## Confidence Interval

The concept of the confidence interval was introduced and developed theoretically by Neyman in the 1930s. The confidence interval represents the range of values around a parameter estimate that indicates the degree of certainty that the range contains the true value of the population parameter. The upper and lower boundaries of the range are the confidence limits. The width of the confidence interval indicates the degree of precision associated with the parameter estimate. Wider intervals indicate less precision, and narrower intervals indicate greater precision. The width of the interval can never be zero, because there will always be some sampling error associated with estimating a population parameter from sample data. Sampling error may be due to measurement unreliability or other chance factors that cause fluctuations from sample to sample. The result is that no matter how carefully a sample is drawn or how large it is, there can be no certainty that the sample estimate is exactly equal to the parameter (population) value.

The calculation of the confidence interval for any parameter is based on the standard error of the relevant sampling distribution. For a simple observation, X, assuming an underlying normal distribution with mean m and standard deviation a, the confidence limits on the observation can be stated simply as X = m ± zo, where z represents the standard normal deviate associated with any particular level of confidence. Any confidence level may be specified, but in practice the most commonly used intervals are the 95%, 99%, and 99.9% levels:

95% confidence limits: -1.96a < X - m < 1.96o

99% confidence limits: -2.58a <X- m < + 2.58o

99.9% confidence limits: -3.29a <X- m < + 3.29o

During the early decades of research in experimental psychology, confidence intervals of 50% were commonly reported, based on the concept of the probable error (m ± 0.6745a). It is now seldom used and is generally considered obsolete.

Confidence limits can be computed for any sample statistic for which the sampling distribution is known. For example, for the mean (M), the standard error of the mean (om) is used, so that M = m ± zam. If the population mean and standard error are not known, which is often the case, estimates of the mean and standard error based on observed samples may be substituted. However, in this situation, the confidence limits must be set using the t-distribution rather than the normal (z) distribution. When sampling distributions are unknown or seriously depart from the normal distribution, various advanced techniques can be employed to estimate the standard error from observed data, such as bootstrapping, jackknifing, and computer simulation. Confidence intervals are most commonly reported for well-known statistics such as sample means, correlation and regression coefficients, proportions, and predicted scores, but they should also be determined for less commonly used statistics, such as measures of effect size and goodness-of-fit indexes.

As an example of the use of confidence intervals, consider an incoming class of 750 college freshman with an average recorded Scholastic Aptitude Test (SAT) score of 550. Assuming an underlying normal distribution with m = 500 and a = 100, then am = 3.65. The resulting 95% confidence interval is 543-557. This interval is typically interpreted as meaning that there is a 95% chance that the interval 543557 contains the "true" value of the freshman class SAT score. However, since any specifically computed interval either does or does not contain the true score, it is probably fairer to say that if a very large number—in principle, an infinite number—of such group means were sampled, 95% of the resulting confidence intervals would contain the true score.

The use of confidence intervals is increasingly being recommended as a substitute for statistical significance testing. This position received its first major explication by

Rozeboom in 1960 and has been elaborated by others since, especially Cohen. This position holds that null hypothesis significance testing is a barrier to progress in behavioral science, especially with respect to the accumulation of knowledge across studies. The use of confidence intervals, in conjunction with other techniques such as metaanalysis, is proposed to replace traditional significance testing. The idea here is that confidence intervals can provide all of the information present in a significance test while yielding important additional information as well.

Joseph S. Rossi

University of Rhode Island