Excel Statistical Functions: CONFIDENCE (828124)

MORE INFORMATION

The CONFIDENCE(alpha, sigma, n) function returns a value that you can use to construct a confidence interval for a population mean. The confidence interval is a range of values that are centered at a known sample mean. Observations in the sample are assumed to come from a normal distribution with known standard deviation, sigma, and the number of observations in the sample is n.

Syntax

CONFIDENCE(alpha,sigma,n)

Parameters: Alpha is a probability and 0 < alpha < 1. Sigma is a positive number, and n is a positive integer that corresponds to the sample size.

Typically, alpha is a small probability, such as 0.05.

Example Usage

Assume that Intelligence Quotient (IQ) scores follow a normal distribution with standard deviation 15. You test IQs for a sample of 50 students in your local school and obtain a sample mean of 105. You want to compute a 95% confidence interval for the population mean. A 95% or 0.95 confidence interval corresponds to alpha = 1 - 0.95 = 0.05.

To illustrate the CONFIDENCE function, create a blank Excel worksheet, copy the table below, and then select cell A1 in your blank Excel worksheet. On the Edit menu, click Paste so that the entries in the table below fill cells A1:B7 in your worksheet.

alpha	0.05
stdev	15
n	50
sample mean	105

=CONFIDENCE(B1,B2,B3)
=NORMSINV(1 - B1/2)*B2/SQRT(B3)

Note After you paste this table into your new Excel worksheet, click the Paste Options button, and then click Match Destination Formatting. With the pasted range still selected, on the Format menu, point to Column, and then click AutoFit Selection.

Cell A6 shows the value of CONFIDENCE. Cell A7 shows the same value because a call to CONFIDENCE(alpha, sigma, n) returns the result of computing:

NORMSINV(1 - alpha/2) * sigma / SQRT(n)

No changes were made directly to CONFIDENCE, but NORMSINV was improved between versions of Excel that are earlier than Excel 2002 and Excel 2002, and then again, improvements were made between Excel 2002 and Excel 2003. Therefore, CONFIDENCE may return different (and improved) results in these later versions because CONFIDENCE relies on NORMSINV.

This does not mean that you should lose confidence in CONFIDENCE for earlier versions. Inaccuracies in NORMSINV generally occurred for values of its argument very close to 0 or very close to 1. In practice, alpha is generally set to 0.05, 0.01, or maybe 0.001. Values of alpha have to be much smaller than that, for example 0.0000001, before round off errors in NORMSINV are likely to be noticed.

Note See the article on NORMSINV for a discussion of computational differences in NORMSINV for versions of Excel that are earlier than Excel 2002, for Excel 2002, and for Excel 2003.

For additional information, click the following article number to view the article in the Microsoft Knowledge Base:

826772 Excel Statistical Functions: NORMSINV

Interpretation of the Results of CONFIDENCE

The Excel Help file for CONFIDENCE has been rewritten for Excel 2003 because all earlier versions of the Help file gave misleading advice on interpreting results. The example states, "Suppose we observe that, in our sample of 50 commuters, the average length of travel to work is 30 minutes with a population standard deviation of 2.5. We can be 95 percent confident that the population mean is in the interval 30 +/- 0.692951" where 0.692951 is the value returned by CONFIDENCE(0.05, 2.5, 50). For the same example the conclusion reads, "the average length of travel to work equals 30 ± 0.692951 minutes, or 29.3 to 30.7 minutes." Presumably, this is also a statement about the population mean falling within the interval [30 - 0.692951, 30 + 0.692951] with probability 0.95.

Before conducting the experiment that yielded the data for this example, a classical statistician (as opposed to a Bayesian statistician) can make no statement about the probability distribution of the population mean. Instead, a classical statistician deals with hypothesis testing. For example, a classical statistician may want to conduct a two-sided hypothesis test that is based on the supposition of a normal distribution with known standard deviation (such as 2.5), a particular pre-selected value of the population mean, µ0, and a pre-selected significance level (such as 0.05). The test's result would be based on the value of the observed sample mean (for example 30) and the null hypothesis that the population mean is µ0 would be rejected at a significance level 0.05 if the observed sample mean was too far from µ0 in either direction. If the null hypothesis is rejected, the interpretation is that a sample mean that far or further from µ0 would occur by chance less than 5% of the time under the supposition that µ0 is the true population mean. After conducting this test, a classical statistician still cannot make any statement about the probability distribution of the population mean.

A Bayesian statistician, on the other hand, would start with an assumed probability distribution for the population mean (named an a priori distribution), would gather experimental evidence in the same way as the classical statistician, and would use this evidence to revise her or his probability distribution for the population mean and thereby obtain an a posteriori distribution. Excel provides no statistical functions that would help a Bayesian statistician in this endeavor. Excel's statistical functions are all intended for classical statisticians.

Confidence intervals are related to Hypothesis Tests. Given the experimental evidence, a confidence interval makes a concise statement about the values of the hypothesized population mean µ0 that would yield acceptance of the null hypothesis that the population mean is µ0 and the values of µ0 that would yield rejection of the null hypothesis that the population mean is µ0. A classical statistician cannot make any statement about the chance that the population mean falls in any specific interval, because she or he never makes a priori assumptions about this probability distribution and such assumptions would be required if one were to use experimental evidence to revise them.

Explore the relationship between hypothesis tests and confidence intervals by using the example at the beginning of this section. With the relationship between CONFIDENCE and NORMSINV stated in the last section, you have:

CONFIDENCE(0.05, 2.5, 50) = NORMSINV(1 - 0.05/2) * 2.5 / SQRT(50) = 0.692951

Because the sample mean is 30, the confidence interval is 30 +/- 0.692951.

Now consider a two-sided hypothesis test with the significance level 0.05 as described earlier that assumes a normal distribution with standard deviation 2.5, a sample size of 50 and a specific hypothesized population mean, µ0. If this is the true population mean, then the sample mean will come from a normal distribution with population mean µ0 and standard deviation, 2.5/SQRT(50). This distribution is symmetrical about µ0 and you would want to reject the null hypothesis if ABS(sample mean - µ0) > some cutoff value. The cutoff value would be such that if µ0 were the true population mean, a value of sample mean - µ0 higher than this cutoff or a value of µ0 - sample mean higher than this cutoff would each occur with probability 0.05/2. This cutoff value is

NORMSINV(1 - 0.05/2) * 2.5/SQRT(50) = CONFIDENCE(0.05, 2.5, 50) = 0. 692951

So reject the null hypothesis (population mean = µ0) if

sample mean - µ0 > 0. 692951

-or-

0 - sample mean > 0. 692951

Because sample mean = 30 in our example these two statements become

30 - µ0 > 0. 692951

-or-

µ0 - 30 > 0. 692951

and rewriting them so that only µ0 appears on the left

µ0 < 30 - 0. 692951

-or-

µ0 > 30 + 0. 692951

These are exactly the values of µ0 that are not in the confidence interval [30 - 0.692951, 30 + 0.692951]. Therefore the confidence interval [30 - 0.692951, 30 + 0.692951] contains those values of µ0 where the null hypothesis that the population mean is µ0 would not be rejected given the sample evidence. For values of µ0 outside this interval, the null hypothesis that the population mean is µ0 would be rejected given the sample evidence.

Conclusions

Inaccuracies in earlier versions of Excel generally occur for extremely small or extremely large values of p in NORMSINV(p). CONFIDENCE is evaluated by calling NORMSINV(p) so accuracy of NORMSINV is a potential concern for users of CONFIDENCE. However, values of p used in practice are not likely to be extreme enough to cause significant round off errors in NORMSINV and performance of CONFIDENCE should not be a concern to users of any version of Excel.

Most of this article has focused on interpreting the results of CONFIDENCE, in other words, "what is the meaning of a confidence interval?" Confidence intervals are frequently misunderstood. Unfortunately, Excel Help files in all versions of Excel that are earlier than Excel 2003 have contributed to this misunderstanding. The Excel 2003 Help file has been improved.

For more information about CONFIDENCE, click Microsoft Excel Help on the Help menu, type confidence in the Search for box in the Assistance pane, and then click Start searching to view the topic.

Excel Statistical Functions: CONFIDENCE (828124)

SUMMARY

MORE INFORMATION

Syntax

Example Usage

Interpretation of the Results of CONFIDENCE

Conclusions