Excel statistical functions: DVAR, DVARP, DSTDEV, and DSTDEVP (828125)
The information in this article applies to:
- Microsoft Office Excel 2003
- Microsoft Excel 2004 for Mac
SUMMARYThe computational formulas for the DVAR, DVARP, DSTDEV, and
DSTDEVP functions have been improved for Excel 2003. This article discusses the
effects of these improvements.
Microsoft Excel 2004 for Mac information
The statistical functions in Excel 2004 for Mac were updated by using the same algorithms that were used to update the statistical functions in Microsoft Office Excel 2003. Any information in this article that describes how a function works or how a function was modified for Excel 2003 also applies to Excel 2004 for Mac.MORE INFORMATIONThese four functions are among Excel's Database and List
Management Functions. Where the functions calculate the variance or the
standard deviation over a set of observations, DVAR, DVARP, DSTDEV, and DSTDEVP
differ from VAR, VARP, STDEV, and STDEVP (respectively) in the way they define
the set of observations. DVAR, DVARP, DSTDEV, and DSTDEVP take as
arguments a database, a field, and criteria. The functions take as observations
the field values in each database record that satisfies criteria. VAR,
VARP, STDEV, and STDEVP take as arguments a set of up to 30 value arguments.
Most frequently, VAR, VARP, STDEV, and STDEVP use a single value argument that
corresponds to a range of cells, such as VAR(A1:B100). Therefore,
with VAR, VARP, STDEV, and STDEVP, you specify the cells that contain the
values you want to include. With DVAR, DVARP, DSTDEV, and DSTDEVP, you specify
a database (table), field (column) and criteria. They include only those values
that occur in cells that satisfy the criteria. SyntaxDVAR(database, field, criteria) Database is the range of cells that makes up the list or database. A
database is a list of related data where the rows of related information are
records, and the columns of data are fields. The first row of the list contains
labels for each column. Field indicates what column the function uses. The field argument can
appear as text with the column label enclosed between quotation marks, such as
"Age" or "Yield," or as a number that represents the position of the column in
the list: 1 (without quotation marks) for the first column, 2 for the second
column, and so on. Criteria is the range of cells that contains the conditions you specify.
You can use any range for the criteria argument if it includes at least one
column label and at least one cell below the column label that specifies a
condition for the column. Note The syntax of DVARP, DSTDEV, and DSTDEVP is the same as DVAR. Example usageThe Excel Help files contain a useful example for these functions.
The following example is a slightly modified version of the example in the Help
files. The following example focuses on the computational aspects of the
functions. Create a blank Excel worksheet, copy the following table,
select cell A1 in your blank Excel worksheet, and then click
Paste on the Edit menu so that the entries in
the table below fill cells A1:F15 in your worksheet.
Tree | Height | Age | Yield | Profit | Height | Apple | >10 | | | | <16 | Pear | | | | | | Tree | Height | Age | Yield | Profit | | Apple | 18 | 20 | =13 +
10^$F$12 | 105 | | Pear | 12 | 12 | =9 +
10^$F$12 | 96 | | Cherry | 13 | 14 | =8 +
10^$F$12 | 105 | | Apple | 14 | 15 | =9 +
10^$F$12 | 75 | | Pear | 9 | 8 | =7 +
10^$F$12 | 76.8 | | Apple | 8 | 9 | =5 +
10^$F$12 | 45 | | | | | | | | =DSTDEV(A4:E10,"Yield",A1:A3) | The estimated standard
deviation in the yield of apple and pear trees if the data in the database is
only a sample of the total orchard population. (2.9664794) | =STDEV(D5,
D6, D8, D9, D10) | 16 | Power of 10 added to Yield | 9 | =DSTDEVP(A4:E10,"Yield",A1:A3) | The true standard
deviation in the yield of apple and pear trees if the data in the database is
the whole population. (2.6532998) | =STDEVP(D5, D6, D8, D9,
D10) | 12.8 | | | =DVAR(A4:E10,"Yield",A1:A3) | The estimated variance in
the yield of apple and pear trees if the data in the database is only a sample
of the total orchard population. (8.8000000) | =VAR(D5, D6, D8, D9,
D10) | 256 | | | =DVARP(A4:E10,"Yield",A1:A3) | The true variance in the
yield of apple and pear trees if the data in the database is the whole orchard
population. (
7.0400000) | =VARP(D5,D6,D8,D9,D10) | 163.84 | | |
You may want to format cells A12:A15 and C12:C15 as Number with
7 decimal places and cells D5:D10 as Number with 0 (zero) decimal
places. This worksheet is designed to show the differences in behavior
between Excel 2002 (and earlier) and Excel 2003. Similar worksheets in the
articles for VAR, VARP, STDEV, and STDEVP give you an opportunity for a more
complete investigation of these differences.
For additional information, click the following article numbers to view the articles in the Microsoft Knowledge Base:
826112
Excel statistical functions: VAR
826393 Excel statistical functions: VARP
826349 Description of the STDEV function in Excel 2003
826406 Excel statistical functions: STDEVP
Each of the four calls to these "D" functions in cells
A12:A15 use the same database, field, and criteria. Data is in A4:E10, the
specified field is "Yield" whose entries are in column D, and the criteria
specified in A1:A3 is that the value in the "Tree" field must be "Apple" or
"Pear". There are five values that satisfy the criteria and are in cells D5,
D6, D8, D9, and D10. Entries in C12:C15 are the corresponding calls to STDEV,
STDEVP, VAR, and VARP that use these five cells as arguments, for example
STDEV(D5, D6, D8, D9, D10). Entries in C12:C15 are the same as those in
A12:A15. If the value in cell F12 has not changed from its initial value of 0,
these values appear in parentheses in the text descriptions in cells
B12:B15. If you have a version of Excel earlier than Excel 2003, there
is the potential for significant round off errors in extreme situations. In
cell F12, you can add a power of 10 to each Yield data value. If you add the
same constant to all the Yield data values, this should not affect the values
of any of the four "D" functions or their counterparts STDEV, STDEVP, VAR, and
VARP. If you gradually increase the value in cell F12 to 1, 2, 3, and 4, you
can see that the addition of a constant does not affect the values of any of
the eight functions shown in A12:A15 and C12:C15. Then, if you continue to
increase the value in F12 to 5, 6, and 7, minor round off errors occur only in
DVAR and VAR. If you continue to increase the value in F12 to 8 and 9, serious
round off errors occur in all eight functions. When you set F12 to 9, these
entries agree with the fixed numbers that appear in D12:D15. These
round off errors may alarm you, but they also illustrate the extreme type of
situation where they occur. If there are many significant digits in the data
and a very small variability between the data points, these round off errors
can occur. When you set F12 to 9, every value is over 1 billion, but the sample
and population standard deviations are both below 3. If you have Excel
2003, there are no round off errors when you conduct this experiment. If you
set F12 to 9, the entries in D12:D15 are the values that you would have
obtained with the versions of Excel earlier than Excel 2003. Results in earlier versions of ExcelIn extreme cases where there are many significant digits in the
data but a small variance, the old formula leads to inaccurate results. Earlier
versions of Excel use a single pass through the data to calculate the sum of
the squares of the data values, the sum of the data values, and the count of
the data values (sample size). Excel 2002 and earlier then combine these
quantities into the formula in the Help files for VAR, VARP, STDEV, and STDEVP.
These formulas are also known as "calculator formulas" because they are
suitable for use on a hand calculator for small sets of numerically
well-behaved data. If, for example, you had weights in pounds for a sample of
100 people and these weights were all whole numbers between 100 and 400, then
the "calculator formulas" work just as effectively as a calculator in practice. Results in Excel 2003In Excel 2003, the procedure uses two passes through the data. On
the first pass, Excel 2003 calculates the sum and count of the data values, and
from these it can calculate the sample mean (average). On the second pass,
Excel finds the squared difference between each data point and the sample mean,
and then sums these squared differences. In the numeric examples, even a high
power of 10 in cell F12 does not affect the squared differences, and the
results of the second pass are independent of the entry in cell F12. Therefore,
Excel 2003 gives results that are numerically more stable. ConclusionsA two-pass approach gives better numeric performance of the
functions DVAR, DVARP, DSTDEV, and DSTDEVP and the functions VAR, VARP, STDEV,
and STDEVP in Excel 2003 than in earlier versions of Excel. The Excel 2003
results are never less accurate than the results in earlier versions of
Excel. In most practical cases, however, you are not likely see a
difference between Excel 2003 results and the results in earlier versions of
Excel. Typical data is not likely to exhibit the unusual behavior that the
earlier experiment illustrates. Numeric instability will most likely occur in
earlier versions of Excel when the data contains many significant digits and
little variation between data values. The following procedure that
finds the sum of squared deviations about a sample mean
- Find the sample mean.
- Calculate each squared deviation.
- Sum the squared deviations.
is more accurate than the alternative procedure (the "calculator
formula")
- Find the sum of the squares of all the observations, the
sample size, and the sum of all the observations.
- Compute the sum of the squares of all the observations
minus ((sum of all the
observations)^2)/sample
size)
Because Excel 2003 replaces the one-pass procedure with the
two-pass procedure that finds the sample mean on the first pass and computes
the sum of squared deviations about it on the second pass, many other functions
are improved in Excel 2003. A short list of such improved functions includes
VAR, VARP, STDEV, STDEVP, DVAR, DVARP, DSTDEV, DSTDEVP, FORECAST, SLOPE,
INTERCEPT, PEARSON, RSQ, and STEYX. Excel 2003 has similar improvements in each
of the three Analysis of Variance tools in the Analysis
ToolPak. For more information about DVAR, DVARP,
DSTDEV, or DSTDEVP, click Microsoft Excel Help on the
Help menu, type dvar, dvarp, dstdev, or
dstdevp in the Search for box in the Assistance
pane, and then click Start searching to view the topic.
Modification Type: | Minor | Last Reviewed: | 1/11/2006 |
---|
Keywords: | kbinfo KB828125 kbAudEndUser |
---|
|