MORE INFORMATION
The FORECAST(x, known_y's,known_x's) function returns the
predicted value of the dependent variable (represented in the data by
known_y's) for the specific value, x, of the independent variable (represented
in the data by known_x's) by using a best fit (least squares) linear regression
to predict y values from x values.
If you assume that data pairs are
plotted in a scatter plot with x values that are measured on the horizontal
axis and with y values that are measured on the vertical axis, FORECAST returns
the height of the best fit regression line at the specific value x on the
horizontal axis. FORECAST is the value of y that would be predicted based on
both the value of x and the regression line (characterized by its slope and
intercept that can be found by using Excel's SLOPE and INTERCEPT functions).
Syntax
FORECAST(x, known_y's,known_x's)
The parameter x must
have a numeric value, known_y's and known_x's must be arrays or cell ranges
that contain equal numbers of numeric data values.
The most common
usage of FORECAST includes a specific x value plus 2 ranges of cells that
contain the data, such as FORECAST(125, A1:A100, B1:B100).
Example Usage
To illustrate the FORECAST function, follow these steps:
- Create a blank Excel worksheet, and then copy the following
table:
y-values | x-values | | |
1 | = 3 + 10^$D$3 | | Power of 10 to
add to data |
2 | =4 + 10^$D$3 | | 0 |
3 | =2 + 10^$D$3 | | Chosen x |
4 | =5 + 10^$D$3 | | 6 |
5 | =4+10^$D$3 | | Chosen x + Power of
10 |
6 | =7+10^$D$3 | | =D5 + 10^$D$3 |
| | | |
=SLOPE(A2:A7,B2:B7) | | | Excel 2002
and earlier |
=INTERCEPT(A2:A7,B2:B7) | | | when D3
= 7.5 |
=FORECAST(D7,A2:A7,B2:B7) | | | 4.875 |
=A10 + A9*D7 | | | when D3 = 8 |
| | | #DIV/0! |
- Select cell A1 in your blank Excel worksheet, and then
click Paste on the Edit menu so that the
entries in the table so that the entries in the table fill cells A1:D13 in your
Excel worksheet.
- After you paste the table in your new Excel worksheet,
click the Paste Options button, and then click Match
Destination Formatting. With the pasted range still selected, point to
Column on the Format menu, and then click
AutoFit Selection.
Note You may want to format cells B2:B7 as Number with 0 decimal
places, and cells A9:D13 as Number with 6 decimal places.
Cells A2:A7
and B2:B7 contain both the y-values and the x-values that are used to call
FORECAST in cell A10. The x value is in cell D7.
If you have a version
of Excel that is earlier than Excel 2003, FORECAST may exhibit round-off errors
in the earlier versions. This behavior of FORECAST has been improved for Excel
2003. FORECAST(x, known_y's,known_x's) is the result of evaluating the
following expression:
INTERCEPT(known_y's,known_x's) + SLOPE(known_y's,known_x's) * x.
While the code for FORECAST has not been directly changed for
Excel 2003, the behavior of FORECAST is improved as a result of improved code
for SLOPE. The code for INTERCEPT has also not been directly changed, but it
too is improved because it also calls SLOPE.
In versions of Excel that
are earlier than Excel 2003, FORECAST may exhibit round-off errors. If you add
a positive constant to each of the observations in B2:B7, and add that same
constant to x at the same time, it should not affect the value of FORECAST. If
you were to plot x,y pairs with x on the horizontal axis and with y on the
vertical axis, the data would just shift to the right if you add a positive
constant to each x value. If x is shifted to the right by the same amount, the
value of FORECAST should be unaffected. The best fit regression line would
still have the same slope. However, the shifted data would have a different
intercept.
With the default value of 0 in D3, SLOPE in A9 is
0.775280899. Cells A10:A11 show values of INTERCEPT and FORECAST. Cell A12
contains the following expression that is evaluated when you calculate
FORECAST:
INTERCEPT(known_y's,known_x's) + SLOPE(known_y's,known_x's) * x
Values in cells A11 and A12 will always agree because the value
in A12 is exactly what FORECAST returns. SLOPE should not vary as you add
different positive constants to the known_x's.
If you increase the
value in D3, a larger constant is added to B2:B7, and that same constant is
added to the first argument of FORECAST. If D3 <= 7, there are no round-off
errors that appear in the first six decimal places of SLOPE. As a result,
INTERCEPT and FORECAST are well behaved. Now change the value in D3 to 7.25,
7.5, 7.75, and 8. You will notice that SLOPE in A9 changes. This causes
variations in the values in cells A11:A12. These values should remain constant
throughout the experiment.
Cells D7:D13 show that the values that are
returned by FORECAST and the values that should have been returned by FORECAST
had SLOPE (and INTERCEPT) not changed. These pairs of values are shown for the
cases where D3 = 7.5 and 8, respectively.
Note that round-off errors
have become so severe that division by 0 occurs when D3 = 8.
Earlier
versions of Excel exhibit incorrect answers in these cases because the effects
of round-off errors are more profound with the computational formula that is
used by these versions. Still, the cases used in this experiment could be
viewed as extreme.
If you have Excel 2003, there are not any changes
in the common values in A11 and A12 if you try the experiment. However, cells
D7:D13 show round-off errors that you would have obtained with earlier versions
of Excel.
Results in Earlier Versions of Excel
The article for SLOPE describes the less numerically robust
formula that is used in earlier versions of Excel. It requires only one pass
through the data. It is the shortcomings of SLOPE in these earlier versions of
Excel that cause FORECAST to also exhibit round-off errors in extreme cases.
Results in Excel 2003
SLOPE is calculated by an improved procedure in Excel 2003, and
this improves the performance of FORECAST. This procedure requires two passes
through the data. Again, the article on SLOPE describes the improvement.
Conclusions
Replacing a one-pass approach with a two-pass approach guarantees
better numeric performance of SLOPE in Excel 2003, and this translates in a
better performance of FORECAST. Excel 2003 results will never be less accurate
than results in earlier versions of Excel.
However, you are not likely
to see a difference between the Excel 2003 results and the results in earlier
versions of Excel in most practical examples. This occurs because typical data
is unlikely to exhibit the kind of unusual behavior that this experiment
illustrates. Numeric instability is most likely to appear in earlier versions
of Excel when the data contains a high number of significant digits combined
with relatively little variation between data values.
The procedure of
finding the sum of squared deviations about a sample mean has been improved in
Excel 2003 by
- finding the sample mean,
- computing each squared deviation, and
- summing the squared deviations.
This procedure is more accurate than the alternative procedure
(frequently named the "calculator formula" because it was suitable for use of a
calculator on a small number of data points). The alternative procedure would:
- find the sum of squares of all observations, the sample
size, and the sum of all observations, and then
- compute the sum of squares of all observations minus ((sum
of all observations)^2)/sample size).
There are many other functions that have been improved for Excel
2003 by replacing this latter one-pass procedure by the two-pass procedure that
finds the sample mean on the first pass and computes the sum of squared
deviations about it on the second pass.
The following is a list of the
functions that have been improved in Excel 2003 in this way:
- VAR
- VARP
- STDEV
- STDEVP
- DVAR
- DVARP
- DSTDEV
- DSTDEVP
- FORECAST
- SLOPE
- INTERCEPT
- PEARSON
- RSQ
- STEYX
Similar improvements were made in each of the three Analysis of
Variance tools in the Analysis ToolPak.
For more information about FORECAST, click
Microsoft Excel Help on the
Help menu, type
forecast in the
Search for box in the
Assistance pane, and then click
Start searching to view the
topic.