MORE INFORMATION
The TREND(known_y's, known_x's, new_x's, constant) function
is used to perform Linear Regression. A least squares criterion is used and
TREND tries to find the best fit under that criterion. Known_y's represent data
on the "dependent variable" and known_x's represent data on one or more
"independent variables". The TREND Help file discusses rare cases where the
second or third argument may be omitted.
For more information about TREND, click
Microsoft Excel Help on the
Help menu, type
trend in the
Search for box in the
Assistance pane, and then click
Start searching to view the
topic.
If the last argument "constant" is set to TRUE,
you want the regression model to include a coefficient for the intercept in the
regression model. If the last argument is set to FALSE, no intercept term is
included; the fitted regression is forced to go through the origin. The last
argument is optional; if omitted it is interpreted as TRUE.
For ease
of exposition in the rest of this article, assume that data is arranged in
columns so that known_y's is a column of y data and known_x's is one or more
columns of x data. Of course the dimensions (lengths) of each of these columns
must be equal. New_x's will also be assumed to be arranged in columns and there
must be the same number of columns for new_x's as for known_x's. All the
observations in this article are equally true if the data is not arranged in columns, but
it is just easier to discuss this single (most frequently used)
case.
After computing the best fit regression model (by essentially
calling Excel's LINEST function), TREND returns predicted values associated
with new_x's.
This article uses examples to show how TREND relates to
LINEST and to point out problems with LINEST in Excel 2002 and earlier that
translate to problems with TREND. While the code for TREND has not been
rewritten for Excel 2003, extensive changes (and improvements) in LINEST code
have been made. TREND effectively calls LINEST, executes LINEST, uses
regression coefficients in LINEST output in its calculation of predicted y
values associated with each row of new_x's, and presents this column of
predicted y values to you. Therefore, you must know about problems in the
execution of LINEST.
As supplements to this article, the article on
LINEST is highly recommended. It contains several examples and documents
problems with LINEST in Excel 2002 and earlier.
For additional information, click the following article number to view the article in the Microsoft Knowledge Base:
828533
Description of the LINEST function in Excel 2003 and in Excel 2004 for Mac
The LINEST Help file revised for Excel
2003 is also recommended. For more
information about LINEST, click
Microsoft Excel Help on the
Help menu, type
linest in the
Search for box in the Assistance pane, and then click
Start searching to view the topic.
Because the focus here is on numeric problems
in Excel 2002 and earlier, this article is light on practical examples for how to use TREND. TREND's Help file contains useful examples.
Syntax
TREND(known_y's, known_x's, new_x's, constant)
The arguments, known_y's, known_x's, and new_x's must be arrays
or cell ranges with related dimensions. If known_y's is one column by m rows,
then known_x's are c columns by m rows where c is greater than or equal to one.
Note that c is the number of predictor variables; m is the number of data
points. New_x's must then be c columns by r rows where r is greater than or
equal to one. (Similar relationships in dimensions must hold if data is laid
out in rows instead of columns.) Constant is a logical argument that must be
set to TRUE or FALSE (or 0 or 1 that Excel interprets as FALSE or TRUE,
respectively). The last three arguments to TREND are all optional; see the
TREND Help file for options of omitting the second argument, the third
argument, or both. Omitting the fourth argument is interpreted as
TRUE.
The most common usage of TREND includes two ranges of cells that
contain the data, such as TREND(A1:A100, B1:F100, B101:F108, TRUE). Note that
because there is typically more than one predictor variable, the second
argument in this example contains multiple columns. In this example, there are
one hundred subjects, one dependent variable value (known_y) for each subject,
and five dependent variable values (known_x's) for each subject. There are
eight additional hypothetical subjects where you want to use TREND to compute
predicted y values.
Example Usage
An Excel worksheet example is provided to illustrate two key
concepts:
- How TREND interacts with LINEST.
- Problems because of collinear known_x's in Excel 2002 and
earlier versions of TREND (or LINEST).
Extensive discussion of the second bullet point in the context
of LINEST is provided in the article on LINEST.
To illustrate TREND
collinearity, create a blank Excel worksheet, copy the following table, select cell
A1 in your blank Excel worksheet. On the
Edit menu, click
Paste so that the entries in the following table fill cells
A1:K35 in your worksheet.
y: | x's: | | | | | | | | | |
1 | 1 | 2 | 1 | | | | | | | |
2 | 3 | 4 | 1 | | | | | | | |
3 | 4 | 5 | 1 | | | | | | | |
4 | 6 | 7 | 1 | | | | | | | |
5 | 7 | 8 | 1 | | | | | | | |
new
x's: | 9 | 11 | | | | | | | | |
| 12 | 14 | | | | | | | | |
| | | | | | | | | | |
TREND using cols
B,C: | | | | pre-Excel 2003
values: | | | | Excel 2003
values: | | |
=TREND(A2:A6,B2:C6,B7:C8,TRUE) | | | | #NUM! | | | | 6.15789473684211 | | |
=TREND(A2:A6,B2:C6,B7:C8,TRUE) | | | | #NUM! | | | | 8.13157894736842 | | |
| | | | | | | | | | |
TREND using col B
only | | | | | | | | | | |
=TREND(A2:A6,B2:B6,B7:B8,TRUE) | | | | 6.1578947368421 | | | | 6.15789473684211 | | |
=TREND(A2:A6,B2:B6,B7:B8,TRUE) | | | | 8.13157894736842 | | | | 8.13157894736842 | | |
| | | | | | | | | | |
Fitted values from Excel 2003 LINEST
results | | | | | | | | | | |
Using cols B, C | | Using Col
B | | | | | | | | |
= K24*1 + J24*B7 +
I24*C7 | | =J31*1+I31*B7 | | | | | | | | |
=K24*1 + J24*B8 + I24*C8 | | =J31*1
+I31*B8 | | | | | | | | |
| | | | | | | | | | |
LINEST using cols
B,C: | | | | pre-Excel 2003
values: | | | | Excel 2003
values: | | |
=LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | | #NUM! | #NUM! | #NUM! | | 0 | 0.657894736842105 | 0.236842105263158 |
=LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | | #NUM! | #NUM! | #NUM! | | 0 | 0.043859649122807 | 0.206652964726136 |
=LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | | #NUM! | #NUM! | #NUM! | | 0.986842105263158 | 0.209426954145848 | #N/A |
=LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | | #NUM! | #NUM! | #NUM! | | 225 | 3 | #N/A |
=LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | =LINEST(A2:A6,B2:C6,TRUE,TRUE) | | #NUM! | #NUM! | #NUM! | | 9.86842105263158 | 0.131578947368421 | #N/A |
| | | | | | | | | | |
LINEST using col B
only | | | | | | | | | | |
=LINEST(A2:A6,B2:B6,TRUE,TRUE) | =LINEST(A2:A6,B2:B6,TRUE,TRUE) | | | 0.657894736842105 | 0.236842105263159 | | | 0.657894736842105 | 0.236842105263158 | |
=LINEST(A2:A6,B2:B6,TRUE,TRUE) | =LINEST(A2:A6,B2:B6,TRUE,TRUE) | | | 0.0438596491228071 | 0.206652964726136 | | | 0.043859649122807 | 0.206652964726136 | |
=LINEST(A2:A6,B2:B6,TRUE,TRUE) | =LINEST(A2:A6,B2:B6,TRUE,TRUE) | | | 0.986842105263158 | 0.209426954145848 | | | 0.986842105263158 | 0.209426954145848 | |
=LINEST(A2:A6,B2:B6,TRUE,TRUE) | =LINEST(A2:A6,B2:B6,TRUE,TRUE) | | | 224.999999999999 | 3 | | | 225 | 3 | |
=LINEST(A2:A6,B2:B6,TRUE,TRUE) | =LINEST(A2:A6,B2:B6,TRUE,TRUE) | | | 9.86842105263158 | 0.131578947368421 | | | 9.86842105263158 | 0.131578947368421 | |
Note After you paste this table into your new Excel worksheet, click
Paste Options, and then click
Match
Destination Formatting. With the pasted range still selected, point to
Column on the
Format menu, and then click
AutoFit Selection.
Data for TREND are in cells
A1:C8. (Entries in cells D2:D6 are not part of the data, but are used for
illustration later in this article.) Results of TREND for two different models for both Excel
2002 (and earlier versions) and Excel 2003 are presented in cells E10:E16 and
cells I10:116 respectively. Results in cells A10:A16 will correspond to the
version of Excel that you are using. For now, this article focuses on the Excel
2003 results in investigating how TREND calls LINEST and how it uses LINEST
results.
TREND and LINEST can be viewed as interacting as follows:
- You call TREND(known_y's, known_x's, new_x's,
constant).
- TREND calls LINEST(known_y's, known_x's, constant,
TRUE).
- Regression coefficients from this call to LINEST are
obtained; these coefficients appear in the first row of LINEST's output
table.
- For each new_x's row, the predicted y-value is calculated
based on these LINEST coefficients and the new_x's values in that
row.
- The calculated value in step 4 is returned in the
appropriate cell for TREND output corresponding to that new_x's row.
If TREND is to return appropriate results, then LINEST had
better generate appropriate results in step 3. Problems here come from
collinear predictor columns.
Predictor columns (known_x's) are
collinear if at least one column, c, can be expressed as a sum of multiples of
others, c1, c2, and other columns. Column c is frequently named redundant
because the information that it contains can be constructed from the columns
c1, c2, and other columns. The fundamental principle in the presence of
collinearity is that results should be unaffected by including or removing a
redundant column from the original data. Because Excel 2002 and earlier
versions of LINEST did not look for collinearity, this principle was easily
violated. Predictor columns are almost collinear if at least one column, c, can
be expressed as almost equal to a sum of multiples of others, c1, c2, and other
columns. In this case "almost equal" means a very small sum of squared
deviations of entries in c from corresponding entries in the weighted sum of
c1, c2, and other columns; "very small" might be less than 10^(-12) for
example.
The first model, in rows 10 to 12, uses columns B and C as
predictors and requests Excel to model the constant (last argument set to
TRUE). Excel then effectively inserts an additional predictor column that looks
just like cells D2:D6. It is easy to notice that entries in column C in rows 2
to 6 are precisely equal to the sum of corresponding entries in columns B and D.
Therefore, there is collinearity present because column C is a sum of multiples
of:
- Column B
-and- - Excel's additional column of 1's inserted because the third
argument to LINEST (same as fourth argument to TREND) was omitted or TRUE (the
"normal" case).
This causes such numeric problems that Excel 2002 and earlier
versions cannot compute results and the TREND output table is filled with
#NUM!.
The second model, in rows 14 to 16, is one that any version of
Excel can handle successfully. There is no collinearity and you can again
request Excel to model the constant. This model is included here for two
reasons.
First, it is perhaps most typical of practical cases: no
collinearity present. These cases are handled well in all versions of Excel.
Hopefully, it is reassuring that, if you have an earlier version, numeric
problems are not likely to occur in the most common practical
case.
Second, this example is used to compare behavior of Excel 2003
in the two models. Most major statistical packages analyze collinearity, remove
a column that is a sum of multiples of others from the model, and alert you
with a message like "column C is linearly dependent on other predictor columns
and has been removed from the analysis."
In Excel 2003, such a message
is conveyed not in an alert or a text string, but in the LINEST output table.
TREND has no mechanism for delivering such a message to you. In the LINEST
output table, a regression coefficient that is zero and whose standard error is
zero corresponds to a coefficient for a column that has been removed from the
model. LINEST output tables are included in rows 23 to 35 corresponding to the
TREND output in rows 10 to 16. The entries in cells I24:I25 show an eliminated
redundant predictor column. In this case, LINEST chose to remove column C
(coefficients in cells I24, J24, K24 correspond to columns C, B, and Excel's
constant column, respectively). When there is collinearity present, any one of
the columns involved can be removed and the choice is basically
arbitrary.
In the second model in rows 30 to 35, there is no
collinearity and no column removed. You can see that the predicted y values are
the same in both models. This issue occurs because removing a redundant column
that is a sum of multiples of others does not reduce the goodness of fit of the
resulting model. Such columns are removed precisely because they represent no
value added in trying to find the best least squares fit. Also, if you examine
Excel 2003 LINEST output in cells I23:K35, you will notice that the last three
rows of the output tables are the same and that the entries in cells I31:J32
and cells J24:K25 coincide. This demonstrates that the same results are
obtained when column C is included in the model but found to be redundant
(output in cells I24:K28) as when column C was eliminated before LINEST was run
(output in cells I31:J35). This satisfies the fundamental principle in the
presence of collinearity.
In cells A18:C21, this article uses Excel
2003 data to illustrate how TREND takes LINEST output and computes the relevant
predicted y-values. By examining the formulas in cells A20:A21 and cells
C20:C21, you can see how LINEST coefficients are combined with new_x's data in
cells B7:C8 for each of the two models (using columns B, C as predictors; using
only column B as a predictor).
Collinearity is identified in LINEST in
Excel 2003 because of a completely different approach, QR Decomposition, to
solving for the regression coefficients. The LINEST article describes a
walkthrough of the QR Decomposition algorithm for a small example.
Summary of Results in Earlier Versions of Excel
TREND results are adversely affected in Excel 2002 and earlier
versions by inaccurate results in LINEST.
LINEST was calculated by
using an approach that paid no attention to collinearity issues. The existence
of collinearity caused round off errors, inappropriate standard errors of
regression coefficients, and inappropriate degrees of freedom. Sometimes round
off problems were sufficiently severe that LINEST filled its output table with
#NUM!. If, as in the great majority of cases in practice, you can be confident
that there are not collinear (or almost collinear) predictor columns, then
LINEST generally provides acceptable results. Therefore, if you use TREND, you
can be similarly reassured if you are confident that there are not collinear
(or almost collinear) predictor columns.
Summary of Results in Excel 2003
Improvements in LINEST include switching to the QR Decomposition
method of determining regression coefficients. QR Decomposition has two
advantages:
- Better numeric stability (generally smaller round off
errors).
-and- - Analysis of collinearity issues.
All problems with Excel 2002 and earlier versions illustrated
in this article have been corrected for Excel 2003.
Conclusions
TREND's performance has been improved because LINEST has been
greatly improved for Excel 2003. If you use an earlier version of Excel, verify
that predictor columns are not collinear before using TREND. While much of the
material presented in this article, and in the LINEST article, might at first
appear alarming to Excel 2002 and earlier users, note that collinearity is a
problem in a small percentage of cases. Earlier versions of Excel give
acceptable TREND results when there is no collinearity.
Fortunately,
improvements in LINEST also affect the Analysis ToolPak's linear regression
tool (this calls LINEST) and two other related Excel functions: LOGEST and
GROWTH.