Excel statistical functions: GROWTH (828526)

MORE INFORMATION

The GROWTH(known_y's, known_x's, new_x's, constant) function is used to perform a Regression analysis where an exponential curve is fitted. A least squares criterion is used and GROWTH tries to find the best fit under that criterion. Known_y's represent data on the "dependent variable" and known_x's represent data on one or more "independent variables". The GROWTH Help file discusses rare cases where the second or third argument may be omitted.

Assuming that there are p predictor variables, GROWTH essentially calls LOGEST. LOGEST fits an equation of the form:

y = b * (m1^x1) * (m2^x2) * ... * (mp^xp)

Values of the coefficients, b, m1, m2, ..., mp are determined that give the best fit to the y data.

If the last argument "constant" is set to TRUE, you want the regression model to include the multiplicative coefficient b in the regression model. If set to FALSE, b is excluded by essentially setting it to 1. The last argument is optional; if the argument is omitted it is interpreted as TRUE.

For ease of exposition in the remainder of this article, assume that data is arranged in columns so that known_y's is a column of y data and known_x's is one or more columns of x data. Of course the dimensions (lengths) of each of these columns must be equal. New_x's will also be assumed to be arranged in columns and there must be the same number of columns for new_x's as for known_x's. All our observations below are equally true if the data is not arranged in columns, but it is just easier to discuss this single (most frequently used) case.

After you compute the best fit regression model (by essentially calling Excel's LOGEST function), GROWTH returns predicted values that are associated with new_x's.

This article uses examples to show how GROWTH relates to LOGEST and to point out problems with LOGEST in versions of Excel earlier than Excel 2003 that translate to problems with GROWTH. GROWTH effectively calls LOGEST, executes LOGEST, uses regression coefficients in LOGEST output in its calculation of predicted y values that are associated with each row of new_x's, and presents this column of predicted y values to you. Therefore, you must know about problems in the execution of LOGEST. When LOGEST is called, it in turn effectively calls LINEST. While code for GROWTH and LOGEST have not been rewritten for Excel 2003, extensive changes (and improvements) in LINEST code have been made.

As supplements to this article, the article about LINEST is highly recommended. It contains several examples and documents problems with LINEST in versions of Excel earlier than Excel 2003.

For additional information about LINEST, click the following article number to view the article in the Microsoft Knowledge Base:

828533 Description of the LINEST function in Excel 2003 and in Excel 2004 for Mac

The LINEST Help file revised for Excel 2003 is also recommended.

For more information about LINEST, click Microsoft Excel Help on the Help menu, type linest in the Search for box in the Assistance pane, and then click Start searching to view the topic.

The article about LOGEST explains how LOGEST interacts with LINEST; these details are omitted here.

For additional information, click the following article number to view the article in the Microsoft Knowledge Base:

828528 Excel statistical functions: LOGEST

Because the focus here is on numeric problems in versions of Excel earlier than Excel 2003, this article is light on practical examples of the use of GROWTH. The Help file in GROWTH contains useful examples.

For more information about GROWTH, click Microsoft Excel Help on the Help menu, type growth in the Search for box in the Assistance pane, and then click Start searching to view the topic.

Syntax

GROWTH(known_y's, known_x's, new_x's, constant)

The arguments, known_y's, known_x's, and new_x's must be arrays or cell ranges that have related dimensions. If known_y's is one column by m rows then known_x's is c columns by m rows where c is greater than or equal to one. C is the number of predictor variables; m is the number of data points. New_x's must then be c columns by r rows where r is greater than or equal to one. (Similar relationships in dimensions must hold if data is laid out in rows instead of columns.) Constant is a logical argument that must be set to TRUE or FALSE (or 0 or 1 that Excel interprets as FALSE or TRUE, respectively). The last three arguments to GROWTH are all optional; see the GROWTH Help file for options of omitting the second argument, third argument, or both; omitting the fourth argument is interpreted as TRUE.

The most common usage of GROWTH includes two ranges of cells that contain the data, such as GROWTH(A1:A100, B1:F100, B101:F108, TRUE). Note that because there is typically more than one predictor variable, the second argument in this example contains multiple columns. In this example, there are one hundred subjects, one dependent variable value (known_y) for each subject, and five dependent variable values (known_x's) for each subject. There are eight additional hypothetical subjects where you want to use GROWTH to compute predicted y values.

Example usage

An Excel worksheet example is provided to illustrate two key concepts:

How GROWTH interacts with LOGEST.
Problems because of collinear known_x's in versions of Excel earlier than Excel 2003 of GROWTH (or LOGEST and LINEST).

Note An extensive discussion of the second item in the context of LINEST is provided in the article about LINEST.

To illustrate the GROWTH function, create a blank Excel worksheet, copy the table below, select cell A1 in your blank Excel worksheet. On the Edit menu, click Paste so that the entries in the table below fill cells A1:K35 in your worksheet.

y:	x's:
=EXP(F2)	1	2	1		1
=EXP(F3)	3	4	1		2
=EXP(F4)	4	5	1		3
=EXP(F5)	6	7	1		4
=EXP(F6)	7	8	1		5
new x's:	9	11
	12	14

GROWTH using cols B,C:				Excel 2002 and earlier values:			Excel 2003 values:
=GROWTH(A2:A6,B2:C6,B7:C8,TRUE)				#NUM!			472.432432563203
=GROWTH(A2:A6,B2:C6,B7:C8,TRUE)				#NUM!			3400.16400895377

GROWTH using col B only
=GROWTH(A2:A6,B2:B6,B7:B8,TRUE)				472.432432563203			472.432432563203
=GROWTH(A2:A6,B2:B6,B7:B8,TRUE)				3400.16400895377			3400.16400895377

Fitted values from Excel 2003 LOGEST results
Using cols B, C		Using Col B
=EXP(LN(K24)1 + LN(J24)B7 + LN(I24)*C7)		=EXP(LN(J31)1 + LN(I31)B7)
=EXP(LN(K24)1 + LN(J24)B8 + LN(I24)*C8)		=EXP(LN(J31)1 + LN(I31)B8)

LOGEST using cols B,C:				Excel 2002 and earlier values:			Excel 2003 values:
=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)		#NUM!	#NUM!	#NUM!	1	1.9307233720034	1.26724101129183
=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)		#NUM!	#NUM!	#NUM!	0	0.043859649122807	0.206652964726136
=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)		#NUM!	#NUM!	#NUM!	0.986842105263158	0.209426954145848	#N/A
=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)		#NUM!	#NUM!	#NUM!	225	3	#N/A
=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)	=LOGEST(A2:A6,B2:C6,TRUE,TRUE)		#NUM!	#NUM!	#NUM!	9.86842105263158	0.131578947368421	#N/A

LOGEST using col B only
=LOGEST(A2:A6,B2:B6,TRUE,TRUE)	=LOGEST(A2:A6,B2:B6,TRUE,TRUE)			1.9307233720034	1.26724101129183		1.9307233720034	1.26724101129183
=LOGEST(A2:A6,B2:B6,TRUE,TRUE)	=LOGEST(A2:A6,B2:B6,TRUE,TRUE)			0.0438596491228071	0.206652964726136		0.043859649122807	0.206652964726136
=LOGEST(A2:A6,B2:B6,TRUE,TRUE)	=LOGEST(A2:A6,B2:B6,TRUE,TRUE)			0.986842105263158	0.209426954145848		0.986842105263158	0.209426954145848
=LOGEST(A2:A6,B2:B6,TRUE,TRUE)	=LOGEST(A2:A6,B2:B6,TRUE,TRUE)			224.999999999999	3		225	3
=LOGEST(A2:A6,B2:B6,TRUE,TRUE)	=LOGEST(A2:A6,B2:B6,TRUE,TRUE)			9.86842105263158	0.131578947368421		9.86842105263158	0.131578947368421

Note After you paste this table in your new Excel worksheet, click the Paste Options button, and then click Match Destination Formatting. With the pasted range still selected, point to Column on the Format menu, and then click AutoFit Selection.

Data for GROWTH are in cells A1:C8. (Entries in cells D2:D6 are not part of the data, but are used for illustration below.) Results of GROWTH for two different models for both versions of Excel earlier than Excel 2003 and for the Excel 2003 version are presented in cells E10:E16 and I10:116 respectively. Results in cells A10:A16 will correspond to the version of Excel that you are using. For now, focus on the Excel 2003 results in investigating how GROWTH calls LOGEST and how it uses LOGEST results.

GROWTH and LOGEST can be viewed as interacting in the following steps:

You call GROWTH(known_y's, known_x's, new_x's, constant);
GROWTH calls LOGEST(known_y's, known_x's, constant, TRUE);
Regression coefficients from this call to LOGEST are obtained; these coefficients appear in the first row of LOGEST's output table.
For each new_x's row, the predicted y-value is calculated based on these LOGEST coefficients and the new_x's values in that row.
The calculated value in step 4 is returned in the appropriate cell for GROWTH output corresponding to that new_x's row.

If GROWTH is to return appropriate results, then LOGEST had better generate appropriate results in step 3. Because the evaluation of LOGEST in step 3 requires a call to LINEST, it is essential that LINEST be well-behaved. Problems in versions of Excel earlier than Excel 2003 of LINEST come from collinear predictor columns. (There are other problems with LINEST and LOGEST in the earlier versions that occur when the last argument to GROWTH is set to FALSE, but those problems do not affect the results of GROWTH and are not discussed here.)

Predictor columns (known_x's) are collinear if at least one column, c, can be expressed as a sum of multiples of others, c1, c2, and other columns. Column c is frequently called redundant because the information that it contains can be constructed from the columns c1, c2, and other columns. The fundamental principle in the existence of collinearity is that results should be unaffected by whether a redundant column is included in the original data or removed from the original data. Because versions of Excel earlier than Excel 2003 of LINEST did not look for collinearity, this principle was easily violated. Predictor columns are almost collinear if at least one column, c, can be expressed as almost equal to a sum of multiples of others, c1, c2, and other columns. In this case "almost equal" means a very small sum of squared deviations of entries in c from corresponding entries in the weighted sum of c1, c2, and other columns; "very small" might be less than 10^(-12) for example.

The first model, in rows 10 to 12, uses columns B and C as predictors and requests Excel to model the constant (last argument set to TRUE). Excel then effectively inserts an additional predictor column that looks just like cells D2:D6. It is easy to notice that entries in column C in rows 2 to 6 are exactly equal to the sum of corresponding entries in columns B and D. Therefore, there is collinearity present because column C is a sum of multiples of

column B, and
Excel's additional column of 1's inserted because the third argument to LOGEST was omitted or TRUE (the "normal" case).

This causes such numeric problems that versions of Excel earlier than Excel 2003 cannot compute results, and the GROWTH output table is filled with #NUM!.

The second model, in rows 14 to 16, is one that any version of Excel can handle successfully. There is no collinearity and the user again requests Excel to model the constant. This model is included here for the following two reasons:

First, it is perhaps most typical of practical cases: that there is no collinearity present. These cases are handled sufficiently in all versions of Excel. Hopefully, it is reassuring that, if you have an earlier version, numeric problems are not likely to occur in the most common practical case.
Second, this example is used to compare behavior of Excel 2003 in the two models. Most major statistical packages analyze collinearity, remove a column that is a sum of multiples of others from the model, and alert the user with a message like "column C is linearly dependent on other predictor columns and has been removed from the analysis."

In Excel 2003, such a message is conveyed not in an alert or a text string, but in the LOGEST output table. GROWTH has no mechanism for delivering such a message to the user. In the LOGEST output table, a regression coefficient that is one, and whose standard error is zero, corresponds to a coefficient for a column that has been removed from the model. LOGEST output tables are included in rows 23 to 35 corresponding to the GROWTH output in rows 10 to 16. The entries in cells I24:I25 show an eliminated redundant predictor column. In this case, LOGEST chose to remove column C (coefficients in cells I24, J24, K24 correspond to columns C, B, and Excel's constant column, respectively). When there is collinearity present, any one of the columns involved can be removed and the choice is basically arbitrary.

In the second model in rows 30 to 35, there is no collinearity and no column removed. You can see that the predicted y values are the same in both models. This issue occurs because removing a redundant column that is a sum of multiples of others does not reduce the goodness of fit of the resulting model. Such columns are removed precisely because they represent no value added in trying to find the best least squares fit. Also, if you examine Excel 2003 LOGEST output in cells I23:K35, you will notice that the last three rows of the output tables are the same and that the entries in cells I31:J32 and cells J24:K25 coincide. This demonstrates that the same results are obtained when column C is included in the model, but found to be redundant (output in cells I24:K28) as when column C was eliminated before LOGEST was run (output in cells I31:J35). This satisfies the fundamental principle in the existence of collinearity.

In cells A18:C21, Microsoft uses Excel 2003 data to illustrate how GROWTH takes LOGEST output and computes the relevant predicted y-values. By examining the formulas in cells A20:A21 and cells C20:C21, you can see how LOGEST coefficients are combined with new_x's data in cells B7:C8 for each of the two models (using columns B, C as predictors; using only column B as a predictor).

Collinearity is identified in LOGEST in Excel 2003 because LOGEST calls LINEST and LINEST uses a completely different approach to solving for the regression coefficients, QR Decomposition. The LINEST article contains a walkthrough of the QR Decomposition algorithm for a small example.

Summary of results in earlier versions of Excel

GROWTH results are adversely affected in versions of Excel earlier than Excel 2003 by inaccurate results in LOGEST that, in turn, stem from inaccurate results in LINEST.

LINEST was calculated using an approach that paid no attention to collinearity issues. The existence of collinearity caused round off errors, inappropriate standard errors of regression coefficients, and inappropriate degrees of freedom. Sometimes round off problems are sufficiently severe that LINEST filled its output table with #NUM!. If, as in the great majority of cases in practice, you can be confident that there were not collinear (or almost collinear) predictor columns, then LINEST would generally provide acceptable results. Therefore, users of GROWTH can be similarly reassured if they can see the absence of collinear (or almost collinear) predictor columns.

Summary of results in Excel 2003

Improvements in LINEST include switching to the QR Decomposition method of determining regression coefficients. QR Decomposition has two advantages:

Better numeric stability (generally smaller round off errors).

-and-
Analysis of collinearity issues.

All problems with versions of Excel earlier than Excel 2003 illustrated in this article have been corrected for Excel 2003. These improvements in LINEST translate to improvements in LOGEST and GROWTH.

Conclusions

GROWTH's performance has been improved because LINEST has been greatly improved for Excel 2003. Improvements in LINEST also affect LOGEST that is essentially called by GROWTH. Users of earlier versions of Excel should verify that predictor columns are not collinear before using GROWTH. While much of the material presented in this article, and in the LINEST article, might at first appear alarming to users of versions of Excel earlier than Excel 2003, it should be noted that collinearity is a problem in a small percentage of cases. Earlier versions of Excel give acceptable GROWTH results when there is no collinearity.

Fortunately, improvements in LINEST also affect the Analysis ToolPak's linear regression tool (that calls LINEST) and two other related Excel functions: LOGEST and TREND.