LINEST
The LINEST function returns an array of the statistics for a straight line that best fits the given data using the least squares method.
LINEST(known-y-values, known-x-values, nonzero-y-intercept, more-stats)
known-y-values: A collection containing the known y values. known-y-values must contain either number values or date/time values. If there is only one collection of known x values, known-y-values can be any size. If there is more than one collection of known x values, known-y-values can be either one column containing the values or one row containing the values, but not both.
known-x-values: An optional collection containing the known x values. known-x-values must contain either number values or date/time values. If omitted, it will be assumed to be a set of the same size as known-y-values beginning with 1—for example, 1, 2, 3 if there are three known-y-values. If there is only one set of known x values, known-x-values, if specified, should be the same size as known-y-values. If there is more than one set of known x values, each row/column of known-x-values is considered to be one set and the size of each row/column must be the same as the size of the row/column of known-y-values.
nonzero-y-intercept: An optional modal value specifying how the y intercept (constant b) should be calculated.
normal (1, TRUE, or omitted): The value of the y intercept (constant b) should be calculated normally.
force 0 value (0, FALSE): The value of the y intercept (constant b) should be forced to be 0.
more-stats: An optional modal value specifying whether additional statistical information should be returned.
no additional stats (0, FALSE, or omitted): Do not return additional regression statistics in the returned array.
additional stats (1, TRUE): Return additional regression statistics in the returned array.
Notes
The values returned by the function are contained in an array. One method of reading the values in the array is to use the INDEX function. You can wrap the LINEST function within the INDEX function: =INDEX(LINEST(known-y-values, known-x-values, y-intercept, more-stats), y, x) where y and x are the column and row index of the desired value.
If additional statistics are not returned (more-stats is FALSE), the array returned is one row deep. The number of columns is equal to the number of sets of known-x-values plus 1. It contains the line slopes (one value for each row/column of x values) in reverse order (the first value relates to the last row/column of x values) and then the value for b, the intercept.
If additional statistics are returned (more-stats is TRUE), the array contains five rows. See additional information about this array immediately following the examples.
Examples |
---|
Given the following table of known-y-values (cells A2:A6) and known-x-values (cells B2:B6): |
A | B | |
---|---|---|
1 | Y | X |
2 | 0 | -1 |
3 | 8 | 10 |
4 | 9 | 12 |
5 | 4 | 5 |
6 | 1 | 3 |
=INDEX(LINEST(A2:A6, B2:B6, 1, 0), 1) returns approximately 0.752707581227437, given a normal (1) value for non-zero-y-intercept. This is the best-fit line slope because we specified we wanted the first value from the array returned by INDEX and we only specified one set of known-x-values. =INDEX(LINEST(A2:A6, B2:B6, 1, 0), 2) returns approximately 0.0342960288808646, which is b, the intercept for the best-fit line. The intercept was returned because we specified we wanted the second value from the array returned by INDEX, which would be the second value because we only specified one set of known-x-values. |
Contents of the array of additional statistics
LINEST can include additional statistical information in the array returned by the function. For purposes of the following discussion, suppose that there are five sets of known x values, in addition to the known y values. Suppose further that the known-x-values are in five table rows or five table columns. Based on this, the array returned by LINEST would contain the following values.
1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|
1 | S5 | S4 | S3 | S2 | S1 | b |
2 | SE5 | SE4 | SE3 | SE2 | SE1 | SEb |
3 | C | SEy | ||||
4 | F | DF | ||||
5 | R1 | R2 |
Row 1, column 1 contains S5 (the slope for the fifth set of known-x-values) continuing through column 5, which would contain S1 (the slope for the first set of known-x-values). Note that the slope related to each of the sets of known-x-values are returned in reverse order.
The last cell in row 1 contains b, the y intercept for the known x values. In our example, this would be row 1 column 6.
Row 2, column 1 contains SE5 (the standard error for the coefficient associated with the fifth set of known-x-values) continuing through column 5, which would contain SE1 (the standard error coefficient for the first set of known-x-values). These values are returned in reverse order; that is, if there are five known x value sets, the value for the fifth set is returned first in the array. This is the same way the slope values are returned.
The last cell in row 2 contains SEb, the standard error associated with the y-intercept value (b). In our example, this would be row 2 column 6.
Row 3, column 1 contains C, the coefficient of determination. This statistic compares estimated and actual y values. If it is 1, there is no difference between the estimated y value and the actual y value. This is known as perfect correlation. If the coefficient of determination is 0, there is no correlation and the given regression equation is not helpful in predicting a y value.
Row 3, column 2 contains SEy, the standard error associated with the y value estimate.
Row 4, column 1 contains F, the F observed value. The F observed value can be used to help determine whether the observed relationship between the dependent and independent variables occurs by chance.
Row 4, column 2 contains DF, the degrees of freedom. Use the degrees of freedom statistic to help determine a confidence level.
Row 5, column 1 contains R1, the regression sum of squares.
Row 5, column 2 contains R2, the residual sum of squares.
Here are some things to keep in mind about the array of additional statistics:
It does not matter whether the known x values and known y values are in rows or columns. In either case, the returned array is ordered by rows as illustrated in the table.
The example assumed five sets of known x values. If there were more or less than five, the number of columns in the returned array would change accordingly (it is always equal to the number of sets of known x values plus 1), but the number of rows would remain constant.
If additional statistics are not specified in the arguments to LINEST, the returned array is equal to the first row only.