How the least squares method is implemented. Least squares method in excel - using the trend function

Example.

Experimental data on the values ​​of variables X And at are given in the table.

As a result of their alignment, the function is obtained

Using method least squares , approximate these data by a linear dependence y=ax+b(find parameters A And b). Find out which of the two lines better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

The essence of the least squares method (LSM).

The task is to find the linear dependence coefficients at which the function of two variables A And b takes the smallest value. That is, given A And b the sum of squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, solving the example comes down to finding the extremum of a function of two variables.

Deriving formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of a function with respect to variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations using any method (for example by substitution method or ) and obtain formulas for finding coefficients using the least squares method (LSM).

Given A And b function takes the smallest value. The proof of this fact is given.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and parameter n- amount of experimental data. We recommend calculating the values ​​of these amounts separately. Coefficient b found after calculation a.

It's time to remember the original example.

Solution.

In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​in the 2nd row for each number i.

The values ​​in the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values ​​from the last column of the table into them:

Hence, y = 0.165x+2.184- the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Error estimation of the least squares method.

To do this, you need to calculate the sum of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in the sense of the least squares method.

Since , then straight y = 0.165x+2.184 better approximates the original data.

Graphic illustration of the least squares (LS) method.

Everything is clearly visible on the graphs. The red line is the found straight line y = 0.165x+2.184, the blue line is , pink dots are the original data.

Why is this needed, why all these approximations?

I personally use it to solve problems of data smoothing, interpolation and extrapolation problems (in the original example they might be asked to find the value of an observed value y at x=3 or when x=6 using the least squares method). But we’ll talk more about this later in another section of the site.

Proof.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.

The method of least squares (OLS) allows you to estimate various quantities using the results of many measurements containing random errors.

Characteristics of MNEs

The main idea of ​​this method is that the sum of squared errors is considered as a criterion for the accuracy of solving the problem, which they strive to minimize. When using this method, both numerical and analytical approaches can be used.

In particular, as a numerical implementation, the least squares method involves taking as many measurements as possible of an unknown random variable. Moreover, the more calculations, the more accurate the solution will be. Based on this set of calculations (initial data), another set of estimated solutions is obtained, from which the best one is then selected. If the set of solutions is parameterized, then the least squares method will be reduced to finding optimal value parameters.

As an analytical approach to the implementation of LSM on a set of initial data (measurements) and an expected set of solutions, a certain one (functional) is determined, which can be expressed by a formula obtained as a certain hypothesis that requires confirmation. In this case, the least squares method comes down to finding the minimum of this functional on the set of squared errors of the original data.

Please note that it is not the errors themselves, but the squares of the errors. Why? The fact is that often deviations of measurements from the exact value are both positive and negative. When determining the average, simple summation may lead to an incorrect conclusion about the quality of the estimate, since the cancellation of positive and negative values ​​will reduce the power of sampling multiple measurements. And, consequently, the accuracy of the assessment.

To prevent this from happening, the squared deviations are summed up. Even moreover, in order to equalize the dimension of the measured value and the final estimate, the sum of the squared errors is extracted

Some MNC applications

MNC is widely used in various fields. For example, in probability theory and mathematical statistics, the method is used to determine such a characteristic of a random variable as the standard deviation, which determines the width of the range of values ​​of the random variable.

After leveling, we obtain a function of the following form: g (x) = x + 1 3 + 1 .

We can approximate this data using the linear relationship y = a x + b by calculating the corresponding parameters. To do this, we will need to apply the so-called least squares method. You will also need to make a drawing to check which line will best align the experimental data.

Yandex.RTB R-A-339285-1

What exactly is OLS (least squares method)

The main thing we need to do is to find such coefficients of linear dependence at which the value of the function of two variables F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 will be the smallest. In other words, for certain values ​​of a and b, the sum of the squared deviations of the presented data from the resulting straight line will have a minimum value. This is the meaning of the least squares method. All we need to do to solve the example is to find the extremum of the function of two variables.

How to derive formulas for calculating coefficients

In order to derive formulas for calculating coefficients, you need to create and solve a system of equations with two variables. To do this, we calculate the partial derivatives of the expression F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 with respect to a and b and equate them to 0.

δ F (a , b) δ a = 0 δ F (a , b) δ b = 0 ⇔ - 2 ∑ i = 1 n (y i - (a x i + b)) x i = 0 - 2 ∑ i = 1 n ( y i - (a x i + b)) = 0 ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + ∑ i = 1 n b = ∑ i = 1 n y i ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + n b = ∑ i = 1 n y i

To solve a system of equations, you can use any methods, for example, substitution or Cramer's method. As a result, we should have formulas that can be used to calculate coefficients using the least squares method.

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n

We have calculated the values ​​of the variables at which the function
F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 will take the minimum value. In the third paragraph we will prove why it is exactly like this.

This is the application of the least squares method in practice. Its formula, which is used to find the parameter a, includes ∑ i = 1 n x i, ∑ i = 1 n y i, ∑ i = 1 n x i y i, ∑ i = 1 n x i 2, as well as the parameter
n – it denotes the amount of experimental data. We advise you to calculate each amount separately. The value of the coefficient b is calculated immediately after a.

Let's go back to the original example.

Example 1

Here we have n equals five. To make it more convenient to calculate the required amounts included in the coefficient formulas, let’s fill out the table.

i = 1 i=2 i=3 i=4 i=5 ∑ i = 1 5
x i 0 1 2 4 5 12
y i 2 , 1 2 , 4 2 , 6 2 , 8 3 12 , 9
x i y i 0 2 , 4 5 , 2 11 , 2 15 33 , 8
x i 2 0 1 4 16 25 46

Solution

The fourth row includes the data obtained by multiplying the values ​​from the second row by the values ​​of the third for each individual i. The fifth line contains the data from the second, squared. The last column shows the sums of the values ​​of individual rows.

Let's use the least squares method to calculate the coefficients a and b we need. To do this, let's substitute required values from the last column and calculate the amounts:

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n ⇒ a = 5 33, 8 - 12 12, 9 5 46 - 12 2 b = 12, 9 - a 12 5 ⇒ a ≈ 0, 165 b ≈ 2, 184

It turns out that the required approximating straight line will look like y = 0, 165 x + 2, 184. Now we need to determine which line will better approximate the data - g (x) = x + 1 3 + 1 or 0, 165 x + 2, 184. Let's estimate using the least squares method.

To calculate the error, we need to find the sum of squared deviations of the data from the straight lines σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 and σ 2 = ∑ i = 1 n (y i - g (x i)) 2, the minimum value will correspond to a more suitable line.

σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 = = ∑ i = 1 5 (y i - (0, 165 x i + 2, 184)) 2 ≈ 0, 019 σ 2 = ∑ i = 1 n (y i - g (x i)) 2 = = ∑ i = 1 5 (y i - (x i + 1 3 + 1)) 2 ≈ 0.096

Answer: since σ 1< σ 2 , то прямой, наилучшим образом аппроксимирующей исходные данные, будет
y = 0.165 x + 2.184.

The least squares method is clearly shown in the graphical illustration. The red line marks the straight line g (x) = x + 1 3 + 1, the blue line marks y = 0, 165 x + 2, 184. The original data is indicated by pink dots.

Let us explain why exactly approximations of this type are needed.

They can be used in tasks that require data smoothing, as well as in those where data must be interpolated or extrapolated. For example, in the problem discussed above, one could find the value of the observed quantity y at x = 3 or at x = 6. We have devoted a separate article to such examples.

Proof of the OLS method

In order for the function to take a minimum value when a and b are calculated, it is necessary that at a given point the matrix of the quadratic form of the differential of the function of the form F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 is positive definite. Let's show you how it should look.

Example 2

We have a second order differential of the following form:

d 2 F (a ; b) = δ 2 F (a ; b) δ a 2 d 2 a + 2 δ 2 F (a ; b) δ a δ b d a d b + δ 2 F (a ; b) δ b 2 d 2 b

Solution

δ 2 F (a ; b) δ a 2 = δ δ F (a ; b) δ a δ a = = δ - 2 ∑ i = 1 n (y i - (a x i + b)) x i δ a = 2 ∑ i = 1 n (x i) 2 δ 2 F (a; b) δ a δ b = δ δ F (a; b) δ a δ b = = δ - 2 ∑ i = 1 n (y i - (a x i + b) ) x i δ b = 2 ∑ i = 1 n x i δ 2 F (a ; b) δ b 2 = δ δ F (a ; b) δ b δ b = δ - 2 ∑ i = 1 n (y i - (a x i + b)) δ b = 2 ∑ i = 1 n (1) = 2 n

In other words, we can write it like this: d 2 F (a ; b) = 2 ∑ i = 1 n (x i) 2 d 2 a + 2 2 ∑ x i i = 1 n d a d b + (2 n) d 2 b.

We obtained a matrix of the quadratic form M = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n .

In this case, the values ​​of individual elements will not change depending on a and b . Is this matrix positive definite? To answer this question, let's check whether its angular minors are positive.

We calculate the angular minor of the first order: 2 ∑ i = 1 n (x i) 2 > 0 . Since the points x i do not coincide, the inequality is strict. We will keep this in mind in further calculations.

We calculate the second order angular minor:

d e t (M) = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n = 4 n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2

After this, we proceed to prove the inequality n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 using mathematical induction.

  1. Let's check whether this inequality is valid for an arbitrary n. Let's take 2 and calculate:

2 ∑ i = 1 2 (x i) 2 - ∑ i = 1 2 x i 2 = 2 x 1 2 + x 2 2 - x 1 + x 2 2 = = x 1 2 - 2 x 1 x 2 + x 2 2 = x 1 + x 2 2 > 0

We have obtained a correct equality (if the values ​​x 1 and x 2 do not coincide).

  1. Let us make the assumption that this inequality will be true for n, i.e. n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 – true.
  2. Now we will prove the validity for n + 1, i.e. that (n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 > 0, if n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 .

We calculate:

(n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 = = (n + 1) ∑ i = 1 n (x i) 2 + x n + 1 2 - ∑ i = 1 n x i + x n + 1 2 = = n ∑ i = 1 n (x i) 2 + n x n + 1 2 + ∑ i = 1 n (x i) 2 + x n + 1 2 - - ∑ i = 1 n x i 2 + 2 x n + 1 ∑ i = 1 n x i + x n + 1 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + n x n + 1 2 - x n + 1 ∑ i = 1 n x i + ∑ i = 1 n (x i) 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + x n + 1 2 - 2 x n + 1 x 1 + x 1 2 + + x n + 1 2 - 2 x n + 1 x 2 + x 2 2 + . . . + x n + 1 2 - 2 x n + 1 x 1 + x n 2 = = n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + + (x n + 1 - x 1) 2 + (x n + 1 - x 2) 2 + . . . + (x n - 1 - x n) 2 > 0

The expression enclosed in curly braces will be greater than 0 (based on what we assumed in step 2), and the remaining terms will be greater than 0, since they are all squares of numbers. We have proven the inequality.

Answer: the found a and b will match lowest value functions F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2, which means they are the desired parameters of the least squares method (LSM).

If you notice an error in the text, please highlight it and press Ctrl+Enter

Ordinary Least Squares (OLS) method - mathematical method, used to solve various problems, based on minimizing the sum of squared deviations of certain functions from the desired variables. It can be used to “solve” overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find solutions in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate point values ​​of some function. OLS is one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data.

Encyclopedic YouTube

    1 / 5

    ✪ Least squares method. Subject

    ✪ Mitin I.V. - Processing of physical results. experiment - Least squares method (Lecture 4)

    ✪ Least squares method, lesson 1/2. Linear function

    ✪ Econometrics. Lecture 5. Least squares method

    ✪ Least squares method. Answers

    Subtitles

Story

Before early XIX V. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, private techniques were used that depended on the type of equations and on the wit of the calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was the first to use the method, and Legendre (1805) independently discovered and published it under its modern name (French. Méthode des moindres quarrés) . Laplace connected the method with probability theory, and the American mathematician Adrain (1808) considered its probability-theoretic applications. The method was widespread and improved by further research by Encke, Bessel, Hansen and others.

The essence of the least squares method

Let x (\displaystyle x)- kit n (\displaystyle n) unknown variables (parameters), f i (x) (\displaystyle f_(i)(x)), , m > n (\displaystyle m>n)- a set of functions from this set of variables. The task is to select such values x (\displaystyle x), so that the values ​​of these functions are as close as possible to certain values y i (\displaystyle y_(i)). Essentially we are talking about the “solution” of an overdetermined system of equations f i (x) = y i (\displaystyle f_(i)(x)=y_(i)), i = 1 , … , m (\displaystyle i=1,\ldots ,m) in the indicated sense of maximum proximity of the left and right parts systems. The essence of the least squares method is to select as a “proximity measure” the sum of squared deviations of the left and right sides | f i (x) − y i |

∑ i e i 2 = ∑ i (y i − f i (x)) 2 → min x (\displaystyle \sum _(i)e_(i)^(2)=\sum _(i)(y_(i)-f_( i)(x))^(2)\rightarrow \min _(x)).

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions to the system of equations can be found analytically or, for example, using various numerical optimization methods. If the system is overdetermined, that is, loosely speaking, the number of independent equations is greater than the number of desired variables, then the system does not have an exact solution and the least squares method allows us to find some “optimal” vector x (\displaystyle x) in the sense of maximum proximity of vectors y (\displaystyle y) And f (x) (\displaystyle f(x)) or maximum proximity of the deviation vector e (\displaystyle e) to zero (closeness is understood in the sense of Euclidean distance).

Example - system of linear equations

In particular, the least squares method can be used to "solve" the system linear equations

A x = b (\displaystyle Ax=b),

Where A (\displaystyle A) rectangular size matrix m × n , m > n (\displaystyle m\times n,m>n)(i.e. the number of rows of matrix A is greater than the number of sought variables).

Such a system of equations in general case has no solution. Therefore, this system can be “solved” only in the sense of choosing such a vector x (\displaystyle x) to minimize the "distance" between vectors A x (\displaystyle Ax) And b (\displaystyle b). To do this, you can apply the criterion of minimizing the sum of squares of the differences between the left and right sides of the system equations, that is (A x − b) T (A x − b) → min (\displaystyle (Ax-b)^(T)(Ax-b)\rightarrow \min ). It is easy to show that solving this minimization problem leads to solving the following system of equations

A T A x = A T b ⇒ x = (A T A) − 1 A T b (\displaystyle A^(T)Ax=A^(T)b\Rightarrow x=(A^(T)A)^(-1)A^ (T)b).

OLS in regression analysis (data approximation)

Let there be n (\displaystyle n) values ​​of some variable y (\displaystyle y)(this could be the results of observations, experiments, etc.) and related variables x (\displaystyle x). The challenge is to ensure that the relationship between y (\displaystyle y) And x (\displaystyle x) approximate by some function known to within some unknown parameters b (\displaystyle b), that is, actually find the best values ​​of the parameters b (\displaystyle b), maximally approximating the values f (x , b) (\displaystyle f(x,b)) to actual values y (\displaystyle y). In fact, this comes down to the case of “solving” an overdetermined system of equations with respect to b (\displaystyle b):

F (x t , b) = y t , t = 1 , … , n (\displaystyle f(x_(t),b)=y_(t),t=1,\ldots ,n).

In regression analysis and in particular in econometrics, probabilistic models of dependence between variables are used

Y t = f (x t , b) + ε t (\displaystyle y_(t)=f(x_(t),b)+\varepsilon _(t)),

Where ε t (\displaystyle \varepsilon _(t))- so called random errors models.

Accordingly, deviations of the observed values y (\displaystyle y) from model f (x , b) (\displaystyle f(x,b)) is already assumed in the model itself. The essence of the least squares method (ordinary, classical) is to find such parameters b (\displaystyle b), at which the sum of squared deviations (errors, for regression models they are often called regression residuals) e t (\displaystyle e_(t)) will be minimal:

b ^ O L S = arg ⁡ min b R S S (b) (\displaystyle (\hat (b))_(OLS)=\arg \min _(b)RSS(b)),

Where R S S (\displaystyle RSS)- English Residual Sum of Squares is defined as:

R S S (b) = e T e = ∑ t = 1 n e t 2 = ∑ t = 1 n (y t − f (x t , b)) 2 (\displaystyle RSS(b)=e^(T)e=\sum _ (t=1)^(n)e_(t)^(2)=\sum _(t=1)^(n)(y_(t)-f(x_(t),b))^(2) ).

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case they talk about nonlinear least squares(NLS or NLLS - English Non-Linear Least Squares). In many cases you can get analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function R S S (b) (\displaystyle RSS(b)), differentiating it according to unknown parameters b (\displaystyle b), equating the derivatives to zero and solving the resulting system of equations:

∑ t = 1 n (y t − f (x t , b)) ∂ f (x t , b) ∂ b = 0 (\displaystyle \sum _(t=1)^(n)(y_(t)-f(x_ (t),b))(\frac (\partial f(x_(t),b))(\partial b))=0).

OLS in the case of linear regression

Let the regression dependence be linear:

y t = ∑ j = 1 k b j x t j + ε = x t T b + ε t (\displaystyle y_(t)=\sum _(j=1)^(k)b_(j)x_(tj)+\varepsilon =x_( t)^(T)b+\varepsilon _(t)).

Let y is the column vector of observations of the variable being explained, and X (\displaystyle X)- This (n × k) (\displaystyle ((n\times k)))-matrix of factor observations (rows of the matrix are vectors of factor values ​​in a given observation, columns are a vector of values ​​of a given factor in all observations). The matrix representation of the linear model has the form:

y = X b + ε (\displaystyle y=Xb+\varepsilon ).

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

y ^ = X b , e = y − y ^ = y − X b (\displaystyle (\hat (y))=Xb,\quad e=y-(\hat (y))=y-Xb).

Accordingly, the sum of squares of the regression residuals will be equal to

R S S = e T e = (y − X b) T (y − X b) (\displaystyle RSS=e^(T)e=(y-Xb)^(T)(y-Xb)).

Differentiating this function with respect to the vector of parameters b (\displaystyle b) and equating the derivatives to zero, we obtain a system of equations (in matrix form):

(X T X) b = X T y (\displaystyle (X^(T)X)b=X^(T)y).

In deciphered matrix form, this system of equations looks like this:

(∑ x t 1 2 ∑ x t 1 x t 2 ∑ x t 1 x t 3 … ∑ x t 1 x t k ∑ x t 2 x t 1 ∑ x t 2 2 ∑ x t 2 x t 3 … ∑ x t 2 x t k ∑ x t 3 x t 1 ∑ x t 3 x t 2 ∑ x t 3 2 … ∑ x t 3 x t k ⋮ ⋮ ⋮ ⋱ ⋮ ∑ x t k x t 1 ∑ x t k x t 2 ∑ x t k x t 3 … ∑ x t k 2) (b 1 b 2 b 3 ⋮ b k) = (∑ x t 1 y t ∑ x t 2 y t ∑ x t 3 y t ⋮ ∑ x t k y t) , (\displaystyle (\begin(pmatrix)\sum x_(t1)^(2)&\sum x_(t1)x_(t2)&\sum x_(t1)x_(t3)&\ldots &\sum x_(t1)x_(tk)\\\sum x_(t2)x_(t1)&\sum x_(t2)^(2)&\sum x_(t2)x_(t3)&\ldots &\ sum x_(t2)x_(tk)\\\sum x_(t3)x_(t1)&\sum x_(t3)x_(t2)&\sum x_(t3)^(2)&\ldots &\sum x_ (t3)x_(tk)\\\vdots &\vdots &\vdots &\ddots &\vdots \\\sum x_(tk)x_(t1)&\sum x_(tk)x_(t2)&\sum x_ (tk)x_(t3)&\ldots &\sum x_(tk)^(2)\\\end(pmatrix))(\begin(pmatrix)b_(1)\\b_(2)\\b_(3 )\\\vdots \\b_(k)\\\end(pmatrix))=(\begin(pmatrix)\sum x_(t1)y_(t)\\\sum x_(t2)y_(t)\\ \sum x_(t3)y_(t)\\\vdots \\\sum x_(tk)y_(t)\\\end(pmatrix)),) where all sums are taken over all valid values t (\displaystyle t).

If a constant is included in the model (as usual), then x t 1 = 1 (\displaystyle x_(t1)=1) in front of everyone t (\displaystyle t), therefore, in the upper left corner of the matrix of the system of equations there is the number of observations n (\displaystyle n), and in the remaining elements of the first row and first column - simply the sums of the variable values: ∑ x t j (\displaystyle \sum x_(tj)) and the first element of the right side of the system is ∑ y t (\displaystyle \sum y_(t)).

The solution of this system of equations gives general formula OLS estimates for the linear model:

b ^ O L S = (X T X) − 1 X T y = (1 n X T X) − 1 1 n X T y = V x − 1 C x y (\displaystyle (\hat (b))_(OLS)=(X^(T )X)^(-1)X^(T)y=\left((\frac (1)(n))X^(T)X\right)^(-1)(\frac (1)(n ))X^(T)y=V_(x)^(-1)C_(xy)).

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when dividing by n, arithmetic means appear instead of sums). If in a regression model the data centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second is a vector of covariances of factors with the dependent variable. If in addition the data is also normalized to MSE (that is, ultimately standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - a vector of sample correlations of factors with the dependent variable.

An important property of OLS estimates for models with constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is satisfied:

y ¯ = b 1 ^ + ∑ j = 2 k b ^ j x ¯ j (\displaystyle (\bar (y))=(\hat (b_(1)))+\sum _(j=2)^(k) (\hat (b))_(j)(\bar (x))_(j)).

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the average value of the explained variable. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion of the minimum sum of squared deviations from it.

The simplest special cases

In the case of a steam room linear regression y t = a + b x t + ε t (\displaystyle y_(t)=a+bx_(t)+\varepsilon _(t)) when assessed linear dependence one variable from another, the calculation formulas are simplified (you can do without matrix algebra). The system of equations has the form:

(1 x ¯ x ¯ x 2 ¯) (a b) = (y ¯ x y ¯) (\displaystyle (\begin(pmatrix)1&(\bar (x))\\(\bar (x))&(\bar (x^(2)))\\\end(pmatrix))(\begin(pmatrix)a\\b\\\end(pmatrix))=(\begin(pmatrix)(\bar (y))\\ (\overline (xy))\\\end(pmatrix))).

From here it is easy to find coefficient estimates:

( b ^ = Cov ⁡ (x , y) Var ⁡ (x) = x y ¯ − x ¯ y ¯ x 2 ¯ − x ¯ 2 , a ^ = y ¯ − b x ¯ . (\displaystyle (\begin(cases) (\hat (b))=(\frac (\mathop (\textrm (Cov)) (x,y))(\mathop (\textrm (Var)) (x)))=(\frac ((\overline (xy))-(\bar (x))(\bar (y)))((\overline (x^(2)))-(\overline (x))^(2))),\\( \hat (a))=(\bar (y))-b(\bar (x)).\end(cases)))

Despite the fact that in the general case models with a constant are preferable, in some cases it is known from theoretical considerations that a constant a (\displaystyle a) must be equal to zero. For example, in physics the relationship between voltage and current is U = I ⋅ R (\displaystyle U=I\cdot R); When measuring voltage and current, it is necessary to estimate the resistance. In this case, we are talking about the model y = b x (\displaystyle y=bx). In this case, instead of a system of equations we have a single equation

(∑ x t 2) b = ∑ x t y t (\displaystyle \left(\sum x_(t)^(2)\right)b=\sum x_(t)y_(t)).

Therefore, the formula for estimating the single coefficient has the form

B ^ = ∑ t = 1 n x t y t ∑ t = 1 n x t 2 = x y ¯ x 2 ¯ (\displaystyle (\hat (b))=(\frac (\sum _(t=1)^(n)x_(t )y_(t))(\sum _(t=1)^(n)x_(t)^(2)))=(\frac (\overline (xy))(\overline (x^(2)) ))).

The case of a polynomial model

If the data is fit by a polynomial regression function of one variable f (x) = b 0 + ∑ i = 1 k b i x i (\displaystyle f(x)=b_(0)+\sum \limits _(i=1)^(k)b_(i)x^(i)), then, perceiving degrees x i (\displaystyle x^(i)) as independent factors for each i (\displaystyle i) it is possible to estimate the model parameters based on the general formula for estimating the parameters of a linear model. To do this, it is enough to take into account in the general formula that with such an interpretation x t i x t j = x t i x t j = x t i + j (\displaystyle x_(ti)x_(tj)=x_(t)^(i)x_(t)^(j)=x_(t)^(i+j)) And x t j y t = x t j y t (\displaystyle x_(tj)y_(t)=x_(t)^(j)y_(t)). Hence, matrix equations in this case it will take the form:

(n ∑ n x t … ∑ n x t k ∑ n x t ∑ n x i 2 … ∑ m x i k + 1 ⋮ ⋮ ⋱ ⋮ ∑ n x t k ∑ n x t k + 1 … ∑ n x t 2 k) [ b 0 b 1 ⋮ b k ] = [ ∑ n y t ∑ n t y t ⋮ ∑ n x t k y t ] .

(\displaystyle (\begin(pmatrix)n&\sum \limits _(n)x_(t)&\ldots &\sum \limits _(n)x_(t)^(k)\\\sum \limits _( n)x_(t)&\sum \limits _(n)x_(i)^(2)&\ldots &\sum \limits _(m)x_(i)^(k+1)\\\vdots & \vdots &\ddots &\vdots \\\sum \limits _(n)x_(t)^(k)&\sum \limits _(n)x_(t)^(k+1)&\ldots &\ sum \limits _(n)x_(t)^(2k)\end(pmatrix))(\begin(bmatrix)b_(0)\\b_(1)\\\vdots \\b_(k)\end( bmatrix))=(\begin(bmatrix)\sum \limits _(n)y_(t)\\\sum \limits _(n)x_(t)y_(t)\\\vdots \\\sum \limits _(n)x_(t)^(k)y_(t)\end(bmatrix)).)

Statistical properties of OLS estimators First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to perform regression analysis: conditional on the factors, the mathematical expectation of a random error must be equal to zero. This condition, in particular, is satisfied if

  1. the mathematical expectation of random errors is zero, and
  2. factors and random errors are independent random variables.

The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain qualitative assessments in this case). In the classical case, a stronger assumption is made about the determinism of the factors, as opposed to a random error, which automatically means that the exogeneity condition is met. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix V x (\displaystyle V_(x)) to some non-singular matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be also effective (the best in the class of linear unbiased estimates), additional properties of random error must be met:

These assumptions can be formulated for the covariance matrix of the random error vector V (ε) = σ 2 I (\displaystyle V(\varepsilon)=\sigma ^(2)I).

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in the English literature the abbreviation is sometimes used BLUE (Best Linear Unbiased Estimator) - the best linear unbiased estimate; In Russian literature, the Gauss-Markov theorem is more often cited). As is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

V (b ^ O L S) = σ 2 (X T X) − 1 (\displaystyle V((\hat (b))_(OLS))=\sigma ^(2)(X^(T)X)^(-1 )).

Efficiency means that this covariance matrix is ​​“minimal” (any linear combination of coefficients, and in particular the coefficients themselves, have minimal variance), that is, in the class of linear unbiased estimators, OLS estimators are best. The diagonal elements of this matrix - the variances of coefficient estimates - are important parameters of the quality of the obtained estimates. However, it is not possible to calculate the covariance matrix because the random error variance is unknown. It can be proven that an unbiased and consistent (for a classical linear model) estimate of the variance of random errors is the quantity:

S 2 = R S S / (n − k) (\displaystyle s^(2)=RSS/(n-k)).

Substituting given value into the formula for the covariance matrix and obtain an estimate of the covariance matrix. The resulting estimates are also unbiased and consistent. It is also important that the estimate of the error variance (and therefore the variance of the coefficients) and the estimates of the model parameters are independent random variables, which allows you to obtain test statistics to test hypotheses about the model coefficients.

It should be noted that if the classical assumptions are not met, OLS parameter estimates are not the most efficient and, where W (\displaystyle W) is some symmetric positive definite weight matrix. Conventional least squares is a special case of this approach, where the weight matrix is ​​proportional to the identity matrix. As is known, for symmetric matrices (or operators) there is an expansion W = P T P (\displaystyle W=P^(T)P). Therefore, the specified functional can be represented as follows e T P T P e = (P e) T P e = e ∗ T e ∗ (\displaystyle e^(T)P^(T)Pe=(Pe)^(T)Pe=e_(*)^(T)e_( *)), that is, this functional can be represented as the sum of the squares of some transformed “remainders”. Thus, we can distinguish a class of least squares methods - LS methods (Least Squares).

It has been proven (Aitken’s theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are the so-called estimates. generalized Least Squares (GLS - Generalized Least Squares)- LS method with a weight matrix equal to the inverse covariance matrix of random errors: W = V ε − 1 (\displaystyle W=V_(\varepsilon )^(-1)).

It can be shown that the formula for GLS estimates of the parameters of a linear model has the form

B ^ G L S = (X T V − 1 X) − 1 X T V − 1 y (\displaystyle (\hat (b))_(GLS)=(X^(T)V^(-1)X)^(-1) X^(T)V^(-1)y).

The covariance matrix of these estimates will accordingly be equal to

V (b ^ G L S) = (X T V − 1 X) − 1 (\displaystyle V((\hat (b))_(GLS))=(X^(T)V^(-1)X)^(- 1)).

In fact, the essence of OLS lies in a certain (linear) transformation (P) of the original data and the application of ordinary OLS to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted OLS

In the case of a diagonal weight matrix (and therefore a covariance matrix of random errors), we have the so-called weighted Least Squares (WLS). In this case, the weighted sum of squares of the model residuals is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation: e T W e = ∑ t = 1 n e t 2 σ t 2 (\displaystyle e^(T)We=\sum _(t=1)^(n)(\frac (e_(t)^(2))(\ sigma_(t)^(2)))). In fact, the data are transformed by weighting the observations (dividing by an amount proportional to the expected standard deviation random errors), and the usual OLS is applied to weighted data.

ISBN 978-5-7749-0473-0 .

  • Econometrics. Textbook / Ed. Eliseeva I.I. - 2nd ed. - M.: Finance and Statistics, 2006. - 576 p. - ISBN 5-279-02786-3.
  • Alexandrova N.V. History of mathematical terms, concepts, notations: dictionary-reference book. - 3rd ed. - M.: LKI, 2008. - 248 p. - ISBN 978-5-382-00839-4. I.V. Mitin, Rusakov V.S. Analysis and processing of experimental data - 5th edition - 24 p.


  • error: Content protected!!