A Linear Algebra View on Least Squares
Last time, we introduce the least squares problem and took a direct approach to solving it. Basically we just minimized the $$y$$-distance of a model to the data by varying the parameters of the model. Today, we find a completely different, but very neat interpretation.
Suppose we have the following problem: There is a set of linear equations without a solution, e.q. a matrix $$\mat{A}$$ and a vector $$\vec{b}$$ so that no vector $$\vec{x}$$ exists that would solve the equation.
Bummer. What does that actually mean though? If we use every possible value of $$\vec{x}$$ in this equation, we get essentially all linear combinations of the column vectors of $$A$$. That is a span of vectors. That is a subspace. So, $$\vec{b}$$ is not in our subspace, otherwise we would find an $$\vec{x}$$ that would solve our linear equation.
Here is a visualization in 3D when A is a two dimensional matrix:
The column space of the matrix is a plane and $$b$$ is not in the plane. A natural question now is: what is the vector in the column space of $$\mat{A}$$ that is the best approximation to $$b$$. But what is a good approximation? Well, how about the Euclidian distance between the endpoints of the vectors. If we choose this criteria, there is a simple solution: the projection of $$\vec{b}$$ to $$\mat{A}$$ will give the best approximation. Here is another image:
We have decomposed $$\vec{b}$$ into its best approximation in $$\mat{A}$$ which we call $$\vec{b_\parallel}$$ and a component that is orthogonal to it called $$\vec{b_}\perp$$. At the end of the day, we are interested in a best approximated solution to our original equation, so we are not really interested in finding $$\vec{b}_\parallel$$, but rather an $$\vec{x^*}$$ so that
Lets look for an equation for this $$\vec{x}^*$$. We will start out by using that the two components of $$\vec{b}$$ are perpendicular to each other. And we will also express $$\vec{b}_\perp$$ through vector addition.
Since $$\vec{x}^*$$ can't be zero, the term in parenthesis must be. This directly yields the least squares formula again:
Conclusion
We did the exact same thing as in the last post, but this time we took another approach and quite a different world view on the problem. The least squares solution came up very naturally here and with a very simple geometric interpretation (at least in the 3d case, the approach is also valid for other dimensions though).
The last iteration of this series will take a look at least squares in statistical models.