November 2011
M	T	W	T	F	S	S
« Oct		Dec »
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

The Linear Algebra View on Least Squares

A Linear Algebra View on Least Squares

Last time, we introduce the least squares problem and took a direct approach to solving it. Basically we just minimized the $$y$$-distance of a model to the data by varying the parameters of the model. Today, we find a completely different, but very neat interpretation.

Suppose we have the following problem: There is a set of linear equations without a solution, e.q. a matrix $$\mat{A}$$ and a vector $$\vec{b}$$ so that no vector $$\vec{x}$$ exists that would solve the equation.

\begin{equation*}\mat{A} \vec{x} = \vec{b}.\end{equation*}

Bummer. What does that actually mean though? If we use every possible value of $$\vec{x}$$ in this equation, we get essentially all linear combinations of the column vectors of $$A$$. That is a span of vectors. That is a subspace. So, $$\vec{b}$$ is not in our subspace, otherwise we would find an $$\vec{x}$$ that would solve our linear equation.

Here is a visualization in 3D when A is a two dimensional matrix:

2D Matrix with Vector not in its Column Space

The column space of the matrix is a plane and $$b$$ is not in the plane. A natural question now is: what is the vector in the column space of $$\mat{A}$$ that is the best approximation to $$b$$. But what is a good approximation? Well, how about the Euclidian distance between the endpoints of the vectors. If we choose this criteria, there is a simple solution: the projection of $$\vec{b}$$ to $$\mat{A}$$ will give the best approximation. Here is another image:

We have decomposed $$\vec{b}$$ into its best approximation in $$\mat{A}$$ which we call $$\vec{b_\parallel}$$ and a component that is orthogonal to it called $$\vec{b_}\perp$$. At the end of the day, we are interested in a best approximated solution to our original equation, so we are not really interested in finding $$\vec{b}_\parallel$$, but rather an $$\vec{x^*}$$ so that

\begin{equation*}\mat{A} \vec{x}^* = \vec{b}_\parallel\end{equation*}

Lets look for an equation for this $$\vec{x}^*$$. We will start out by using that the two components of $$\vec{b}$$ are perpendicular to each other. And we will also express $$\vec{b}_\perp$$ through vector addition.

\begin{align*}0 &= \vec{b}_\parallel ^T \vec{b}_\perp \\ &= (\mat{A} \vec{x}^*)^T \vec{b}_\perp \\ &= (\mat{A} \vec{x}^*)^T (\vec{b} - \mat{A} \vec{x}^*) \\ &= \vec{x}^{* T} \mat{A}^T (\vec{b} - \mat{A} \vec{x}^*) \\ &= \vec{x}^{* T} ( \mat{A}^T \vec{b} - \mat{A}^T \vec{A} \vec{x}^*)\end{align*}

Since $$\vec{x}^*$$ can't be zero, the term in parenthesis must be. This directly yields the least squares formula again:

\begin{equation*}\vec{x}^* = (\mat{A}^T \mat{A})^{-1} \mat{A}^T \vec{b}\end{equation*}

Conclusion

We did the exact same thing as in the last post, but this time we took another approach and quite a different world view on the problem. The least squares solution came up very naturally here and with a very simple geometric interpretation (at least in the 3d case, the approach is also valid for other dimensions though).

The last iteration of this series will take a look at least squares in statistical models.

November 9 2011 Click for Comments Permalink

Twitter

Blog Archive

A Linear Algebra View on Least Squares

Conclusion