What Least Squares Assumes About Your Data
In my linear algebra course in college I learned about least squares regression, where it wasn't very motivated outside of "we square the error because it makes error non-negative, really penalizes outliers, and it's easy to compute".
Recently, I learned about maximum likelihood estimation, where we pick parameters for a model based on whatever maximizes the likelihood of the observed data. For example, if we have a random variable
Y = a X + b + E where X ~ U(0, 1) (X uniformly sampled from [0, 1]) E ~ N(0, s^2) (error sampled from a Gaussian distribution)
Given some set of samples (X1, Y1), (X2, Y2), ..., (Xn, Yn), omitting some constants for clarity, we can find the parameters (a, b) that maximize the likelihood via:
argmax[a,b] of product[i=1...n] of exp(-((Yi - a Xi - b) / s)^2)
{ taking logarithms, an increasing function }
= argmax[a,b] of sum[i=1...n] of -((Yi - a Xi - b) / s)^2
{ taking out the constant -s^2 term }
= argmin[a,b] of sum[i=1...n] of (Yi - (a Xi + b))^2
Which is just a least squares fit.
So, least squares can be interpreted as maximizing likelihood, assuming that the error is drawn from a normal distribution. Assuming other distributions yield other maximum likely hood estimators. For example, assuming Laplace-distributed errors yields an estimator that minimizes the sum of absolute errors instead of squared errors.
A gap in knowledge that I had for the longest time was understanding what a model is. I had some fuzzy idea that a model is a simplified version of reality, that follows strict rules that let us do some calculations about the model. In retrospect I was confusing models with abstractions.
I now think of a model as a belief, a claim regarding the process that generates some data set.
Once you commit to believing that your model is true, you can derive exact quantities: probabilities, expected values, MLEs, etc. Then, you can apply those results "blindly" to the real world (the model is true, after all).
If the estimations fail, we don't just patch them with ad-hoc fixes. Instead, we push those changes back into the model. We ask: what change to the model would yield the change in estimator that I see is needed? Perhaps the answer then is adding a new parameter, who knows.
There's even the extreme version of this: In practice we often come up with solutions or rules of thumb without any model behind them at all. But we can ask: for what model or models would this rule be the exact optimal solution? That alone can point to cases where the rule would break down.
Thinking about real-world problem solving this way feels less like just picking up tricks and more like a systematic (scientific, even) process of modeling -> testing -> refining models.
Comments
Post a Comment