This video (by the masterful Sean Rule is quite possibly the best explanation of how linear regression works that I’ve ever seen.
A couple questions that come up when thinking about linear regression:
Why are we only concerned with the VERTICAL distance on the y axis between each point and the line?
Sean points out: “You minimize the vertical (offset) distance because you’re checking the error between the model (the “best fit” line) and the actual “performance” of the data. By checking the vertical distance, the x – coordinate (input variable) remains consistent between y (data dependent value) and “y hat” (the predicted y-value).”
Or to put it another way: for each value on the x axis, we want to see how far off we are on the y axis. We know our x values–the point of linear regression is to tell us the value of y based on the known value of x. So all we care about it is the y distance from the line for any specific x point.
Why do we square the errors instead of just taking absolute value?
Sean covers this briefly about 2:10 in, but let me try to build it out, starting by saying: it’s complex, tricky and the main answer is a dumb one, namely: “that’s how we’ve always done it”.
There’s a better reason that involves calculus, but to be honest with you, I don’t quite get it (the curse of being an English major, I suppose).
Suffice it to say, you probably could use absolute value and get something that works. If you’re really smart, you’ll know the limitations and advantages of each method. I am not really smart, so I’m gonna do it the way most everybody does. That’s a horrible answer, but it ticks the important boxes, namely:
- Does it work to give us good predictions most of the time? Yes, yes it does.
- Does it handle negative differences as just as important as positive ones? Yes, yes it does.
Ok, that’s good enough for me!