Should you read this article? If you understand all the words in the next section, then no. If you don’t care to understand them, then also no. If you want the bolded bits explained, then yes.
The bias-variance tradeoff
“The bias-variance tradeoff” is a popular phrase you’ll hear in the context of ML/AI. If you’re a statistician, you might think it’s about summarizing this formula:
MSE = Bias² + Variance
Well, it’s loosely related, but the phrase actually refers to a practical recipe for how to pick a model’s complexity sweet spot. It’s most useful when you’re tuning a regularization hyperparameter.
Note: If you’ve never heard of the MSE, you might need a bit of help with some of the jargon. When you hit a new term you want explained in more detail, you can follow the links to my other articles where I introduce the words I’m using.
Understanding the basics
The mean squared error (MSE) is the most popular (and vanilla) choice for a model’s loss function and it tends to be the first one you’re taught. You’ll likely take a whole bunch of stats classes before it occurs to anyone to tell you that you’re welcome to minimize other loss functions if you like. (But let’s be real: parabolae are super easy to optimize. Remember d/dx x²? 2x. That convenience is enough to keep most of you loyal to the MSE.)
Once you learn about the MSE, it’s usually mere moments until someone mentions the bias and variance formula:
MSE = Bias² + Variance
I did it too and, like a garden variety data science jerk, left the proof as homework for the interested reader.
Let’s make amends — if you’d like me to derive it for you while making snide comments in the margins, take a small detour to here. If you choose to skip the mathy stuff, then you’ll have to put up with my jazz hands and just take my word for it.
Positive vibes only
Want me to tell the key thing to you bluntly? Notice that the formula consists of two terms that can’t be negative.
The quantity (MSE) you’re trying to optimize when you fit your predictive ML/AI models can be decomposed into always-positive terms that involve bias only and variance only.
MSE = Bias² + Variance = (Bias)² + (Standard Deviation)²
Even more bluntly? Okay, sure.
A better model has a lower MSE. E stands for error and fewer errors are better, so the best model has a zero MSE: it makes no mistakes. That means it also has no bias and no variance.
Instead of perfect models, let’s look at going from good to better. If you’re truly able to improve your model (in terms of MSE), there’s no need for a tradeoff between bias and variance. If you became a better archer, you became a better archer. No tradeoff. (You probably needed more practice — data! — to get there.)
As Tolstoy would say, all perfect models are alike, but each unhappy model can be unhappy in its own way.
All perfect models are alike
As Tolstoy would say, all perfect models are alike, but each unhappy model (for a given MSE) can be unhappy in its own way. You can get two equally rubbish yet different models with the same MSE: one model can have really good (low) bias but high variance while the other can have really good (low) variance but high bias, and yet both can have the same MSE (overall score).
If we measure the performance of an archer by MSE, we’re saying that decreasing an archer’s standard deviation is worth the same as an equivalent decrease in bias. We’re saying we’re indifferent between the two. (Wait, what if you’re not indifferent between them? Then the MSE might not be your best choice here. Don’t like the MSE’s way of scoring performance? Not a problem. Make your own loss function.)
Now that we’ve set the table, head over to Part 2 where we dig into the heart of the matter: Is there an actual tradeoff? (Yes! But not where you might think.) And what does overfitting have to do with it? (Hint: everything!)
Thanks for reading! How about a YouTube course?
If you had fun here and you’re looking for an entire applied AI course designed to be fun for beginners and experts alike, here’s the one I made for your amusement:
Looking for hands-on ML/AI tutorials?
Here are some of my favorite 10 minute walkthroughs:
- Vertex AI
- AI notebooks
- ML for tabular data
- Text classification
- Image classification
- Video classification
Original post: https://towardsdatascience.com/is-there-always-a-tradeoff-between-bias-and-variance-5ca44398a552