Disclaimer: My opinions are informed by my experience maintaining Cortex, an open source platform for machine learning engineering.
If you frequent any part of the tech internet, you’ve come across GPT-3, OpenAI’s new state of the art language model. While hype cycles forming around new technology isn’t new—GPT-3’s predecessor, GPT-2, generated quite a few headlines as well—GPT-3 is in a league of its own.
Looking at Hacker News for the last couple months, there have been dozens of hugely popular posts, all about GPT-3:
And of course, there have been plenty of “Is this the beginning of SkyNet?” articles written:
The excitement over GPT-3 is just a piece of an bigger trend. Every month, we see more and more new initiatives release, all built on machine learning.
To understand why this is happening, and what the trend’s broader implications are, GPT-3 serves as a useful study.
What’s so special about GPT-3?
The obvious take here is that GPT-3 is simply more powerful than any other language model, and that the increase in production machine learning lately can be chalked up to similar improvements across the field.
Undoubtedly, yes. This is a factor. But, and this is crucial, GPT-3 isn’t so popular just because it’s powerful. GPT-3 is ubiquitous because it is usable.
By “usable,” I mean that anyone can build with it, and it’s easy. For context, after the full GPT-2 was released, most of the popular projects built on it were built by machine learning specialists, and required substantial effort:
Comparatively, it has only been a couple of months since GPT-3’s announcement, and we’re already seeing dozens of viral projects built on it, often of the “I got bored and built this in an afternoon” variety:
Anyone with some basic engineering chops can now build an application leveraging state of the art machine learning, and this increase in the usability of models—not just their raw power—is an industry-wide phenomenon.
Why it’s suddenly so easy to build with machine learning
One of the biggest blockers to using machine learning in production has been infrastructure. We’ve had models capable of doing incredible things for a long time, but actually building with them has remained a major challenge.
For example, consider GPT-2. How would you build a GPT-2 application?
Intuitively, the model is more or less an input-output machine, and the most logical thing to do would be to treat it as some sort of microservice, a
predict() function your application could call. Pass in some text and receive GPT-2 generated text in return, just like any other API.
This is the main way of deploying GPT-2 (what is known as realtime inference), and it comes with some serious challenges:
- GPT-2 is massive. The fully trained model is roughly 6 GB. Hosting a GPT-2 microservice requires a lot of disk space.
- GPT-2 is compute hungry. Without at least one GPU, you will not be able to generate predictions with anywhere near acceptable latency.
- GPT-2 is expensive. Given the above, you need to deploy GPT-2 to a cluster provisioned with large GPU instances—very expensive at scale.
And this is just for the vanilla, pretrained GPT-2 model. If you want to fine tune GPT-2 for other tasks, that too will be its own technical challenge.
This is why machine learning has been so unusable. Using it in production required you not only to be versed in machine learning, but also DevOps and backend development. This describes very few people.
Over the last several years, this has changed. There has been an emphasis in the community to improve infrastructure, and as a result, it’s gotten much easier to actually use models. Now, you can take a new model, write your API, and hit
deploy—no DevOps needed.
GPT-3 is an extreme example of this trend. The model, which is almost certainly too large for most teams to host, was actually released as an API.
While this move rankled many, it had a secondary effect. All of a sudden, using the most powerful language model in the world was easier than sending a text message with Twilio or setting up payments with Stripe.
In other words, you could call GPT-3 the most complex language model in history, but you could also call it just another API.
The number of people who can query an API, as it turns out, is orders of magnitude higher than the number of people that can deploy GPT-2 to production, hence the huge number of GPT-3 projects.
Machine learning engineering is mainstream now
GPT-3’s hype train is a convergence of things. It does have unprecedented accuracy, but it is also incredibly usable, and was released at a time when machine learning engineering has matured as an ecosystem and discipline.
For context, machine learning engineering is a field focused on building applications out of models. “How can I train a model to most accurately generate text?” is an ML research question. “How can I use GPT-2 to write folk music?” is a machine learning engineering question.
Because the machine learning engineering community is growing rapidly, companies are releasing new models like web frameworks, hoping to attract engineers to build with them. A consideration, therefore, has to be usability—they want to release not just the most powerful, but the most used model.
Obviously, the proliferation of machine learning has many implications, but for engineers, there are two big conclusions to draw from this GPT-3 situation:
- It is easier than ever for you to actually build with machine learning.
- It is unlikely that in the near future you will be working on a piece of software that doesn’t not incorporate machine learning in some way.
Machine learning is becoming a standard part of the software stack, and that trend is only accelerating. If you’re not already, it’s time to get familiar with production machine learning.