Ultra-Large AI Models Are Over

I don’t mean ‘over’ as in “you won’t see a new large AI model ever again” but as in “AI companies have reasons to not pursue them as a core research goal — indefinitely.”

Don’t get me wrong. This article isn’t a critique of the past years — even if I don’t buy the “scale is all you need” argument, I acknowledge just how far scaling has advanced the field.

Parallelism can be drawn between the 2020–2022 scaling race and — keeping the distance — the 50s-70s space race. Both advanced science significantly as a byproduct of other intentions.

But there’s a key distinction.

While space exploration was innovative in nature, the quest for novelty isn’t present in the “bigger is better” AI trend: To conquer space, the US and USSR had to design novel paths toward a clear goal. In contrast, AI companies have blindly followed a predefined path without knowing why or whether it’d lead us anywhere.

You can’t put the cart before the horse.

That makes all the difference and explains why and how we’ve got here.

The scaling laws of large models

Call it AGI, superintelligence, human-level AI, or true AI.

In any case, it’s been a recurring goal since the field’s birth in 1956. But the idea got tangible in 2012, then more in 2017, and finally exploded in 2020.

The last milestone was OpenAI’s discovery and application of the strongest version of scaling laws for large language models (LLMs).

They accepted, earlier than anyone else, that sheer model size (and thus data and computing power) was key to advancing the field. OpenAI’s faith in the scaling hypothesis was reflected in the Jan 2020 empirical paper “Scaling Laws for Neural Language Models.”

In May 2020, OpenAI announced GPT-3, a direct result of applying the scaling laws. The unprecedentedly big 175-billion-parameter LLM put OpenAI ahead of everyone else.

The belief that making models bigger would yield emerging properties — like true intelligence — was suddenly a tangible reality.

The race toward AGI… or something

It was during this period that phrases like “AGI is coming” and “scale is all you need” became super popular.

OpenAI set a precedent. Google, Meta, Nvidia, DeepMind, Baidu, Alibaba… the major players in the field lost no time. Their priority was surpassing GPT-3. It wasn’t a competition with OpenAI but an attempt at corroborating the rumors: Did scale work so well? Could AGI really be around the corner?

Big tech companies bought the scale argument and wanted to signal their presence in the AI race. Here’s a brief, incomplete list of how the landscape changed in one year, from mid-2021 to mid-2022 [company: model (size, release date)]:

  • Google: LaMDA (137B, May 2021), and PaLM (540B, Apr 2022)
  • Meta: OPT (175B, May 2022), and BlenderBot 3 (175B, Aug 2022)
  • DeepMind: Gopher (280B, Dec 2021), and Chinchilla (70B, Apr 2022)
  • Microsoft-Nvidia: MT-NLG (530B, Oct 2021)
  • BigScience: BLOOM (176B, June 2022)
  • Baidu: PCL-BAIDU Wenxin (260B, Dec 2021)
  • Yandex: YaLM (100B, June 2022)
  • Tsinghua: GLM (130B, July 2022)
  • AI21 labs: Jurassic-1 (178B, Aug 2021)
  • Aleph Alpha: Luminous (200B, Nov 2021)
Credit: Alan D. Thompson (Life Archillect)

A pretty dramatic picture — cherry-picked to back my argument, yes, but quite revealing regardless. Companies were running away from small-scale AI.

But, what were they looking for in between hundreds of billions of parameters? They didn’t know.

Scale proved to improve performance, but were benchmark results translatable to real-world performance? They didn’t know.

Could they really reach AGI with sheer size? Could scale alone lead us to intelligence?

They also didn’t know.

Every few months a company released a new largest model. But they were escaping forward from having to think about the limitations. They didn’t have a plan. They didn’t know where they were going or, most importantly, why.

The title of “largest model ever” changed hands so much that it was hard to keep track. In April 2022 Google released PaLM — now, 6 months later, no other has claimed the throne.

Are they done?

The Algorithmic Bridge is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

They’re done — taking shots in the dark

The scaling race was intense and then stopped.

Key lessons on why large AI models aren’t the thing now stem from research conducted by those very companies. The scaling hypothesis is extremely attractive in its simplicity — but not everyone believes it.

And then there’s the bigger picture — which I so frequently try to paint in my articles.

The AI field doesn’t exist in a vacuum. Yet, one of the most prevalent aspects I notice when I read discussions on AI, scale, AGI, superintelligence, etc. is people looking exclusively at a carefully selected subset of reality.

Purely technical — and sometimes scientific — aspects matter. Everything else? Irrelevant. Key aspects that have direct influences on technological development are often ignored.

The perfect recipe for failing predictions.

Below is a broad — although likely incomplete — compilation of reasons why ultra-large AI models are over. This isn’t to say companies have stopped for these reasons, but they should consider them before resuming a pointless race.

Technical reasons

New scaling laws

Compute-optimal models, like Chinchilla, needed more data, not more parameters.

At “just” 70B parameters in size, Chinchilla instantly became the second most performant model of all across benchmarks, only behind PaLM (surpassing GPT-3, Gopher, and MT-NLG).

DeepMind found that all super-large models are “significantly undertrained.” They’re unnecessarily big.

Why make models larger when there’s room for improvement at lower sizes?

Prompt engineering limitations

And those skills don’t only depend on the effort they’ve put into learning them, but also on the collective knowledge we have about good prompting practices (people are still coming up with new methods: let’s think step by stepchain of thought promptingask me anything, and self-ask, etc.)

Unless our collective prompting knowledge gets to the theoretical maximum (which we ignore, and wouldn’t know even if we were already there), we may never know what LLMs are actually capable of.

Prompting resembles searching for an object in a dark room — when you don’t know what object you’re looking for. If we haven’t explored the latent space deep enough, why build a larger model?

Suboptimal training settings

OpenAI and Microsoft realized they could improve GPT-3 further by using the best set of initial hyperparameters during training. They found a technique to do it virtually costless simply by transferring those hyperparameters from a smaller, equivalent model.

They proved how a 6.7B version of GPT-3 performed better than its 13B brother.

Unsuitable hardware

As models got larger, chips’ memory became increasingly insufficient to host them.

Engineers applied parallelization techniques (data, model, pipeline) to make training work across entire hubs.

However, as models grow even further, they became almost intractable.

Parallelization is a band-aid. AI hardware companies like Cerebras Systems are tackling this, but universal solutions are yet to be found.

Scientific reasons

Biological neurons >>> artificial neurons

Another study published in Sep 2021 in Neuron proved artificial neurons are too simple. You need ~1000 of them — built into a whole neural network — to represent a biological neuron accurately.

This is evidence that the foundation under which deep learning and neural networks are built is too simplistic.

In turn, this renders pointless comparisons between large models and human brains. It’s not an apples-to-apples comparison. More like comparing apples to apple tree forests.

Let’s see how these findings change the too-typical comparison:

The human brain has ~100 billion neurons x ~10,000 connections. That’s 1,000 trillion synapses. If we accept the complexity delta of two orders of magnitude, we’d need a model with 100 quadrillion parameters to reach the scale of the human brain.

That’s 500,000 times the largest AI model in existence today.

Maybe pursuing AGI mindlessly through pure scaling isn’t so reasonable after all.

Wait, but why?

Without a clear goal, and moved by dubious empirical evidence, plans break down.

What was the scientific purpose of making models larger if companies didn’t know why they were doing it or what they were looking for?

Dubious construct validity and reliability

Construct validity refers to how well a test measures a concept we want to measure when it’s not directly measurable. For instance, do IQ tests measure intelligence? Do AI language benchmarks measure AI models’ linguistic ability?

Reliability refers to the consistency of a test: whether it always gives the same results under the same conditions or not.

Because of how AI benchmarks are designed, it’s often difficult to ensure adequate validity and reliability (this isn’t true for all benchmarks).

The world is multimodal

The world is multimodal and our brain is multisensory. Exploring other modes of information and how they interact inside neural networks makes more sense than further enlarging text-only models.

DeepMind’s generalist agent, Gato, is 1.2B parameters. That’s 100x smaller than GPT-3.

The AI art revolution

OpenAI’s DALL·E 2, Midjourney, Stable Diffusion, Meta’s Make-A-Scene, Google’s Imagen, and Parti… AI art models comprise 2022’s AI revolution.

Yet, there’s not a single generative visual AI model that comes anywhere close to the typical LLM size (Google’s Parti is the largest at 20B). And high-quality models abound in the 1–5B parameter range (Stable Diffusion is 1.2B and DALL·E 2 is 3.5B).

This makes AI art models easier to build, train, and deploy.

As I wrote in my last article, “the attractiveness of the visual component, the easiness with which anyone could leverage the models, and the relatively smaller size compared to their language counterparts” makes them more appealing to both companies and consumers.

Philosophical reasons

What is AGI anyway?

The lack of definitions and measuring tools creates an insurmountable gap between our knowledge of reality and reality itself.

How can we draw conclusions if AI models are uninterpretable and we rely on blind prompting to prove the presence or absence of those presumed emergent properties?

The LaMDA sentience/consciousness debate made this apparent. Saying it was sentient wasn’t even wrong, as the question was empty to start with because we lack adequate definitions and tools.

Human cognitive limits

What if we’re not that intelligent?

What if human intelligence is a very narrow form of intelligence, instead of general, and our capabilities aren’t great enough?

Erik Brynjolfsson’s Tweet

If we assume that to be true, and assume AGI has human-level intelligence for every task it does — the obvious conclusion is that AGI would surpass us greatly.

Couldn’t it simply hide from us before we notice?

Existential risks

Shouldn’t we ensure that AI is friendly before attempting at building it?

If AGI turns out to be “bad” or misaligned, it may be too late for us to do anything to revert the situation.

It’s weird that it’s precisely those most concerned with existential risks who are willing to work in scaling AI models boundlessly.

Their reasoning is circular: Because everyone is working to advance AI and progress is inevitable, we should work to advance AI to get there first. That way, we can ensure it’s done correctly.

But it’s only because they follow this reasoning that they’re working more and more to advance the field, making progress inevitable.

Aligned AI, how?

If the problem sounds not just hard, but intangible, it’s because it is. Not even people working on alignment know how to do it — mainly because the issues it tries to pre-solve are so distant we can’t begin to correctly conceptualize them.

Sociopolitical reasons

The open-source revolution

This disincentivizes companies to devote many resources to training and deploying LLMs: there’s no guaranteed ROI if people can go and download a model of similar characteristics and quality.

The dark side of LLMs

Developers train them with info scraped from the internet. The consequences? The toxic content that populates the web poisons LLMs to the point that they can become unusable.

LLMs can also generate seemingly human-written texts, which makes them great tools to generate believable misinformation.

These limitations have generated a backlash against AI companies to force them to stop training these models with the sole purpose of passing some tests.

Bad for the climate

Building, training, and deploying LLMs is polluting. It’s not the most polluting activity we do, but among those for which we don’t have a purpose, it probably is (apart from crypto, of course).

Economic reasons

The benefit-cost ratio is low

OpenAI partnered with Microsoft in 2019 for 1$ billion and even then Sam Altman thought they’d get out of money in 5 years.

OpenAI isn’t in this business to make money (they’re hard believers), but other companies are. If there’s no benefit, there’s no reason to take the cost.

Good-enough models

Companies that want to create useful, valuable products and services don’t need to keep going. Larger models would only imply minor improvements, unnoticeable at the consumer level.

Final remarks

LLMs are useful

What is pointless is pursuing a mirage of what they can become. LLMs are useful but they aren’t agents capable of reasoning and understanding in the human sense.

We’ll eventually see a larger AI model

In fact, I’d guess that if we eventually build an AGI, at least a part of it will be a deep learning large model larger than any of the existing ones.

Yet, the reasons not to pursue this avenue right now are very strong.

The arguments are valid even if my predictions are wrong

There’s more than enough to gain from what we’ve already built, and more than enough to reflect on the limitations we’ve encountered, to continue building bigger things for the sake of it.

The Algorithmic Bridge is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Subscribe to The Algorithmic Bridge. Bridging the gap between algorithms and people. A newsletter about the AI that matters to your life.

You can also support my work on Medium directly and get unlimited access by becoming a member using my referral link here! 🙂

Original post: https://albertoromgar.medium.com/ultra-large-ai-models-are-over-c7b5fc007d0a

Leave a Reply

Your email address will not be published. Required fields are marked *