As we grapple with sovereign AI, perhaps we should treat computational resources as finite and precious

February 20, 2024

Mark Pesce

Mark Pesce is a professional futurist and public speaker. He invented the technology for 3D on the Web.

In part one, we explored how ‘foundation’ large language models (LLM), like OpenAI’s GPT-4, cost hundreds of millions of dollars to ‘train’, much of that cost being the enormous energy resources required to power the aircraft-hangar-sized data centres where training occurs.

Just a year ago we had only a few ‘foundation’ models: GPT-3.5 from OpenAI, PaLM from Google, and Claude, from OpenAI breakaway Anthropic. In the wake of the sudden and massive popularity of ChatGPT, every tech company (and a fair few non-tech enterprises) decided they needed their own foundation models, and set to work training them. It’s difficult to know exactly how many foundation models have been created in the last year, but the names of a few give you a sense of how widespread it’s become: BloombergGPT, ChaseGPT, TimeGPT-1, Falcon, LLaMA (1 and 2), Yi, Samsung Gauss, are among a growing number.

Two foundation models announced in the last week (!) point to how broad and specific these models have become.

Where foundation models from OpenAI, Microsoft and Google largely have been trained on English-language texts – and therefore struggle when presented with less-widely used languages – the Aya language model has been trained to be responsive across more than 100 languages. Aya helps to ensure those languages aren’t subject to a sort of ‘AI colonialism’ pushing those languages to the margins simply because they aren’t the ‘language of AI’. Something we wouldn’t even have considered a need for a year ago can now be seen as vital.

This touches on a point recently made by nVidia founder and CEO Jensen Huang: the need for nations to invest in ‘sovereign’ AI capabilities. Looking beyond the self-interest embedded in Huang’s observation – nVidia has become the third-most-valuable company in the world due to its commanding lead in the sales of the bits of AI kit nations will need to stock up on to ensure their sovereign AI capabilities – an ‘arms race’ paranoia already underlies US-China trade in AI-capable chips. China finds ways to smuggle nVidia chips in (despite a fierce American ban on the export of those chips) so they can train their own foundation models away from spying American eyes. Wouldn’t every nation want a similar capability – if they can afford it?

While both commercial and political imperatives drive some of the explosive growth in foundation models, more of it – much, much more – will be driven by an increasingly nuanced understanding of the value of these models. For the last year we’ve been mesmerised by ‘conversational’ bots: we speak to them, and they respond. It seems like magic, even when we know that underneath they’re all just mathematics and statistics. That bright experience of delight blinded us to the real utility of these models: the essential nature of the breakthrough, which has nothing at all to do with any language.

Another model – released last year, and updated last week – gives us a peek into the shape of a future where foundation models have become absolutely common – and incredibly useful.

GeneGPT turns the human genome dataset into a language model – all of those Ts, Cs, As, and Gs – that can be queried, explored, searched and ‘conversed’ with. GeneGPT allows a researcher – or, in the near future, a doctor treating patients with genetic disorders – to interact with a genome in much the same way they’d use ChatGPT or any other foundation model. It’s easy to imagine a medical workflow that goes from human genome sequencing (the price of which has fallen to around $1000 per patient) through GeneGPT, then on to a treatment program supervised by a doctor.

It seems like magic, even when we know underneath they’re all just mathematics and statistics.

While we’ll certainly start with a GeneGPT for humans, we can expect similar versions for the food crops we rely on, the animals we use in agriculture (and, of course, our pets), then, into the broader world: the millions of plant and animal species in the biosphere. ChatGPT informs me there are around 8.7 million identified species. On that basis alone, we could expect to see more GeneGPT foundation models than all other foundation models combined; each species with its own foundation GeneGPT model – where will we get the energy for that?

This is the question that will dog AI until we work our way toward an effective solution. We can do so much – ifwe can afford it. Any solution will likely be a ‘kludge‘ – mixing efficiencies in chip designs, improvements in training algorithms (already getting markedly better), and some good old-fashioned human ingenuity. We’re good at working within constraints – when we have no choice.

Yet this is only half the problem. Training foundation models consumes huge amounts of resources – once. ‘Inferencing’ those models – that is, getting them to generate ‘completions’ in response to ‘prompts’, produces far greater energy demands than training ever could.

Generating a response from a large language model requires a computer program to traverse the billions (or hundreds of billions, perhaps even trillions) of ‘weights’ within the model, as it searches for the statistically most likely next bit of the generated output. The program has to do this for every ‘token’ of output it generates – and as a rough rule of thumb, one token equals one syllable of generated output. Asking for a 500 word essay on the history of Tampere, Finland, for example, means that a model has to generate roughly 1000 tokens of output. If the model is something like the just released SMAUG 72B (after the famed dragon of The Hobbit), that could mean the computer program has to traverse that entire model – all seventy billion weights – for each token generated. That’s potentially a incredibly large number of discrete computer operations – well into the trillions – for each token. (These are worst-case numbers; inferencing is generally at least 100x more efficient than this.)

It takes at least a few billion computer operations to generate each syllable of a response from a foundation model. That’s obvious when we see ChatGPT ‘thinking’ for a moment before it generates a response to a prompt. Even on the very fastest computers, it takes time to do that much maths on that much data. Those computers consume huge amounts of energy in order to maintain their speed. That wouldn’t be much of an issue if we only made a few prompts a day to a foundation model – which is pretty much where things sit today. Even OpenAI – which has a hundred million weekly users of ChatGPT – doesn’t see that many prompts per day per user. We don’t have that much to say to these chatbots, yet.

But that’s about to change – because the way we use these foundation models is about to transform, tying us into a continuous loop of conversation. That shape of that future was recently revealed in a bit of work released by researchers working with Microsoft. Project ‘UFO’ (no relation to the famous Gerry Anderson series from the mid 1970s) uses AI drive an ‘agent’ – that is, a system capable of translating human requests into a series of actions that it then performs without human intervention – autonomously. These ‘autonomous agents’ were effectively impossible before large language models. Now, they’re almost easy to create, as UFO demonstrates.

It takes at least a few billion computer operations to generate each syllable of a response from a foundation model.

It’s a bit of open-source software (that you can download and play with yourself) for Windows that asks you what you’d like it to do for you – for example, delete all the speaker notes on a PowerPoint deck, or write an email on the following topic to such-and-such a person, and so on. UFO works out how to perform the task – but doesn’t try to do this by itself; instead UFO engages in continuous conversation with GPT-4, asking it how to break the problem down into steps, how to break those steps into actions, how to perform those actions, and how to test whether those actions worked. In a twist that tells us just how far AI has advanced in the last year – UFO also takes a snapshot of the Windows desktop, passing that along to GPT-4V (a version of the foundation model that’s trained to analyse images), asking it to feed back to UFO a list of what applications are open on the Windows desktop – and therefore can be ‘driven’ by UFO.

It’s still very early days for this sort of thing. UFO requires everything to be set up ‘just right’ for it to correctly perform the task at hand. That will improve, and as it does we’ll see a big change in the information flow between our computers (and our smartphones, which will similarly have agents embedded within them) and foundation models. The user’s computer continuously queries to the foundation model with questions, asking for guidance, explanations and demonstrations of how it can do what it needs to do for the user. That’s the next place AI is going – a conversation with a direction, aimed toward a specific outcome. Help me do this. Do that – while I’m over here doing this other thing. This is a different conversation, and it’s likely to be more-or-less continuous, operating in the background, supporting the user.

In that agent-enabled future – commonplace within the next two or three years – our prompts to foundation models skyrocket from a few tens a day to perhaps a few thousand per day. Some of these prompts – perhaps even most of them – could be answered locally. A smarter successor to SMAUG 72B can sit in the computer’s memory all the time, replying to the majority of the prompts. Those that it can’t work out for itself could be passed along to the ‘big brains’ in the cloud. This distribution of intelligence would help – yet, multiplied by the billions of desktops and smartphones that will all be using agents, we end up back in the same place: a lot of computation happening pretty much everywhere, on every device, pretty much all the time.

Perhaps we need to regard computing as a finite resource – like water, or a precious metal. Last year, Dutch researcher Wim Vanderbauwhede introduced the concept of ‘Frugal Computing‘: “As a society we need to start treating computational resources as finite and precious, to be utilised only when necessary, and as effectively as possible.” Precious and finite as they are, we have found an extraordinary new use for those limited computational resources. The next decade will see us trying to balance our dreams within our budgets.

Mark Pesce

Massive computers chewing up gigawatts of energy to support AI