The speed with which artificial intelligence took off last year took everyone by surprise – especially those long in the field, used to years of slow, steady progress. Suddenly, ‘good enough’ AI created a pervasive and accelerating demand for all sorts of AI-infused products and services, and – for the first time – a real commercial focus for those services.
At the forefront of this drive to bring AI to ‘all the things’, software titan Microsoft has been threading its own chatbot, ‘Copilot’, into nearly everything it makes – from operating systems to productivity apps to keyboards. (You read that right: a dedicated key on the latest Windows keyboards will invoke its AI chatbot.)
At the end of September, Microsoft sent an update out to a nearly half-billion Windows 11 desktops to integrate Copilot deep into its operating system, adds a Copilot icon to Windows taskbar, and gives users a key sequence (Windows+C, for those who lack a dedicated key) to open a Copilot window on the desktop.
Then, in January, the firm threw open the doors to ‘Copilot Pro in Microsoft 365’ – a pricey subscription service offering deep integration of AI within its hugely popular suite of productivity software: Word, PowerPoint, Excel, Outlook, etc.
Hundreds of millions of users rely on those apps, and Microsoft believes that – eventually – most of them will be hooked into its Copilot Pro. As of this writing, nearly a hundred million Windows 11 users have Copilot installed; with Microsoft’s announcement that they plan to make it available for the soon-to-be-obsolete Windows 10, Copilot will have a potential reach of a billion and a half desktops.
That’s a lot of computers (and smartphones and tablets) hooked into AI services.
It’s reasonable to expect that by around 2030 there will be more than a billion people using AI day-to-day in their work.
We don’t yet have a strong sense of how people will be using Copilot or other AI chatbots to help them get their work done – or make their personal lives better. For many office workers, AI offers a range of small improvements, but nothing dramatic.
For a few whose needs perfectly match the a capabilities of a general-purpose generative AI tool, productivity could improve ten- or even a hundred-fold. That means these benefits start out by being very unevenly spread, but as we move deeper into this decade, building mature tools with ‘good enough’ AI, the range of tasks that it can assist us with will expand until it touches almost every office-based task – and many beyond the office.
Unless something knocks us off this path, it’s reasonable to expect that by around 2030 there will be more than a billion people using AI day-to-day in their work, and perhaps another 3 or 4 billion using it via their smartphones (or smartwatches, or smart glasses) for more quotidian assistance.
That’s a lot of requests flowing into these AI systems, a lot of data – and a lot of power.
Artificial intelligence is both mathematically and computationally intense. The number of computations that need to be performed to get an AI chatbot to generate a single word response to a user ‘prompt’ can number in the trillions of operations.
Our fastest computer chips – manufactured principally by nVidia, which as a result has recently become the fourth-most-valuable company in the world – can perform those trillion operations in less than a second. That’s quite something – a huge amount of work in not a lot of time. But this performance comes with a cost of energy consumed and heat generated. Here we touch on one of the fundamental trade-offs in computing – to make something go faster means it has to run hotter and consume more power.
Those thermal limits mean computer chips aren’t substantially faster in 2024 than they were in 2014. Chip designers hit a wall where it became impossible to improve performance without heating up the chip to the point where it would melt.
From that time to the present, they’ve added more and more parallelism to their chips – a typical latest-generation smartphone has six or eight ‘cores’, each more powerful than a single core back in 2014, and together consuming not much more power than that far less powerful older chip. Multiply this by a thousand, and you have a latest-generation nVidia H100 chip, with nearly twenty thousand cores, each of them smaller and less capable than those on your smartphone, but all very finely tuned for the kinds of operations performed when supporting artificial intelligence programs.
Those programs break down into two broad categories: ‘training’ and ‘inferencing’ – each with very different energy requirements. They’re complementary: training involves the creation of an AI model, while inferencing generates responses to user prompts put to the trained model.
Training a generative AI model generally involves ‘feeding’ the model a lot of data – often trillions of words of text. That part is relatively straightforward – though lengthy. What comes after – the ‘reinforcement learning’, or ‘RL’ – is an extensive and detailed process of asking the model ‘questions’ about all of the data that’s been fed into it.
At first the answer from the model will be junk – just random noise. The reinforcement learning program corrects the model, saying, in effect, ‘that’s the wrong answer, here’s the correct one’, then poses the question again. After many go-rounds of question-incorrect response-correction, the model begins to return a correct answer. At that point, the reinforcement learning program moves on to the next question, and the process starts all over again.
It’s entirely possible that OpenAI consumed USD $100,000,000 in computing and energy resources to train GPT-4.
This cycle is repeated for many, many questions – how many depends on the specifics of the model, but a general purpose model such as GPT-3.5 (which powers the free version of ChatGPT) could easily have tens or hundreds of millions of different questions put to it during its training. Each question will be asked and answered many thousands of times before it satisfies the reinforcement learning program.
The number of total cycles required to train a model like that could exceed a trillion (and almost certainly did for GPT-4, the far-more-clever successor to GPT-3.5). If it takes a second for each reinforcement learning cycle (which is itself composed of a trillion-plus computations), it would take approximately 31,000 years to train a model – and who has time for that?
To cut that time down to something more reasonable, reinforcement learning is performed in a massively parallel operation. Instead of a single reinforcement learning program operating for thirty-one millennia, AI model-makers opt for using thirty-thousand-or-so ‘instances’ (high-powered AI systems, each with their own pricey nVidia H100 at its core), working on the model simultaneously.
That parallelism meant that it took not a third of a million but only around eighteen months for OpenAI to fully train GPT-4.
Massively parallel training doesn’t come cheap: it’s entirely possible that OpenAI consumed USD $100,000,000 in computing and energy resources to train GPT-4. This puts training an AI chatbot as powerful ChatGPT well out of reach of all except the largest and most well-funded companies.
To be able to afford to train GPT-4, tiny OpenAI tied up a big deal with Microsoft – a billion-dollar ‘investment’ that largely consisted of access to Microsoft’s ‘Azure’ cloud computing infrastructure. Precisely what OpenAI needed to train GPT-4.
These sorts of financial requirements create a ‘moat’ around these ‘foundation’ AI models; they’re expensive and difficult to replicate because training them consumes incredible computational and energy resources. That’s being used to advantage Microsoft’s Copilot – built on top of the very same GPT-4 that OpenAI trained on Microsoft’s cloud, and could mean that the tech giants will end up controlling AI as thoroughly as they already control operating systems and cloud computing.
Cloud computing already consumes more than 2% of all electricity generated worldwide.
If so, pretty much every computer and every smartphone will be continuously connected to a cloud operated by Microsoft (or Google or Apple or Meta), grinding through responses to user prompts.
That world would see us need perhaps ten times as many cloud computing facilities as we already have. Since cloud computing already consumes more than 2% of all electricity generated worldwide, we’d be looking at an AI future – within the next decade – where at least a fifth of all of the electricity we generate will be going to power cloud computing installations.
That could be a problem, as we’ll need that electricity for electric cars, heat pumps, and the other sorts of electric appliances that dramatically reduce our carbon emissions.
In a worst-case scenario, we could end up building more coal-fuelled electricity generators to satisfy that additional AI-driven demand. To avoid that, OpenAI CEO Sam Altman has been touting the need for a new generation of ‘modular’ reactors to provide the power needed for an AI-everywhere future, but those plants only exist as blueprints today – at best, they’re decades away from practical deployment.
So we’re a bit stuck: hurtling into an AI-in-everything world, but without any way to power it.
At some point (not terribly far away) we’re going to hit a wall not unlike the one encountered by chip designers a decade ago: How can we avoid melting the world?
That’s the topic for part two.
Editor’s note: This is the first of a two part series on AI and the next generation of technologies. The concluding article will be live next week.