Balancing accuracy and emissions: the climate cost of your AI 

Every time you ask an artificial intelligence a question, there’s a surprising cost: carbon emissions.

Before an AI like ChatGPT can respond, it first breaks down your input into “tokens” — small chunks of text such as words, parts of words, or punctuation. These tokens are turned into numbers the model can process using billions of internal settings called parameters, which help it recognise patterns, make connections and predict what comes next. These predictions are made one token at a time and then assembled into a final answer.

That entire process consumes energy. And now, researchers in Germany have calculated how much CO₂ is released by different large language models (LLMs) when they answer a question.

LLMs are the software behind tools like ChatGPT, Google Gemini and other AI assistants. They’ve been trained on massive volumes of text to learn how to read, write and respond intelligently. 

“If users know the exact CO₂ cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies.”

The researchers tested 14 LLMs by asking them 1,000 benchmark questions across diverse subjects. They then calculated the associated CO₂ emissions, revealing a big divide between “concise” models and those that generate lengthy, reasoned responses.

“The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions,” says first author Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences. “We found that reasoning-enabled models produced up to 50 times more CO₂ emissions than concise response models.”

Reasoning models, on average, created 543.5 ‘thinking’ tokens per question, whereas concise models required just 37.7 tokens. More tokens mean higher CO₂ emissions, but it doesn’t always correspond with accuracy.  

The best-performing model, Cogito (with 70 billion parameters), scored 84.9% accuracy, but emitted three times more CO₂ than similar-sized models that gave shorter answers.

“Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies,” says Dauner. “None of the models that kept emissions below 500 grams of CO₂ equivalent achieved higher than 80% accuracy on answering the 1,000 questions correctly.” 

The subject mattered, too. Philosophical or abstract mathematical questions caused up to six times more emissions than simpler topics like high school history, due to longer reasoning chains.

The researchers hope these findings will encourage more thoughtful use of AI.

“Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power,” says Dauner.

Even the choice of model makes a difference. For example, DeepSeek R1 (70 billion parameters) answering 600,000 questions generates emissions equivalent to a round-trip flight from London to New York. By contrast, another model — Qwen 2.5, with 72 billion parameters — can answer more than three times as many questions with similar accuracy, while generating the same emissions.

The team notes that the emissions figures may vary depending on the hardware used and the energy source powering it (for instance, coal-heavy grids versus renewables), but the key message remains: asking a chatbot isn’t free from climate consequences.

“If users know the exact CO₂ cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies,” says Dauner.

These findings are published in Frontiers.

Please login to favourite this article.