If you are in tech, you most surely have followed the rapid development of LLMs over the last year and more. But have you ever wondered about the energy being used, the carbon footprint of training, and the continuous usage of these large language models? You probably have but haven’t given it much thought.
Well, I am going to dive a little into that here.
Let’s break down the two key ways that LLMs generate emissions. Firstly, from the building costs (manufacturing of the relevant hardware and running the training on this hardware) and secondly, the ongoing usage cost, which can be seen as negligible per query but becomes extremely large when we consider the billions of monthly visits.
Let’s look at some of the numbers available in current research. I will mostly look at the CO2e (Carbon Dioxide equivalent) emissions since that is the main greenhouse gas threatening our environment.
Training
The AI startup Hugging Face was one of the first organizations to try to measure the true carbon footprint of LLMs accurately, starting with their own model BLOOM. They added up lots of different numbers: the amount of energy used in training the model on a supercomputer, the energy required in manufacturing the hardware, maintaining the infrastructure, and all the energy used to run BLOOM once it was deployed.
The final estimate was that BLOOM’s training released 22 metric tons of CO2e. That is only from the active training procedure. If you consider the hardware manufacturing and maintenance emissions, this number is doubled.
Now, for models like GPT-3 that are trained on much more carbon-intensive energy sources, the training could use 1287 mWh (more than 1.2 million kWh) of electricity and release about 552 metric tons of CO2e. These numbers for Meta’s Llama 2 are similar, using 1,273,000 (1.2 million) kWh of electricity and generating 529 metric tons of CO2e. And again, this is just the training part.
Ongoing Usage
Estimates from a recent study about the aggregate cost of ongoing usage for ChatGPT over a monthly period were between 1 to 23 million kWh. In another study, some authors arrived at the 4 million kWh using a different methodology and suggested that they are probably in the right ballpark.
Clearly, the electricity consumption is several times higher than training and so are the emissions. This matches with claims from AWS and Nvidia that usage accounts for 90% of the cost of large-scale AI workloads.
While there are plenty of such estimates from some of the top researchers in the space, there is a lot of uncertainty surrounding their accuracy because of the lack of standard methodology and lack of transparency in the development of LLMs.
How much is this really
Now, you might think that these are all big numbers and sure if must be a thing of concern considering we are talking about it. But, let’s look at how this compares to some of our more regular things to give you a better sense of how big this problem is and if in fact, it is a problem.
The average American household consumes about 2977 kWh of electricity per month. If we assume GPT-3 required a million kWh to train, then that amount of energy can power an American household for 335 months or 27 years. Now, just imagine how much more energy is being used for training GPT-4 models (which is the more commonly used one at present) and then, one can calculate the emissions from there onward. And again, that’s just the training part.
We don’t know what data Open AI used to train their models. The same stands for Google’s models powering their Gemini chatbot. Even for open-source models, the accurate impact is unknown because of the costs involved in deploying the model to an unknown and varying number of users, as well as the emissions used to produce the unique hardware.
Plus, the fact that these manufacturing and training processes use a mix of fossil fuels and renewable energy only makes this calculation more complex.
In addition to electricity, the water consumption of ChatGPT is estimated to be 500 milliliters for a session of 20-50 queries. When you add this up for billions of visitors, this amounts to billions of liters of water being used in cooling computers.
So what about it
So, what does the future look like? Is there a more sustainable way of going about training and using LLMs? Short answer: yes. But it is easier said than done.
Regardless of what the environmental impact is, all tech leaders, especially the ones training and deploying the largest and most sophisticated models, can reduce the problem by improving the location and time of training, model size, transparency, and hardware efficiency.
All of these factors can be significant. For instance, location can greatly influence the carbon footprint based on the carbon intensity in that location. One article highlights how GPT-4 trained in a Microsoft US-West data center could have almost 14 times higher carbon footprint than if it was trained in a Canada-East data center.
Another approach being suggested by some in the space is creating a new business model to train much smaller LLMs for specific categories of tasks. Companies that specialize in a specific knowledge area can fine-tune as needed and update the relevant parameters.
A study has shown that some models less than half the size of ChatGPT achieved more than 97% performance on certain tasks as compared to ChatGPT with as little as 12 hours of fine-tuning on a single GPU.
Even Sam Altman has mentioned that the ROI from making LLMs bigger will not be significant soon. There needs to be other ways to make them better. The fight of publishing a larger model to market would soon not be so useful.
It is early 2024, most of tech is focused on the shiny new, bigger, and “better” models while the other industries are either adapting these LLMs or thinking about orienting around them.
As the increased size of models becomes less important to consumers, it will be on the VCs to re-think whether the push for growth at all costs is really worth it, which is most certainly counter to the aim for sustainability. There is definitely a straight answer to regulating big tech that controls most of the training and deployment along with the infrastructure for LLM development: government regulations and consumer pressure.
While no one can say for sure that focusing on smaller fine-tuned models or optimizing location for model development will lead to a guaranteed better future, it is important to pay continued attention to the increasing demand and integration of LLM-based applications in everyday life.
Read more about this
The Environmental Impact of Large Language Models — Stanford CS324
The AI Index Annual Report — Stanford HAI
If you enjoyed this letter, share it with your friends ✌️