Large Language Models have become a pretty common part of our lives now. I recently came across a paper that highlighted how the way we talk about LLMs might not be the best to better understand how it really works.
Models such as BERT and GPT-4 have been game-changing for how people perceive and value artificial intelligence. Many tasks that have forever demanded dedicated human intelligence can now be reduced to the next token prediction with a sufficiently performant model.
I want to talk about how, as these models have developed in just the last year or two, it has become increasingly common for people to anthropomorphize them, despite these systems working in fundamentally different ways.
We humans have evolved over time to co-exist and ensure a degree of mutual understanding. And it is surely okay to be playful when we apply such understanding to objects that may not work that way. However, I think AI is not just another thing we play with, but will have increasing influence in almost every aspect of our lives and, with them, critical consequences. Hence, to apply the same ways of working to AI systems as the ones that exist between humans is a serious mistake when they are so different in core operation.
What do LLMs really do?
Human language is an aspect of human collective behavior, and it only makes sense in the wider context of social interactions. When a child is born into a community that uses a specific language, they absorb the language through engaging and interacting with this community. And with time, not only do elements of the community change but so does the language evolve. There are new words that only make sense to one generation, and others find it crazy.
As adults, this foundation of language as a way to interact is what helps us interact with the world. A large language model works in an extremely different way.
To put simply, LLMs are mathematical models that can predict the next word based on the previous word, characters, or set of words, including punctuations from a vast corpus of human-generated text. When you ask a question to ChatGPT, the real question you are asking is something more like this: “Here is a bunch of words. According to your mathematical model, tell me what words should come next?” See? It’s more like completing the sentence or passage problems.
You may even just type “Red, blue, yellow …” and the LLM could respond with “green.” This is not because the LLM understands that Red, Blue, Yellow, and Green are primary colors, but because based on the data it is trained on, “Green” is the most likely word to appear after the combination of the words “Red”, “Blue”, and “Yellow.”
When the LLM is being used through chat interfaces like ChatGPT or Bard, you don’t see the actual ask that is leading to the specific results you get. You may simply type in “Harry Potter” and expect to see an explanation of the books or the movie series. In fact, that is what the LLM will return. However, the LLM returned that result not because it understood that you want to know about the topic (like a human individual would have understood), but because it predicted what word should come after “Harry Potter” and then kept predicting the next set of words.
These distinctions on how the LLM fundamentally works are not obvious at the level most people use it every day. However, when you evaluate the utility of the model, this matters significantly. It is important for people in the technology space, especially developers, designers, and product managers, to avoid using words like “belief,” “understanding,” “self,” or “consciousness” when describing the functionality of LLMs.
Stop nitpicking! Does this really matter?
Okay, I understand that we anthropomorphize a lot of things. Saying things like “The sun has finally decided to come out today” is fairly common. So, why does this really matter with how we talk about LLMs when we really understand what we mean by it?
To understand the importance of this, we need to acknowledge that large language models are unlike most other objects or digital applications we have used in the last few decades. LLMs are powerful tools, and importantly, they are convincingly intelligent.
As developers and designers, and in general, as any AI practitioner, it is key to convey the existing technology and the future possibilities of the existing technology to ensure people understand the potential positive and negative consequences.
When technology leaders speak to policymakers and influence critical regulations, it is important to be careful about using words like “believes” and “thinks,” which are nothing that an LLM really does.
Beyond the tech bubble, there are a lot of people being impacted by LLMs every day. It matters how we talk about LLMs because it is important for everyone to realize what it is that they are truly interacting with. It is not a being that can reason. Only an excellent predictor.
I am certainly not suggesting that there is only one way or the right way of talking about these new tools. However, anthropomorphizing these tools is certainly a negative representation. Especially because so many of us within and outside the technology space still don’t understand the true fundamental workings, this matters.
If you enjoyed this letter, share it with your friends ✌️