LLMs as “Mannequin Models”
You think you understand something. Until you have to explain it to others. And this is why I am always so happy for opportunities to share some of my PhD work — on how startups building AI-enabled products and who care about ‘being responsible’ do that in practice — with diverse audiences. In the process though, you realise how confusing it can all be. This post unpacks one metaphor I’ve just started using one day when demystifying how a large language model works: mannequin models.
Over the past few months, I’ve received a few emails, texts and calls, from people who I used to work with (mainly in consulting). The conversation goes a bit like this:
“So….
this AI thing…
we need to be having a conversation about it…
but we’re not.
That’s your thing now right?
Can you come and do some sort of session with us on what it is please so we can meaningfully engage in how it’s impacting us and our clients?”.
Of course, you can’t cover all of that in a one hour lunch’n’learn (surprise!). But what has been emerging is a real need to demystify what AI is and talk about it in ways that are a lot more accessible for organisations who are not directly in AI.
As someone who is in this space day after day and has been in some ways since my Masters in Applied Cybernetics in 2020, you’d think that I’d understand AI by now. Working out a way to communicate it is a completely different ballgame.
Demystifying AI
One of the key objectives of the session was to demystify AI.
First things, first. Everyone is using / has used AI. Even if you have never opened chatGPT. Revisiting that there are different types of AI and that if you’ve used FaceID, Alexa, Google Maps, Netflix or Spotify…then you’ve used some form of AI. But there are a LOT of different ways of thinking about AI.
Easy does it…
What’s GenAI? What are LLMs?
Then it came to explaining what some of the terms really mean. And of course, the big one at the moment is Generative AI (GenAI), and specifically Large Language Models (LLMs).
Second things, second. Generative AI (GenAI) is an umbrella term. Large Language Models (LLMs), like what powers chatGPT, are a specific type of Generative AI. They take text and generate text. Other types of Generative AI include image generators (like DALL-E or MidJourney) — they take text and generate images. And other GenAI will take text and generate audio or video… you get the picture…
Overall: All LLMs are GenAI but not all GenAI is LLMs. (ChatGPT is a LLM and thereby is GenAI)
OK. But what is a Large Language Model (LLM) and how does it work?
The best (read: easy to follow) description I have heard of how a Large Language Model (LLM) works was in a session given to us ANU School of Cybernetics PhDs by Lee Hickin, then the Chief Technology Officer of Microsoft Australia. His explanation talks about how data is pre-trained to create a foundation model which is then adapted for various uses like answering questions, recognising objects or captioning images.
When I was asked to explain how a LLM works in one of these lunch’n’learns, somewhat instinctively, I blurted out — so you can think about the foundation models underpinning Large Language Models like GPT as mannequins. They’re mannequin models.
Certain data goes in to form the basic dimensions of the mannequin model (foundation model)— its height, its curvature, its waistline, whether it has arms or just a torso etc — and then (others) can put different outfits onto the mannequin, known as adaptations. It could be dressed for comfort, for a night out, in a swimsuit etc. Similarly, a foundation model can be adapted to do question and answering (as in chatGPT), sentiment analysis, information extraction, etc.
Hickin’s explanation is fused with the mannequin metaphor in the diagram below:
I’ve found that this terminology has slipped into my chit chat about AI and how large language models work. And it’s amazing the reception it gets — ohhhh… mannequin model, I get it!
Have you come across the term mannequin models before (or did I really just make this up — chatGPT seems to think so…)?
Do you think it could be helpful for explaining how LLMs work?
Currently, Lorenn is a PhD candidate at the Australian National University’s School of Cybernetics, a Responsible Tech Collaborator at Centre for Public Impact and undertakes freelance consulting on responsible AI, governance, stakeholder engagement and strategy. Previously, Lorenn was a Director at PwC’s Indigenous Consulting and a Director of Marketing & Innovation at a Ugandan Solar Energy Company whilst a Global Fellow with Impact Investor, Acumen. She also co-founded a social enterprise leveraging sensor technology for community-led landmine detection whilst a part of Singularity University’s Global Solutions Program. Her research investigates conditions for dignity-centred AI development, with a particular focus on entrepreneurial ecosystems.