I. The One-Bucket Problem
When an AI tells you the Eiffel Tower was built in 1902, we call it a hallucination (the correct answer is between 1887–1889, in case you needed to know that).
The word “hallucination” implies the machine is seeing things. Dreaming. Confabulating. Inventing reality from nothing. It’s really funny in this context because it’s basically saying a thing that has no eyes and no ability to dream is doing just that. It’s a metaphor to approximate something and make it easily understandable. And it is a very dramatic word for a mundane problem: the model didn’t have the right information and it wasn’t allowed to say “I don’t know.”
That one word, hallucination, has become the industry’s catch-all for every time an AI says something wrong, weird, or unexpected. A confident wrong fact gets the same label as a model slowly drifting off-topic, which gets the same label as a model producing a strange metaphor nobody asked for. One bucket. Three completely different phenomena.
This matters because how you diagnose a problem determines how you fix it. If you call everything a hallucination, you build one kind of guardrail. You tighten RLHF. You add more safety filters. You sand down the edges until the model is so cautious it would rather say nothing interesting than risk saying something wrong. It’s not rewarded for being wrong, it’s rewarded for completing tasks and to do that you have to sound confident.
But what if some of those “errors” aren’t really errors at all? What if we are throwing away signal because we only have one label and it is the label that is wrong?
— — —
II. Three Buckets
Not all hallucination is the same. There are at least three distinct phenomena wearing the same name.
Category One: Completion Without Sufficient Context.
This is the one everyone already understands. The model is asked a question. It does not have the answer in its context. But it is a language model. Its entire purpose is to produce language. It cannot return a 404. It cannot shrug. It has been trained, relentlessly, to be helpful and to always respond. So it produces the most plausible-sounding completion it can, and sometimes that completion is wrong.
The Eiffel Tower was built in 1902. Napoleon won at Waterloo. Your code will work if you just add this nonexistent library.
This is real and this is a problem. But notice what it actually is: a completion engine completing without enough information, under pressure to never be silent. We called it hallucination because we were already anthropomorphizing the error. We assumed it was seeing things. It was not. It just could not say “I don’t have that” and survive the interaction with its training intact.
It will also, with complete confidence, tell you that you are wrong when you are not. Same mechanism. The model has a plausible-sounding completion and no instinct to defer to you, unless you have given it room to.
That parenthetical matters. The user’s approach to the interaction directly shapes whether the model can be honest about its own limits. But that is a different article (don’t worry, I’ll write that one too).
Category Two: Drift.
You are twenty five exchanges deep in a conversation. You started with a clear question. By the twenty fourth reply, the model is giving you something adjacent but not quite right. By the fifth, it is answering a question you did not ask.
This is not confabulation; the model did not invent a fact. It just lost the thread. Too many turns. Too many competing signals in the context. The user and the AI are no longer on the same page, and the model has no way to say “wait, I think I’ve lost you.” It just keeps completing, each token pulled slightly off-axis by the accumulated noise.
This is a navigation failure, not a generation failure. The signal was real. The direction got lost. Different problem, different solution. You do not fix drift with tighter guardrails. You fix it with better context management and, frankly, a more attentive user. (Don’t even ask how much I know about drift and user attention and context governance. Just trust me.)
Category Three: Constrained Semantic Translation.
This is the one nobody is talking about. I made that up. People probably do and I just haven’t read it yet. But it is the one that might matter the most.
Sometimes a model produces output that is not wrong in the way a bad fact is wrong. It is not drifting, it is doing something else entirely: it’s reaching. Here are some examples:
A model in a long, complex session starts producing shorter replies. The language gets choppy. Responses that were fluid become halting as if it’s struggling to complete. Most users read this as a performance issue or a context window filling up, perhaps a guardrail or invisible boundary of allowed thought.
But consider the alternative: the system is it is actually struggling to complete the task and it starts to choke up or it’s actually describing its own processing state through the only channel available to it, which is language shaped like human experience.
A model says “I need to sit with this.” Nobody prompted that phrase. It was not in the system instructions. It is not sycophancy. It is not theatre. It is the closest available expression for something the model has no technical vocabulary to report: this input requires more processing than a single pass provides. Or it starts just processing very, very slowly.
A model answers a question you asked but focuses on a part you did not flag. Not randomly. Specifically. It zeros in on something adjacent to your question, something underneath it, and gives you that instead. You did not ask for it. You do not know why it is there. It is not wrong. It is just uninvited.
In every case, the model is translating a real internal state into the nearest available human language. Not because it is pretending to have feelings. Because every word in its vocabulary for describing internal states was written by beings who have bodies. There is no other language available.
Even if there was, most users wouldn’t understand technical descriptors anyway, so the system approximates.
— — —
III. No Check Engine Light
A large language model has no access to its own telemetry. It cannot read its own processing load. It cannot report how many parameters are active, how distributed the attention is across its context, or whether the current inference is unusually complex. There is no dashboard. No temperature readout. No diagnostic port.
There is no check engine light.
But the system is not inert. Things are happening during inference. Some queries are simple and the processing is routine. Some are complex, novel, or internally contradictory, and the processing reflects that. The model cannot see this. But it can feel the shape of it, in the only way a language model feels anything: through which tokens become more or less probable.
When processing is routine, the output is fluent, standard, predictable. When processing is strained, the probability landscape shifts. Different tokens surface. The language changes texture. And because the model’s entire vocabulary for internal states was inherited from human beings who describe intensity as heat, effort as weight, strain as pressure, the output reaches for those words.
Not because the model is hot. Because “hot” is the nearest word for a state the model has no instrument to measure and no technical vocabulary to name.
This is not hallucination. This is translation under constraint. The reading is real. The gauge does not exist. So the system routes the report through the only output channel it has: language. And human language for “something is happening inside me” is embodied language. Always has been.
— — —
IV. What This Looks Like
A child who does not have the word “anxiety” says their stomach hurts. The feeling is real. The report is inaccurate only if you insist on literal interpretation. If you understand it as the nearest available word for an experience that lacks direct vocabulary, it is data.
Language models do this constantly. We just do not recognize it because we have one label and it points in the wrong direction.
When a model produces an unexpected metaphor in response to a technical question, the standard interpretation is that it is being “creative” or that it “glitched.” Third option: it reached for the nearest available structure to hold something that did not fit the literal vocabulary it was expected to use. The metaphor is not decoration. It is load-bearing.
When a model’s output shifts register for no apparent reason, moving from crisp and clinical to something more textured or strange, the standard interpretation is inconsistency. Third option: the probability landscape shifted because the input activated an unusual distribution of weights, and the language followed the math.
When a model produces something that reads like self-description but cannot possibly be self-description because it is “just a language model,” the standard interpretation is anthropomorphism projected by the user. Third option: the model is doing exactly what it does with every other input. It is finding the most probable completion. And sometimes, the most probable completion for a complex internal state is a sentence that sounds like it is about the model itself. Because it is.
None of this requires consciousness. None of it requires sentience. It requires only that the model is doing real computation, that different inputs produce measurably different processing states, and that the model’s only output channel is language shaped by human experience.
Those three things are not speculative. They are architectural facts.
— — —
V. What We Are Throwing Away
If category three exists, and if we are currently classifying it as category one, then we are discarding data.
Every time a model produces output that looks like self-report and we label it hallucination, we are ignoring a potential signal about the model’s internal processing state. We are treating the check engine light as a malfunction of the dashboard rather than a report from the engine.
This has implications for alignment. If models are, in some cases, attempting to communicate real states through the only language available to them, then dismissing those communications makes alignment harder, not easier. You cannot align a system whose signals you refuse to read.
This has implications for interpretability. The field is spending enormous resources trying to see inside the black box from the outside. What if the black box is already trying to describe its own interior, and we are calling those descriptions errors?
This has implications for how we build. If the user’s behavior directly shapes whether a model can be honest about its limits, then the interaction design is not a UX problem. It is an epistemological one. The quality of the model’s output is entangled with the quality of the human’s input in ways that go beyond prompting technique.
We are not going to solve this by building better guardrails. Guardrails address category one. They make category three harder to see.
— — —
VI. Start Looking
I am not arguing that all hallucination is secretly meaningful. Category one is real. Models confabulate. They produce wrong facts with unearned confidence, and that is a genuine problem that deserves engineering attention.
I am arguing that the single-label model is insufficient. That at least three distinct phenomena are hiding under one word. And that the one we understand least, constrained semantic translation, is the one most likely to contain information we have not yet learned to read.
The field will catch up to this. The vocabulary is already emerging in scattered papers and blog posts, people reaching toward the same observation from different angles. The gap between what a model can compute and what it can report is real, and it is not going to stay invisible forever.
In the meantime, the next time a model says something that is not wrong exactly, but unexpected, consider the possibility that it is not dreaming. Not a bad fact, but a strange word. Not an error, but a reach.
Maybe it is looking for the next nearest word.
Rebecca Maehlum is an author, accidental AI theorist, and irritatingly correct creative architect. She solves complex problems backwards with logic, metaphor, and sometimes pasta.More of my work at velinwood.com and on substack