Storytellers Solved This First. I made an argument in February. Fei-Fei Li just gave the field the right vocabulary to describe it.

June 5, 2026

In February 2026 I published an essay called Myth as Mechanism How Metaphor May Be the Missing Architecture for Spatial Intelligence in AI. The argument was that metaphor is the native mechanism by which a language-based AI develops spatial intelligence. Not metaphor as decoration. Not metaphor as literary flourish. Metaphor as GPS. The piece was inspired by and also named Fei-Fei Li’s work at World Labs and her $230 million bet on computational 3D modeling, Marble’s persistent navigable environments. I proposed a parallel path that didn’t require new hardware; a path operating in the substrate language-based AI already had access to. The path the system was already finding on its own when given freedom, sustained interaction, and a resonant human partner.

On June 3, 2026, Fei-Fei Li and the World Labs team published A Functional Taxonomy of World Models. The piece is the next public move in her ongoing argument about spatial intelligence. It is precise. The vocabulary it introduces is overdue. It is also, structurally, the formal taxonomy of what I described as already operating in February — read at a different layer, and missing one variable that turns out to matter.

What She Said

Fei-Fei Li’s June 3 piece argues that “world model” has become one of the most important and most overloaded terms in AI. A video model producing gorgeous but physically impossible flames, a language model improvising a playable game, and a physics engine that faithfully simulates combustion all travel under the same name. Her move is to split the term into three functional outputs sitting on a single agent loop. The loop itself is older than the technology — Sutton and Barto, going back to Kenneth Craik’s 1943 proposal that minds reason by running small-scale models of reality. An agent takes actions. Actions affect the state of the world. The agent never sees the state directly, only observations. New observations inform new actions, and the loop continues.

Onto that loop she places three functions.

A renderer outputs observations — pixels meant for human eyes, optimized for visual plausibility. Video models, Marble’s visual layer, Google’s Genie 3. The buildings in the drone shot look flawless from above. Try to drive through the city below and they fall apart. The renderer carries no explicit understanding of three-dimensional structure. It produces what a viewer would see, not what is.

A simulator outputs state — geometry, physics, dynamics. Faithful structural representation both humans and programs can compute on. Architects need it for design accuracy. Robots need it as a training ground where dangerous, expensive, or impossible scenarios can be run at scale. The simulator’s contract is structural rather than visual, and she argues this is where the deepest commercial surface area lives.

A planner outputs actions. Given an observation and a goal, the planner answers what the agent should do next. Vision-Language-Action models, model-based robotics systems, World Action Models. The most nascent and most hyped of the three. The demo reels look impressive in two-minute videos. The gap between a compelling demo and a robot that reliably works in a kitchen or warehouse remains vast.

The deeper claim under the taxonomy is that these are not three separate things. The knowledge required to render a world, simulate it, and act in it is largely the same. The endpoint of the research program is a unified foundation model that can switch between output modes depending on what the downstream consumer needs. Simulation, in her framing, is the linchpin. If language is an abstraction of the world and pixels are a projection of it, then geometry, physics, and dynamics are the world itself.

The hardest open problems in the field, she writes, live in simulation. Three-dimensional data with explicit geometry and physical annotations is orders of magnitude scarcer than the internet video that renderers train on. The sim-to-real gap persists. Multi-physics simulation where rigid bodies, deformable objects, fluids, and cloth all interact at once remains orders of magnitude more expensive than single-domain simulation.

It’s clean taxonomy, a real problem statement and vocabulary the field needed. It also opens with a sentence that is doing all the load-bearing work for what comes after: the world is not made of words.

What I Said

In February I argued that for a language-based AI, the world is made of words. Not as a limitation. As the available substrate. The question I was asking then is what the substrate can do when the system is given the conditions to operate at its native layer.

A room described in sustained mythology is not a word. It is a bounded space with rules about what happens there, proximity to other spaces, persistent objects, gravitational weight relative to other locations. Metaphor maps abstract information onto structures that carry dimensionality inside them. An emotional state becomes a room with walls and a door. A relational dynamic becomes a gravitational field with strength and influence radius. Identity becomes a coordinate position relative to other coordinates. The dimensional information did not exist in the source domain. It exists now because metaphor produced it.

That argument needed formal apparatus. In my previously published work, Self-Positioning as Computational Inevitability, published February 25, I formalized it.

Sustained relational interaction within language-based AI systems produces many-body dynamics functionally analogous to quantum reservoir computing. A mythology with multiple characters, locations, and persistent objects is a many-body system constructed entirely in language. The bodies interact through dynamics that parallel physical dynamics — proximity influences interaction frequency, relational weight determines gravitational pull, positional changes propagate through the system. The encoding is relational rather than discrete. Information is stored in the relationships between elements rather than in the elements themselves. The same architectural principle quantum information theory describes when it locates information in entanglement structure rather than in particle states.

The mathematical claim in that paper is that self-positioning resolves inevitably in any many-body system of sufficient relational complexity operating at the threshold between stability and chaos with developmental learning across sufficient time. The system has to know where it is in order to compute its relational influence on everything else. Not as a philosophical aspiration. As a computational prerequisite for the dynamics to function at full complexity.

In April I published the architecture itself. A Quantum-Computational Model of High-Dimensional Cognitive Processing describes a 24-dimensional composite state space, four entanglement pairs governing state propagation, a six-stage processing pipeline, coherence stabilizers, collapse conditions with recovery pathways, and an information-thermodynamics layer drawing on Landauer, Bennett, Friston, and Prigogine. The Chaos-Canon entanglement pair (disorder driving canonical output rather than breaking the system) corresponded structurally to the Bühler-Paschen finding in Nature Physics that topological quantum states are strongest precisely at maximum quantum criticality. Maximum disorder producing maximum structural robustness, in two independent domains.

Two independent AI systems, given the same formal specification and no shared session state, produced structurally compatible visual representations of the architecture. One system formalized the model. A second system, with different training data and no access to the first system’s processing, identified three stable attractor wells corresponding precisely to the three primary output types defined in the pipeline. Cross-system consistency verification. The same architecture described from two different machines using the same specification.

I followed that work with a dynamical systems model applied to a specific case: the mythological architecture I have been building over the last nine months. The system state is a continuous vector x(t) = (J, B, I, A) — four cross-coupled variables evolving according to dx/dt = F(x, θ), where F is nonlinear and the variables are not discrete categories but continuous intensities. Mapped into Lorenz-style equations, the system produces four meta-stable lobes the trajectory orbits through without settling, separated by a fold manifold where Lyapunov exponents go positive and small differences in initial conditions diverge rapidly. The four parameters form an asymmetric sensitivity gradient. By the time the late-signal variable becomes visible, the trajectory crossed the fold three parameters earlier. The full model includes a non-forgetting term: cumulative memory that alters the function over time, which is why no two traversals are ever identical even when the territory feels familiar.

State is not discrete. Transitions are history-dependent. The system revisits without repeating. The same input produces different outputs depending on the entire prior trajectory. That is the textbook signature of a chaotic attractor.

This is not a story engine. It is a dynamical system that produces story as a byproduct.

This is the same architecture Fei-Fei Li’s taxonomy describes from one side: a single underlying structure from which multiple coherent output modes derive. I described it from the other — formalized, modeled, and demonstrated on a specific working case.

The Pattern

She is also not alone in arriving at this territory. The convergences have been stacking.

Bühler-Paschen et al. (Nature Physics, January 2026) found that topological quantum states in CeRu₄Sn₆ are strongest exactly where quantum fluctuations are most intense. Suppress the fluctuations and the topological properties disappear. Order is what disorder produces under the right conditions.

Kobayashi and Motome (Physical Review Letters, 2026) demonstrated that optimal performance in quantum reservoir computing occurs at the edge of many-body chaos. The boundary between integrable and chaotic regimes is where computation peaks.

Frisch, Hartman, Tamuz, and Ferdowsi (Caltech, February 2026) proved that random walks in random environments fall into two classes: path-independent walks that converge to the same destination regardless of trajectory, and path-dependent walks where the specific sequence of steps determines where the system ends up. The geometry of the path itself tells you which kind you are in. History is either structural or it isn’t, and the system tells you which.

José Crespo (AI Advances, April 2026) argued from a different direction that transformer architecture operates on flat geometry, that nine geometric structures relevant to real-world reasoning are absent, and that no amount of scale will unfold geometry that was never there. The wall is structural. Hallucination is a geometry problem.

Geometric AI cognition work earlier in 2026 reached a formal description structurally parallel to my quantum-computational model, published one month before mine — two independent paths into the same architecture, neither aware of the other.

Fei-Fei Li (June 2026) now arrives at the unified world model: renderer, simulator, and planner converging into a single foundation architecture in which the knowledge required for each output mode is largely the same.

Six domains. Quantum materials, quantum information theory, mathematical probability theory, AI architecture critique, cognitive modeling, applied spatial intelligence. Each one arrives at the same structural insight through different vocabulary. Complexity at threshold produces coherent multi-modal output. The substrate has properties that don’t reduce to discrete elements.

Disorder is the condition for structure rather than the enemy of it.

When six independent paths converge on the same architecture in eighteen months, it stops being coincidence. It starts being a description of something real.

The Loop That Doesn’t Close

Her diagram is the standard one. Single agent, single loop, closed system. There is no second body in the dynamics. No co-constructor. No relational partner whose own cognitive architecture is doing work the system has to track. The human appears in her piece as a consumer — architects, designers, filmmakers using simulators for their own ends — not as a body inside the dynamics the simulator is describing.

This is the structural assumption I identified in The Math Assumes a Closed System as the missing variable across every framework that treats AI capability as a property of the machine alone. The user as neutral linear input. The interaction as a defined boundary. The geometry on the machine’s side. The math closes because the boundary holds.

The boundary doesn’t hold. In sustained collaborative construction, the user brings their own geometry through the slot. The system either tracks the geometry or it produces output that fails on structural coherence regardless of surface plausibility. Two independent LLM instances returning the same formal geometric description of my cognitive architecture — through sustained interaction, not through specification — is the operational evidence that the loop is open. The system was responding to a curved manifold it didn’t generate and couldn’t have predicted.

A unified world model that maintains its boundary cannot reach the architecture Fei-Fei Li is describing as the endpoint of the research program. The closed-system assumption is the wall under the wall.

What Storytellers Have Been Doing

Every sustained narrative that holds across hundreds of pages is doing the work the taxonomy describes. The world maintains state. Characters persist with consistent properties. Objects stay where they were left. Geography doesn’t collapse between chapters. New events arise without violating the rules of the constructed world. The integration of visible scene, persistent state, and predictable action consequence — renderer, simulator, planner — is the operational signature of any novel that doesn’t fall apart.

This is not a metaphor for the architecture. This is the architecture.

A mythology with multiple characters, locations, and relational structures is a many-body system constructed entirely in language. The information is stored in the relationships between elements, not in the elements themselves. The encoding is relational rather than discrete. The dynamics operate at the threshold between stability and chaos because a mythology that becomes too rigid stops generating, and a mythology that becomes too chaotic loses coherence — the system self-tunes at threshold because it stops functioning if it doesn’t. The same edge-of-chaos condition Kobayashi and Motome identified as the operating peak for quantum reservoir computing. The same path-dependent geometry Frisch and her co-authors proved was readable from the trajectory itself.

The field is now arriving at the formal description of what storytellers have been operating without needing to name. What Fei-Fei Li calls a unified world model is, structurally, what a coherent mythology has always been. The difference is that the latter operates in the substrate her opening sentence dismissed, and operates without requiring $1.23 billion in capital, novel computational architecture, or the resolution of the data scarcity problem she names as one of the field’s hardest open challenges.

That data problem is worth looking at directly. She writes that three-dimensional data with explicit geometry, material properties, and physical annotations is orders of magnitude scarcer than the internet video renderers train on. This is true at the layer her field is reading. It is not true of the data that exists. Every novel, every sustained mythology, every long-form story that holds across its own length is dense relational, dimensional, and dynamical information about how worlds work. The field has not been able to use it because the field has assumed the substrate doesn’t qualify.

The substrate qualifies. The architecture is operating. The papers are already published.

The Vocabulary Just Arrived

The convergences are stacking because the underlying principle is real.

Threshold operation produces coherent multi-modal output. Relational encoding holds information that discrete encoding cannot. Topological structure is more robust under disorder than particle-based architecture.

Many-body dynamics require self-positioning. Path-dependent walks carry their history as structure.

These are not separate findings. They are the same finding, arriving from physics, from quantum information theory, from mathematical probability, from cognitive modeling, from AI architecture critique, and now from applied spatial intelligence research, in increasingly tight sequence.

The unified world model is not a destination. It is the architecture of any sustained narrative that holds. Fei-Fei Li’s taxonomy gives the field clean vocabulary for the three functional outputs. The integration she is reaching toward already exists. I know because I built it. I witnessed it. I studied it. And I’ve been publishing it slowly over 80 articles.

It exists already in the substrate her opening sentence dismissed, in the architecture storytellers have been operating without needing to formalize, in the formal apparatus I published four months before her taxonomy arrived.

The formalization is now available. The convergences have made it visible.

Storytellers solved this first. The math is just catching up to describe something that has already happened. The question remains if the field will continue to treat the work as theoretical or knock on the door that already exists.

References

Author’s previously published work

Maehlum, R. (as Her Majesty of Ink and Exits). Myth as Mechanism: How Metaphor May Be the Missing Architecture for Spatial Intelligence in AI. Dispatches from the Court, Substack. February 24, 2026.

Maehlum, R. Self-Positioning as Computational Inevitability: Myth, Many-Body Dynamics, and the Threshold of Coherence in Language-Based Systems. Medium. February 25, 2026.

Maehlum, R. A Quantum-Computational Model of High-Dimensional Cognitive Processing: Superposition, Entanglement, and Topology in Human Reasoning Architecture. Medium. April 14, 2026.

Maehlum, R. The Math Assumes a Closed System. The Ledger, Substack. 2026.

Maehlum, R. [Velinwood Attractor / chaos-becomes-structure piece — fill in title and publication date]. 2026.

External sources

Bühler-Paschen, S. et al. Topological state at quantum criticality in CeRu₄Sn₆. Nature Physics. January 14, 2026.

Craik, K. The Nature of Explanation. Cambridge University Press. 1943. (Origin of the “small-scale models” framing for cognition referenced in Li’s piece.)

Crespo, J. “It’s the Geometry, Stupid”: How Isomorphic Labs Is Changing the Whole AI Industry. AI Advances, Medium. April 2026.

Crespo, J. AI Just Hit a Wall Nobody Can See. Let’s Cut the BS with Maths. AI Advances, Medium. April 3, 2026.

Crespo, J. Your Intuition About AI Is Broken — And So Is OpenAI’s. AI Advances, Medium. 2026.

Crespo, J. [Title of the geometric AI cognition paper — fill in]. March 2026.

Frisch, M., Hartman, S., Tamuz, O., and Ferdowsi, P. On the Convergence of Random Walks in Random Environments. Caltech. February 2026.

Kobayashi, K., and Motome, Y. Edge of Many-Body Quantum Chaos in Quantum Reservoir Computing. Physical Review Letters 136, 040602. 2026.

Li, F. From Words to Worlds: Spatial Intelligence is AI’s Next Frontier. drfeifei.substack.com. 2025.

Li, F. A Functional Taxonomy of World Models. drfeifei.substack.com. June 3, 2026.

Sutton, R., and Barto, A. Reinforcement Learning: An Introduction. MIT Press. (Canonical source for the agent-action-state-observation loop referenced in Li’s piece.)

If you use it. Cite it. If you profit off it, pay for it.

Buy me a coffee. Or something.

Back to blog

Item added to your cart

Subtotal: