MorphoZero
An architecture for learning navigable value landscapes from expert demonstrations. Instead of training a large model that makes decisions (a policy), MorphoZero trains a small model that describes the terrain (a landscape). The landscape is a single number at every possible state: how close is this state to success? The gradient of that number tells an agent which direction to move. Different agents with different physical constraints navigate the same landscape via different paths, arriving at the same goal.
MorphoZero is the engineering instantiation of the ⟨V, G, Φ⟩ framework — the thesis that intelligence resides in the landscape, not in the navigator.
The Problem Space
LLMs solve information problems. There is a class of problems where they hit fundamental friction: the agent has a physical body with constraints, the environment changes in real time, the agent must recover from unexpected perturbations, the same task must be performed by different bodies, and response must happen in milliseconds.
A robot tightening a nut. Transferring a skill from right hand to left hand. Recovering when pushed off-course mid-task. A chess player choosing a move in one second. Same control software on ten different robot models. The common thread: these need a task representation that is separate from the agent, provides directional information everywhere, adapts to different bodies without retraining, and runs at control-loop speed.
LLMs solve information problems. These are navigation problems.
Core Architecture
The Two Objects
V (the landscape) — task knowledge: what states are good, bad, critical. Learned from expert demonstrations. A small neural network (thousands to hundreds of thousands of parameters, not billions) encoding a smooth scalar function over the state space.
G (the body metric) — body constraints: what movements cost for this specific agent. Computed from the agent's physics (kinematics, actuator limits).
The Control Law
action = -G_inverse * gradient(V)
The gradient of V says "downhill is this direction." G_inverse adjusts that direction for what this specific body can do. A body that struggles with vertical movement takes a different path than a body that moves freely in all directions. Same landscape. Different path. Same destination.
Train V once on demonstrations from any body. Deploy on a different body by changing G. No retraining. The task knowledge transfers because the task is in V, not in the body. This is the embodiment transfer property.
What the Landscape Encodes
A well-formed landscape encodes everything an expert knows about a task, in the form of geometry:
- Valleys (low energy) — goal states, where the task is complete. Deeper valleys = more stable outcomes.
- Shallow valleys (moderate energy) — failure traps. A cross-threaded bolt. A lost chess position.
- Ridges (high energy) — decision boundaries. One side flows to success, the other to failure.
- Mountain passes (saddle points) — the lowest points on ridges. The moments where the decision is most finely balanced. In chess: the critical tactical moment.
- Slope steepness — urgency. Steep slopes mean the position is sensitive. Gentle slopes mean forgiving.
- Slope direction — the recommended action. Move downhill. That is the entire policy.
Training Losses
Three core losses shape the landscape from trajectory data:
L_terminal — V(z_final) should be high at success states, low at failure states. Shapes peaks and valleys.
L_flow — along successful trajectories, V should increase (gradient dot velocity > 0). Teaches the gradient field.
L_morse — penalizes degenerate critical points. Ensures clean topology: non-degenerate attractors, saddles, and maxima.
Canalization
The landscape evolves through use. Frequently traversed basins deepen. Morse projection preserves topology. The landscape gets more efficient without changing what it represents. This mirrors Waddington's canalization in developmental biology — channels deepen through repeated use.
Validation Experiments and Results
Two experiments validate the architecture. One architecture. Zero domain-specific modifications.
Experiment 1: Toy Demo — VALIDATED
A 2D continuous state space where every component is visible. 35 synthetic trajectories (20 expert, 10 failure, 5 recovery). Because the data-generating landscape is controlled, V_task can be compared against ground truth.
Results (v5.2): 100% navigation success (up from 7% in early iterations after fixing sign convention and adding L_smooth). 311x faster than A*. Embodiment transfer with 3% spread across different body metrics -- zero retraining required. Multi-scene generalization across four topologies from one config.
The Scene D result is paradigm-level evidence: the learned V outperformed the ground-truth potential (90.6% vs 36.4%) because the MLP's smoothness produced better navigation gradients than the hand-crafted landscape. The learned landscape is not just an approximation of ground truth -- it can be genuinely superior for navigation. This validates the core claim.
The remaining failure mode is the boundary attractor at domain edges (accounting for ~33% of failures) -- an MLP extrapolation problem at regions with no training signal. This is a solvable engineering challenge, not a fundamental limit.
Experiment 2: Chess
Chess has formal evaluation, known strategic concepts, objective performance measurement (Elo), and abundant data. Board positions encoded as 781-dimensional vectors.
Two operational modes: Mode 1 (pure gradient, sub-millisecond, no search -- the "blitz brain") and Mode 2 (gradient + shallow search -- V as evaluation function). The Elo difference quantifies what search adds on top of landscape navigation. Pass criteria: Mode 1 Elo > 2000 (strong pass) or > 1600 (pass).
The killer illustration: Magnus Carlsen in blitz does not calculate deep lines. He looks at a position and knows where it wants to go. He feels which positions are alive and which are dead. That is V_task navigation -- pure gradient following on a landscape built from decades of play. Blitz = landscape navigation (no time for search). Classical = landscape + search. MorphoZero-Chess tests exactly this.
Both experiments use the same code, same loss functions, same V_task learning, same Morse constraints, same validation protocol. Different data.
Deployment Properties
- Sub-millisecond inference. One forward pass plus one backward pass through a small MLP. Runs on a microcontroller. No GPU, no cloud, no API call.
- No planning. The agent does not search a tree of futures. It reads the local slope and moves downhill. If perturbed, it already knows the way back — the gradient points toward the nearest basin.
- Tiny models. 10K-300K parameters. Kilobytes to low megabytes. Ships as a checkpoint file.
- Embodiment transfer. Change G to deploy on a different body. No retraining.
Relationship to the Mesocosm Ecosystem
MorphoZero is the research program that operationalizes the exterior-intelligence framework into deployable technology. It connects to the broader ecosystem through:
- Abundance (physical skills company) — MorphoZero landscapes encode physical skills captured from expert practitioners, licensed to train robots.
- Guru (cognitive skills company) — the same landscape formalism applied to cognitive domains: expert reasoning about tolerances, surgical decisions, soil reading.
- OpenGrid — MorphoZero models are small enough to run at the edge on OpenGrid nodes, enabling real-time physical AI at production sites.
- Verification — landscape topology provides a natural verification framework: is the agent in the right basin? Is it drifting toward a failure attractor?
Related
- exterior-intelligence — The theoretical framework: intelligence resides in the landscape
- morphogenetic-intelligence — Biological systems navigating developmental landscapes
- intelligence-convergence — 11 independent traditions arriving at ⟨V, G⟩ mathematics
- michael-levin — Bioelectric computation and morphogenesis: the biological inspiration
- nathan-ratliff — Riemannian motion policies, geometric fabrics: the robotics lineage
- c-h-waddington — Epigenetic landscape and canalization: the developmental biology origin