Home / concepts

Exterior Intelligence

[CONVICTION]

Intelligence resides in the landscape agents navigate, not inside the agents themselves. This inverts the dominant paradigm of AI and cognitive science (bigger model = smarter agent). The control law is the same everywhere: u = -G⁻¹ · [alpha nabla V_task + beta nabla V_lyap - eta nabla Sigma] -- a body-metric inverse applied to a landscape gradient, augmented with safety shaping and epistemic exploration. It appears in neuroscience, robotics, developmental biology, ecological psychology, and evolutionary theory under different names, derived independently, with identical mathematical form.

The Formal ⟨V, G, Phi⟩ Architecture

graph TD
    subgraph VGP["⟨V, G, Phi⟩ Framework"]
        V["V_task(z; theta)<br/>Value Landscape<br/>━━━━━━━━━━━━━<br/>Scalar field over state manifold<br/>Goals = minima<br/>Failures = maxima<br/>Decisions = saddle points"]
        G["G(z, L)<br/>Body Metric<br/>━━━━━━━━━━━━━<br/>Riemannian metric tensor<br/>Movement cost from physics<br/>Kinematics + allostatic load<br/>Transforms gradients → motion"]
        PHI["Phi_canal<br/>Canalization<br/>━━━━━━━━━━━━━<br/>Slow reshaping of V through use<br/>Traversed basins deepen<br/>Topology preserved<br/>Geometry refines"]
    end

    V -->|"nabla V<br/>(gradient)"| CL["Control Law<br/>u = -G⁻¹ · nabla V"]
    G -->|"G⁻¹<br/>(body inverse)"| CL
    CL -->|"action"| ENV["Environment"]
    ENV -->|"experience"| PHI
    PHI -->|"reshapes"| V

    style V fill:#e74c3c,color:#fff
    style G fill:#3498db,color:#fff
    style PHI fill:#27ae60,color:#fff
    style CL fill:#f39c12,color:#fff

Three objects encode any intelligent system completely. This formalism evolved from a two-object ⟨V, G⟩ framework (v2) through spectral grounding (v3) to a full engineering architecture (v4). See vgphi-framework-evolution for the development history.

V_task(z; theta): M -> R -- The Value Landscape

A scalar function over a low-dimensional state manifold M, learned from data with topological constraints (Morse regularization). V_task encodes the domain's attractor structure: goals are minima where gradient flow converges, failure modes are maxima where it diverges, and decision boundaries are saddle points where perturbation determines which basin the system enters. Parameterized as a small MLP (2-4 layers, 10K-200K parameters).

[EVIDENCE]

V is not metaphorical. Quasi-potentials in gene regulatory networks are measurable Lyapunov functions (Bhattacharya et al. 2011). Bioelectric patterns in Levin's planaria are recorded with voltage-sensitive dyes. Pheromone fields in ant colonies are physical substances with measured concentrations. In the spectral formulation (v3), V emerges from the spectral decomposition of co-occurrence statistics -- dominant Fourier modes define macro structure, higher modes add detail:

V_base(x) = Sigma_k lambda_k . f_k(x)

where lambda_k are eigenvalues (mode amplitudes) and f_k(x) are Fourier eigenmodes. The landscape has principled geometry discoverable from data, not hand-engineered.

What V provides by differentiation: V(z) gives scalar distance from goal. nabla V gives optimal descent direction. ||nabla V|| gives sensitivity near decision points. H(z) = nabla-squared V gives local curvature for stability analysis. Lambda(H) classifies critical points. Basin integration partitions which states converge to which attractors.

Training losses: V is trained through four losses: L_terminal (shapes altitude at trajectory endpoints), L_flow (aligns gradient field with observed trajectory directions), L_morse (regularizes toward non-degenerate critical points by pushing Hessian determinants away from zero), and L_cross (interpretive mode: rewards multi-channel propagation of perturbations).

G(z, L) -- The Body Metric

A Riemannian metric tensor encoding instantaneous movement cost. Constructed from physics (not learned): kinematics, sensor telemetry, allostatic load. Transforms gradients into body-feasible motion. G is built via Riemannian pullback:

G(z, L) = J(z)^-T . A(L) . J(z)^-1

where J(z) is the encoder Jacobian mapping between observation and latent spaces, and A(L) is a body cost matrix whose diagonal entries scale with load on corresponding degrees of freedom. The allostatic load L evolves dynamically: dL/dt = F(z,u) - R(z).L, where F accumulates effort and R is state-dependent recovery rate.

[REFRAME]

The separation is the key insight. V encodes the task. G encodes the body. They compose but never merge. The consequence: embodiment transfer. Two bodies performing the same task share V_task but have different J (different sensing/actuation) and different A (different load states). The pullback produces different G, yielding different geodesics to the same attractor. Same landscape, new body, new trajectory, same goal.

In the spectral formulation (v3), G is decomposed across five kosha layers -- G = {G_anna, G_prana, G_mano, G_vijnana, G_ananda} -- each a function of natal configuration modulated by time-varying dasha activation. The G tensor is expressed in the Fourier basis discovered from co-occurrence statistics, ensuring its coordinate system matches the landscape's intrinsic geometry.

Phi_canal -- Canalization Dynamics

The slow process that reshapes V_task through use while preserving validated topology. Frequently traversed basins deepen. Practiced paths steepen. Topology is invariant: same attractors, same saddles, same basin boundaries. Only the geometry refines.

d-theta/dt = Pi_Morse[-epsilon . Q(tau_recent) . nabla_theta C(rho, V_task)]

Pi_Morse is a topology-preserving projection that keeps the critical point count and classification unchanged. After each canalization step, the system verifies critical point inventory; if a spurious critical point appears (topology corruption), the parameter step is reverted and epsilon reduced by 50%. This provides a hard guarantee on topological stability.

Morse Validation Protocol

[EVIDENCE]

V_task must pass a topological gate before deployment: (1) enumerate critical points via gradient descent on ||nabla V||^2, (2) classify each by Hessian eigenvalues, (3) verify critical point count matches domain expectations, (4) integrate gradient trajectories to map basin boundaries, (5) reject if spurious critical points or degenerate Hessians appear. This makes the landscape inspectable -- enumerable attractors, saddles, basins -- in contrast to the opaque internal representations of neural policies.

Epistemic Uncertainty Sigma

Sigma(z) quantifies confidence in V_task at state z. Sources: encoder posterior variance, training data density, Hessian condition number. The epistemic drive eta . nabla Sigma pushes the system toward poorly constrained regions, generating domain-specific probing strategies. This preserves Friston's epistemic drive without requiring full expected free energy computation.

Two Operating Modes

Constructive mode: V_task does not exist a priori. The system builds it from expert demonstrations (50-150 for robotics). A foundation model scores demonstrations during training only; at deployment, only the encoder, V_task, and G are active. Inference cost: O(d^2) per step.

Interpretive mode: V_task already exists as a natural system's intrinsic attractor structure. The system discovers it through observation and controlled perturbation -- sending signals through the system's native medium, recording responses, and building V from the data. This is the interpreter model: learn the system's dynamical language, participate in it, translate it. The system does not control the natural system's trajectory; it communicates a target state. No forward model of the interior required.

Four Domain Instantiations

The architecture is domain-general. Each instantiation shares the same mathematics and differs only in manifold construction, sensing modality, signaling medium, and training data:

Domain Mode Manifold V_task size Signaling medium
MorphoZero (robotics) Constructive Joint angles, RGB-D, force/torque; d=8-16 MLP [d, 64, 64, 1] Joint torques
MorphoLife (human state) Interpretive HRV, respiratory, voice, circadian + semantic dims; d=17+ MLP [17+, 64, 64, 1] Vibration, breath guides, language
MorphoNature (ecosystems) Interpretive Chemical, microbial, acoustic, bioelectric; d=16-64 Domain-dependent Nutrient, hydrological, chemical
MorphoSocial (coordination) Interpretive Trust, resource flows, governance; d=12-32 Domain-dependent Protocol parameters

Eleven Convergent Traditions

graph TD
    subgraph LANDSCAPE["Landscape Traditions<br/>(V is real)"]
        W["Waddington<br/>Epigenetic landscape"]
        L["Levin<br/>Bioelectric fields"]
        WK["Wright / Kauffman<br/>Fitness landscapes"]
    end

    subgraph COUPLING["Perception-Action Traditions<br/>(agents couple to V through G)"]
        GI["Gibson<br/>Affordances"]
        FR["Friston<br/>Free Energy Principle"]
        RA["Ratliff<br/>Geometric Fabrics"]
        RK["Rimon & Koditschek<br/>Navigation functions"]
    end

    subgraph EXTERIOR["Exterior Cognition Traditions<br/>(intelligence in the medium)"]
        ST["Stigmergy<br/>Ant colonies, slime molds"]
        CL["Clark & Hutchins<br/>Extended Mind"]
    end

    subgraph LINGUISTIC["Linguistic-Philosophical Traditions<br/>(field precedes utterance)"]
        PA["Panini<br/>Karaka system"]
        BH["Bhartrihari<br/>Sphota / four Vak levels"]
    end

    LANDSCAPE --> CONV["Universal Control Law<br/>u = -G⁻¹ · nabla V"]
    COUPLING --> CONV
    EXTERIOR --> CONV
    LINGUISTIC --> CONV

    style LANDSCAPE fill:#27ae60,color:#fff
    style COUPLING fill:#3498db,color:#fff
    style EXTERIOR fill:#9b59b6,color:#fff
    style LINGUISTIC fill:#e67e22,color:#fff
    style CONV fill:#e74c3c,color:#fff

[EVIDENCE]

Eleven independent research traditions arrived at the same architecture from different directions. See intelligence-convergence for the full argument that this convergence constitutes evidence of structural truth.

The landscape traditions (V is real)

  1. c-h-waddington (1957) -- Epigenetic landscape formalized as quasi-potential functions with Lyapunov stability. Bhattacharya et al. (2011) proved the quasi-potential decreases along differentiation trajectories. Cell types are attractor minima; differentiation barriers are saddle points with measurable escape times.
  2. michael-levin (2005-present) -- Bioelectric fields as navigable morphogenetic landscape. 48-hour voltage perturbation permanently rewrites planarian target morphology, wild-type genome intact. Xenobots reveal attractor states no frog has ever occupied -- the landscape exceeds the species' normal repertoire.
  3. Sewall Wright / Stuart Kauffman -- Adaptive fitness landscapes. NK model parametrizes ruggedness through epistatic interactions. Coupled coevolutionary landscapes where one species' evolution deforms another's field. Evolution is gradient-climbing on an exterior fitness surface.

The perception-action traditions (agents couple to V through G)

  1. james-gibson (1979) -- Affordances: information for perception exists in the ambient optic array, not inside the head. William Warren (1984) showed stair-climbability is invariant at ratio R/L ~ 0.88 across body sizes -- behavior specified by exterior relational structure, body-scaled.
  2. karl-friston (2006-present) -- Free Energy Principle: action as gradient descent a-dot = -dF/da. Solved the mountain-car benchmark using only free energy minimization, no reward or utility. Markov blankets define systems by boundaries, not interiors. Kuchling, Friston, Georgiev, and Levin (2020) unified morphogenesis with active inference: cells navigate anatomical morphospace by following free energy gradients.
  3. nathan-ratliff (2018-present) -- Riemannian Motion Policies and Geometric Fabrics at NVIDIA. Control law u = M^-1.f -- metric-weighted acceleration fields. Neural Geometric Fabrics outperform both classical baselines and unstructured neural networks on 23-DOF dexterous manipulation.
  4. Rimon & Koditschek (1990-1992) -- Navigation functions with Morse-theoretic topology: unique global minimum, saddle points determined by obstacle count. Navigation properties invariant under diffeomorphisms.

The exterior cognition traditions (intelligence in the medium)

  1. Stigmergy / Ant Colony Optimization -- Grasse (1959): termites build without blueprints via environmental traces. Dorigo (1992-96) proved ACO is mathematically equivalent to stochastic gradient descent in pheromone space. Physarum polycephalum -- zero neurons, replicates the Tokyo rail network.
  2. Clark & Chalmers / Hutchins -- Extended Mind thesis (1998). Hutchins showed navigation aboard the USS Palau is accomplished by a socio-technical system.

The linguistic-philosophical traditions (the field precedes the utterance)

  1. panini (c. 5th century BCE) -- Karaka system: six semantic roles structuring all verb-argument relations as an intermediate field between syntax and semantics. Rick Briggs (1985, NASA Ames) showed the Paninian method is "identical not only in essence but in form with current work in Artificial Intelligence." ~4,000 sutras generate all of Classical Sanskrit -- a complete navigable architecture.
  2. Bhartrihari (c. 5th century CE) -- Sphota theory and the four levels of Vak: para (undifferentiated potentiality), pasyanti (pre-linguistic whole-meaning), madhyama (mental speech), vaikhari (articulate speech). Manifestation from undifferentiated field through progressive differentiation.

Cross-modal confirmation

Cohn and Paczynski (2013) demonstrated the Agent-Patient-Instrument structure across language, vision, gesture, and drawing. Goldin-Meadow et al. (2008) found homesign systems worldwide converge on Agent-Patient-Act ordering regardless of spoken language. Synesthesia research (Cuskley et al., 2019): ~70% of all participants produce isomorphically structured cross-modal mappings -- perception starts unified and differentiates, exactly as the Vak model predicts.

The Universal Control Law

[REFRAME]

The same geometric equation appears independently across disciplines:

Tradition Form V equivalent G equivalent
Active inference mu-dot = G^-1.nabla F Free energy functional Fisher information metric
Geometric fabrics u = M^-1.f Policy force field Riemannian metric
Gradient systems x-dot = -g^-1 nabla V Lyapunov potential Manifold metric
Epigenetic landscape Cell trajectories Quasi-potential Gene regulatory coupling
Ant colony optimization Pheromone following Pheromone concentration Sensorimotor coupling

Yuan and Ao (2014) proved constructively that any dynamics with a Lyapunov function has a corresponding physical realization as x-dot = -g^-1 nabla V. The control law is not an analogy. It is the same mathematics.

The v4 formalism extends this with safety shaping (V_lyap for bounded-disturbance convergence) and epistemic exploration (nabla Sigma driving uncertainty reduction), making it a complete engineering specification rather than a descriptive observation.

Where Interior-Only Approaches Fail

[EVIDENCE]

The negative evidence is systematic. See morphogenetic-vs-interior for the full argument.

Three structural failure modes of interior models: (1) Brittleness under perturbation -- an interior dynamics model contains no gradient information in regions the training data never visited; an exterior landscape has gradient information everywhere. (2) Non-transferability across embodiments -- a forward model of one robot's kinematics cannot transfer to another; an exterior landscape encodes the task, not the body. (3) Inability to capture self-organizing systems -- a cell, an organism, an ecosystem does not have a state transition function that can be written down; but the boundary signature is stable and readable.

The boundary principle: the rich interior dynamics of a self-organizing system project onto a scalar value field V at the system's boundary. An exterior architecture that senses at the boundary and navigates on V captures the system's relevant structure without modeling its interior. The principle holds when: (a) the system self-organizes toward attractors, (b) the boundary observable is non-degenerate, and (c) the timescale of interest is slower than the system's internal regulation.

Lake and Baroni (2018): standard seq2seq models achieve near-0% accuracy on SCAN compositional splits. Google's CFQ benchmark confirmed a strong negative correlation between compound divergence and accuracy. Pure interior computation cannot capture rule-like exterior compositional structure.

When exterior structure is added, performance transforms. Neural-Symbolic Stack Machine (Chen et al., 2020): 100% generalization on all four compositional benchmarks. Chain-of-thought prompting: PaLM 540B improves from 18% to 57% on GSM8K by externalizing reasoning into navigable token sequences. On SWE-bench Pro, top models collapse to 23%; on WebArena, GPT-4 agents achieve 14.41% vs human 78.24%. Yann LeCun's formal argument: if each token has error probability epsilon, sequence accuracy (1-epsilon)^n -> 0.

The Scale Contrast

[REFRAME]

The parameter efficiency tells the story. VLA foundation models (RT-2, pi-zero): billions of parameters, GPU clusters, no stability certificate, no embodiment transfer. The ⟨V, G, Phi⟩ MorphoZero: 10K-200K parameters, sub-millisecond inference on edge hardware, Morse-validated topology, embodiment transfer by swapping G. Diffusion policies: 50-100 denoising steps per action vs one gradient evaluation. The systematic failures of LLM agents -- architectural, not model-level -- confirm the insufficiency of computation without coupling to structured exterior fields.

Implications for the Mesocosm

[CONVICTION]

You don't design intelligent citizens -- you design intelligent environments. Affordance landscapes that naturally develop the capacities you want. This applies to education (28-the-sovereign-child), health, governance, and economic design. The architecture scales: from molecular networks navigating transcriptional space, to cells navigating morphospace, to organisms navigating affordance space, to populations navigating fitness landscapes, to civilizations navigating institutional possibility space.

The interpreter model extends this. For any self-organizing system -- a body, a watershed, a community -- the role is the same: learn its dynamical language (map V_task), signal through its native medium (the control law), and translate for humans (LLM as interface). The factory is not replaced by a smarter factory. It is replaced by an interpreter that speaks the system's language.

Robotics: The Energy-Landscape Reward Model

[FRONTIER]

The robotics reward model field has bifurcated into VLM-based exterior reward models (rich task understanding, poor physics reasoning) and world-model-based approaches (good physics, limited task understanding). The ⟨V, G, Phi⟩ architecture occupies a productive intersection that neither camp has explored.

V-JEPA 2 (Assran, Bardes, LeCun; June 2025) is the closest existing implementation: a 1.2B-parameter vision transformer world model achieving 65-80% success rates on zero-shot manipulation using only 62 hours of unlabeled robot data. The energy function is L1 distance in learned representation space. Planning uses Cross-Entropy Method -- sampling 800 candidate action sequences, scoring by energy. But V-JEPA 2 lacks explicit agent-environment separation and uses a simple distance metric rather than a learned energy function.

The gap the ⟨V, G, Phi⟩ architecture fills: explicit agent-environment separation (following SEAR's auxiliary loss, proven across 18 environments and 5 robots), compositional energy functions (following Yilun Du's work: energies compose additively, enabling zero-shot generalization that outperforms LLM planners), and process-level energy evaluation (scoring each manipulation substep, not just full trajectories -- borrowing the process reward model concept from LLM training, which achieved 6x sample efficiency gains).

Three LLM reward innovations remain untapped for robotics: constitutional AI (robotics principles as energy terms), self-play (VLM challenger generates scenarios, VLA solver executes, physics simulator verifies), and reward model ensembles (given that no single VLM excels across all tasks).

The strongest differentiator of the ⟨V, G, Phi⟩ approach: data efficiency (62 hours vs 1M trajectories for VLM reward models), compositionality (no VLM reward model can compose novel task rewards from primitives), and physical grounding (structured energy landscapes vs statistical pattern matching). These are precisely the properties that matter for real-world deployment.

The Consciousness Ground

[FRONTIER]

The ⟨V, G, Phi⟩ framework describes agents navigating landscapes. It does not, on its own, specify what the landscapes are made of. If spacetime is emergent, then the value landscapes are not physical substrates but structures in something deeper. Hoffman's Conscious Agent Theory and the amplituhedron share mathematical objects (decorated permutations), suggesting that the agents navigating landscapes and the pre-spacetime geometry may be aspects of the same thing. The consciousness-first ontology provides the substrate claim: the landscapes are consciousness structured into navigable form. The physics-vedanta-convergence documents the convergence from both physics and contemplative traditions on what that substrate looks like.

Related

Tags: intelligencelandscapeframeworkvgphiconvergencespectralgeometric-fabrics