OpenGrid
Open compute-and-verification fabric for real work in the physical world. A separate company cofounded by Mesocosm that routes compute jobs to distributed data center nodes and provides agent scaffolding for the entire ecosystem.
What OpenGrid Is
OpenGrid is distributed in physical structure and state, locally decentralized in adaptation, and polycentric in governance and policy. It is not workload-agnostic. The topology, routing policy, failure posture, and proof requirements all change with the workload.
Three properties define the architecture:
- Distributed — compute, state, evidence capture, and capacity are spread across many nodes and places.
- Locally decentralized — nodes and colonies adapt to local conditions without a single global optimizer.
- Polycentric — authority is plural and nested: site policy, metro overflow, and federation rules live at different layers. A factory colony, a campus colony, a metro canopy, and a regional federation each make decisions at their own scale.
What OpenGrid Does
If the thesis goal is a verifiable physical economy, then OpenGrid cannot be positioned as generic agent infrastructure that happens to run at the edge. Its role is narrower and stronger:
- Placement network — for inference-native work that cares about locality.
- Verification network — for compute events and physical-world events.
- Interface network — for sensors, robots, cameras, PLCs, operator apps, and domain-specific tools.
- Economic rail — where verified outcomes coordinate discovery, contracts, and settlement.
Not "an agentic internet CDN." That story collapses the thesis into generic cloud competition. The stronger story: open compute and verification infrastructure for operators in the physical world.
Session-Native Architecture
[CONVICTION]
OpenGrid is session-native, not request-native. The unit on the network is not a token or a request -- it is a session. This is the architectural choice that separates OpenGrid from every other distributed compute project.
The parallel is telecom, not the web. The web never built a proper session layer -- it faked statefulness with cookies. CDNs never built session management -- they are stateless by design. Telecom is the only infrastructure that took sessions seriously as a network primitive, with SS7 as the signaling layer, SIP as the session protocol, and mobile handover as the migration mechanism. OpenGrid brings that same thinking to the agentic internet, but for intelligent sessions rather than dumb audio pipes.
The session manifest is the contract between an agent and the infrastructure. It declares everything the IDN needs to manage this agent's sessions: session characteristics (duration, interactivity, state footprint), placement constraints (sovereignty, hardware, latency ceiling), migration policy (cost of migration, acceptable state loss), scaling profile (demand spikes, warm-up time), and verification requirements (what Mycel needs to verify). This is fundamentally different from a Docker container spec -- a container says "I need 16GB RAM." A session manifest says "I need 200ms response times, sessions last 30 minutes, state must survive node failure, and I need to prove learning happened."
Three dimensions govern session management decisions: continuity model (real-time stream vs interactive pauses vs fire-and-forget with checkpoints), migration cost (a text chat with 4K context is cheap to migrate; a voice agent mid-sentence is nearly impossible), and value curve (a tutoring session where the agent has built a model of the student's understanding is worth more than a fresh session).
Three-Plane Architecture
[CONVICTION]
OpenGrid separates three planes to ensure the network's intelligence layer stays responsive under compute load:
Control plane (signaling) -- session registry, routing decisions, node health, failover coordination, settlement, verification proof routing. Lightweight, always responsive. Never touches GPU. Never blocked by inference. This is the SS7 equivalent for the agentic internet.
Inference plane (real-time compute) -- the live agent brain. Voice-to-voice, LLM reasoning, tool execution. Latency-critical. Runs on GPU nodes. Cannot be cached. On each node, a lightweight signaling daemon (CPU-bound, always responsive) runs alongside the agent runtime (GPU-bound, doing inference). The signaling daemon reports health, accepts placement instructions, and coordinates migration without being starved by inference workloads.
Asset plane (content delivery) -- static and semi-static content that supports sessions. Diagrams, videos, documents, generated images, curriculum content. Can be cached at the edge. Can be pre-positioned based on the session manifest. This IS traditional CDN territory. The node focuses on what only it can do -- real-time inference. Everything else comes from the asset layer.
The separation matters because a node maxed out on GPU serving a voice agent cannot simultaneously respond to "are you healthy? can you take another session?" If control messages get queued behind inference workloads, the IDN goes blind exactly when it most needs to see.
Agent Deployment: CDN for Agents
[CONVICTION]
The unit on the network is not a "node with a GPU" but a "node hosting a capable agent." An agent that is good at legal research is not the same as one that is good at code review. The IDN routes work to agents based on capability, not to nodes based on proximity.
The model: build an agent, push it to OpenGrid with one API call, and the IDN handles everything -- replicates it to nodes near users, serves it at low latency, scales it up and down. A biology tutor deployed to OpenGrid runs in 50 locations without the developer managing a single server. This is the Cloudflare moment for AI.
The agent package includes model weights (or model reference), system prompt/persona, tools, memory structure, curriculum data, and RAG knowledge base. The IDN receives this and makes placement decisions: which nodes should host a copy based on demand patterns, latency requirements, and hardware availability.
Node operators do not just contribute raw compute. They deploy agents -- either ones they built or ones from an open ecosystem. The IDN indexes what each agent can do (skills, tools, track record, latency profile). When work comes in, the IDN matches the task to the best available agent considering skill match, latency, cost, and verified performance history. Verification through Mycel becomes essential because in a world of specialized agents, proof of quality -- not just proof of liveness -- is the trust layer that makes capability-based routing credible.
Agent Scaffolding
OpenGrid provides the agent layer for the entire Mesocosm ecosystem:
- Session containers any agent plugs into
- Agent lifecycle management
- Agent verification (is the agent performing correctly)
- Agent marketplace and registry
- Composability (agents calling agents)
- Identity tied to Mycel
- Distributed storage owned by user
- Billing through Mycel
Everything in the ecosystem runs on OpenGrid — learning agents, farming agents, enterprise agents, routing apps. Microcosm's personal AI, Macrocosm's bioregional AI, and all ecosystem company products operate on OpenGrid compute.
Protocol Architecture: MMP Core and OMP
The v2 mesh protocol family refactors the protocol surface into three layers:
Mycel trust grammar — identities (did:mycel), proof envelopes, vault and consent semantics, MIPs, settlement, federation trust anchors, policy packs. OpenGrid does not invent a second identity or proof stack.
MMP Core (Mycelial Mesh Protocol) — the reusable locality and coordination substrate. Defines ParticipantDescriptor, LocalityScope, ColonyDescriptor, ProfileMembership, CapabilityEnvelope, FederationTreaty, and ReceiptRef. Multiple mesh profiles inhabit the same topology without collapsing into one undifferentiated network.
OMP (OpenGrid Mesh Profile) — the compute/runtime profile. Overlay maintenance, liveness detection, capability-class routing, deterministic hot-path scoring, session lifecycle, route feedback, model and artifact distribution, receipt emission.
The same MMP Core substrate is reused by Arena (challenge routing), research profiles, capability formation, and capital routing — each with different demand objects, lifecycle state machines, and feedback signals.
The Three-Scale Topology
Colonies — dense local clusters: factories, campuses, micro-data-centers, neighborhood cabinets. The fast path is internal.
Canopy — sparse metro or regional overlay of high-uptime nodes providing route summaries, admission help, cache, and selective durability. Current RC clusters evolve into canopy seeds and oracle nodes — first among equals, not permanent bosses.
Federation — treaty layer between sovereign colonies and canopy zones. Authority distributed by scope, not dissolved.
Workload Fit
The mesh helps most when latency, sovereignty, and verification all matter:
| Priority | Use Case | Thesis Fit |
|---|---|---|
| Strongest wedge | Site-bound conversational agents | Nearby serving, low tail latency, sovereignty |
| Strongest wedge | Physical AI at a site | Factory cells, agri stations, warehouse perception — makes verification visible |
| Strong support | Nearby agent hosting | Session affinity, warm models, local policy |
| Capacity fill | Autoresearch and back-office batch | Absorbs spare capacity, too generic alone |
| Weak initial wedge | Generic internet-wide chatbot hosting | Story becomes cheap inference, not locality + proofs |
Four Invariants
- Control and compute never mix. The control plane is not in the request hot path. Nodes are the infrastructure.
- Control plane stays out of the hot path for premium traffic. Even as the mesh becomes more autonomous, hard QoS lanes need fast local admission.
- Nodes remain the infrastructure. Data centers, factories, and public nodes all become substrate; the coordination layer composes them.
- Media and inference data stay direct between client and serving node. The control plane supervises but does not mediate.
Application Protocol Compatibility
MCP, A2A, UCP, WebRTC, HTTP, QUIC, ROS2, OPC UA, FHIR, EPCIS, and similar protocols remain unchanged. They operate above the mesh substrate. OpenGrid does not compete with MCP, A2A, or UCP — it gives them a locality-aware, sovereignty-aware substrate to run on.
Operator Economics: The Circular Economy
[CONVICTION]
The ATM analogy drives the economic architecture. Anyone can plug in a node and earn. The routing layer itself becomes an economic role anyone can graduate into, creating a two-sided market.
Operator tiers form a career ladder:
Node host -- the shopkeeper or school that provides space, power, and internet for a small edge node. Minimal effort, small revenue share, like hosting a cell tower on a roof. Basically passive income with occasional "make sure the light is green" responsibility.
Fleet operator -- monitors and maintains hundreds of small edge nodes across a city or district. Dispatches for physical issues, manages swap inventory, handles basic troubleshooting remotely. The new blue-collar tech job. Scalable, learnable, local. A single fleet operator managing 300-500 edge nodes in a city is a solid small business.
Site operator -- manages a medium node or small datacenter site (4-8 GPU servers with cooling, UPS, networking). More technical, more responsibility, better compensated. Career progression from fleet operator.
Regional coordinator -- runs the routing and coordination for an entire region. Earned through reputation and uptime history. Partly technical, partly operational, partly business development. Earns the routing fee on every session routed for any node in the region, not just their own.
Fleet investor -- deploys capital to buy and place nodes. Contracts with operators for maintenance. Earns returns from session fees minus operator costs. A million dollars buys roughly 250 Mac Studios deployed across a city in shops, schools, clinics, community centers.
Community ownership (NodeCo) -- communities pool money to buy nodes. A Mac Studio costs $4,000; if 40 families put in $100 each, they own a node. The ownership structure is registered on Mycel. Session fees flow back proportionally: 70% to owners, 20% to operator, 10% to protocol.
The software is free. The OpenGrid foundation builds all monitoring, diagnostics, fleet management, self-healing, predictive maintenance. The irreducible human job is physical presence -- plugging in replacement units, checking that cooling vents are not blocked, carrying dead units out and new units in. This is appliance maintenance, not engineering.
Economics
Revenue model: Compute routing fees (0.5-2%), agent scaffolding usage fees, node certification fees, marketplace transaction fees. Settlement splits: 75% to node operator, 15% to routing operator, 10% to protocol layer (the ATM interchange model).
TAM: Global cloud computing $913B in 2025, projected $1.6-2.4T by 2030. Distributed/edge compute and AI inference are the fastest-growing segments. At 1% of distributed compute market = $2-4B/year by 2030.
Comparison: Like Cloudflare for compute routing. Cloudflare has $1.7B revenue routing ~20% of global web traffic. OpenGrid routes compute jobs to distributed nodes. Same model, different layer.
vs. OpenAI Frontier: Same layers (business context, agent execution, evaluation, interfaces). Opposite architecture: open distributed vs. proprietary centralized. No platform tax. Data stays with user. Critical difference — OpenAI verifies AI agent performance ("did the AI work well?"). Mesocosm ecosystem verifies human and organizational outcomes ("did the situation improve for the humans involved?").
Scaling: Lightning network. Data center operators join for economic return. Scales through economics globally from day one. First thing in the ecosystem that goes global.
Protocol Architecture: The Honest Design
[CONVICTION]
OpenGrid is a CDN. Not a blockchain. Not a P2P network. A CDN. Akamai, Cloudflare, and Fastly all solved the same problem -- get compute geographically close to users with sub-50ms latency. They all arrived at the same architecture: hierarchical, regionally coordinated, thin global control plane. OpenGrid's architecture is the same, with one difference: anyone can contribute a node.
The "permissionless" property lives at the node layer -- that is where it matters for the distributed compute thesis. The coordination layer is operated by known entities running open-source software. This matches how the internet itself works: ICANN, root DNS servers, and tier-1 ISPs are known entities running open protocols. The internet has run reliably for 40 years with this model.
Lightning Network's dirty secret proves the pattern: the protocol is "permissionless" but the routing network is roughly 20 large nodes on AWS, Google Cloud, and Hetzner. If those nodes go down, Lightning functionally stops. The "decentralization" is a legal and ideological wrapper around what is operationally hub-and-spoke. And that is fine -- because it works.
What is open and permissionless: node participation (anyone installs the daemon, meets hardware thresholds, starts serving), the protocol spec (signaling, CR format, session manifests, heartbeat -- all open, anyone implements), the RC software (open source Elixir -- if the operator stops, someone else deploys).
What is not decentralized: routing decisions (local, sub-millisecond, from ETS -- no consensus, no gossip in the hot path), RC operation (needs known network locations with good peering, high uptime, professional monitoring), settlement (hourly batches, authoritative single CP -- simpler, more auditable than BFT consensus).
The key insight: decentralize the control plane, not the data plane. Bitcoin decentralizes the ledger at the cost of throughput. OpenGrid cannot afford that cost for real-time inference. But it can afford it for slower operations: RC election (hours), settlement (hourly batches), cross-region routing updates (seconds).
The Circular Economy
[CONVICTION]
The ATM analogy drives the economic architecture. Anyone can plug in a node and earn. The routing layer itself becomes an economic role anyone can graduate into, creating a two-sided market:
The flywheel: A college kid plugs in a Mac Mini, earns from inference. After months, buys a second Mac. Notices only one RC serves their region with 35ms latency. Rents a $40/month VPS, installs open-source RC binary, advertises as routing operator. Local nodes measure 8ms latency versus 35ms and switch over. Now they earn compute fees AND routing fees. A local ISP co-locates an RC -- even better latency, more nodes. The ecosystem grows through economic natural selection, not consensus governance.
RC-to-RC peering replaces Kafka for cross-region. RCs exchange capacity summaries every 30 seconds. When an RC cannot serve locally, it checks its peering table and routes cross-region. Direct TCP between RCs. No Kafka, no CP in the path. This is the BGP-equivalent.
CR co-signing creates the economic audit trail. The RC co-signs every CR for sessions it routed. Settlement splits: 75% to node operator, 15% to routing operator, 10% to protocol layer. This is the ATM interchange model.
Progressive decentralization path: Phase 1 (now) -- ship hierarchical, operator-deployed RCs. Phase 2 (after PMF) -- add RC election protocol, region by region. Phase 3 (protocol maturity) -- replace CP with BFT validator ring, remove Kafka, TerraMater can disappear and network continues. This is the Helium path.
Sovereignty
Nodes locally owned. Data stays local through routing policies. Open deployment standard -- spec for data center nodes that any operator can build and certify.
Related
- mesocosm-ecosystem — The full ecosystem architecture
- verification-infrastructure — What OpenGrid enables at the verification layer
- four-protocol-layers — The protocol stack OpenGrid implements
- platform-vs-protocol — Why open distributed beats proprietary centralized
- distributed-abundance — The thesis outcome OpenGrid enables