§ 01 / 06

Agent system · 26 specialistsTyped IO · deterministic busOne job each · no monolith prompt.

412 typed actions catalogued · 0 monolith prompts

exAI Agentic OS · agent system

Twenty-six agents.
One job each.

The agent system inside exAI is not one giant prompt. It is twenty-six small, sharp specialists — typed inputs, typed outputs, declared effects — moving across a deterministic bus.

A planner that only plans. A test runner that only runs tests. A rollback operator that only knows how to revert. Each agent is replay-safe, receipted, cancellable. The orchestrator composes them; reviewers read the trail like a ledger, not a transcript.

Browse the 26 Register a custom agent

Active roster · #00214

7 categories

Live · agents on the bus

TestRunnerrunning3 shards · 412 / 412 green

MigrationPlanneridlelast run 4h · 7 tables

ErrorTriagerrunning1 trace · classifying

PolicyAuditoridlenext sweep 22:00

DeployRunnerparkedawaiting approval · SRE on-call

Agents

Each agent
knows one thing.

Specialists, not generalists. Twenty-six agents across seven categories. The orchestrator composes them into a typed DAG; reviewers can name every node and predict its blast radius. Generalist prompts do not survive a Fortune 100 audit.

01 · Planning4

Read intent. Shape the DAG.

IntentParserBusiness request → typed objective
TaskDecomposerObjective → atomic, schedulable steps
DAGPlannerSteps → typed dependency graph
ScopeNegotiatorResolves scope & budget conflicts

02 · Generation4

Write code. Write schema. Write tests.

CodeWriterTyped source · file-by-file plan
SchemaDesignerPostgres tables · indexes · FKs
MigrationWriterForward + backward SQL
TestWriterUnit · integration · e2e

03 · Verification4

Prove it green before it lands.

TypeCheckertsc · strict · zero-error gate
TestRunnerSharded across warm pool
IntegrationProberLive deps · contract checks
RegressionHunterBisects flake from real failure

04 · Repair4

Self-heal until the gate opens.

ErrorTriagerStack trace → root-cause class
FlakeFixerRe-runs · quarantines · proposes fix
PatchApplierHunk-level edits · type-safe
RebaseResolverConflict-aware merge strategy

05 · Packaging4

Make the artifact reviewable.

ReleaseNotesWriterDiff → human-readable notes
ChangelogCuratorConventional commits · SemVer
DocsAssemblerAPI + guides · type-aware
ArtifactSignerSBOM · cosign · attestation

06 · Governance3

Receipts the auditor wants.

PolicyAuditorOPA · Rego · per-tenant rules
ComplianceSweeperSOC 2 · ISO · evidence drop
AccessReviewerLeast-privilege drift detector

07 · Operations3

Land it. Watch it. Roll back if it slips.

DeployRunnerBlue/green · canary · zero-downtime
RollbackOperatorSLO breach → revert + receipt
ObservabilitySpotterAnomaly · trace · log triage

7 categories · 26 specialists · 412 typed actions cataloguedCatalog version v2026.04 · agents.exai.dev/registry

§ 03 / 06

Typed IO · deterministic bus

A contract,
not a prompt.

Every agent declares input, output, effects. The bus refuses to dispatch anything that does not parse. A run is a directed graph of contracts — schedulable at plan-time, replayable byte-for-byte, receipted at every hop.

A contract names what an agent will read, what it will return, and what it will touch. The orchestrator type-checks the graph before any VM warms. Models change, prompts change — the seam between agents does not.

agents/test-runner.tscontract · v3

export const TestRunner = defineAgent({
  name: "TestRunner",
  category: "verification",
  input: z.object({ shards: z.number() {
  output: z.object({ pass: z.number(), fail: z.number() }),
  effects: ["vm.dispatch", "ledger.write"],
  capability: 0.97,
  run: async (ctx, { shards }) => {
    const r = await ctx.bus.fanOut(shards);
    return { pass: r.pass, fail: r.fail };
  },
});

Inputs

zod · parsed at the seam

Outputs

zod · validated before return

Effects

named · audited · receipted

Bus invariants

01
Schedulable at plan-time.
Inputs, outputs, effects declared up-front. The orchestrator rejects impossible graphs before a token is spent.
02
Effects declared upfront.
Network egress, filesystem writes, secret access — every side-effect named in the contract or it does not run.
03
Deterministic replay.
Same input + same model + same tool ledger → byte-identical output. Replay any run, any time.
04
Receipt on every call.
Tokens, latency, cost, model id, tool calls — receipted to the orchestrator ledger.
05
Cancellation propagates.
Cancel one node — its descendants halt, VMs snapshot, budgets refund automatically.

§ 04 / 06

Capability scoring · per-agent

Capability vs.
latency. Per agent.

Each agent carries a per-task capability score and a measured P50 latency. The router uses both — under a cost ceiling — to pick a model the agent will reach for. Scores update on every receipt.

Top 5 by capability · last 7d

Score · cohort of 40 tenants

rolling · n=21,084 receipts

TestRunner

Verification · 0.97 · 0.4s

0.97

DAGPlanner

Planning · 0.95 · 0.8s

0.95

CodeWriter

Generation · 0.94 · 1.2s

0.94

MigrationWriter

Generation · 0.93 · 2.1s

0.93

ErrorTriager

Repair · 0.91 · 0.3s

0.91

Updated

on every receipt · live aggregate

Used by

router · planner · approval gate

Floor

0.85 · below floor · routes to human

System totals

Specialists in the catalog

Typed actions catalogued

Monolith prompts

How the score moves

·Passscore lifts proportional to receipt confidence.
·Failscore decays · agent re-tries on a stronger model.
·Driftbelow floor for 24h · agent quarantined for review.

§ 05 / 06

Custom agents · same bus

Bring your own
specialist.

Your domain has agents the catalog will not ship — a claims router, a SAP migrator, a regulator-specific evidence packer. Register them on the same typed bus as the 26. Same contract, same receipts, same replay.

Define an agent in TypeScript. Declare its input, output, and the effects it touches. The bus refuses to dispatch anything the contract does not name. The orchestrator schedules custom agents the same way it schedules the catalog — replayable, receipted, auditable.

agents/claims-router.tscustom · tenant-bound

import { registerAgent, z } from "@exai/sdk";

registerAgent({
  name: "ClaimsRouter",
  input: z.object({
    claim_id: z.string(),
    policy: z.enum(["auto", "home"]),
  }),
  output: z.object({ adjuster: z.string() }),
  effects: ["hr.read", "audit.write"],
  run: async (ctx, input) => {
    const a = await ctx.tools.hr .onCallFor(input.policy);
    return { adjuster: a.id };
  },});

npm i @exai/sdk · zod peerregister from any Node 20+ runtimetenant-scoped · never global

01 · TypeScript SDK

Strict types · no any.

Generic over input and output. Compile-time guarantee that the bus only sees the shape it agreed to.

02 · zod schemas

Validated at the seam.

Inputs parsed before run. Outputs parsed before return. Bad data fails closed at the boundary.

03 · Webhook bus

Same wire as native.

Custom agents sit on the same NATS bus as the 26. Signed envelopes. Bring your own runtime.

04 · Replay-safe

Pure runs · no clocks.

Deterministic by contract. Side-effects routed through a ledger so the run is rewindable.

§ 06 / 06

Closing · the agent system

Twenty-six specialists.
One governed pipeline.

The agent system is the part of exAI Agentic OS that survives the audit. Typed contracts, deterministic bus, replayable runs. Bring a monorepo migration, a compliance sweep, an internal portal. The bus will dispatch.

Agents

Twenty-six agents.One job each.

Each agentknows one thing.

A contract,not a prompt.

Capability vs.latency. Per agent.