v2026.04
Read release notes
exAI Agentic OSexAI
§ 01 / 06
Agent system · 26 specialistsTyped IO · deterministic busOne job each · no monolith prompt.
412 typed actions catalogued · 0 monolith prompts
exAI Agentic OS · agent system

Twenty-six agents.
One job each.

The agent system inside exAI is not one giant prompt. It is twenty-six small, sharp specialists — typed inputs, typed outputs, declared effects — moving across a deterministic bus.

A planner that only plans. A test runner that only runs tests. A rollback operator that only knows how to revert. Each agent is replay-safe, receipted, cancellable. The orchestrator composes them; reviewers read the trail like a ledger, not a transcript.

Active roster · #00214
7 categories
Live · agents on the bus
TestRunnerrunning3 shards · 412 / 412 green
MigrationPlanneridlelast run 4h · 7 tables
ErrorTriagerrunning1 trace · classifying
PolicyAuditoridlenext sweep 22:00
DeployRunnerparkedawaiting approval · SRE on-call
Agents
0
Categories
0
Capability avg
0.00
Fig. 01 · roster snapshot · 10:04:11typed bus · NATS
§ 02 / 06
The 26 · catalog

Each agent
knows one thing.

Specialists, not generalists. Twenty-six agents across seven categories. The orchestrator composes them into a typed DAG; reviewers can name every node and predict its blast radius. Generalist prompts do not survive a Fortune 100 audit.

01 · Planning4

Read intent. Shape the DAG.

  • IntentParserBusiness request → typed objective
  • TaskDecomposerObjective → atomic, schedulable steps
  • DAGPlannerSteps → typed dependency graph
  • ScopeNegotiatorResolves scope & budget conflicts
02 · Generation4

Write code. Write schema. Write tests.

  • CodeWriterTyped source · file-by-file plan
  • SchemaDesignerPostgres tables · indexes · FKs
  • MigrationWriterForward + backward SQL
  • TestWriterUnit · integration · e2e
03 · Verification4

Prove it green before it lands.

  • TypeCheckertsc · strict · zero-error gate
  • TestRunnerSharded across warm pool
  • IntegrationProberLive deps · contract checks
  • RegressionHunterBisects flake from real failure
04 · Repair4

Self-heal until the gate opens.

  • ErrorTriagerStack trace → root-cause class
  • FlakeFixerRe-runs · quarantines · proposes fix
  • PatchApplierHunk-level edits · type-safe
  • RebaseResolverConflict-aware merge strategy
05 · Packaging4

Make the artifact reviewable.

  • ReleaseNotesWriterDiff → human-readable notes
  • ChangelogCuratorConventional commits · SemVer
  • DocsAssemblerAPI + guides · type-aware
  • ArtifactSignerSBOM · cosign · attestation
06 · Governance3

Receipts the auditor wants.

  • PolicyAuditorOPA · Rego · per-tenant rules
  • ComplianceSweeperSOC 2 · ISO · evidence drop
  • AccessReviewerLeast-privilege drift detector
07 · Operations3

Land it. Watch it. Roll back if it slips.

  • DeployRunnerBlue/green · canary · zero-downtime
  • RollbackOperatorSLO breach → revert + receipt
  • ObservabilitySpotterAnomaly · trace · log triage
7 categories · 26 specialists · 412 typed actions cataloguedCatalog version v2026.04 · agents.exai.dev/registry
§ 03 / 06
Typed IO · deterministic bus

A contract,
not a prompt.

Every agent declares input, output, effects. The bus refuses to dispatch anything that does not parse. A run is a directed graph of contracts — schedulable at plan-time, replayable byte-for-byte, receipted at every hop.

A contract names what an agent will read, what it will return, and what it will touch. The orchestrator type-checks the graph before any VM warms. Models change, prompts change — the seam between agents does not.

agents/test-runner.tscontract · v3
export const TestRunner = defineAgent({
  name: "TestRunner",
  category: "verification",
  input: z.object({ shards: z.number() {
  output: z.object({ pass: z.number(), fail: z.number() }),
  effects: ["vm.dispatch", "ledger.write"],
  capability: 0.97,
  run: async (ctx, { shards }) => {
    const r = await ctx.bus.fanOut(shards);
    return { pass: r.pass, fail: r.fail };
  },
});
Inputs
zod · parsed at the seam
Outputs
zod · validated before return
Effects
named · audited · receipted
Bus invariants
  • 01
    Schedulable at plan-time.
    Inputs, outputs, effects declared up-front. The orchestrator rejects impossible graphs before a token is spent.
  • 02
    Effects declared upfront.
    Network egress, filesystem writes, secret access — every side-effect named in the contract or it does not run.
  • 03
    Deterministic replay.
    Same input + same model + same tool ledger → byte-identical output. Replay any run, any time.
  • 04
    Receipt on every call.
    Tokens, latency, cost, model id, tool calls — receipted to the orchestrator ledger.
  • 05
    Cancellation propagates.
    Cancel one node — its descendants halt, VMs snapshot, budgets refund automatically.
§ 04 / 06
Capability scoring · per-agent

Capability vs.
latency. Per agent.

Each agent carries a per-task capability score and a measured P50 latency. The router uses both — under a cost ceiling — to pick a model the agent will reach for. Scores update on every receipt.

Top 5 by capability · last 7d
Score · cohort of 40 tenants
rolling · n=21,084 receipts
TestRunner
Verification · 0.97 · 0.4s
0.97
DAGPlanner
Planning · 0.95 · 0.8s
0.95
CodeWriter
Generation · 0.94 · 1.2s
0.94
MigrationWriter
Generation · 0.93 · 2.1s
0.93
ErrorTriager
Repair · 0.91 · 0.3s
0.91
Updated
on every receipt · live aggregate
Used by
router · planner · approval gate
Floor
0.85 · below floor · routes to human
System totals
Specialists in the catalog
0
Typed actions catalogued
0
Monolith prompts
0
How the score moves
  • ·Passscore lifts proportional to receipt confidence.
  • ·Failscore decays · agent re-tries on a stronger model.
  • ·Driftbelow floor for 24h · agent quarantined for review.
§ 05 / 06
Custom agents · same bus

Bring your own
specialist.

Your domain has agents the catalog will not ship — a claims router, a SAP migrator, a regulator-specific evidence packer. Register them on the same typed bus as the 26. Same contract, same receipts, same replay.

Define an agent in TypeScript. Declare its input, output, and the effects it touches. The bus refuses to dispatch anything the contract does not name. The orchestrator schedules custom agents the same way it schedules the catalog — replayable, receipted, auditable.

agents/claims-router.tscustom · tenant-bound
import { registerAgent, z } from "@exai/sdk";

registerAgent({
  name: "ClaimsRouter",
  input: z.object({
    claim_id: z.string(),
    policy: z.enum(["auto", "home"]),
  }),
  output: z.object({ adjuster: z.string() }),
  effects: ["hr.read", "audit.write"],
  run: async (ctx, input) => {
    const a = await ctx.tools.hr .onCallFor(input.policy);
    return { adjuster: a.id };
  },});
npm i @exai/sdk · zod peerregister from any Node 20+ runtimetenant-scoped · never global
01 · TypeScript SDK
Strict types · no any.

Generic over input and output. Compile-time guarantee that the bus only sees the shape it agreed to.

02 · zod schemas
Validated at the seam.

Inputs parsed before run. Outputs parsed before return. Bad data fails closed at the boundary.

03 · Webhook bus
Same wire as native.

Custom agents sit on the same NATS bus as the 26. Signed envelopes. Bring your own runtime.

04 · Replay-safe
Pure runs · no clocks.

Deterministic by contract. Side-effects routed through a ledger so the run is rewindable.

§ 06 / 06
Closing · the agent system

Twenty-six specialists.
One governed pipeline.

The agent system is the part of exAI Agentic OS that survives the audit. Typed contracts, deterministic bus, replayable runs. Bring a monorepo migration, a compliance sweep, an internal portal. The bus will dispatch.

Agents
26
Categories
07
Typed actions
412
Monolith prompts
00
SOC 2 Type IIISO 27001GDPR · DPFFedRAMP
Live · agents.exai.dev · v2026.04 · status nominalSpecialists, on a bus. The opposite of one large prompt.