30-hour runs.
Replayable byte-identical.
Hand the Orchestrator a monorepo migration, a framework upgrade, a compliance sweep. A typed DAG of 26 specialized agents runs inside isolated microVMs and shows its work at every hop.
Pause it. Resume it next week. Replay it byte-identical from the ledger. Every checkpoint a receipt, every gate a signature, every retry a reason.
Intake to merge.
One graph.
The Orchestrator plans the work as a typed DAG, fans out to a warm microVM pool, snapshots state every 60 seconds, parks at human gates, and resumes byte-identical from the ledger. Below: a real run, edited for readability.
Five rules that
hold at hour 19.
Long-running agent work fails predictable ways: untyped graphs that deadlock, lost state on restart, sequential dispatch, missing approval surfaces, runaway spend. The Orchestrator answers each one as a first-class invariant.
- Typed DAG.Every agent declares inputs, outputs, and effects. The scheduler rejects impossible graphs at plan time — not at hour 19.
- Durable checkpoints.State, VM disk, and the tool-call ledger snapshot to object storage every 60 seconds. Crash today, resume Thursday.
- Parallel by default.Independent nodes fan out across a warm Firecracker pool. Wall-clock hours, not days.
- Human-in-the-loop gates.Mark any node approval-required. Orchestrator parks, pings Slack with the diff, resumes — hours or weeks later.
- Receipted budgets.Per-run token + compute ceilings with live receipts. Aborts cleanly before overshoot, rolls back to last gate.
import { defineNode } from "@exai/orchestrator";
import { z } from "zod";
export const migratePackage = defineNode({
name: "migrate-package",
inputs: {
repo: z.string(),
package: z.string(),
from: z.string(),
to: z.string(),
},
outputs: {
diff: z.string(),
coverage: z.number(),
},
effects: ["fs", "net"],
budget: { tokens: 2_000_000, compute_h: 2 },
retry: { max: 3, backoff: "exp" },
gate: "merge-approval",
async run({ input, vm, log }) {
const result = await vm .codemod(input.from, input.to);
log.checkpoint("codemod-applied");
return result;
},
});Hand it the work
nobody wants overnight.
Four shapes of long-running work that the Orchestrator owns end-to-end. None of it requires a babysitter. All of it produces a PR, a binder, or a receipt at the other side.
Turbo, Nx, pnpm, npm workspaces. P50 wall-clock 18h 42m, fully unattended, single PR per package.
React 18 → 19, Next 14 → 15, Rails 7 → 8. Codemods + manual escapes + e2e per package boundary.
SOC 2 evidence quarterly, unattended. Pulls audit logs, runs control attestations, files the binder.
Re-run a flaky suite a thousand times, bisect to first failing commit, open the PR with the repro.
Spend has a ceiling.
Receipts ship hourly.
Every run is bounded. Per-node tokens, compute-hours, and model spend are declared up front. The Orchestrator emits a streaming receipt, aborts cleanly before overshoot, and rolls back to the last checkpoint with a signed reason.
The Orchestrator treats budget like a typed input. You declare ceilings on the run — total tokens, total compute-hours, total model spend — and the scheduler refuses to dispatch a node whose worst-case cost would breach the envelope.
Receipts stream hourly to your finance webhook with the actual tokens-by-model breakdown, the wall-clock per VM, and the per-node line item. When a cap is approached the Orchestrator drains the work in flight, snapshots state, and aborts cleanly — no zombie shards, no orphaned VMs.
- Pre-flight cost estimation per node, refused before dispatch.
- Hourly streaming receipts to webhook · Slack · S3.
- Hard ceiling triggers drain → snapshot → clean abort.
- Resume from the last gate with a fresh budget on rerun.
Hand it a 30-hour
problem. Sleep.
Typed DAG. Sixty-second checkpoints. Parallel by default. Human-in-the-loop gates. Receipted budgets. The runtime that owns long-running agent work — built for the platform engineers actually shipping migrations at scale.