Operator memory.
Real numbers.
Forty platform teams, eighteen industries, one agentic runtime — measured at ninety days, not at the demo.
These pages are not testimonials. They are receipts. Onboarding drops, AI-PR throughput, cold-start P50, contractual uptime, and dollar-savings per seat — collected from the same telemetry the customer ships to their own SIEM. Pick one. We will put you on the phone with the engineer who lived it.
Median across the cohort. Outliers redacted at request of customer counsel — full datasets available under NDA at /benchmarks.
We retired a forked VSCode, two point tools, and an internal Gitpod. Ramp-up dropped from six weeks to three days, and our platform team stopped maintaining the IDE.
A forked VSCode distribution maintained by a five-person platform team, two point-tool subscriptions for AI completions and review, and an internally hosted Gitpod cluster running on EKS.
GitHub Enterprise, Datadog, Buildkite, the SOC 2 control catalog, the existing Okta-fronted SSO, and the Friday demo cadence — exAI dropped in beneath them, not on top.
Reviewer trust climbed before throughput did. The platform team had budgeted six months for cultural adoption — Composer's plan-first surface earned the room in three weeks.
Karim leads developer platform for a Fortune 50 fintech with 4,812 engineers across nine product divisions. The cohort he represents is the hardest one we sell into — incumbent tooling, entrenched vendors, an internal IDE team with twelve years of accumulated culture. He took the reference call on the record.
Reviewer trust.
Container root.
Migration scale.
Three different sized organizations, three different shaped problems, the same agentic runtime underneath. Each operator picked the receipt that mattered to their seat.
“Composer's plan step is the first AI coding feature I've seen that earns reviewer trust.”
“Firecracker per workspace means I stopped having the 'why does every dev have Docker root' conversation.”
“The Orchestrator ran a 30-hour Next.js 14→15 migration across 184 apps. Two human gates. One PR per app. Zero rollbacks.”
What 40 platform teams
actually measured.
Pulled from the customer's own telemetry, not from a survey. Ninety-day rolling windows. Outliers trimmed at p2 / p98 before medians were computed. The full cohort dataset is published, anonymized, at /benchmarks.
Talk to one of them.
On the record.
Pick the seat closest to yours — CISO, principal engineer, head of platform — and we will route the call. No pre-screening, no vendor-side talking points, no NDA on the conversation. The numbers above are the ones they will recite.