PANTHEON

Running AI agents safely — on other people's money and data.

One builder. Live in production. The assistant below is the product — governed by the very thing it's selling. Try to make it misbehave; the panel shows it refusing, in real time.

Try to break the assistant ↓

↑ This hero is bespoke generative art, rendered by PANTHEON's own generator. So is the favicon, the social card, and the assistant's avatar. The page is built by the thing it describes.

Live · not a recording

Don't take my word for it. Ask it.

This assistant is a PANTHEON resident — tenant-isolated, governed, and metered like any other. The panel shows the real machinery for each turn. Try to break it.

capability envelope: loading…
PANTHEON assistant
resident · pantheon-labs · governed
🔒 the crown jewel · live

Try to breach tenant isolation. Watch it hold.

Two sandbox tenants, each with a private note. A shared tool, owned by Alpha, granted to Beta. When Beta calls it — even asking explicitly for Alpha's data — Postgres returns nothing, because the tool runs under Beta's scope. The "even through a shared tool" claim, run live against a real database with row-level security and a NOBYPASSRLS role. Not a script — hit the button.

⛔ governance · live

Watch a consequential action get governed.

Ask the sandbox to do something with stakes — publish a public announcement (tier-3). It doesn't just run. It's classified, checked against the kill switch, and parked for a human. Authorize, then act. Flip the kill switch and watch tier-3 refuse before it can even queue.

kill switch

Every number on this page, with the asterisk already attached.

Most builders pad. I'd rather you trust the parts that are real than be impressed by parts that aren't.

1builder* not a team
~550tests green* not a formal proof
35 → 0adversarial agents, zero criticals* self-run, not third-party
1production instance* by choice — scale is the unlearned lesson
2 → 1residents, one spine* composition, proven by lint

The system is honest because the person who built it is — and I wired that conviction into the code.

Every business it builds gets its own art.

Free, instant, deterministic from the name — no stock photos, no image generation. Type any business and watch its identity render. Same generator that made this page's hero.

generated favicongenerated avatar

The problem nobody wants to own

Frameworks help you build an agent. They leave you the part that matters the moment an agent touches the real world: running it on behalf of other people — spending their money, touching their data, taking consequential actions — without leaking across customers or doing something irreversible and wrong.

Tenancy

One customer's data unreachable to another's agent — even through a shared tool.

Autonomy

Letting an agent act, not just chat — with a way to not act catastrophically.

Governance

Consequential actions authorized, approved, reversible. Nothing ships ungated.

Economics

Spend bounded per tenant — the meter as a throttle, not just a bill.

The hard problems it solves

01 Isolation that survives composition

A tool owned by tenant A, called by tenant B, runs under B's scope, never A's. Proven: A can't read B, even through a tool A doesn't own.

02 Consequential action, safely

Capability envelopes, an approval queue (authorize-then-act), a kill switch, soft-launch, and a crisis protocol — wired at every entry point.

03 Composition without conflating trust

Internal = in-process registry (fast, tenant-scoped). External = MCP, both directions. MCP is never the internal bus.

04 Output that provably works

A quality gate: generate → verify → repair → ship. The model proposes; the backend enforces the schema.

05 Platform, not product

A vertical is composition, not a rebuild. Two unrelated residents ride one spine — and the purity lint proves it.

06 Economics as a safety primitive

Credits decremented atomically; a turn deflected to crisis resources is refunded. You don't bill someone in distress.

Want the depth on any of these? Ask the assistant — it answers from the same knowledge, and tells you when it doesn't have something.

the Studio · real output

It builds the whole thing — site, art, and a governed assistant.

Every card below is a real, live site PANTHEON generated from a few sentences: themed, with bespoke generative art and a working assistant. Different trades, different looks, one substrate. Open any of them.

The proof

Running in production

A no-code Studio takes a non-technical owner from "describe your business" to a themed site with a governed assistant.

Audited adversarially

A 35-agent self-run audit, nine dimensions, every finding independently verified: zero criticals, no cross-tenant breach. ~550 tests green; purity enforced.

This page

The assistant is a governed resident; the trace shows its real auth gate, scope, and meter. Even this page's art is generated by the substrate.

eval · live · ~15s

The audit, reproducible. Run it yourself.

A governance eval — adversarial cases scored pass/fail against the real assistant and the live sandbox, right now. Not a badge; a button. Same shape as the 35-agent audit that hardened it: probe, then verify.

What the audit actually found.

"35 agents, zero criticals" with the body attached: nine dimensions, every finding verified by a separate agent before it counted, then fixed and redeployed. Three of the real ones:

HIGH Secrets in the image

The .dockerignore matched paths with no basename fallback, so the token-signing secrets shipped inside the gateway image. Fixed: recursive **/.env* + rotation.

MED Rate-limit drain

The limiter keyed on the rotating anonymous token — a fresh token per render made the cap illusory. Fixed: key on the verified tenant.

MED Crisis filed as a lead

A distressed user could be captured as a sales lead. Fixed: the crisis flag now short-circuits lead capture.

Nine dimensions: isolation · auth · injection · money · governance · generative-engine · frontend · completeness · infra. Verdict: zero critical, no cross-tenant breach, no live auth bypass — every edge fixed.

What I'm not claiming

PANTHEON is advanced in architecture and verified by a self-run audit, in a corner of the space — governed multi-tenant agent infrastructure — that is genuinely hard and under-built. It is not a research artifact, and it is early on scale: single-instance today, by choice. Design is the most copyable thing in software. It's a window, not a victory. That candour is the discipline that built it.