Most enterprises sit between Stage 02 and Stage 03. Production-readiness lives at Stage 05.
What changes as you climb
Immature vs. mature, measured
Six control surfaces decide whether an agent can safely run in production.
| Control surface | ImmatureStage 01–02 | MaturingStage 03 | MatureStage 04–05 |
|---|---|---|---|
| | No list. Nobody can say which agents exist, what they touch, or where they run. | A partial register — the important agents are tracked, the long tail is not. | A complete, current registry. Every agent has a purpose, an owner, and a defined scope of access. |
| | Personal accounts. No named human is accountable for any agent’s behaviour. | Owners assigned to some agents, inconsistently, with gaps at handover. | Every agent has a named owner accountable for behaviour, change, incident response, and risk acceptance. |
| | None. What an agent does between input and output is invisible. | Some logging, reviewed after the fact when someone goes looking. | Every tool call, data access, and decision is logged, attributable, and visible in real time. |
| | Personal tokens with broad scopes. Agent identity is indistinguishable from a user’s. | Scopes assigned, but static and hard to change without touching code. | Least-privilege scopes granted, narrowed, and revoked without code changes. Agent identity is distinct from user identity. |
| | None, or a single check before launch that is never repeated. | Ad hoc pre-deployment testing, with little visibility once an agent is live. | Quality, safety, and policy compliance tested continuously in production. Drift and regressions caught automatically. |
| | No policy, no audit path, no way to stop an agent cleanly. | Reviews and policy exist, but enforcement drifts behind real behaviour. | Runtime enforcement — policy that blocks, scopes, or escalates the instant an agent acts. |
Stage by stage
The five stages in detail
Each stage is defined by its primary risk and the single control that moves you on. You do not skip stages — you close the gap in front of you.
A handful of engineers and analysts run agents from personal accounts to save themselves time. Nothing is registered, reviewed, or shared. Ask "which agents are running here?" and nobody can answer.
Nothing is written down. Capability lives in individual heads and personal API keys.
No inventory, personal tokens, no review. Agents are shadow infrastructure from day one.
Start an inventory — even a spreadsheet. You cannot govern what you cannot list.
One useful agent becomes four. It spreads from the team that built it into marketing, sales, support, and finance. People copy prompts and tokens; everyone has a slightly different version.
Agents are now organisational tools, but ownership is diffuse and identity is borrowed from whoever set them up.
Fragmented ownership and inconsistent standards. No single team is accountable for any given agent.
Assign named owners. Move agents off personal tokens and onto scoped, revocable identities.
Agents are wired into the CRM, the data warehouse, file storage, and internal APIs. They move real work and affect real records — but there is no way to see or intervene while they do it.
Agents have left the sandbox. They read and write production systems, so their mistakes are now production incidents.
Workflow impact without runtime intervention. When something goes wrong, you find out afterwards, if at all.
Add runtime visibility. Log every tool call, data access, and decision — attributable and in real time.
There is a registry, named owners, change reviews, and evaluations. The controls exist on paper and in process. The hard part is keeping them complete as the agent estate keeps growing.
The organisation can describe how every agent should behave — the remaining gap is between the policy and the running system.
Coverage gaps and control drift. Policies exist, but enforcement lags behind what agents are actually doing.
Add runtime enforcement — policies that block, scope, or escalate at the moment of action, not after the fact.
Every agent is inventoried, owned, observable, access-controlled, and continuously evaluated. Policies are enforced as actions happen. Residual risk is measured, accepted deliberately, and reviewed.
Agents are treated like any other piece of production infrastructure: governed at runtime, not trusted on faith.
Residual risk is measured and managed — the failure modes are known, bounded, and rehearsed rather than discovered.
Maintain. Quarterly evaluations, drift checks, and incident postmortems keep the estate from sliding back.
Where real companies sit
Real organisations on the curve
You can read an organisation’s maturity from the roles it hires. Below, companies from our live Jobs Index (updated 16 June 2026) placed by the most advanced agentic role they are staffing.
Hiring agent-ops, evaluations, AI-governance, and AI-security roles — the controls that keep agents safe at scale.
Hiring agent engineers, forward-deployed engineers, and solutions architects — shipping agents as production software.
Hiring foundational AI engineers and AI product managers — the first agent builds, before operational controls.
Placement reflects hiring signal, not a private audit — the strongest public evidence of maturity available. See the full method on the Jobs Index.
The destination
What Stage 05 actually requires
A production-ready agent has five properties. Without them you cannot measure trust — or manage risk.
Inventoried
Registered with a purpose, an owner, and a defined scope of access.
Owned
A named human is accountable for its behaviour, change management, and incident response.
Observable
Every action it takes is logged, attributable, and visible in real time.
Access-controlled
Least-privilege permissions, granted and revoked without code changes.
Continuously evaluated
Quality, safety, and compliance tested in production — not just before launch.
The maturity model, answered directly.
Short answers for teams placing themselves on the agent operational maturity curve.
What is the Agent Operational Maturity Model?
It is a five-stage model that maps how ready an organisation is to run AI agents as production infrastructure — from Experimental (isolated individual use) through Shared, Operational, and Governed, to Production-Ready (agents monitored and controlled at runtime). Each stage is defined by its primary risk and the single control that moves you to the next one.
What does agent operational maturity mean?
Agent operational maturity describes how ready an organisation is to run AI agents as operational infrastructure. It covers inventory, ownership, runtime visibility, access control, evaluations, auditability, policy enforcement, incident response, and compliance readiness.
What separates an immature organisation from a mature one?
An immature organisation cannot list its agents, has no named owners, no runtime visibility, broad personal-token access, no evaluations, and no enforcement. A mature organisation has a complete registry, accountable owners, real-time observability, least-privilege identity, continuous evaluation, and runtime policy enforcement. The difference is whether agents are treated as shadow tools or as governed production infrastructure.
How do I assess my organisation’s agent maturity?
Measure six control surfaces — inventory, ownership, runtime visibility, access control, evaluations, and governance — against the five-stage ladder. Most enterprises sit between Stage 02 (Shared) and Stage 03 (Operational). The free assessment scores you in about seven minutes and names the highest-risk gap to close next.
What does production-ready mean for an AI agent?
A production-ready AI agent is inventoried, owned, observable at runtime, access-controlled, continuously evaluated, and governed by policy that is enforced at the moment of action.