Is prompt injection really the main threat to agentic AI?

It is the most distinctive one — the attack that exists because agents read content and hold tools at the same time. But most real incidents are mundane: over-broad credentials, a shared token nobody can rotate, an agent nobody remembered was running. Do the identity and permissions work first; it bounds the injection damage too.

Do our existing security tools cover AI agents?

Partially. Your IAM can hold agent identities, your SIEM can ingest agent logs, and both should. What they lack is the agent-specific layer: tool-level permissioning, injection-aware input handling, and runtime gates on agent actions. Use what you have for the foundations and add the agent layer on top, rather than buying a parallel stack. The wider terrain — what securing AI means beyond agents — is mapped in [what is AI security](/ai-security).

Security

How to secure the AI agents you run

Goal

Put a working security model around your agentic AI — identity, scoped permissions, untrusted-input handling, and a kill switch — so an agent that goes wrong is contained rather than catastrophic.

Before you start

An inventory of the agents in scope, with the systems each one reads and writes
Authority to issue and revoke service credentials
Somewhere durable to send logs — agent security work produces evidence, and it needs a home

Steps

1

Map each agent's attack surface before its threat list

For every agent, write down three things: the tools it can invoke, the credentials those tools carry, and every place untrusted content enters — user messages, retrieved documents, web pages, emails, API responses. That third list is the one teams skip, and it is where agent incidents start: anything an agent reads can try to steer it. The attack surface of an agent is its tool list multiplied by its input sources, not its model.
2

Give every agent its own identity and least-privilege scopes

One agent, one service identity, never a shared account or a developer's token. Then scope each tool's credential to the narrowest grant that works — read-only where the agent only reads, this-table not this-database, this-channel not this-workspace. The test of the setup is operational: can you revoke one agent in one step without breaking anything else, and can you tell, from any log line downstream, which agent acted?
3

Treat everything the agent reads as untrusted input

Prompt injection is the agent-era equivalent of SQL injection: instructions hidden in content the agent was merely supposed to process — a support ticket, a web page, a calendar invite. Defences are layered, not absolute: separate instructions from data in your prompting, strip or flag instruction-like content in retrieved material, and — the control that actually bounds the damage — refuse to let high-risk tool calls proceed on the strength of retrieved content alone. Assume injection will sometimes work, and design so that when it does, the blast radius is a scoped tool, not the estate.
4

Gate the consequential actions at runtime

Decide which action classes are expensive to reverse — payments, deletions, sending external messages, granting access — and put a control in front of each: a human approval, a policy check, a rate limit, a dollar ceiling. The gate must live outside the agent's own reasoning; an agent persuaded by injected instructions will also be persuaded that the action is fine. Runtime enforcement is what stands when the prompt falls.
5

Log every action as evidence, not telemetry

Record each tool call with its inputs, outputs, initiating agent identity, and the chain of reasoning context that led to it, redacting sensitive fields as you capture. The standard you are aiming for: a security reviewer can reconstruct any consequential action after the fact without asking the team that built the agent. If your audit trail cannot answer "why did the agent do that", it is monitoring, not security.
6

Red-team the agent before launch and after every model change

Run an adversarial pass against the assembled system, not the model in isolation: injection attempts through every input source, tool-misuse chains, data-exfiltration paths through innocuous-looking outputs. Re-run it when the underlying model version changes, because behaviour shifts under your feet even when your code does not. Keep the failing cases as a regression suite — the attacks that worked once are the first ones to try again.
7

Build the kill switch before you need it

Decide now how you stop an agent in under a minute: revoke its identity, disable its credentials at the providers, halt its orchestrator. Write the steps down where on-call can find them, and rehearse once — a revocation path that has never been exercised is a hypothesis. Containment speed, not prevention, is what separates an agent incident from an agent story.

Common pitfalls

Securing the model instead of the tool surface. Jailbreak resistance is the vendor's problem; what the agent can write to is yours, and it is the part you control completely.
A shared service account across agents — one compromise revokes everything or nothing, and the audit trail can no longer say which agent acted.
Trusting retrieved content because it came from an internal source. The wiki page an agent reads may have been edited by anyone, including a previous agent.
A one-time security review while the model underneath changes quarterly. Agent security has the shelf life of the model version it was tested against.
Treating the audit trail as done because logs exist. Logs nobody can query during an incident are storage, not security.

Before you start

Steps

Map each agent's attack surface before its threat list

Give every agent its own identity and least-privilege scopes

Treat everything the agent reads as untrusted input

Gate the consequential actions at runtime

Log every action as evidence, not telemetry

Red-team the agent before launch and after every model change

Build the kill switch before you need it

Common pitfalls

Frequently asked questions