AI agent security

AI agent security covers the controls and practices that protect AI agent systems from manipulation, exploitation, and unintended harm — including prompt injection defenses, least-privilege permission scoping, tool-call sandboxing, output validation, and audit trails for every action an agent takes.

How agents create new attack surfaces

A language model that generates text creates a risk of bad outputs. An agent that acts on the world — writing files, calling APIs, sending messages, querying databases — creates a risk of bad actions. The attack surface expands in two directions: inbound (inputs the agent processes can contain adversarial instructions) and outbound (actions the agent takes can affect systems beyond its intended scope). Prompt injection is the primary inbound threat: attacker-controlled content in the agent's context — a malicious document, a crafted web page, an instruction embedded in a retrieved record — can redirect the agent's behavior. Outbound risk is proportional to the set of tools the agent can call and the permissions attached to those tools.

Core security controls for agents

Least-privilege access limits what tools the agent can call and what data it can reach to the minimum required for its defined task. Sandboxing isolates tool execution so that a compromised tool call cannot escalate privileges or escape the defined scope. Input validation screens content that enters the agent's context for injection patterns before the model processes it. Output validation checks what the agent is about to do before execution — flagging actions outside authorized parameters. Audit logging records every tool call and its arguments so the full sequence of agent actions can be reconstructed after the fact. None of these controls is sufficient alone; effective agent security requires all five working together.

Security considerations in multi-agent systems

When agents invoke other agents, trust boundaries become critical. A compromised sub-agent can pass malicious instructions back to the orchestrator, or an orchestrator with broad permissions can be weaponized by a sub-agent that has been prompt-injected. Each agent in a multi-agent system should have its own permission scope and should not inherit the calling agent's full authorization. Messages between agents should be treated with the same skepticism as messages from external sources — trust must be established by the system design, not assumed from the fact that the message came from another agent in the same system.

AI agent security — FAQ

What is prompt injection in the context of AI agents?

Prompt injection is an attack in which malicious content in the agent's context — a document it reads, a web page it retrieves, a database record it processes — contains instructions designed to redirect the agent's behavior. Unlike traditional injection attacks, prompt injection exploits the model's instruction-following tendency rather than a parsing vulnerability. Defense requires input sanitization, context separation, and validation of the agent's planned actions before execution.

How do I scope agent permissions appropriately?

Start with the minimum set of tool calls required for the defined task and grant access only to those. For each tool, apply the narrowest scope available — read-only where write is not required, restricted to relevant data sources rather than broad database access. Re-evaluate permissions whenever the agent's task scope changes. The goal is to ensure that if the agent is compromised or makes an error, the potential impact is bounded by what it was actually authorized to do.