How agents create new attack surfaces
A language model that generates text creates a risk of bad outputs. An agent that acts on the world — writing files, calling APIs, sending messages, querying databases — creates a risk of bad actions. The attack surface expands in two directions: inbound (inputs the agent processes can contain adversarial instructions) and outbound (actions the agent takes can affect systems beyond its intended scope). Prompt injection is the primary inbound threat: attacker-controlled content in the agent's context — a malicious document, a crafted web page, an instruction embedded in a retrieved record — can redirect the agent's behavior. Outbound risk is proportional to the set of tools the agent can call and the permissions attached to those tools.
Core security controls for agents
Least-privilege access limits what tools the agent can call and what data it can reach to the minimum required for its defined task. Sandboxing isolates tool execution so that a compromised tool call cannot escalate privileges or escape the defined scope. Input validation screens content that enters the agent's context for injection patterns before the model processes it. Output validation checks what the agent is about to do before execution — flagging actions outside authorized parameters. Audit logging records every tool call and its arguments so the full sequence of agent actions can be reconstructed after the fact. None of these controls is sufficient alone; effective agent security requires all five working together.
Security considerations in multi-agent systems
When agents invoke other agents, trust boundaries become critical. A compromised sub-agent can pass malicious instructions back to the orchestrator, or an orchestrator with broad permissions can be weaponized by a sub-agent that has been prompt-injected. Each agent in a multi-agent system should have its own permission scope and should not inherit the calling agent's full authorization. Messages between agents should be treated with the same skepticism as messages from external sources — trust must be established by the system design, not assumed from the fact that the message came from another agent in the same system.