Two different practices share the name "AI security", and the first useful move is to pull them apart. One is using AI *in* security — models that triage alerts, hunt threats, and summarise incidents for a SOC. The other is securing the AI itself: protecting the models, data flows, and — increasingly — the agents your organisation runs, against attackers who now have a new surface to work with. This page is about the second meaning. It is the one that arrives uninvited: you can choose whether to buy AI-powered defence, but the moment your teams deploy AI, securing it stops being optional.
What makes securing AI different from securing the software around it is that the system's behaviour is not fully specified by its code. A model does what its training and its inputs make it do, which means an attacker does not need to find a bug in your implementation — influencing what the system *reads* can be enough. Classic application security assumes the program is on your side and the inputs are data. AI security has to assume the inputs are arguments, and the program can be argued with.
The agent shift
For systems that generate content for human review, the stakes of that property are bounded: a manipulated model produces bad output, and a person stands between the output and the consequence. Agentic AI removes the person from that position. An agent holds credentials, calls tools, and writes to real systems, so the same manipulation that once produced a bad paragraph now produces an action — and a chain of them, because agents build each step on the last. The canonical attack here is prompt injection: instructions smuggled into content the agent was merely supposed to process, a support ticket or a wiki page or an email, steering it to act with the full force of whatever permissions it holds. The defence is not a smarter model; it is the security architecture around the agent — identity, scoped tools, runtime gates, and a rehearsed kill switch.
The same logic extends to the agent's supply lines. Tool layers like the Model Context Protocol concentrate credentials and capability into servers that agents call, which makes them both the right place to enforce limits and an attractive thing to compromise — they need securing in their own right. And the model itself sits in a supply chain — weights, fine-tuning data, retrieved context, third-party tools — every link of which is something an attacker can poison and your inventory should account for.
What the work looks like
In practice, the risk landscape is less exotic than the attack stories suggest, and most of it will feel familiar to anyone who has secured operational software. The estate gets inventoried, because you cannot secure agents you do not know exist. Each agent gets an identity and the narrowest permissions that do its job. Inputs are treated as untrusted wherever they originate. Consequential actions get gates that live outside the agent's own reasoning. Everything emits evidence — an audit trail that can answer *what did it do and why* after the fact. The agent-specific parts — injection-aware input handling, tool-level permissioning, autonomy that widens only on evaluation evidence — sit on top of that foundation, not instead of it.
Frameworks exist to organise this work and to prove it happened: NIST's AI Risk Management Framework and ISO/IEC 42001 at the organisational layer, OWASP's LLM and agentic security guidance and MITRE ATLAS at the engineering layer. They are worth adopting deliberately rather than collecting — choosing one for the job you actually have matters more than choosing the famous one — and all of them currently need supplementing where agents are concerned, because most were written for the content era. The decision rights behind all of it — who approves an agent, who widens what it may do, who shuts it down — are governance work, security's twin: governance decides what the organisation lets an agent do on purpose; security bounds what an attacker can make it do anyway.