Security

What is AI security?

AI security is the work of protecting AI systems themselves — their models, data, and increasingly their agents — and it is not the same thing as using AI to defend your network.

Two different practices share the name "AI security", and the first useful move is to pull them apart. One is using AI *in* security — models that triage alerts, hunt threats, and summarise incidents for a SOC. The other is securing the AI itself: protecting the models, data flows, and — increasingly — the agents your organisation runs, against attackers who now have a new surface to work with. This page is about the second meaning. It is the one that arrives uninvited: you can choose whether to buy AI-powered defence, but the moment your teams deploy AI, securing it stops being optional.

What makes securing AI different from securing the software around it is that the system's behaviour is not fully specified by its code. A model does what its training and its inputs make it do, which means an attacker does not need to find a bug in your implementation — influencing what the system *reads* can be enough. Classic application security assumes the program is on your side and the inputs are data. AI security has to assume the inputs are arguments, and the program can be argued with.

The agent shift

For systems that generate content for human review, the stakes of that property are bounded: a manipulated model produces bad output, and a person stands between the output and the consequence. Agentic AI removes the person from that position. An agent holds credentials, calls tools, and writes to real systems, so the same manipulation that once produced a bad paragraph now produces an action — and a chain of them, because agents build each step on the last. The canonical attack here is prompt injection: instructions smuggled into content the agent was merely supposed to process, a support ticket or a wiki page or an email, steering it to act with the full force of whatever permissions it holds. The defence is not a smarter model; it is the security architecture around the agent — identity, scoped tools, runtime gates, and a rehearsed kill switch.

The same logic extends to the agent's supply lines. Tool layers like the Model Context Protocol concentrate credentials and capability into servers that agents call, which makes them both the right place to enforce limits and an attractive thing to compromise — they need securing in their own right. And the model itself sits in a supply chain — weights, fine-tuning data, retrieved context, third-party tools — every link of which is something an attacker can poison and your inventory should account for.

What the work looks like

In practice, the risk landscape is less exotic than the attack stories suggest, and most of it will feel familiar to anyone who has secured operational software. The estate gets inventoried, because you cannot secure agents you do not know exist. Each agent gets an identity and the narrowest permissions that do its job. Inputs are treated as untrusted wherever they originate. Consequential actions get gates that live outside the agent's own reasoning. Everything emits evidence — an audit trail that can answer *what did it do and why* after the fact. The agent-specific parts — injection-aware input handling, tool-level permissioning, autonomy that widens only on evaluation evidence — sit on top of that foundation, not instead of it.

Frameworks exist to organise this work and to prove it happened: NIST's AI Risk Management Framework and ISO/IEC 42001 at the organisational layer, OWASP's LLM and agentic security guidance and MITRE ATLAS at the engineering layer. They are worth adopting deliberately rather than collecting — choosing one for the job you actually have matters more than choosing the famous one — and all of them currently need supplementing where agents are concerned, because most were written for the content era. The decision rights behind all of it — who approves an agent, who widens what it may do, who shuts it down — are governance work, security's twin: governance decides what the organisation lets an agent do on purpose; security bounds what an attacker can make it do anyway.

Where to go from here

If you run agents today, start with the securing guide — it turns this page's model into seven concrete steps. If your exposure runs through tool servers, the MCP security guide covers that layer. And to see where this work sits in the larger readiness picture, the agent maturity curve maps the journey — security lands hardest at the transition into production, and the assessment will tell you how close you are standing to it.

The concept pages on AI security fundamentals, security best practices, enterprise AI security, and cyber security threats from AI systems cover the underlying landscape in more depth.

Frequently asked questions

What is AI security?

AI security is the work of protecting AI systems themselves — their models, data, and increasingly the agents you run — against attackers who now have a new surface to work with. It is distinct from using AI to defend your network, and it stops being optional the moment your teams deploy AI.

How is securing AI different from traditional application security?

A model's behaviour is not fully specified by its code — it does what its training and inputs make it do. So an attacker need not find a bug in your implementation; influencing what the system reads can be enough. AI security has to treat inputs as arguments, not just data.

What is prompt injection?

Prompt injection smuggles instructions into content an agent was merely supposed to process — a support ticket, a wiki page, an email — steering it to act with the full force of whatever permissions it holds. The defence is the security architecture around the agent, not a smarter model.

How does AI security change with AI agents?

For content tools, a manipulated model just produces a bad output a person can catch. An agent holds credentials and acts, so the same manipulation produces an action — and a chain of them. Defence shifts to identity, scoped tools, runtime gates, and a rehearsed kill switch.

What does AI security work actually involve?

Mostly familiar operational discipline: inventory the estate, give each agent an identity and least-privilege permissions, treat all inputs as untrusted, gate consequential actions outside the agent's reasoning, and keep an audit trail. Agent-specific controls sit on top of that foundation, not instead of it.

Is your organisation ready for AI agents?

Take the assessment →