Attacks through inputs

AI systems are steerable by content. Prompt injection hides instructions in material the system was merely supposed to process — a ticket, a web page, a retrieved document — and the model, which cannot fully separate data from directive, acts on them. Poisoning moves the same attack upstream into training or retrieval corpora. These are not exotic: any system that reads untrusted content and holds any capability is exposed, and defences are layered rather than absolute — instruction separation, content flagging, and above all refusing to let high-risk actions proceed on the strength of retrieved content alone.

Leakage of data and secrets

The quieter risk class is what flows out. Sensitive data enters prompts and contexts; traces and evaluation logs capture tool inputs and outputs wholesale; outputs themselves can reveal what the system was given. The commonly missed surface is observability — step-level tracing is essential for operating agents, and it is also a fresh copy of everything the agent touched, which is why redaction at write time and the same access controls as the source data are part of the security work, not an optional polish.

Supply chain, multiplied

An AI system's supply chain is wider than its package manifest: model weights and versions, fine-tuning data, retrieved context sources, tool layers, and — concretely, today — the MCP servers teams install with production credentials attached. Each is something an attacker can compromise or a vendor can silently change; a model version bump alone alters behaviour with no code change on your side. Pinning, allowlists, and re-review on update apply to all of it, exactly as they do to conventional dependencies.

Action risk: where agents change the stakes

With generative tools, the risks above end in bad content a person reviews. Agents remove that buffer: the same injection now fires a tool call, the same over-broad credential now moves money or deletes records, and errors compound step over step at machine speed. This is why agent security work concentrates on bounding action — per-agent identity, least-privilege tools, runtime gates, kill switches — and why the securing guide treats those, not model hardening, as the core of the job.