Operations

How to set up an audit trail for AI agents

Goal

Stand up an action-level audit trail that can answer, months later and under scrutiny: which agent did what, when, with what inputs, under whose approval, and with what outcome.

Before you start

Agents have distinct identities (shared service accounts make attribution impossible)
A list of which agent actions count as consequential for your business
Somewhere append-only to write: object-lock storage, a WORM bucket, or an audit-grade log service

Steps

1

Log at the action boundary, not the model boundary

Capture events where the agent touches the world: every tool call that writes, sends, or moves something. Model-call logs tell you what the agent considered; action logs tell you what it did. Auditors ask the second question.
2

Fix the event schema early

Each event needs: timestamp, agent identity and version, the human or trigger it acted on behalf of, the action and its parameters, the policy decision that allowed it (including which approvals), the outcome, and a trace ID linking back to the full task. Schema changes later mean two eras of evidence that don't join.
3

Redact at write time

Tool inputs and outputs carry customer data. Decide field by field what is stored raw, what is masked, and what is hashed for matching without disclosure. Redaction at query time means the sensitive data was retained all along — which is its own finding.
4

Make the store append-only and the retention deliberate

Write to storage that nobody — including the agent platform's own admins — can quietly edit. Set retention to match the longest obligation that applies to you (financial-services and healthcare rules commonly run five to seven years), and document why.
5

Rehearse retrieval

Pick a real past incident or invent one: "show every action agent X took against customer Y in March." Time how long the answer takes. If it is days, the trail exists but the capability does not. The rehearsal also surfaces the joins you forgot — usually between approvals and actions.

Common pitfalls

Confusing observability traces (sampled, mutable, engineer-facing) with audit evidence (complete, append-only, third-party-facing)
Logging everything raw and discovering you have built a sensitive-data lake with a retention obligation
No link between the approval record and the action it approved
Trail exists but only the team that built it can query it

Before you start

Steps

Log at the action boundary, not the model boundary

Fix the event schema early

Redact at write time

Make the store append-only and the retention deliberate

Rehearse retrieval

Common pitfalls