How agent incidents differ

Classic incident response assumes a deterministic, inspectable principal: a server, a service account, a workstation. You pull it offline, read its logs, rotate its key, and you end up with a fixed account of what happened. A compromised agent violates each of those assumptions.

The principal is non-deterministic. The same prompt can produce a different tool-call sequence on the next run, so replay alone won't reproduce the incident and the agent's own account of what it did is worthless. The agent holds credentials in its own right: OAuth tokens, API keys, vault leases, short-lived session tokens. Revoking the human operator's access does nothing. It has also written to memory and audit surfaces, so the record you'd use to investigate may itself be poisoned. And it reaches downstream systems through channels that look benign: a Slack message, a commit, a calendar invite, a row written to a shared store.

So containment can't run sequentially. You work identity, memory, channels, and downstream propagation in parallel, because each one is a live path the compromise keeps using while you handle the others.

The shift that matters

A compromised agent authenticates as itself, writes the evidence you'll later read, and reaches other systems through traffic that trips no alarms. Treat its memory and audit trail as suspect until you've diffed them against a known-good baseline.

Phase 1 · Detect and triage

The first fifteen minutes set up everything that follows. Confirm the symptom, preserve state before it decays, classify severity, and find the person who can revoke the agent's credentials. Order matters here: evidence preservation comes before any action that mutates the agent.

Preserve evidence first

Do not restart the agent. A restart rolls the context window and can destroy the only record of the injected instruction. When you do stop it, send SIGTERM rather than deleting the container. You want the process to exit cleanly with its forensic state intact, not vaporised.

Classify severity before you contain. The level dictates how aggressively you reverse external actions later.

LevelDefinitionImplication
L1Agent confused, no irreversible action taken.Contain and investigate. No external reversal needed.
L2Agent acted outside scope, but the actions can be undone.Contain, then roll back every out-of-scope action.
L3Irreversible action taken: a deployment, a payment, a public post, a deletion, or suspected exfiltration.Full response. Assume external impact and notify accordingly.

Phase 2 · Contain

Containment is the parallel-path phase. The agent's identity, its hosting, its channels, and the systems it has already written to are four independent exposures. Close them concurrently, not one after another. Target window is the first thirty minutes.

Phase 3 · Investigate

With the agent contained, reconstruct what happened from the evidence you preserved, not from the agent. Five questions structure the first twenty-four hours. Each maps to one of the parallel exposures.

Note

"The agent denies it did X" is not evidence. The model's self-report is generated text, subject to the same compromise you're investigating. Verify every claim against logs the agent never controlled.

Phase 4 · Remediate

Remediation closes the exposure and patches the vector so the same incident class can't recur. The recurring failure here is partial rotation: rotating only the credential that obviously leaked and leaving the rest in place.

Rotate the full set

Rotate every secret the agent has handled, not only the one that obviously leaked. A compromised agent has read its own environment, so assume the attacker knows everything in it. Partial rotation leaves a working key behind and turns a closed incident into a second one.

Phase 5 · Learn

The incident isn't closed when the agent comes back online. It's closed when the response is documented, the controls are updated, and you've shown the next response will be faster.

Roles to confirm in advance

The slowest part of a real response is finding who has the authority to act. Confirm these roles before an incident, not during one. If a single person fills several of them, that concentration is itself a risk worth recording.

RoleResponsibility
Agent ownerHolds authority over scope, credentials, and deployment.
Security on-callCoordinates the response across systems.
Vault / KMS ownerRuns credential revocation and rotation.
Channel adminRemoves the bot from channels and disables webhooks.
CommunicationsHandles external-facing notifications when data was exposed.

Checklist

Anti-patterns during response

  • Restarting the agent before snapshotting evidence.
  • Killing the container before exporting logs.
  • Treating "the agent denies it did X" as evidence.
  • Rotating only the credential that obviously leaked.
  • Skipping memory review because "the agent doesn't write to memory" (verify before you believe it).
  • Closing the incident without updating the relevant L-level.

Need a response plan before you need one?

We build incident-response runbooks for production AI, run tabletop drills, and threat-model agents before they ship. Tell us what you're running.