Why zero trust is the right frame
Individual agent controls (guard prompts, sandboxes, credential vaults, redaction filters) accumulate ad hoc unless something ties them together. After a few months of operation you find overlapping guards, contradictory policies, and no single component responsible for any given decision. A control added for one threat gets quietly worked around to address another. Zero Trust Architecture (ZTA) is the frame that makes those controls compose instead of pile up.
Zero trust is an architectural posture rather than a product you buy. Its three principles (never trust by default, verify every access decision explicitly, and design on the assumption that some component is already compromised) were written for human and workload access in NIST SP 800-207 and proven at scale by Google's BeyondCorp. The translation to agents is unusually direct. Every tool call is a trust boundary, every external input may carry adversarial content, and every agent is a non-deterministic process holding credentials.
The perimeter model assumed a friendly interior behind a hardened edge. Agents have no such interior. The agent runs inside the perimeter and is continuously influenced by content that originated outside it: fetched pages, files, channel messages, tool and sub-agent outputs. An agent violates the friendly-interior assumption by design, which is exactly why the perimeter model fails for agents and why ZTA's assumptions match the real environment.
The agent is a non-deterministic principal. Never trust its decisions implicitly, and enforce every one at the boundary. What stops a rogue tool call is a control placed outside the agent's reach. A control expressed in its prompt does not.
The PDP/PEP model
NIST SP 800-207 defines a small vocabulary worth getting straight, because the rest of this frame depends on it. The subject is the entity making the request. The resource is what it requests: an API, a file, a signing service, a database. The Policy Engine decides whether this subject may access this resource right now under these conditions, and emits a yes/no with reasons. The Policy Administrator turns that decision into a session credential. Together they form the Policy Decision Point (PDP), which makes the decision. The Policy Enforcement Point (PEP) is the inline component that allows or denies traffic on the data plane and carries it out.
For agents the mapping is one-to-one. The agent is the subject, every tool is a resource, and every tool call passes through a PEP. The PEP is where you actually stop a call: the wire-level decision that it will or will not be dispatched, rather than the model's intent inferred from natural language. The PDP is where you express the policy governing that decision: what credentials are required, what arguments are admissible, whether human confirmation is needed.
The architecture is deliberately small: one PDP (co-located with the harness or run as a sidecar) and one PEP per tool. The PDP holds the policy and reads the contextual signals (which agent, which tool, which arguments, what the previous tool call was, whether any external content has been ingested this session). The PEP enforces the result on the wire. Because the agent may be influenced or compromised, the controls sit where it cannot touch them.
Verify explicitly, for agents
Every access decision is authenticated and authorised on all available signals, not granted implicitly by origin. For agents this has four consequences.
The agent is a subject with its own identity. Not "the user's identity, used by the agent". This matters for authorisation, where the agent's privileges should not be the user's, and for audit. Once an agent acts as a person, the trail can no longer separate human action from agent action, and revoking the agent means revoking the human. Give each agent its own service account, credentials and identity material.
Authorise per tool call, not per session. Granting one broad authorisation at session start ("act on behalf of the user for thirty minutes") collapses the access decision into a moment that occurred before any later action was chosen. Authorise each invocation of a state-changing tool (delete_file, send_email, deploy_release, make_payment) independently.
Consider signals beyond the prompt. Which tool, which arguments, whether the arguments came from a fetched URL or from direct user input, whether the previous call was a network fetch that put external instructions into context. A slack_post_message whose channel argument was extracted from a fetched web page deserves stricter scrutiny than one whose channel is the constant #research declared in the tool configuration.
Escalate high-risk calls to a human. The agent decides, the PEP intercepts, and the operator is asked "the agent decided to do this, are you also OK with it?". The call proceeds only on yes. The high-risk class is deployment-specific but typically covers anything irreversible: payments, deletions, public posts, transactions, external email.
Least privilege, for agents
Applied to agents, least privilege reads at both the tool level and the role level.
At the tool level, an agent's tools are its capabilities, so removing a tool is the strongest deprivileging available. The first question for any configured tool is whether the agent needs it. A research agent rarely needs a shell. A coding agent rarely needs slack_post_message. A design agent rarely needs production database access. Tool inventory is the most effective single lever for reducing blast radius, and it grows silently during development, which makes it the easiest to overlook.
Where a tool must stay, scope it tightly at the boundary. A slack_post_message that can post to any channel is too broad; constrain it to #research or to direct messages. A read_file that accepts an arbitrary path is too broad; constrain it to the working directory. A web_fetch that accepts an arbitrary URL is too broad; constrain it to a domain allow-list. Credentials follow the same logic. Prefer short-lived just-in-time credentials issued at call time and revoked at session end over long-lived broad-scope tokens.
Least privilege applies recursively. In a multi-agent system an orchestrator does not need the union of every sub-agent's permissions. Hand-offs should pass the minimum capability required for the delegated task. An orchestrator that accumulates authority is the principal multi-agent failure mode.
At the role level, different tasks belong to different agents. A coding agent does not need email. A research agent does not need a shell. The all-in-one assistant (one agent with shell, email, Slack, database and irreversible-action tools) is convenient to stand up, and it combines the worst case of every threat class. Role separation is a structural defence that bounds what any single compromise can reach.
Assume breach, for agents
Design for the case where a component is already compromised. Five practical consequences.
- Treat every external input as adversarial. Web pages, fetched files, channel messages, MCP server outputs and sub-agent return values are all potential carriers of injected instructions. The defender's question is "this will be hostile, what then?". The strict-ignore guard is the prompt-level answer. Tool-side enforcement is the hard one.
- Bound the blast radius of any single agent. A compromised research agent should not reach production, and a compromised coding agent should not reach personal email. Role separation is the structural means, and harness and runtime controls are the operational means.
- Audit every action that crosses a tool boundary. Log the tool name, arguments, decision and outcome, stored separately from the agent. A log the agent can write is a log it can rewrite after a successful injection.
- Encrypt agent-to-tool channels on infrastructure the operator controls. mTLS between harness and tool servers, an overlay network for cross-host links, and no tool interfaces on open public ports. Encryption does not prevent compromise, since a hostile MCP server speaks TLS too, but it removes cheap attacks and pins the trusted components to the ones the operator chose.
- Plan for compromise. A runbook that says, in plain language, "this agent is misbehaving, do these things" beats any number of preventive controls. It typically includes a kill switch, a token rotation procedure, and a means to replay a session's action history. The point is that it exists before the incident.
A zero-trust maturity ladder
Five dimensions across four coarse levels. The goal is to locate a deployment's current state and pick a direction of travel, not to define a certification scheme.
| Level | Identity | Tool authorisation | Egress | Secrets | Audit |
|---|---|---|---|---|---|
| L0 Naive | Agent shares user identity | Tools have full scope | Open internet | Cleartext in env / config | None |
| L1 Basic | Distinct service account per agent | Tool scopes per agent | Limited egress (allow-list) | Env vars, none in repos | Conversation log |
| L2 Standard | Per-agent identity + per-session token | Per-tool-call PEP, role-separated | VPN-gated egress | Password manager / KMS | Conversation + tool calls logged centrally |
| L3 Hardened | Short-lived JIT credentials, identity attested | Risk-scored decisions, HITL for high-risk | No egress except declared | Vault, dynamic per-call secrets | Tamper-evident audit, replay capability |
L0 is the default state of many current deployments and the first priority for remediation: shared identity, full tool scope, open internet, long-lived credentials in environment variables, no trail beyond the conversation log. Every property is individually fixable. L2 is a realistic target for most organisations. L3 is appropriate where the agent touches irreversible actions, regulated data, or systems where compromise is materially expensive.
The ladder is not a competition. Aim for consistent L2 across a deployment, with selective L3 on the agents that carry the greatest blast radius. A read-only research agent does not need L3. An agent that signs blockchain transactions plausibly needs L3 on every dimension, because a successful injection there is non-recoverable.
Patterns to avoid
A handful of recurring anti-patterns account for a disproportionate share of incidents.
- Running the agent under the user's identity. Accountability collapses, the trail stops distinguishing human from agent, and revocation has to take out the human too. The fix is mechanical: a service identity per agent.
- Granting all tools to every agent. Tool inventory is the most effective lever for reducing blast radius and the easiest to neglect. General model capability is no substitute for tool-scope enforcement, because an agent can only call the tools it holds.
- Holding a long-lived broad-scope OAuth token. This is the most common cause of high-impact secret leaks in agentic deployments. Prefer short-lived, narrow-scope, service-to-service credentials.
- Relying on instruction-level guardrails as the hard control. "Tell the model not to do X" is a useful soft layer. If the only thing between an agent and a destructive tool call is the wording of the system prompt, the deployment is at L0. Pair the guard with enforcement at the tool boundary.
- Storing audit logs in the agent's reach. An agent with write access to its own log can edit it, so after an injection the log is not a trustworthy record. Write audit data to a store the agent cannot reach.
- Treating confirmation prompts as friction. If confirmations are too frequent to be useful, the agent has too many sensitive capabilities. Split it into more narrowly scoped agents rather than silencing the prompts.
Checklist
Before shipping an agent with tools
- The agent has its own service identity, distinct from any human user.
- Each agent holds only the tools its task requires; tool inventory has been reviewed.
- Every tool is scoped at the boundary: allow-listed channels, paths, domains.
- Every state-changing tool call passes a PEP that authorises per call, not per session.
- The PDP reads signals beyond the prompt: tool, arguments, argument provenance, prior fetches.
- High-risk and irreversible actions require human confirmation at the PEP.
- Credentials are short-lived and narrowly scoped; no long-lived broad tokens in env or config.
- Egress is constrained to declared destinations.
- Tool calls, not just conversation, are logged to a store the agent cannot modify.
- A compromise runbook exists: kill switch, token rotation, session replay.
- You can name the level (L0 to L3) on each ladder dimension and your next move up.
Locating your agents on the ladder?
We run pre-deployment threat models and zero-trust posture reviews for production AI, and we train teams to build identity, scope and enforcement in from the start. Tell us what you are running.