Why zero trust is the right frame

Individual agent controls (guard prompts, sandboxes, credential vaults, redaction filters) accumulate ad hoc unless something ties them together. After a few months of operation you find overlapping guards, contradictory policies, and no single component responsible for any given decision. A control added for one threat gets quietly worked around to address another. Zero Trust Architecture (ZTA) is the frame that makes those controls compose instead of pile up.

Zero trust is an architectural posture rather than a product you buy. Its three principles (never trust by default, verify every access decision explicitly, and design on the assumption that some component is already compromised) were written for human and workload access in NIST SP 800-207 and proven at scale by Google's BeyondCorp. The translation to agents is unusually direct. Every tool call is a trust boundary, every external input may carry adversarial content, and every agent is a non-deterministic process holding credentials.

The perimeter model assumed a friendly interior behind a hardened edge. Agents have no such interior. The agent runs inside the perimeter and is continuously influenced by content that originated outside it: fetched pages, files, channel messages, tool and sub-agent outputs. An agent violates the friendly-interior assumption by design, which is exactly why the perimeter model fails for agents and why ZTA's assumptions match the real environment.

The central reframe

The agent is a non-deterministic principal. Never trust its decisions implicitly, and enforce every one at the boundary. What stops a rogue tool call is a control placed outside the agent's reach. A control expressed in its prompt does not.

The PDP/PEP model

NIST SP 800-207 defines a small vocabulary worth getting straight, because the rest of this frame depends on it. The subject is the entity making the request. The resource is what it requests: an API, a file, a signing service, a database. The Policy Engine decides whether this subject may access this resource right now under these conditions, and emits a yes/no with reasons. The Policy Administrator turns that decision into a session credential. Together they form the Policy Decision Point (PDP), which makes the decision. The Policy Enforcement Point (PEP) is the inline component that allows or denies traffic on the data plane and carries it out.

For agents the mapping is one-to-one. The agent is the subject, every tool is a resource, and every tool call passes through a PEP. The PEP is where you actually stop a call: the wire-level decision that it will or will not be dispatched, rather than the model's intent inferred from natural language. The PDP is where you express the policy governing that decision: what credentials are required, what arguments are admissible, whether human confirmation is needed.

Diagram: an agent (subject) requests access to tools (resources); each tool call is intercepted by a PEP that consults a PDP, which holds policy and reads contextual signals before returning an allow or deny decision enforced on the data plane.
The agent is the subject and each tool a resource. A PEP intercepts every call and consults the PDP, which holds policy and reads context. The agent is not trusted to honour either of them.

The architecture is deliberately small: one PDP (co-located with the harness or run as a sidecar) and one PEP per tool. The PDP holds the policy and reads the contextual signals (which agent, which tool, which arguments, what the previous tool call was, whether any external content has been ingested this session). The PEP enforces the result on the wire. Because the agent may be influenced or compromised, the controls sit where it cannot touch them.

Verify explicitly, for agents

Every access decision is authenticated and authorised on all available signals, not granted implicitly by origin. For agents this has four consequences.

The agent is a subject with its own identity. Not "the user's identity, used by the agent". This matters for authorisation, where the agent's privileges should not be the user's, and for audit. Once an agent acts as a person, the trail can no longer separate human action from agent action, and revoking the agent means revoking the human. Give each agent its own service account, credentials and identity material.

Authorise per tool call, not per session. Granting one broad authorisation at session start ("act on behalf of the user for thirty minutes") collapses the access decision into a moment that occurred before any later action was chosen. Authorise each invocation of a state-changing tool (delete_file, send_email, deploy_release, make_payment) independently.

Consider signals beyond the prompt. Which tool, which arguments, whether the arguments came from a fetched URL or from direct user input, whether the previous call was a network fetch that put external instructions into context. A slack_post_message whose channel argument was extracted from a fetched web page deserves stricter scrutiny than one whose channel is the constant #research declared in the tool configuration.

Escalate high-risk calls to a human. The agent decides, the PEP intercepts, and the operator is asked "the agent decided to do this, are you also OK with it?". The call proceeds only on yes. The high-risk class is deployment-specific but typically covers anything irreversible: payments, deletions, public posts, transactions, external email.

Least privilege, for agents

Applied to agents, least privilege reads at both the tool level and the role level.

At the tool level, an agent's tools are its capabilities, so removing a tool is the strongest deprivileging available. The first question for any configured tool is whether the agent needs it. A research agent rarely needs a shell. A coding agent rarely needs slack_post_message. A design agent rarely needs production database access. Tool inventory is the most effective single lever for reducing blast radius, and it grows silently during development, which makes it the easiest to overlook.

Where a tool must stay, scope it tightly at the boundary. A slack_post_message that can post to any channel is too broad; constrain it to #research or to direct messages. A read_file that accepts an arbitrary path is too broad; constrain it to the working directory. A web_fetch that accepts an arbitrary URL is too broad; constrain it to a domain allow-list. Credentials follow the same logic. Prefer short-lived just-in-time credentials issued at call time and revoked at session end over long-lived broad-scope tokens.

Note

Least privilege applies recursively. In a multi-agent system an orchestrator does not need the union of every sub-agent's permissions. Hand-offs should pass the minimum capability required for the delegated task. An orchestrator that accumulates authority is the principal multi-agent failure mode.

At the role level, different tasks belong to different agents. A coding agent does not need email. A research agent does not need a shell. The all-in-one assistant (one agent with shell, email, Slack, database and irreversible-action tools) is convenient to stand up, and it combines the worst case of every threat class. Role separation is a structural defence that bounds what any single compromise can reach.

Assume breach, for agents

Design for the case where a component is already compromised. Five practical consequences.

A zero-trust maturity ladder

Five dimensions across four coarse levels. The goal is to locate a deployment's current state and pick a direction of travel, not to define a certification scheme.

LevelIdentityTool authorisationEgressSecretsAudit
L0 NaiveAgent shares user identityTools have full scopeOpen internetCleartext in env / configNone
L1 BasicDistinct service account per agentTool scopes per agentLimited egress (allow-list)Env vars, none in reposConversation log
L2 StandardPer-agent identity + per-session tokenPer-tool-call PEP, role-separatedVPN-gated egressPassword manager / KMSConversation + tool calls logged centrally
L3 HardenedShort-lived JIT credentials, identity attestedRisk-scored decisions, HITL for high-riskNo egress except declaredVault, dynamic per-call secretsTamper-evident audit, replay capability

L0 is the default state of many current deployments and the first priority for remediation: shared identity, full tool scope, open internet, long-lived credentials in environment variables, no trail beyond the conversation log. Every property is individually fixable. L2 is a realistic target for most organisations. L3 is appropriate where the agent touches irreversible actions, regulated data, or systems where compromise is materially expensive.

The ladder is not a competition. Aim for consistent L2 across a deployment, with selective L3 on the agents that carry the greatest blast radius. A read-only research agent does not need L3. An agent that signs blockchain transactions plausibly needs L3 on every dimension, because a successful injection there is non-recoverable.

Patterns to avoid

A handful of recurring anti-patterns account for a disproportionate share of incidents.

Checklist

Before shipping an agent with tools

  • The agent has its own service identity, distinct from any human user.
  • Each agent holds only the tools its task requires; tool inventory has been reviewed.
  • Every tool is scoped at the boundary: allow-listed channels, paths, domains.
  • Every state-changing tool call passes a PEP that authorises per call, not per session.
  • The PDP reads signals beyond the prompt: tool, arguments, argument provenance, prior fetches.
  • High-risk and irreversible actions require human confirmation at the PEP.
  • Credentials are short-lived and narrowly scoped; no long-lived broad tokens in env or config.
  • Egress is constrained to declared destinations.
  • Tool calls, not just conversation, are logged to a store the agent cannot modify.
  • A compromise runbook exists: kill switch, token rotation, session replay.
  • You can name the level (L0 to L3) on each ladder dimension and your next move up.

Locating your agents on the ladder?

We run pre-deployment threat models and zero-trust posture reviews for production AI, and we train teams to build identity, scope and enforcement in from the start. Tell us what you are running.