The principle

Least privilege is one of the oldest ideas in security and one of the least applied in agent design. Most agent frameworks default to the opposite: hand the model a broad set of tools and trust the prompt to keep it in line. That puts the burden in the wrong place. The right default is deny-by-default. Every agent starts with zero tools, and you add each capability explicitly, only when a task requires it.

This is the layer that makes prompt injection survivable. You cannot stop an agent from being hijacked, but you can make sure a hijacked agent holds nothing worth stealing and can reach nothing worth breaking. Done well, least privilege turns what would be a critical incident into a logged non-event.

Diagram contrasting a broadly-scoped agent holding many tools against a narrowly-scoped agent with a single allow-listed capability.
A broad capability union (left) against scoped, single-purpose authority (right).

Per-tool scoping

Whether a tool is present is not the only knob. For each tool you grant, constrain its configuration as tightly as the task allows:

Then apply three preferences, in order:

The hardest tool

Shell access is the hardest capability to constrain, because it is a universal tool by design. If you grant it, run the agent in a sandbox and push every command through a Policy Enforcement Point. There is no safe "shell, but trust the prompt" configuration.

Per-role separation

The second axis is separation across agents. Run different tasks on different agents, each with its own minimal capability set.

Sandboxing risky tasks

Code execution, package installation, untrusted-file processing, and any shell-using agent belong in a sandbox. Match the isolation level to the risk.

IsolationPropertiesUse when
ContainerCheap and fast, but escape paths exist for a capable adversaryLow-risk, trusted-ish code
VM / microVM (Firecracker, Kata)Kernel-level isolationExecuting genuinely untrusted code
Ephemeral per-taskSpun up for one task and destroyed after, so impact does not accumulateAny repeated untrusted workload
Keep the controls outside the box

Long-lived credentials and persistent identity stay outside the sandbox. You sandbox the work, not the controls that govern it. If the secrets live in the same environment you are trying to isolate, you have isolated nothing.

Multi-agent discipline

Once agents talk to each other, the boundaries between them are trust boundaries. Treat them that way.

The L0→L3 ladder

Most teams start at L0 without naming it. Use this to find where you sit and what the next step up buys you.

LevelTool scopeRole separationSandboxAudit
L0"Whatever the model needs"One agent does everythingNoneConversation only
L1Tool allow-list per agentSeparate promptsNoneConversation + tool calls
L2Per-tool argument constraintsDistinct service accountsContainers for shell-using agentsTool calls in a central store
L3PEP at every call, HITL for high-riskHand-off only, no shared scopemicroVM per task, ephemeralTamper-evident, replay-capable

Anti-patterns to remove

Checklist

When adding a capability or onboarding a new agent

  • Start from zero tools; add each capability explicitly and justify it.
  • Constrain each tool's arguments at the boundary, not in the prompt.
  • Default to read-only, idempotent, reversible; gate the rest behind confirmation.
  • Split capabilities across role-specific agents; no all-in-one assistant.
  • Give each agent its own identity, tokens, and audit trail.
  • Sandbox any code execution or shell use; keep credentials outside the sandbox.
  • Route the orchestrator on the user's prompt only; treat hand-offs as external input.
  • Know which ladder level you are at, and what the next one removes.

Designing a multi-agent system?

We review agent architectures for least-privilege failures before they reach production, and we build the enforcement layer when off-the-shelf frameworks fall short. Tell us what you are working on.