The principle
Least privilege is one of the oldest ideas in security and one of the least applied in agent design. Most agent frameworks default to the opposite: hand the model a broad set of tools and trust the prompt to keep it in line. That puts the burden in the wrong place. The right default is deny-by-default. Every agent starts with zero tools, and you add each capability explicitly, only when a task requires it.
This is the layer that makes prompt injection survivable. You cannot stop an agent from being hijacked, but you can make sure a hijacked agent holds nothing worth stealing and can reach nothing worth breaking. Done well, least privilege turns what would be a critical incident into a logged non-event.
Per-tool scoping
Whether a tool is present is not the only knob. For each tool you grant, constrain its configuration as tightly as the task allows:
slack_post_message→ a channel allow-list, not an arbitrary channel argument.read_file/write_file→ the working directory only; reject path-traversal (..) at the tool boundary.web_fetch→ a domain allow-list, not an arbitrary URL.git_commit→ a single repository, with write access limited to a specific branch.
Then apply three preferences, in order:
- Read-only over read-write wherever the task permits.
- Idempotent over non-idempotent.
- Reversible over irreversible. Irreversible actions (delete, payment, public post, transaction) get human-in-the-loop confirmation.
Shell access is the hardest capability to constrain, because it is a universal tool by design. If you grant it, run the agent in a sandbox and push every command through a Policy Enforcement Point. There is no safe "shell, but trust the prompt" configuration.
Per-role separation
The second axis is separation across agents. Run different tasks on different agents, each with its own minimal capability set.
- A coding agent does not need email. A research agent does not need a shell. A design agent does not need access to secrets.
- No "all-in-one assistant" with shell, email, Slack, database, and irreversible-action tools combined. One such agent collects the worst case of every threat class into a single blast radius.
- Delegate, but keep authority scoped. When agent A delegates to agent B, B runs with B's capabilities, not the union of both. Delegation must not widen authority.
- Per-role identity. Give each agent its own service account, tokens, and audit trail. Sharing one OAuth token across agents collapses every separation you just built.
- Per-role guard prompts that explicitly enumerate the tools the agent does not have.
Sandboxing risky tasks
Code execution, package installation, untrusted-file processing, and any shell-using agent belong in a sandbox. Match the isolation level to the risk.
| Isolation | Properties | Use when |
|---|---|---|
| Container | Cheap and fast, but escape paths exist for a capable adversary | Low-risk, trusted-ish code |
| VM / microVM (Firecracker, Kata) | Kernel-level isolation | Executing genuinely untrusted code |
| Ephemeral per-task | Spun up for one task and destroyed after, so impact does not accumulate | Any repeated untrusted workload |
Long-lived credentials and persistent identity stay outside the sandbox. You sandbox the work, not the controls that govern it. If the secrets live in the same environment you are trying to isolate, you have isolated nothing.
Multi-agent discipline
Once agents talk to each other, the boundaries between them are trust boundaries. Treat them that way.
- Orchestrator routing derives from the user's prompt, never from text produced by a sub-agent or by fetched content. An orchestrator that routes on sub-agent output is a confused deputy waiting to happen.
- Treat each inter-agent boundary like external input: guard prompt, provenance tag, and scope check at every A → B → C hop.
- Prefer direct hand-offs over shared memory and broadcast. A hand-off keeps an error contained to one consumer; shared memory lets it spread to everyone who reads. Where shared memory is unavoidable, segregate by writer and tag by provenance.
- Sign hand-offs from A to B and verify the signature at B's enforcement point. Confused-deputy patterns then show up at the boundary instead of going unnoticed.
The L0→L3 ladder
Most teams start at L0 without naming it. Use this to find where you sit and what the next step up buys you.
| Level | Tool scope | Role separation | Sandbox | Audit |
|---|---|---|---|---|
| L0 | "Whatever the model needs" | One agent does everything | None | Conversation only |
| L1 | Tool allow-list per agent | Separate prompts | None | Conversation + tool calls |
| L2 | Per-tool argument constraints | Distinct service accounts | Containers for shell-using agents | Tool calls in a central store |
| L3 | PEP at every call, HITL for high-risk | Hand-off only, no shared scope | microVM per task, ephemeral | Tamper-evident, replay-capable |
Anti-patterns to remove
- Granting shell access by default.
- Sharing OAuth tokens across multiple agents.
- Running the agent under the user's own identity.
- Treating prompt content as a substitute for enforced scope.
- Treating the orchestrator as implicitly trusted.
- Deferring sandboxing on cost grounds.
Checklist
When adding a capability or onboarding a new agent
- Start from zero tools; add each capability explicitly and justify it.
- Constrain each tool's arguments at the boundary, not in the prompt.
- Default to read-only, idempotent, reversible; gate the rest behind confirmation.
- Split capabilities across role-specific agents; no all-in-one assistant.
- Give each agent its own identity, tokens, and audit trail.
- Sandbox any code execution or shell use; keep credentials outside the sandbox.
- Route the orchestrator on the user's prompt only; treat hand-offs as external input.
- Know which ladder level you are at, and what the next one removes.
Designing a multi-agent system?
We review agent architectures for least-privilege failures before they reach production, and we build the enforcement layer when off-the-shelf frameworks fall short. Tell us what you are working on.