Least-privilege agent setup

The principle

Least privilege is one of the oldest ideas in security and one of the least applied in agent design. Most agent frameworks default to the opposite: hand the model a broad set of tools and trust the prompt to keep it in line. That puts the burden in the wrong place. The right default is deny-by-default. Every agent starts with zero tools, and you add each capability explicitly, only when a task requires it.

This is the layer that makes prompt injection survivable. You cannot stop an agent from being hijacked, but you can make sure a hijacked agent holds nothing worth stealing and can reach nothing worth breaking. Done well, least privilege turns what would be a critical incident into a logged non-event.

Diagram contrasting a broadly-scoped agent holding many tools against a narrowly-scoped agent with a single allow-listed capability. — A broad capability union (left) against scoped, single-purpose authority (right).

Per-tool scoping

Whether a tool is present is not the only knob. For each tool you grant, constrain its configuration as tightly as the task allows:

slack_post_message → a channel allow-list, not an arbitrary channel argument.
read_file / write_file → the working directory only; reject path-traversal (..) at the tool boundary.
web_fetch → a domain allow-list, not an arbitrary URL.
git_commit → a single repository, with write access limited to a specific branch.

Then apply three preferences, in order:

Read-only over read-write wherever the task permits.
Idempotent over non-idempotent.
Reversible over irreversible. Irreversible actions (delete, payment, public post, transaction) get human-in-the-loop confirmation.

The hardest tool

Shell access is the hardest capability to constrain, because it is a universal tool by design. If you grant it, run the agent in a sandbox and push every command through a Policy Enforcement Point. There is no safe "shell, but trust the prompt" configuration.

Per-role separation

The second axis is separation across agents. Run different tasks on different agents, each with its own minimal capability set.

A coding agent does not need email. A research agent does not need a shell. A design agent does not need access to secrets.
No "all-in-one assistant" with shell, email, Slack, database, and irreversible-action tools combined. One such agent collects the worst case of every threat class into a single blast radius.
Delegate, but keep authority scoped. When agent A delegates to agent B, B runs with B's capabilities, not the union of both. Delegation must not widen authority.
Per-role identity. Give each agent its own service account, tokens, and audit trail. Sharing one OAuth token across agents collapses every separation you just built.
Per-role guard prompts that explicitly enumerate the tools the agent does not have.

Sandboxing risky tasks

Code execution, package installation, untrusted-file processing, and any shell-using agent belong in a sandbox. Match the isolation level to the risk.

Isolation	Properties	Use when
Container	Cheap and fast, but escape paths exist for a capable adversary	Low-risk, trusted-ish code
VM / microVM (Firecracker, Kata)	Kernel-level isolation	Executing genuinely untrusted code
Ephemeral per-task	Spun up for one task and destroyed after, so impact does not accumulate	Any repeated untrusted workload

Keep the controls outside the box

Long-lived credentials and persistent identity stay outside the sandbox. You sandbox the work, not the controls that govern it. If the secrets live in the same environment you are trying to isolate, you have isolated nothing.

Multi-agent discipline

Once agents talk to each other, the boundaries between them are trust boundaries. Treat them that way.

Orchestrator routing derives from the user's prompt, never from text produced by a sub-agent or by fetched content. An orchestrator that routes on sub-agent output is a confused deputy waiting to happen.
Treat each inter-agent boundary like external input: guard prompt, provenance tag, and scope check at every A → B → C hop.
Prefer direct hand-offs over shared memory and broadcast. A hand-off keeps an error contained to one consumer; shared memory lets it spread to everyone who reads. Where shared memory is unavoidable, segregate by writer and tag by provenance.
Sign hand-offs from A to B and verify the signature at B's enforcement point. Confused-deputy patterns then show up at the boundary instead of going unnoticed.

The L0→L3 ladder

Most teams start at L0 without naming it. Use this to find where you sit and what the next step up buys you.

Level	Tool scope	Role separation	Sandbox	Audit
L0	"Whatever the model needs"	One agent does everything	None	Conversation only
L1	Tool allow-list per agent	Separate prompts	None	Conversation + tool calls
L2	Per-tool argument constraints	Distinct service accounts	Containers for shell-using agents	Tool calls in a central store
L3	PEP at every call, HITL for high-risk	Hand-off only, no shared scope	microVM per task, ephemeral	Tamper-evident, replay-capable

Anti-patterns to remove

Granting shell access by default.
Sharing OAuth tokens across multiple agents.
Running the agent under the user's own identity.
Treating prompt content as a substitute for enforced scope.
Treating the orchestrator as implicitly trusted.
Deferring sandboxing on cost grounds.

Checklist

When adding a capability or onboarding a new agent

Start from zero tools; add each capability explicitly and justify it.
Constrain each tool's arguments at the boundary, not in the prompt.
Default to read-only, idempotent, reversible; gate the rest behind confirmation.
Split capabilities across role-specific agents; no all-in-one assistant.
Give each agent its own identity, tokens, and audit trail.
Sandbox any code execution or shell use; keep credentials outside the sandbox.
Route the orchestrator on the user's prompt only; treat hand-offs as external input.
Know which ladder level you are at, and what the next one removes.

Designing a multi-agent system?

We review agent architectures for least-privilege failures before they reach production, and we build the enforcement layer when off-the-shelf frameworks fall short. Tell us what you are working on.

Get in touch Read: prompt injection