Why the harness is the control plane
The model decides what it wants to do. The harness decides what actually happens. It dispatches every tool call, assembles the context window, runs hooks before and after each action, and writes the audit trail. Treat it as application code with a hostile caller, because that is what an injected agent is.
Two layers carry the security weight, and they fail differently. The harness governs what the agent is permitted to attempt: tool scoping, hook enforcement, the Policy Enforcement Point. The host governs what a successful attempt can reach: network destinations, filesystem mounts, process boundaries, credentials. A flawless harness on a host that shares the developer's user account and SSH keys still hands an attacker the keys. Treat each as its own security problem, not as one undifferentiated pile of infrastructure.
Prompt-layer defenses lower the probability of a hijack. The harness and host limit its consequences. The first is best-effort against an adversary who gets unlimited attempts. The second still holds after the model has been turned. Spend accordingly.
Network egress: the single most effective control
An agent that cannot reach a destination cannot exfiltrate to it, no matter what it was tricked into deciding. Egress control bounds the worst case directly, which is why it earns the most attention.
- Default-deny outbound. The harness has no internet access until a destination is explicitly granted. Start from zero, not from the open internet minus a blocklist.
- Per-agent, per-tool allow-list. Enumerate the destinations each tool actually needs (
api.openai.com,slack.com, a specific news provider) and nothing else. A scope wide enough to be convenient is wide enough to exfiltrate through. - Domain filtering at an application-layer proxy. Squid, mitmproxy in transparent mode, or a cloud-provider equivalent enforces the allow-list above the packet level.
- DNS filtered at the resolver. An agent blocked at the proxy can still encode data into DNS lookups. Close that channel at the resolver, not only at the HTTP layer.
- Internal services over VPN only. Tailscale, WireGuard, or mTLS. Nothing the agent talks to should be reachable from the public internet.
- No SSH credentials between hosts. If the agent holds no keys, construction blocks lateral movement. You are not relying on a policy you have to trust.
Network egress is the single most effective control you can put on an agent. Get default-deny outbound plus a tight allow-list right, and most exfiltration paths close at once, including the ones you never anticipated. Most other hardening refines this one.
Filesystem isolation
The filesystem is the second exfiltration and tampering surface. An agent reads what it should not, writes where it should not, or climbs out of its working directory with a relative path. Bound all three at the tool boundary, where you control the code, not in the prompt, where you control nothing.
read_fileandwrite_fileaccept paths only inside an explicit working directory. Resolve and validate the canonical path; do not pattern-match the raw string.- Reject path traversal (
..) at the tool, not by asking the model to behave. - Mount system directories read-only. Mount source code read-only wherever the task allows.
- Exclude credential and config files from the agent's view entirely:
~/.ssh,~/.aws,~/.config/claude,~/.npmrc, browser profile directories. The agent should not be able to read what it cannot name. - Give each task an ephemeral filesystem (microvm or container) so an injection in one session leaves nothing behind for the next session.
Add capabilities deliberately, one at a time, with a recorded reason. The common failure runs the other way: an agent ships with wide filesystem access, then gets narrowed after an incident shows what it could already reach.
Hooks as inline guardrails
Hooks run the harness's own code in the path of every tool call, before the action executes and after it returns. They are defense-in-depth: a second enforcement point that does not depend on the model. They sit in front of the Policy Enforcement Point, which still makes the hard authorization decision.
| Hook | Job |
|---|---|
| PreToolUse · Bash | Match dangerous commands (rm -rf, git push --force, curl ... | sh); deny or escalate to a human. |
| PreToolUse · Write | Block writes outside the working directory and writes to dotfiles. |
| PreToolUse · WebFetch | Enforce the domain allow-list a second time, at hook level. |
| PostToolUse · * | Redact known secret patterns out of tool outputs before they reach the model's context. |
| UserPromptSubmit | Pattern-strip known injection markers on the way in. |
Disabling hooks for performance. A hook that runs on every tool call is exactly the code you want in the path once the agent has been turned. If hook latency is the problem, profile the hook. Do not remove the guardrail.
Process model
Shared state between agents turns one compromise into many. Keep the boundaries hard:
- One harness process per agent. No process shared between agents on a host.
- An ephemeral subprocess per task for code execution and shell, torn down when the task ends.
- No filesystem shared between agents.
- No shared credentials. Give each agent its own service account, so an attacker who turns one agent inherits that agent's authority and nothing past it.
Resource limits
Limits turn runaway behavior into a bounded, observable event. They cap the damage of a loop the model cannot escape, and the throughput of an exfiltration the egress controls failed to fully close.
- Loop-iteration cap and wall-clock limit per session.
- Per-tool rate limits per session.
- Token and monetary budgets per session, enforced by the harness.
- OS-level memory and disk quotas.
- An outbound byte ceiling per session. A sudden approach to the ceiling is a strong exfiltration signal.
Audit and observability
You cannot reconstruct an incident you never recorded, and you cannot trust a record the attacker could edit. Both halves matter.
- Log every tool call: identity, tool, redacted arguments, outcome, latency, cost.
- Write the audit log to a store outside the agent's reach. An agent with write access to its own log can rewrite history after the fact.
- Log conversation transcripts with provenance tags, so you can tell later which content was untrusted.
- Stream logs in real time. Batch upload leaves a window where a compromised agent can suppress its own evidence.
- Alert on outliers: a tool used for the first time, call volume past the 99th percentile, off-hours invocation, a new egress destination.
- Retain at least 90 days for general agents, longer for high-risk ones.
- Replay an old session into a sandbox now and then, and confirm the audit trail is enough to reconstruct it. An audit you have never tested is a guess.
Running the agent on a laptop under the developer's user account, with audit logs written inside the working directory and the harness web interface bound to a public port. Each one collapses a boundary the rest of this page is built to hold.
Checklist
Before running an agent on a shared or production host
- Default-deny outbound, with a per-agent, per-tool allow-list at an application-layer proxy.
- DNS filtered at the resolver; internal services reachable over VPN only; no inter-host SSH credentials.
- File tools confined to a working directory, path traversal rejected, credential directories excluded from view.
- Per-task ephemeral filesystem so injection does not persist across sessions.
- Pre and post hooks enforced on Bash, Write, and WebFetch, with secrets redacted from outputs.
- One process and one service account per agent; no shared filesystem, process, or credentials.
- Loop, wall-clock, rate, cost, memory, disk, and outbound-byte limits set per session.
- Every tool call logged to a store outside the agent's reach, streamed live, alerting on outliers.
- Audit reconstructability verified by replaying a real session into a sandbox.
Putting an agent on a host that touches production?
We harden agent harnesses and runtimes, run adversarial assessments, and train teams to build the host boundary in from day one. Tell us what you are running.