Why the harness is the control plane

The model decides what it wants to do. The harness decides what actually happens. It dispatches every tool call, assembles the context window, runs hooks before and after each action, and writes the audit trail. Treat it as application code with a hostile caller, because that is what an injected agent is.

Two layers carry the security weight, and they fail differently. The harness governs what the agent is permitted to attempt: tool scoping, hook enforcement, the Policy Enforcement Point. The host governs what a successful attempt can reach: network destinations, filesystem mounts, process boundaries, credentials. A flawless harness on a host that shares the developer's user account and SSH keys still hands an attacker the keys. Treat each as its own security problem, not as one undifferentiated pile of infrastructure.

The framing that matters

Prompt-layer defenses lower the probability of a hijack. The harness and host limit its consequences. The first is best-effort against an adversary who gets unlimited attempts. The second still holds after the model has been turned. Spend accordingly.

Diagram: an agent's tool call passes through hooks and a Policy Enforcement Point in the harness, then out through an egress proxy and DNS filter on the host, with the audit log written to a store outside the agent's reach.
The harness enforces policy on the call. The host bounds where the call can go and what it can touch.

Network egress: the single most effective control

An agent that cannot reach a destination cannot exfiltrate to it, no matter what it was tricked into deciding. Egress control bounds the worst case directly, which is why it earns the most attention.

The single most important decision

Network egress is the single most effective control you can put on an agent. Get default-deny outbound plus a tight allow-list right, and most exfiltration paths close at once, including the ones you never anticipated. Most other hardening refines this one.

Filesystem isolation

The filesystem is the second exfiltration and tampering surface. An agent reads what it should not, writes where it should not, or climbs out of its working directory with a relative path. Bound all three at the tool boundary, where you control the code, not in the prompt, where you control nothing.

Note

Add capabilities deliberately, one at a time, with a recorded reason. The common failure runs the other way: an agent ships with wide filesystem access, then gets narrowed after an incident shows what it could already reach.

Hooks as inline guardrails

Hooks run the harness's own code in the path of every tool call, before the action executes and after it returns. They are defense-in-depth: a second enforcement point that does not depend on the model. They sit in front of the Policy Enforcement Point, which still makes the hard authorization decision.

HookJob
PreToolUse · BashMatch dangerous commands (rm -rf, git push --force, curl ... | sh); deny or escalate to a human.
PreToolUse · WriteBlock writes outside the working directory and writes to dotfiles.
PreToolUse · WebFetchEnforce the domain allow-list a second time, at hook level.
PostToolUse · *Redact known secret patterns out of tool outputs before they reach the model's context.
UserPromptSubmitPattern-strip known injection markers on the way in.
Anti-pattern

Disabling hooks for performance. A hook that runs on every tool call is exactly the code you want in the path once the agent has been turned. If hook latency is the problem, profile the hook. Do not remove the guardrail.

Process model

Shared state between agents turns one compromise into many. Keep the boundaries hard:

Resource limits

Limits turn runaway behavior into a bounded, observable event. They cap the damage of a loop the model cannot escape, and the throughput of an exfiltration the egress controls failed to fully close.

Audit and observability

You cannot reconstruct an incident you never recorded, and you cannot trust a record the attacker could edit. Both halves matter.

Anti-pattern

Running the agent on a laptop under the developer's user account, with audit logs written inside the working directory and the harness web interface bound to a public port. Each one collapses a boundary the rest of this page is built to hold.

Checklist

Before running an agent on a shared or production host

  • Default-deny outbound, with a per-agent, per-tool allow-list at an application-layer proxy.
  • DNS filtered at the resolver; internal services reachable over VPN only; no inter-host SSH credentials.
  • File tools confined to a working directory, path traversal rejected, credential directories excluded from view.
  • Per-task ephemeral filesystem so injection does not persist across sessions.
  • Pre and post hooks enforced on Bash, Write, and WebFetch, with secrets redacted from outputs.
  • One process and one service account per agent; no shared filesystem, process, or credentials.
  • Loop, wall-clock, rate, cost, memory, disk, and outbound-byte limits set per session.
  • Every tool call logged to a store outside the agent's reach, streamed live, alerting on outliers.
  • Audit reconstructability verified by replaying a real session into a sandbox.

Putting an agent on a host that touches production?

We harden agent harnesses and runtimes, run adversarial assessments, and train teams to build the host boundary in from day one. Tell us what you are running.