Threat modeling an agentic system

Why agentic systems need their own threat model

Classical threat modeling assumes a deterministic system with human users at the edges and code in the middle. An agentic system breaks both assumptions. The "user" driving an action may be a model responding to text it read three tool calls ago. The component you most need to constrain, the model, is the one you cannot fully specify. The most dangerous data flow is usually internal: tool output feeding the next decision.

The discipline still transfers. You decompose, find boundaries, enumerate threats per component, and trace the worst ones to ground. The difference is that the control flow underneath is probabilistic.

Scope this correctly

Threat modeling is a design-level activity. It tells you where the attack surface is and which controls to enforce. It is not a line-by-line code audit and does not replace one. Run it before you build, then again before you ship.

The method in five steps

STRIDE-per-component gives you breadth. Attack trees give you depth on the threats that warrant it. The five steps:

Decompose the system into components and data flows.
Draw the trust boundaries.
Apply STRIDE to each component.
Build attack trees for the highest-severity threats.
Map each leaf to an enforced control, and record what residual risk remains.

1 · Decompose the system

List every component and every data flow between them. For an agent, the components are almost always some subset of:

The model (the decision-maker), and its system prompt.
The harness / runtime that drives the agent loop.
Each tool, with its scope and the system it reaches.
External content sources (web, files, tickets, channels).
Memory and state (conversation, vector store, scratch files).
Other agents and the orchestrator.
Secrets and identity (tokens, service accounts).

For each data flow, note the direction and what crosses it. Injection and confused-deputy attacks live in the flows.

2 · Draw the trust boundaries

A trust boundary is any line across which the trust level changes. In agentic systems, the non-obvious ones do the damage:

External content → model context. The classic indirect-injection boundary.
Tool output → model context. Poisoned results re-enter as instructions.
Sub-agent output → orchestrator. Internal, so teams routinely forget it.
Model decision → tool execution. This is where authority actually gets exercised.
Inside the sandbox → outside. Credentials must not cross here.

Anything crossing a boundary is untrusted on the far side until a control says otherwise. Mark every crossing. Each one generates threats in the next step.

3 · Apply STRIDE per component

For each component, walk the six categories. Read each one through an agentic lens:

STRIDE	What it means for an agent
Spoofing	One agent or tool impersonates another; an injected instruction impersonates the user or the system prompt.
Tampering	Poisoned fetched content, manipulated tool output, altered memory or vector-store entries.
Repudiation	No tamper-evident audit of which agent took which action with which authority.
Information disclosure	Secrets in prompts, tokens echoed in tool output, system prompt exfiltrated, cross-tenant memory leakage.
Denial of service	Token/cost exhaustion, infinite tool loops, resource starvation in the runtime.
Elevation of privilege	Confused deputy: a low-trust input induces a high-authority tool call. This is the threat that defines agentic systems.

Most teams stop here, with a list. The list feeds the next step. It is not the deliverable.

4 · Build attack trees for the top threats

Pick the highest-severity threats from STRIDE (elevation of privilege and information disclosure usually dominate) and decompose each into an attack tree. The root is the attacker's goal. The branches are the ways to reach it. The leaves are concrete, testable preconditions.

Take "attacker induces a fund-moving transaction." The tree might branch into: inject via a fetched page → reach a payment tool with an unconstrained amount argument → no human confirmation on the path. Each leaf is now something you can verify as true or false against your design.

An attack tree earns its keep when its leaves are specific enough to check. "The model could be tricked" gives you nothing to test. "web_fetch output reaches send_payment with no PEP between them" does.

A threat matrix mapping system components against threat categories, with severity shading. — The output: a per-component threat matrix that produces the control list.

5 · Map to controls and record residual risk

Every leaf in every attack tree maps to exactly one of three outcomes: an enforced control that cuts the path (a PEP, a scope constraint, a confirmation gate), an accepted risk that you document and own, or a gap that becomes work. A threat model without a residual-risk section is incomplete. It implies you closed everything, which is rarely true.

The deliverable is short and structured: system context, trust boundaries, the prioritised threats, the control mapped to each, and the residual risk. A real engagement uses the same structure, and so does the example below.

Worked example

We publish a complete pre-deployment threat model for a fictional multi-agent system, with the same structure and depth as a client engagement: system context, trust boundaries, prioritised risks, and recommended mitigations. The system is invented, but the format matches what we ship to clients.

Read the example threat model

A pre-deployment threat model for an example multi-agent system. It is what this method produces when you run it end to end.

Download PDF (240 KB) Have us run one

Checklist

A threat model is done when

Every component and data flow is listed, including the paths nobody intended.
Every trust boundary is drawn, including the internal tool-output and sub-agent ones.
STRIDE has been walked per component rather than over the system as a whole.
The top threats have attack trees with concrete, checkable leaves.
Each leaf maps to an enforced control, an accepted risk, or a tracked gap.
There is a residual-risk section, and it states what you actually left open.
The output is re-runnable: change the design and the model changes with it.

Want a threat model before you ship?

We produce pre-deployment threat models and security design reviews for agentic systems on any stack. You get the structured report shown above, tailored to your architecture.

Get in touch Read: least privilege