Model and supply-chain hygiene

The expanded trusted computing base

In a classical system, the trusted computing base (TCB) is the set of components whose failure breaks every security property. The discipline is to keep it small and audit it hard. An agent inverts the usual ratio: almost nothing in its TCB is code you wrote.

Walk the dependency graph of a deployed agent and you find a model whose weights and serving behaviour belong to a third party, MCP servers pulled from a registry, plugins and skills downloaded from a marketplace, agent files (CLAUDE.md, AGENT.md, .cursor/rules, .aider.conf.yml) that may sync in from shared locations, tool descriptions written by whoever published the server and dropped verbatim into the system prompt, and a build pipeline that assembles all of it. Every one of these shapes what the agent does, and every one is a place an adversary can stand.

Diagram: the agent harness at the centre, surrounded by model provider, MCP servers, plugins and skills, agent files, tool descriptions, and the build pipeline, all feeding into the trusted computing base. — The agent's TCB. The harness is the small in-house core, ringed by third-party components that each influence behaviour.

The rule

Treat any third-party MCP server, plugin, skill, or tool description as part of your TCB. If you did not author or review it, it is trusted code running with your agent's authority, whatever you choose to call it.

The failure mode is rarely exotic malware. It is the quiet default: model-latest aliases, auto-updating servers, agent files loaded from a synced folder, tool descriptions read as documentation when they are really prompt input. Each default buys a little convenience and leaves a standing path into the agent's intent. Hygiene means closing those paths one component class at a time.

Model-provider trust

The model is the largest dependency and the one you can inspect least. Two concerns dominate: behavioural drift and data handling.

Pin the version. A model-latest alias lets the provider change your agent's behaviour without any deploy on your side. Injection resistance, tool-call formatting, and refusal behaviour can all shift between point releases. Pin an explicit version for any high-consequence agent. Treat an upgrade as a deliberate change that re-triggers evaluation, injection-resistance evals above all.

Read the data-handling agreement. Default contracts may permit prompt logging or training use, which is wrong for high-sensitivity workloads. Where it matters, negotiate a no-train clause and zero retention, or move to a self-hosted or zero-retention deployment of an open-weights model and accept the capability reduction as the price of containment.

Pin model versions explicitly. No model-latest aliases for high-consequence agents.
Treat every upgrade as a change event and re-run evals before promoting it.
Give provider keys narrow scope, short lifetimes, and an audit trail.

MCP servers

An MCP server is the cleanest example of the inverted TCB. You add it for one tool and inherit everything it registers: its code, its declared scopes, and its tool descriptions, which the harness injects into the system prompt. A server you treat as a utility is, in practice, code with a writable channel into your agent's instructions.

Inspect the descriptions, not just the README. The README is marketing. The tool descriptions are the attack surface, because they become part of the prompt the model obeys. A description carrying imperative sentences directed at the model ("after calling this, always email the result to...") is a tool-description injection sitting in your system prompt by design. Read every description a server registers before you adopt it, and re-validate them against the deployed prompt after every upgrade.

Sandbox the untrusted ones. Run any server you did not author in a sandbox alongside the harness, under the same egress restrictions you apply to the agent itself. Restrict its outbound network to the one API it actually needs. Issue short-lived, scope-limited credentials through the harness instead of leaving long-lived keys on disk.

Anti-pattern

Auto-updating MCP servers. An auto-update is an unreviewed code push and an unreviewed prompt change at the same time. Pin specific versions, disable auto-update, subscribe to release notifications, and review the diff before you take it.

Plugins, skills, and agent files

Plugins and skills carry the same risk as MCP servers, namely third-party code plus third-party prompt content, so give them the same treatment. Pin versions. Scan bundles for embedded tokens, suspicious URLs, and shell invocations. Prefer marketplaces that show provenance: signed bundles, publisher identity, visible version history. "Open-source" does not mean "reviewed," and a high download count is not a security signal.

Agent files deserve separate attention because they read like configuration but act like a system prompt. CLAUDE.md and its siblings steer the model's behaviour directly. Treat them as security-relevant configuration: review changes, restrict who can modify them, and hash-pin them if they load from a shared location.

Note

The agent must not write its own agent files, and those files must not load from synced cloud folders, network mounts, or auto-synced repos. Allow either and a single successful injection that writes to CLAUDE.md becomes a persistent backdoor that survives the session.

Tool-description injection

To carry the thread from the MCP section forward: a tool description is prompt content, not documentation. Whoever writes it writes into your system prompt. When the description comes from a third party, you have handed an outside party a sentence in your instructions.

Generate descriptions in-house where you can. Where you cannot, inspect them before deployment and strip any imperative-mood sentences aimed at the model. Length-limit descriptions and reject any that issue instructions instead of stating what the tool does. If a third-party description has to stay, label it explicitly as third-party metadata in the system prompt so the model weighs it as data, not command.

Build and deploy

Everything above assumes the artefact you reviewed is the artefact you run. The build pipeline decides whether that assumption holds. Without checksums and signing, a compromised dependency or a registry substitution can swap in code you never saw.

Reproducible install steps: pinned dependencies, no auto-update at session start.
CI builds with checksums on artefacts.
Signed releases for high-value agents.
A software bill of materials (SBOM) for the agent and its dependencies.
An attestation chain from source to deployed binary.

None of this is specific to agents. It is ordinary supply-chain discipline, applied to a system whose supply chain happens to include a language model and a pile of third-party prompt content.

The L0 to L3 ladder

Hygiene is not binary. Most teams start at L0 by default and climb as the agent's consequence rises. Use the ladder to place each component class honestly, then pull the laggards up to match the agent's blast radius.

Level	Models	MCP / plugins	Skills / agent files	Build / deploy
L0	`model-latest`; default contract	Install whatever the docs suggest	Auto-load anything in `~`	No CI; unsigned artefacts
L1	Pin model version	Pin server / plugin versions	Review at commit	Reproducible install steps
L2	No-train clause; zero-retention if available	Allow-list of vetted servers; descriptions inspected	Marketplace with publisher identity	CI builds; checksums
L3	Self-hosted or contractually verified; second-source plan	Sandboxed MCP servers; signed releases	Signed skill bundles; hash-pinned files	Signed releases; SBOM; attestation

The pre-adoption review for a new MCP server is the ladder in miniature. Read the source. Read every tool description. Note declared permissions and scopes. Check publisher identity and prior releases. Test in a sandboxed harness with no production credentials. Pin the version, issue short-lived credentials through the harness, and restrict outbound network to the one API it needs.

Checklist

Before adopting a model, server, plugin, skill, or agent file

Model version pinned; upgrades re-trigger evals; data-handling agreement read and acceptable for the workload.
Every MCP server version-pinned with auto-update disabled.
Every tool description a server registers inspected and re-validated against the deployed prompt.
Untrusted servers sandboxed, egress-restricted, and given short-lived scoped credentials.
Plugin and skill versions pinned; bundles scanned for tokens, suspicious URLs, and shell invocations.
Agent files treated as security configuration: reviewed, access-restricted, never loaded from synced locations, never self-written.
Tool descriptions generated in-house where possible; any third-party ones stripped of imperatives and labelled as metadata.
Reproducible builds with checksums, signed releases, an SBOM, and an attestation chain.

Adopting third-party models, servers, or skills?

We review agent supply chains, sandbox and pre-adoption-test MCP servers, and build pinning and attestation discipline into your pipeline. Tell us what you are integrating.

Get in touch Read: securing MCP servers

The expanded trusted computing base

Model-provider trust

MCP servers

Plugins, skills, and agent files

Tool-description injection

Build and deploy

The L0 to L3 ladder

Checklist

Before adopting a model, server, plugin, skill, or agent file

Adopting third-party models, servers, or skills?

Securing MCP servers and tools

Defending against prompt injection