Secret management for agents

Secrets are delegated authority

A secret is not data the agent holds. It is authority the agent inherits. Once an API key, OAuth token, or database credential is within reach, every system that key unlocks is reachable by whatever the model decides to do next, including whatever an injected instruction decides for it. Treat each credential as a delegation of your own access and three design questions follow. How broad is the grant? How long does it last? Where can it leak on the way through?

The answer to all three is one posture. Authority should be scoped to one repo, one channel, one table. It should expire in minutes rather than months. It should be redacted from anything that gets logged or sent to the model, and it should never enter the prompt context. An agent does not need to see a secret to use it. The credential belongs at the tool boundary, where the harness injects it into the outbound request and the model never reads it.

The shift that matters

A leaked secret in a chatbot is an embarrassment. In an agent it is a standing grant of authority that an attacker can replay long after the conversation ends. Severity scales with what the credential can reach. How the agent was tricked into spilling it barely matters.

Diagram: a three-rung ladder from environment variables at the floor, through password manager and KMS, up to just-in-time vault issuance scoped to a single tool call. — The secrets ladder. Each rung shortens credential lifetime and narrows scope, up to per-tool-call issuance at the top.

Five places secrets leak

Before choosing where to store a credential, know where it escapes. Five locations account for nearly every agent secret leak. Four of them are specific to how agents operate.

The environment. OPENAI_API_KEY=sk-... in a .bashrc, on a CI runner, in a checked-in .env. Long-lived, broadly scoped, rarely rotated.
The agent's own context. Secrets pasted into system prompts, agent files (CLAUDE.md, AGENT.md, .cursor/rules), or pulled in through retrieval. These travel on every model call.
Tool arguments. slack_post_message(token="xoxb-...") lands in the transcript, the harness audit trail, and the model provider's logs.
Tool outputs and fetched content. A page returns a secret in cleartext and the agent dutifully repeats it in a summary, carrying it downstream.
Persistent files. Logs, transcripts, scratch files. Every secret the agent has ever touched, sitting on disk waiting to be read.

The first leak is the classic ops problem. The other four are the agent-specific surface, and they are why the rest of this guide exists. Store the secret well and you have still solved nothing if it ends up in the context window or a log line.

L1 · Environment variables

Environment variables are the floor, not a destination. The goal here is narrow: no cleartext secret ever written somewhere durable. That is the minimum, and a large fraction of real deployments do not reach it.

No cleartext secrets in workspaces or configuration files.
No secrets in checked-in code, dotfiles in repos, agent files, or test fixtures.
.env files never committed to version control.
No secrets in skill files, plugin configs, MCP server configs, or scheduled-task scripts.
Per-host environment files (.envrc, a systemd unit) populated at boot from a more secure store.
CI/CD secrets injected at runtime under a strict no-echo policy.

Warn

An export OPENAI_API_KEY=sk-... in .bashrc is the single most common agent secret leak. It is long-lived, machine-wide, readable by every process the user runs, and it lands in shell history. If you find one, treat the key as already compromised. Rotate it, then move it up the ladder.

L2 · Password manager or KMS

One rung up, the secret never touches the host filesystem. The harness pulls it from a managed store into the process environment at session start, uses it, and drops it when the process exits. The credential lives in 1Password, Bitwarden, Vaultwarden, or a cloud KMS. The agent launch command is wrapped so the value exists only in memory for the life of the run.

No cleartext secrets anywhere on the host filesystem, including the locations L1 missed.
Secrets pulled into the process environment at session start, never written to disk.
op run (1Password CLI) wraps each agent's launch command, with a per-agent scoped set.
aws-vault exec for AWS, issuing short-lived STS tokens through the OS keychain.
gcloud auth application-default print-access-token for GCP, minting short-lived OAuth tokens on demand.

L2 is the right baseline for most teams. It removes the durable artefact and gives you one place to scope and audit what each agent can reach. The weakness that remains is lifetime. A token pulled at session start is valid for the whole session, so a leak mid-run stays replayable until it expires.

L3 · Just-in-time vault issuance

The top rung issues credentials at tool-call time and discards them right after. A vault (HashiCorp Vault, Akeyless, AWS Secrets Manager with IAM dynamic credentials) mints a credential scoped to one action, valid for minutes, tied to the specific call. The Policy Enforcement Point requests it, embeds it in the outbound request, and throws it away. The model never sees a secret, because there is no standing secret to see.

Database access: a 5-minute Postgres role per session, revoked at session end.
Source control: a fine-grained PAT, single repo, 1-hour expiry.
Cloud access: STS or OIDC federation, 15-minute tokens scoped to a single resource.
A per-tool-call credential the PEP requests, embeds, and discards.
Every issuance traceable to the agent, the tool call, and the resulting action.

This is where you want any agent with reach into production. A leaked credential is worthless within minutes, the blast radius is one resource, and the audit trail reconstructs exactly which call issued what. The cost is operational. You need the vault, the dynamic backends, and a PEP disciplined enough to request and discard on every call.

Note

The ladder is cumulative, not a menu. L3 assumes L1 and L2 already hold: no cleartext on disk, nothing in the prompt, nothing in a committed file. Just-in-time issuance does not save you if the agent still logs the credential it was handed.

The three rungs compared:

Rung	Where the secret lives	Lifetime	Blast radius
L1 · Environment	Process env, populated at boot from a secure store	Until rotated (rarely)	Everything the key grants
L2 · Manager / KMS	Pulled to memory at session start, never on disk	The whole session	Per-agent scoped set
L3 · JIT vault	Issued per tool call, discarded after	Minutes	One resource, one action

Output and argument redaction

Storing a secret well does nothing if the agent prints it. Redaction is the filter between the agent and everything that persists or gets sent onward. Run it as a PostToolUse hook on outputs and a PreToolUse hook on arguments. Do not rely on the model to police itself.

Redact known API-key formats: sk-..., xoxb-..., gho_..., AKIA....
Redact JWT-like strings and content following Authorization: Bearer.
Redact long base64 blobs above a configurable length threshold.
Redact organisation-specific secret patterns you know exist in your own systems.
Use the pattern sets from detect-secrets, gitleaks, or trufflehog.
Replace rather than strip, so the leak is visible in review: sk-...abcd becomes <REDACTED:openai_api_key>.
Redact tool arguments too (PreToolUse), not only outputs.

Warn

Asking the model to "never reveal secrets" is a request, not a control. Redaction belongs in deterministic code at the tool boundary. A prompt-level rule fails the first time an injection, or a verbose debug path, routes a credential around it.

Per-agent identity

Shared credentials destroy attribution and widen every blast radius. When two agents use the same token, a compromise of either is a compromise of both, and the audit log cannot tell you which one acted. Give every agent its own identity, scoped to exactly what that agent does.

No shared credentials between agents, ever.
GitHub: a fine-grained PAT, one repo, specific permissions, 30-day expiry or less.
Slack: per-bot OAuth, channel-scoped, rotated quarterly.
Cloud: a per-agent IAM role via OIDC federation, no long-lived access keys.
Database: a per-agent role with access to just the tables and columns it needs.
No tokens sitting in the agent's filesystem.

Per-agent identity is what makes the rotation and revocation drill below tractable. If you can name which agent holds which credential, you can cut any one of them without guessing what else breaks.

Rotation and revocation

Every secret has a half-life. Rotation bounds how long a quiet compromise stays useful. Revocation is the emergency stop. Both have to be automated and both have to be drilled, because a kill switch you have never pulled is a kill switch you do not actually have.

Long-lived tokens rotated every 30 days; webhook secrets every 90 days.
Rotation automated, not a manual calendar reminder.
Revocation wired as a kill switch: one command, a single source of truth.
Any secret that has appeared in a log, transcript, or tool output is rotated immediately, not eventually.

Run the revocation drill

Ask the question out loud before you need the answer: "What breaks if I revoke the coding agent's GitHub token right now?" If you cannot list the consequences, you do not control that credential. You are hoping nothing depends on it. Pull the switch in a controlled window, watch what fails, and fix the surprises before an incident finds them for you.

Checklist

Before handing an agent any credential

No cleartext secrets in workspaces, config, dotfiles, agent files, or committed .env files.
Secrets pulled from a manager or KMS into memory at session start, never written to disk.
Production-reaching agents use just-in-time vault issuance, scoped per tool call, minutes-long.
Nothing secret in the prompt context: not the system prompt, not agent files, not retrieval.
Redaction hooks on tool arguments (PreToolUse) and outputs (PostToolUse), in code, not in the prompt.
Every agent has its own scoped identity; no credential shared between agents.
Rotation automated; anything seen in a log or transcript rotated immediately.
You have run the revocation drill and know what each kill switch breaks.

Wiring secrets into a production agent?

We review credential handling, design just-in-time issuance, and pressure-test the leak paths before they reach production. Tell us what you're building.

Get in touch Read: least-privilege agents