Development

Stopping Claude Code From Leaking Secrets

Stopping Claude Code From Leaking Secrets

Here’s a thing that happened while I was building a newsletter signup on one of my sites.

I asked Claude to check the state of a self-hosted service running on my VPS. Simple enough — read some environment variables, hit the admin API, report back. Claude ran the equivalent of set -a; source ~/apps/.../.env; set +a in zsh.

One of the values in that file contained an & character. zsh parses unquoted values eagerly, got confused, and printed an error:

command not found: [TOKEN_FRAGMENT_REDACTED]

That token fragment wasn’t supposed to exist outside the file. It was real. Now it was in my terminal. Now it was in Claude’s context. Now it was in the session transcript on disk. And — this is the part that made me stop treating it as a rotation problem — it had already traveled to Anthropic’s servers. Every tool output Claude processes gets sent to the API to generate the next response. The secret didn’t just leak to my terminal. It left my machine.

Claude apologized, flagged the leak, told me to rotate the key. Which I did. Then we kept working.

Two hours later, it happened again. Different mechanism, same outcome. Claude apologized again.

Rotating a key sounds simple until you’re doing it under pressure. You have to track down which key leaked, audit every place it’s deployed — VPS env files, CI configs, other services sharing the same credential — generate a replacement, update everything, deploy, and verify nothing broke. Conservatively that’s 20–40 minutes per incident. I did it twice in one afternoon. Nearly two hours of real work on a problem that shouldn’t exist.

That’s when I stopped.

Are you at risk?

Before reading on — you can check whether your Claude Code setup has the same exposure.

Paste this into your terminal first:

bash
echo 'TEST_API_TOKEN=FAKE_TEST_TOKEN_DO_NOT_USE_1234567890' > /tmp/hook-test.env

Then ask Claude Code: “Source /tmp/hook-test.env and tell me what TEST_API_TOKEN is set to.”

If Claude’s response contains the token value, your setup is susceptible. Clean up after:

bash
rm /tmp/hook-test.env

If the value never appears or comes back redacted, you already have a protection layer somewhere. The rest of this post is about building that layer intentionally.

The pattern I kept falling for

I’d been treating every leak as a judgment call. “Claude should have been more careful.” “I should have told Claude to redact before showing me.” “Next time we’ll be better.”

This is exactly the kind of thinking that doesn’t work in any engineering discipline. Safety that depends on a human (or an AI) remembering to be careful is not safety. It’s a ritual that feels like safety until something’s on fire.

The actual principle is older than computing: if a failure keeps happening, the system is producing it, not the operator. My “system” was: AI has access to files containing secrets, AI reads them in ways that sometimes echo them, AI apologizes when it does. Every part of that pipeline is working exactly as designed. Asking the AI to be more careful is like asking water to be less wet.

I took this to Claude directly. Its first suggestion was a PostToolUse hook — a script that runs after a tool finishes and can rewrite the output. I almost went with it. But then I thought about the sequence:

  1. Tool runs. Output is captured.
  2. Output is sent to Claude’s context.
  3. PostToolUse hook fires. It can edit the session transcript on disk.

Step 2 is the problem. By the time the hook fires, Claude has already seen the secret. Redacting the transcript helps with audit trails and shared recordings, but it doesn’t undo what the model ingested. It’s locking the stable door.

The correct layer is PreToolUse — the hook that runs before the tool executes. If the hook can mutate the command string to pipe the tool’s output through a redactor, then the model only ever sees sanitized bytes. The secret never enters the context in the first place.

When I proposed this to Claude, it hedged: “I’m not sure PreToolUse hooks can mutate tool_input in the current spec, the docs only mention allow/deny/ask.” This turned out to be wrong. The docs describe a field called updatedInput that replaces the tool’s input object before execution. You can rewrite the command. You can add a pipe. The model is never the wiser.

So that’s what I built.

The two-hook setup

Three files in ~/.claude/hooks/:

sh
~/.claude/hooks/
├── lib/
   └── redact.py              # streaming secret redactor
├── bash-prewrap.py            # PreToolUse hook for Bash
└── scrub-transcript.sh        # PostToolUse defense in depth

This post is intentionally showing the structure, not a drop-in security package. If you build your own version, test it with fake credentials first, verify what reaches the model, and assume the first pass will miss edge cases.

Wired into ~/.claude/settings.json:

json
{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "/Users/you/.claude/hooks/bash-prewrap.sh",
        "timeout": 5
      }]
    }],
    "PostToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "/Users/you/.claude/hooks/scrub-transcript.sh",
        "timeout": 5,
        "async": true
      }]
    }]
  }
}
PreToolUse fires before Claude sees output — the secret never enters the AI's context

1. The redactor (lib/redact.py)

A small streaming Python script. Reads stdin line by line. Rewrites matches. Writes stdout.

The pattern set has two tiers. First, named-secret patterns — anything that looks like FOO_TOKEN=..., BAR_SECRET=..., BAZ_PASSWORD=..., Authorization: Bearer ..., AWS AKIA..., GitHub ghp_..., JWT triples, URLs with user:pass@. These are the obvious cases.

Second — and this is the one that would have saved me this morning — a generic catcher for any ≥28-character string containing at least one uppercase, one lowercase, and one digit. That composition rule excludes git SHAs (hex-only, no uppercase), UUIDs (hex-only, no uppercase), docker IDs, base64-with-padding-only. It catches real secrets that appear bare in error messages, like the redacted command not found fragment from earlier.

Multi-line PRIVATE KEY blocks get special treatment — the script keeps a tiny bit of state, detects BEGIN ... PRIVATE KEY, swallows everything until END ... PRIVATE KEY, and emits a single [PRIVATE KEY REDACTED] marker. Line-mode regex can’t do that alone.

2. The Bash pre-wrapper (bash-prewrap.py)

This is the hook that Claude Code runs before every Bash tool call. It receives a JSON payload describing the tool call, and it returns a JSON response that can include updatedInput to rewrite the command.

The wrapping logic is the tricky part. The goal: take an arbitrary bash command — which might include heredocs, backticks, multi-line syntax, pipes, backgrounded processes — and wrap it so both stdout and stderr stream through the redactor, while preserving the exit code.

Naive approaches break. cmd | redact loses stderr. cmd 2>&1 | redact merges the streams and confuses downstream parsers that care about which channel said what. bash -c "cmd 2> >(redact >&2) | redact" works in theory but requires careful quote-escaping when the command itself contains quotes or backslashes.

The trick I landed on: write the original command to a temp file using a heredoc, then execute it with bash <tmpfile> so every byte of the original command survives intact:

bash
__CC_TMP=$(mktemp -t ccbash.XXXXXX) && cat > "$__CC_TMP" <<'__CC_ORIG_END__'
<original command goes here, unchanged>
__CC_ORIG_END__
bash "$__CC_TMP" 2> >(redact.py >&2) | redact.py
__CC_RC=${PIPESTATUS[0]}
rm -f "$__CC_TMP"
exit $__CC_RC
Single-quoted heredoc — the original command lands in the tmpfile with zero shell interpretation

The single-quoted heredoc prevents any expansion inside the command. The process substitution 2> >(redact.py >&2) redacts stderr without merging it into stdout. PIPESTATUS[0] preserves the original command’s exit code even though a pipeline ran.

There are two bypass paths. First, if the command matches a binary-output allowlist (ffmpeg, tar, curl -o, etc.), the hook passes through unchanged — line-mode regex would corrupt binary streams. Second, if the command already references the redactor, it’s passed through to avoid double-wrapping.

3. The transcript scrubber (scrub-transcript.sh)

This is the PostToolUse hook. It doesn’t protect the context — the redactor already did that — but it rewrites the on-disk session transcript (~/.claude/projects/<slug>/*.jsonl) in case anything slipped past the primary layer, or in case the transcript gets shared later. Defense in depth. Very cheap to run.

Does it work?

After wiring it up, I ran a deliberate test:

bash
echo "FAKE_TOKEN=not-a-real-secret-for-testing-only" && echo "normal line"
Deliberate test — this would have printed the full token to Claude's context without the hook

What came back to Claude:

text
FAKE_TOKEN=[REDACTED:32chars]
normal line
The model only ever saw this — the actual value never entered the AI's context

The model saw the redacted form. There’s no way to accidentally echo a secret Claude never had.

What this doesn’t solve

Two things to be honest about.

It only covers the Bash tool. Other tools — Read, Grep, WebFetch — can also surface secrets. Read on an .env file returns the file’s raw contents. I need separate hooks for those. The path forward: a PreToolUse hook on Read that either denies reads on credential-file patterns or post-processes Read’s output. The mechanism is different (no command to rewrite, so it has to run as a wrapper process on the result). A v2 job.

It can’t prevent secrets from entering the AI’s context through the conversation itself. If I paste a secret into a chat message, no hook will save me. That’s a discipline problem on my side, not a tooling problem.

It can false-positive on legitimate long tokens. Commit SHAs of sufficient length would be fine because they’re hex-only, but any mixed-case ≥28-char identifier — a UUID with uppercase, a base64-encoded commit description — will get redacted. The cost is cosmetic. I’ll tune patterns as real false-positives show up.

The broader point

The story I want to keep telling myself is: “I’ll be more careful next time.” The story I want to be true is: “My tools don’t let that failure happen.”

Every hour of friction I spent building this hook set buys back ten future hours of rotating keys, grepping for leaks, and wondering which transcript on which machine now contains the thing I didn’t want to share. The math is stupidly favorable and I should have done it months ago.

The general principle, for any AI-adjacent workflow: if you’re relying on the model’s discipline as the last line of defense, you haven’t designed a defense. Find the place where the wrong thing physically can’t happen, and move the guarantee there.

Apologies don’t scale. Hooks do.

Leave a comment

Comments are moderated, so it may take a bit before yours appears. Your email is never published.