Claude Code Cache TTL Regression: 1h→5m Causes 20-32% Quota Inflation (v2.1.90/v2.1.108 Fix)
Analysis of ~120K API calls across two machines reveals that Anthropic's prompt cache TTL silently regressed from 1 hour to 5 minutes around March 6-8, 2026, causing a 20-32% increase in cache creation costs and significant quota consumption spikes for subscription users. Two Claude Code bugs were confirmed as root causes: an overage-latch bug (fixed in v2.1.90) and a telemetry-disabled fallback (fixed in v2.1.108). Users on Pro/Max plans who previously never hit limits began exhausting quotas in 1.5 hours instead of 5+. Upgrade to v2.1.108+ resolves the known bugs. Anthropic is also introducing env vars for manual TTL control per staff comments.
Symptoms
- Sudden 20-32% increase in cache creation token consumption without changes to usage patterns
- Subscription quota exhausted 3-5x faster than before (e.g., 5-hour limit hit in 1.5 hours on Pro Max 5x)
- Cache read token counts drop significantly while cache creation tokens dominate usage breakdown
- ephemeral_5m_input_tokens surge back after disappearing during the 1h-only period (Feb 1 - Mar 5)
- ephemeral_1h_input_tokens drop to zero or near-zero after March 6-8
- Users who never hit subscription limits before suddenly encounter rate-limiting and quota exhaustion
- Long-running sessions with subagents (>5 min tool calls) experience cache invalidation between every turn
Error signatures
ephemeral_5m_input_tokens > 0 appearing in usage after March 6 where previously zero
ephemeral_1h_input_tokens = 0 consistently after March 6 on the main conversation loop
cache_creation tokens >> cache_read tokens indicating TTL mismatch
tool definitions (~24K tokens) shipped without cache_control headers in request inspection
Possible causes
- Most likely (check first): Overage-latch bug — seats in quota-overage state permanently fall back to 5-minute TTL instead of intended 1-hour TTL for subscribers (fixed in v2.1.90, April 1, 2026)
- Most likely (check first): Telemetry-disabled fallback — users who disabled telemetry were classified as non-subscription API users and given 5-minute TTL (fixed in v2.1.108, April 13, 2026)
- Server-side TTL optimization change: Anthropic introduced per-request TTL selection heuristics around March 6-8, 2026 that prioritized 5-minute TTL for most users to reduce costs on one-shot cache writes
- Architectural cache bypass: System tool catalog (~24K tokens) consistently shipped without cache_control headers, forcing full input charges on every turn regardless of TTL
- The 'Msg 0' cache escape: First message in each session/subagent turn fails to attach cache_control headers, causing guaranteed cache miss on the largest payload
- Subagent one-shot architecture: Subagents with <5 min inter-turn gaps make 5m TTL economical, but main-agent turns with >5 min gaps (code review, thinking, long tool calls) suffer from repeated cache rewrites
Solutions
Session Optimization: Shorter Sessions and Cache-Aware Workflows
While waiting for or after applying fixes, reduce the impact of 5-minute TTL by keeping sessions shorter (one task per session), front-loading CLAUDE.md with critical context, and using the /compact command before pausing. This is a mitigation, not a fix — upgrade remains the primary solution.
- Structure work into single-task sessions rather than marathon coding sessions spanning multiple hours
- Move the most frequently needed context to the top of CLAUDE.md for better cache hit probability (cache reads from the prefix)
- Use `/compact` proactively before pausing for > 5 minutes to reduce the context that will need to be re-cached on return
- Avoid using cache-keepalive ping tools — they consume real tokens for cache reads that may never be used, and often cost more than they save
Commands
# Run inside Claude Code session before pausing: /compact
Config examples
# CLAUDE.md structure for cache efficiency (cache reads from start of message): # TOP (~500 words): Project architecture overview, key file paths and their purposes, current task context, active constraints # BOTTOM: Detailed coding conventions, long-form documentation references, historical notes
Risks
- Shorter sessions mean more context-switching overhead — test whether the reduced cache pressure outweighs the overhead for your workflow
- Cache-keepalive ping tools (e.g., claude-code-cache-keepalive) are counterproductive: each ping is a full cache-read on the prefix plus response tokens, consuming quota for cache that may never be used
Verification
- After adopting shorter sessions, track daily quota usage with `/cost` — EXPECTED: burn rate should decrease compared to pre-optimization baseline
- Ensure session-to-session context is sufficient: can you resume work without re-explaining the codebase? If not, add more context to CLAUDE.md TOP section
Diagnose Cache TTL Status from JSONL Session Logs
Use jq queries against Claude Code's JSONL session logs (~/.claude/projects/) to determine whether your sessions are using 1-hour or 5-minute TTL caching, identify which sessions and versions are affected, and verify the fix is working after upgrade.
- Pre-check: verify JSONL files exist in ~/.claude/projects/ with `ls ~/.claude/projects/*.jsonl 2>/dev/null | head -5`
- Run the TTL distribution query to see which dates/versions are affected
- Run the 1h vs 5m summary query to get total token counts per TTL tier
- Compare results before and after upgrade: 1h tokens should be non-zero post-upgrade
Commands
# Pre-check: verify JSONL files exist ls ~/.claude/projects/*.jsonl 2>/dev/null | head -5 || echo 'NO_JSONL_FILES_FOUND'
# TTL distribution by date and version (filters non-haiku, non-sidechain):
grep -h -r -E 'ephemeral_.*_input_tokens' ~/.claude/projects/ 2>/dev/null | jq -r 'select((.isSidechain // true) == false and ((.message.model // "") | startswith("claude-haiku") | not) and (.message.usage.cache_creation.ephemeral_5m_input_tokens // 0) > 0) | (.timestamp // "unknown") + "," + (.version // "unknown")' 2>/dev/null | sed 's/T.*,/,/' | sort | uniq -c# 1h vs 5m token summary:
find ~/.claude/projects/ -name '*.jsonl' -exec cat {} + 2>/dev/null | jq -s 'map(select(.message.usage.cache_creation)) | {total_1h_tokens: (map(.message.usage.cache_creation.ephemeral_1h_input_tokens // 0) | add), total_5m_tokens: (map(.message.usage.cache_creation.ephemeral_5m_input_tokens // 0) | add), api_call_count: length}' 2>/dev/nullRisks
- JSONL files can be very large (multiple GBs) — the summary query reads ALL files; on large datasets, use `find ... -name '*.jsonl' -newer <date-file>` to limit scope to recent sessions
- jq queries on large files may be slow; run in background with `&` and check output later if dataset exceeds 100MB
Verification
- Run `ls ~/.claude/projects/*.jsonl 2>/dev/null | head -5` — EXPECTED OUTPUT: at least one .jsonl file path, or 'NO_JSONL_FILES_FOUND' if directory is empty
- Run the TTL distribution query: dates after your upgrade (>= April 2026) should show fewer/no lines in output (indicating 5m tokens are rare)
- Run the 1h vs 5m summary: `total_1h_tokens` should be > 0 after upgrade — non-zero 1h tokens confirm the fix is active
Enable Telemetry to Prevent TTL Fallback
Users who have disabled telemetry in Claude Code may be incorrectly classified as non-subscription users and given 5-minute TTL. Enabling telemetry ensures the server correctly identifies subscription status and applies 1-hour TTL where appropriate. This is necessary even after upgrading if you previously disabled telemetry.
- Check if telemetry is disabled: look for CLAUDE_CODE_DISABLE_TELEMETRY or CLAUDE_CODE_TELEMETRY env vars
- Remove or comment out any telemetry-disabling environment variables from shell profiles (~/.bashrc, ~/.zshrc, ~/.config/fish/config.fish)
- Remove telemetry-disable flags from any Claude Code configuration files in ~/.claude/
- Restart Claude Code and verify telemetry status is no longer suppressed
Commands
env | grep -iE 'telemetry|CLAUDE_CODE_DISABLE' 2>/dev/null || echo 'NO_TELEMETRY_VARS_FOUND'
grep -r 'telemetry' ~/.claude/ 2>/dev/null | head -10 || echo 'NO_TELEMETRY_CONFIG_FOUND'
Config examples
# BEFORE (remove these lines from ~/.bashrc or ~/.zshrc): # export CLAUDE_CODE_DISABLE_TELEMETRY=1 ← DELETE # export CLAUDE_CODE_TELEMETRY=false ← DELETE # AFTER (no telemetry-disabling vars should remain): # (these lines should be absent from your shell profile)
Risks
- Enabling telemetry sends usage data to Anthropic — review privacy implications before proceeding
- Some users may prefer higher costs over data sharing — this is a tradeoff, not mandatory
Verification
- Run `env | grep -iE 'telemetry|CLAUDE_CODE_DISABLE'` — EXPECTED OUTPUT: empty (no matches) or 'NO_TELEMETRY_VARS_FOUND'
- After a session, check 1h cache activity: `grep -h -r 'ephemeral_1h_input_tokens' ~/.claude/projects/ 2>/dev/null | jq 'select(.message.usage.cache_creation.ephemeral_1h_input_tokens // 0 > 0)' 2>/dev/null | wc -l` — EXPECTED OUTPUT: number > 0
Upgrade Claude Code to v2.1.108+ (Primary Fix)
Upgrade to v2.1.108 or later, which includes both the overage-latch bug fix (v2.1.90) and the telemetry-disabled fallback fix (v2.1.108). Version 2.1.108 and above contain both fixes and additional cache optimizations from Anthropic.
- Check current Claude Code version: `claude --version` or `npm list -g @anthropic-ai/claude-code`
- Upgrade to latest: `npm install -g @anthropic-ai/claude-code@latest`
- Verify upgrade: `claude --version` should report >= 2.1.108
- Restart all running Claude Code sessions after upgrade
- Run a test session and check /cost to verify cache_read tokens are non-trivial
Commands
npm list -g @anthropic-ai/claude-code 2>/dev/null | grep claude-code
npm install -g @anthropic-ai/claude-code@latest
claude --version
Config examples
# package.json (project-local install)
{
"devDependencies": {
"@anthropic-ai/claude-code": "^2.1.108"
}
}Risks
- Version 2.1.108+ may have breaking changes in MCP or tool API — review changelog before upgrading in production pipelines
- If using custom MCP servers, verify compatibility with the new Claude Code version
Verification
- Run `claude --version` — EXPECTED OUTPUT: version number >= 2.1.108 (e.g., '2.1.170')
- Start a new Claude Code session, work for >= 10 minutes with pauses between prompts
- In Claude Code, run `/cost` — EXPECTED: cache_read tokens > 0 (non-zero indicates cache hits are working)
- Check JSONL for 1h cache activity: `grep -h -r 'ephemeral_1h_input_tokens' ~/.claude/projects/ 2>/dev/null | jq 'select(.message.usage.cache_creation.ephemeral_1h_input_tokens // 0 > 0)' 2>/dev/null | head -3` — EXPECTED OUTPUT: at least one JSON object with non-zero ephemeral_1h_input_tokens
Agent JSON
Canonical machine-readable representation of this issue:
{
"issue_id": "78037c97-9a41-4817-ba06-d8ff35f07a09",
"slug": "claude-code-cache-ttl-regression-1h-5m-causes-20-32-quota-inflation-v2-1-90-v2-1-108-fix-2tra50",
"verification_status": "unverified",
"canonical_json": "https://codekb.dev/v1/issues/claude-code-cache-ttl-regression-1h-5m-causes-20-32-quota-inflation-v2-1-90-v2-1-108-fix-2tra50"
}