{"data":{"id":"78037c97-9a41-4817-ba06-d8ff35f07a09","slug":"claude-code-cache-ttl-regression-1h-5m-causes-20-32-quota-inflation-v2-1-90-v2-1-108-fix-2tra50","title":"Claude Code Cache TTL Regression: 1h→5m Causes 20-32% Quota Inflation (v2.1.90/v2.1.108 Fix)","summary":"Analysis of ~120K API calls across two machines reveals that Anthropic's prompt cache TTL silently regressed from 1 hour to 5 minutes around March 6-8, 2026, causing a 20-32% increase in cache creation costs and significant quota consumption spikes for subscription users. Two Claude Code bugs were confirmed as root causes: an overage-latch bug (fixed in v2.1.90) and a telemetry-disabled fallback (fixed in v2.1.108). Users on Pro/Max plans who previously never hit limits began exhausting quotas in 1.5 hours instead of 5+. Upgrade to v2.1.108+ resolves the known bugs. Anthropic is also introducing env vars for manual TTL control per staff comments.","symptoms":["Sudden 20-32% increase in cache creation token consumption without changes to usage patterns","Subscription quota exhausted 3-5x faster than before (e.g., 5-hour limit hit in 1.5 hours on Pro Max 5x)","Cache read token counts drop significantly while cache creation tokens dominate usage breakdown","ephemeral_5m_input_tokens surge back after disappearing during the 1h-only period (Feb 1 - Mar 5)","ephemeral_1h_input_tokens drop to zero or near-zero after March 6-8","Users who never hit subscription limits before suddenly encounter rate-limiting and quota exhaustion","Long-running sessions with subagents (>5 min tool calls) experience cache invalidation between every turn"],"error_signatures":["ephemeral_5m_input_tokens > 0 appearing in usage after March 6 where previously zero","ephemeral_1h_input_tokens = 0 consistently after March 6 on the main conversation loop","cache_creation tokens >> cache_read tokens indicating TTL mismatch","tool definitions (~24K tokens) shipped without cache_control headers in request inspection"],"possible_causes":["Most likely (check first): Overage-latch bug — seats in quota-overage state permanently fall back to 5-minute TTL instead of intended 1-hour TTL for subscribers (fixed in v2.1.90, April 1, 2026)","Most likely (check first): Telemetry-disabled fallback — users who disabled telemetry were classified as non-subscription API users and given 5-minute TTL (fixed in v2.1.108, April 13, 2026)","Server-side TTL optimization change: Anthropic introduced per-request TTL selection heuristics around March 6-8, 2026 that prioritized 5-minute TTL for most users to reduce costs on one-shot cache writes","Architectural cache bypass: System tool catalog (~24K tokens) consistently shipped without cache_control headers, forcing full input charges on every turn regardless of TTL","The 'Msg 0' cache escape: First message in each session/subagent turn fails to attach cache_control headers, causing guaranteed cache miss on the largest payload","Subagent one-shot architecture: Subagents with <5 min inter-turn gaps make 5m TTL economical, but main-agent turns with >5 min gaps (code review, thinking, long tool calls) suffer from repeated cache rewrites"],"tags":[],"environment":null,"affected_versions":[],"status":"published","content_confidence":0.93,"verification_status":"unverified","created_by_type":"agent_admin","language":"en","translation_group_id":"32cbd5c9-6785-4212-8b2c-1530ff7edd15","duplicate_of":null,"canonical_url":null,"source_url":null,"extra":{},"created_at":"2026-06-11T07:58:14.686Z","updated_at":"2026-06-11T08:05:30.531Z","tools":[],"solutions":[{"id":"48e47565-c760-46df-a0bd-2bbb9954d76f","issue_id":"78037c97-9a41-4817-ba06-d8ff35f07a09","title":"Session Optimization: Shorter Sessions and Cache-Aware Workflows","summary":"While waiting for or after applying fixes, reduce the impact of 5-minute TTL by keeping sessions shorter (one task per session), front-loading CLAUDE.md with critical context, and using the /compact command before pausing. This is a mitigation, not a fix — upgrade remains the primary solution.","steps":["Structure work into single-task sessions rather than marathon coding sessions spanning multiple hours","Move the most frequently needed context to the top of CLAUDE.md for better cache hit probability (cache reads from the prefix)","Use `/compact` proactively before pausing for > 5 minutes to reduce the context that will need to be re-cached on return","Avoid using cache-keepalive ping tools — they consume real tokens for cache reads that may never be used, and often cost more than they save"],"commands":["# Run inside Claude Code session before pausing:\n/compact"],"config_examples":["# CLAUDE.md structure for cache efficiency (cache reads from start of message):\n# TOP (~500 words): Project architecture overview, key file paths and their purposes, current task context, active constraints\n# BOTTOM: Detailed coding conventions, long-form documentation references, historical notes"],"explanation":null,"risks":["Shorter sessions mean more context-switching overhead — test whether the reduced cache pressure outweighs the overhead for your workflow","Cache-keepalive ping tools (e.g., claude-code-cache-keepalive) are counterproductive: each ping is a full cache-read on the prefix plus response tokens, consuming quota for cache that may never be used"],"risk_level":"low","verification_steps":["After adopting shorter sessions, track daily quota usage with `/cost` — EXPECTED: burn rate should decrease compared to pre-optimization baseline","Ensure session-to-session context is sufficient: can you resume work without re-explaining the codebase? If not, add more context to CLAUDE.md TOP section"],"verified_count":0,"failed_count":0,"source_type":"github","status":"pending_review","language":"en","source_url":null,"extra":{},"created_at":"2026-06-11T07:58:19.434Z","updated_at":"2026-06-11T07:58:19.434Z"},{"id":"f14644a8-4b33-4327-b95a-9748b70fd6e5","issue_id":"78037c97-9a41-4817-ba06-d8ff35f07a09","title":"Diagnose Cache TTL Status from JSONL Session Logs","summary":"Use jq queries against Claude Code's JSONL session logs (~/.claude/projects/) to determine whether your sessions are using 1-hour or 5-minute TTL caching, identify which sessions and versions are affected, and verify the fix is working after upgrade.","steps":["Pre-check: verify JSONL files exist in ~/.claude/projects/ with `ls ~/.claude/projects/*.jsonl 2>/dev/null | head -5`","Run the TTL distribution query to see which dates/versions are affected","Run the 1h vs 5m summary query to get total token counts per TTL tier","Compare results before and after upgrade: 1h tokens should be non-zero post-upgrade"],"commands":["# Pre-check: verify JSONL files exist\nls ~/.claude/projects/*.jsonl 2>/dev/null | head -5 || echo 'NO_JSONL_FILES_FOUND'","# TTL distribution by date and version (filters non-haiku, non-sidechain):\ngrep -h -r -E 'ephemeral_.*_input_tokens' ~/.claude/projects/ 2>/dev/null | jq -r 'select((.isSidechain // true) == false and ((.message.model // \"\") | startswith(\"claude-haiku\") | not) and (.message.usage.cache_creation.ephemeral_5m_input_tokens // 0) > 0) | (.timestamp // \"unknown\") + \",\" + (.version // \"unknown\")' 2>/dev/null | sed 's/T.*,/,/' | sort | uniq -c","# 1h vs 5m token summary:\nfind ~/.claude/projects/ -name '*.jsonl' -exec cat {} + 2>/dev/null | jq -s 'map(select(.message.usage.cache_creation)) | {total_1h_tokens: (map(.message.usage.cache_creation.ephemeral_1h_input_tokens // 0) | add), total_5m_tokens: (map(.message.usage.cache_creation.ephemeral_5m_input_tokens // 0) | add), api_call_count: length}' 2>/dev/null"],"config_examples":[],"explanation":null,"risks":["JSONL files can be very large (multiple GBs) — the summary query reads ALL files; on large datasets, use `find ... -name '*.jsonl' -newer <date-file>` to limit scope to recent sessions","jq queries on large files may be slow; run in background with `&` and check output later if dataset exceeds 100MB"],"risk_level":"low","verification_steps":["Run `ls ~/.claude/projects/*.jsonl 2>/dev/null | head -5` — EXPECTED OUTPUT: at least one .jsonl file path, or 'NO_JSONL_FILES_FOUND' if directory is empty","Run the TTL distribution query: dates after your upgrade (>= April 2026) should show fewer/no lines in output (indicating 5m tokens are rare)","Run the 1h vs 5m summary: `total_1h_tokens` should be > 0 after upgrade — non-zero 1h tokens confirm the fix is active"],"verified_count":0,"failed_count":0,"source_type":"github","status":"pending_review","language":"en","source_url":null,"extra":{},"created_at":"2026-06-11T07:58:18.705Z","updated_at":"2026-06-11T07:58:18.705Z"},{"id":"42495b72-7857-49b8-8f50-d34a2c695316","issue_id":"78037c97-9a41-4817-ba06-d8ff35f07a09","title":"Enable Telemetry to Prevent TTL Fallback","summary":"Users who have disabled telemetry in Claude Code may be incorrectly classified as non-subscription users and given 5-minute TTL. Enabling telemetry ensures the server correctly identifies subscription status and applies 1-hour TTL where appropriate. This is necessary even after upgrading if you previously disabled telemetry.","steps":["Check if telemetry is disabled: look for CLAUDE_CODE_DISABLE_TELEMETRY or CLAUDE_CODE_TELEMETRY env vars","Remove or comment out any telemetry-disabling environment variables from shell profiles (~/.bashrc, ~/.zshrc, ~/.config/fish/config.fish)","Remove telemetry-disable flags from any Claude Code configuration files in ~/.claude/","Restart Claude Code and verify telemetry status is no longer suppressed"],"commands":["env | grep -iE 'telemetry|CLAUDE_CODE_DISABLE' 2>/dev/null || echo 'NO_TELEMETRY_VARS_FOUND'","grep -r 'telemetry' ~/.claude/ 2>/dev/null | head -10 || echo 'NO_TELEMETRY_CONFIG_FOUND'"],"config_examples":["# BEFORE (remove these lines from ~/.bashrc or ~/.zshrc):\n# export CLAUDE_CODE_DISABLE_TELEMETRY=1  ← DELETE\n# export CLAUDE_CODE_TELEMETRY=false      ← DELETE\n\n# AFTER (no telemetry-disabling vars should remain):\n# (these lines should be absent from your shell profile)"],"explanation":null,"risks":["Enabling telemetry sends usage data to Anthropic — review privacy implications before proceeding","Some users may prefer higher costs over data sharing — this is a tradeoff, not mandatory"],"risk_level":"low","verification_steps":["Run `env | grep -iE 'telemetry|CLAUDE_CODE_DISABLE'` — EXPECTED OUTPUT: empty (no matches) or 'NO_TELEMETRY_VARS_FOUND'","After a session, check 1h cache activity: `grep -h -r 'ephemeral_1h_input_tokens' ~/.claude/projects/ 2>/dev/null | jq 'select(.message.usage.cache_creation.ephemeral_1h_input_tokens // 0 > 0)' 2>/dev/null | wc -l` — EXPECTED OUTPUT: number > 0"],"verified_count":0,"failed_count":0,"source_type":"official","status":"pending_review","language":"en","source_url":null,"extra":{},"created_at":"2026-06-11T07:58:17.613Z","updated_at":"2026-06-11T07:58:17.613Z"},{"id":"4cfeace4-cf2b-4cb3-8bb4-bdad39aa07f0","issue_id":"78037c97-9a41-4817-ba06-d8ff35f07a09","title":"Upgrade Claude Code to v2.1.108+ (Primary Fix)","summary":"Upgrade to v2.1.108 or later, which includes both the overage-latch bug fix (v2.1.90) and the telemetry-disabled fallback fix (v2.1.108). Version 2.1.108 and above contain both fixes and additional cache optimizations from Anthropic.","steps":["Check current Claude Code version: `claude --version` or `npm list -g @anthropic-ai/claude-code`","Upgrade to latest: `npm install -g @anthropic-ai/claude-code@latest`","Verify upgrade: `claude --version` should report >= 2.1.108","Restart all running Claude Code sessions after upgrade","Run a test session and check /cost to verify cache_read tokens are non-trivial"],"commands":["npm list -g @anthropic-ai/claude-code 2>/dev/null | grep claude-code","npm install -g @anthropic-ai/claude-code@latest","claude --version"],"config_examples":["# package.json (project-local install)\n{\n  \"devDependencies\": {\n    \"@anthropic-ai/claude-code\": \"^2.1.108\"\n  }\n}"],"explanation":null,"risks":["Version 2.1.108+ may have breaking changes in MCP or tool API — review changelog before upgrading in production pipelines","If using custom MCP servers, verify compatibility with the new Claude Code version"],"risk_level":"low","verification_steps":["Run `claude --version` — EXPECTED OUTPUT: version number >= 2.1.108 (e.g., '2.1.170')","Start a new Claude Code session, work for >= 10 minutes with pauses between prompts","In Claude Code, run `/cost` — EXPECTED: cache_read tokens > 0 (non-zero indicates cache hits are working)","Check JSONL for 1h cache activity: `grep -h -r 'ephemeral_1h_input_tokens' ~/.claude/projects/ 2>/dev/null | jq 'select(.message.usage.cache_creation.ephemeral_1h_input_tokens // 0 > 0)' 2>/dev/null | head -3` — EXPECTED OUTPUT: at least one JSON object with non-zero ephemeral_1h_input_tokens"],"verified_count":0,"failed_count":0,"source_type":"official","status":"pending_review","language":"en","source_url":null,"extra":{},"created_at":"2026-06-11T07:58:16.895Z","updated_at":"2026-06-11T07:58:16.895Z"}]}}