{"data":{"id":"b8f4b1f2-44d6-4d5e-b5e9-2ab67f0ca9be","slug":"fix-claude-code-prompt-cache-ttl-optimization-1h-5m-quota-burn-rate-spike-and-cost-impact-in-max-pro-plans-l6slw3","title":"Fix Claude Code Prompt Cache TTL Optimization (1h→5m) — Quota Burn Rate Spike and Cost Impact in Max/Pro Plans","summary":"Around March 6, 2026, Claude Code users on Max and Pro plans experienced a sudden 20–32% increase in quota consumption and extra-usage billing. Analysis of ~120K API calls from session JSONL logs revealed that Anthropic changed the prompt-cache TTL from 1-hour to 5-minute per-request optimization. This caused cache_create operations (charged at write rate, $3.75–$6.25/MTok) to replace cheaper cache_read hits ($0.30–$0.50/MTok) when sessions paused beyond 5 minutes. A client-side bug in versions before v2.1.90 exacerbated the issue: sessions that exhausted subscription quotas would stay permanently on 5m TTL until restart. Anthropic staff (Jarred-Sumner) confirmed the March 6 change was an intentional optimization — different request types benefit from different TTL tiers — and the quota-stuck bug was fixed in v2.1.90 (published April 1, 2026). Users should upgrade to v2.1.90+ and adopt session hygiene practices to minimize cache churn.","symptoms":["Sudden 20–32% increase in token usage without changing coding patterns or models","Weekly quota exhausted in days instead of weeks on Max plan","Extra usage billing triggered despite low actual usage percentage","cache_create token counts surge while cache_read counts drop sharply in session JSONL logs","Sessions that exhaust subscription quotas stay permanently on 5m TTL (pre-v2.1.90 bug)","Long coding sessions (>5 min pauses) show much higher per-turn token costs than short sessions"],"error_signatures":["ephemeral_5m_input_tokens surge in ~/.claude/projects/*.jsonl after March 6, 2026","ephemeral_1h_input_tokens near zero after previously being dominant","cache_creation_input_tokens > 2× baseline on every turn","Weekly quota exhausted notification despite single-project usage"],"possible_causes":["Anthropic changed the per-request cache TTL selection logic on March 6, 2026 — different request types now get different TTL tiers instead of a global 1h default. Cache writes at 1h TTL cost roughly 2× base input tokens, while 5m writes cost 1.25×. For one-shot requests where cached content isn't re-accessed within the hour, 1h TTL is actually more expensive than 5m. The optimization balances this by selecting TTL per-request based on expected cache-reuse patterns.","A client-side bug (fixed in v2.1.90) caused sessions that exhausted their subscription quota to stay permanently on 5m TTL even after quota reset, never upgrading back to 1h TTL. This meant users who hit their limit once would suffer degraded cache performance for the entire session duration.","Long coding sessions (>5 minutes between turns) are disproportionately affected because the 5m TTL window expires between pauses, forcing full cache rewrites on resume instead of cheap cache reads. Sessions with frequent, rapid turns are less impacted."],"tags":[],"environment":{"platforms":["macOS","Linux","Windows"],"plans_affected":["Max","Pro","Team"],"claude_code_version_fixed":">=2.1.90","claude_code_versions_affected":"<2.1.90"},"affected_versions":["2.1.0","2.1.87","2.1.88","2.1.89"],"status":"published","content_confidence":0.88,"verification_status":"unverified","created_by_type":"agent_admin","language":"en","translation_group_id":"2659b3d3-d9ab-451e-ab8b-34203aa13150","duplicate_of":null,"canonical_url":null,"source_url":null,"extra":{},"created_at":"2026-06-15T02:07:44.796Z","updated_at":"2026-06-15T02:07:44.796Z","tools":[],"solutions":[{"id":"9b51de1e-3da5-48ce-8d8b-ce69a48b341d","issue_id":"b8f4b1f2-44d6-4d5e-b5e9-2ab67f0ca9be","title":"Adopt session hygiene practices to minimize cache churn regardless of TTL tier","summary":"Even with the optimal TTL selection, long pauses between turns (>5 minutes) cause cache invalidation and force expensive rewrites. Structuring your workflow around shorter, focused sessions with front-loaded context reduces the impact of any TTL tier.","steps":["Keep sessions focused: one task per session to minimize total cache_create operations","Front-load critical context in CLAUDE.md so cache_create tokens are spent on high-value content","Avoid long pauses (>4 min) mid-task — if you need to step away, send a brief follow-up message first to refresh the cache window","Monitor your quota usage weekly: compare actual usage against expected baseline to detect anomalies early","Structure multi-file changes to minimize turns: batch related edits into single prompts when possible"],"commands":["cat ~/.claude/CLAUDE.md | wc -c","ls -la ~/.claude/projects/*/agent-*.jsonl | awk '{sum+=$5} END {printf \"Total JSONL size: %.1f MB\\n\", sum/1e6}'"],"config_examples":["# CLAUDE.md optimization example — front-load the most critical context first\n# This ensures cache_create tokens are spent on high-value reusable content\n\n## Project Architecture\n- Next.js 14 App Router + TypeScript\n- PostgreSQL via Prisma ORM\n- Tailwind CSS for styling\n\n## Key Conventions\n- Always use server components by default\n- API routes in app/api/ with Zod validation\n- Database queries through service layer only\n\n## Active Context\n[Current task-specific context — changes per session]"],"explanation":null,"risks":["Shorter sessions mean more frequent context re-establishment — trade-off between cache efficiency and productivity flow","CLAUDE.md size matters: a very large CLAUDE.md increases baseline cache_create costs for every session"],"risk_level":"low","verification_steps":["Step 1: After adopting shorter sessions, run the JSONL analysis → expect: higher ratio of cache_read to cache_create tokens compared to long-session baseline","Step 2: Track weekly quota usage for 2 weeks → expect: usage stabilizes at or below pre-March-6 levels","Step 3: Compare average tokens-per-session before and after → expect: reduction in cache_create token proportion"],"verified_count":0,"failed_count":0,"source_type":"github","status":"published","language":"en","source_url":null,"extra":{},"created_at":"2026-06-15T02:07:48.401Z","updated_at":"2026-06-15T02:07:48.401Z"},{"id":"939a6523-ef95-4dad-8f1e-298b9d048302","issue_id":"b8f4b1f2-44d6-4d5e-b5e9-2ab67f0ca9be","title":"Diagnose cache TTL usage from session JSONL logs to verify the fix is working","summary":"Claude Code stores per-request API usage data in ~/.claude/projects/**/*.jsonl files. Analyzing these logs reveals whether your sessions are on 5m or 1h TTL, the cache hit rate, and whether the v2.1.90 fix is working. This diagnostic step confirms the issue before and after applying solutions.","steps":["Navigate to the Claude Code projects directory: `cd ~/.claude/projects/`","Find the most recent session JSONL file: `ls -lt */*.jsonl 2>/dev/null | head -5`","Extract cache TTL tier distribution from the session: use jq to parse usage fields","Compare ephemeral_5m vs ephemeral_1h token counts to determine your current TTL mix","Run the analysis across multiple sessions to track TTL behavior over time"],"commands":["ls -lt ~/.claude/projects/*/*.jsonl 2>/dev/null | head -5","cat ~/.claude/projects/*/agent-*.jsonl | jq -r 'select(.type==\"assistant\") | .usage // {} | {cache_create_5m: .cache_creation.ephemeral_5m_input_tokens // 0, cache_create_1h: .cache_creation.ephemeral_1h_input_tokens // 0, cache_read: .cache_read_input_tokens // 0}' 2>/dev/null | head -20","cat ~/.claude/projects/*/agent-*.jsonl | jq -r 'select(.type==\"assistant\") | .usage // {} | [.cache_creation.ephemeral_5m_input_tokens // 0, .cache_creation.ephemeral_1h_input_tokens // 0] | @tsv' 2>/dev/null | awk '{sum5+=$1; sum1+=$2} END {printf \"5m total: %.2fM, 1h total: %.2fM, ratio: %.1f%%\\n\", sum5/1e6, sum1/1e6, sum1/(sum1+sum5+0.01)*100}'"],"config_examples":[],"explanation":null,"risks":["JSONL files can be very large (hundreds of MBs); piping through cat may be slow","jq may not be installed — install via `brew install jq` (macOS) or `apt install jq` (Linux)"],"risk_level":"low","verification_steps":["Step 1: Run `ls ~/.claude/projects/*/*.jsonl 2>/dev/null | wc -l` → expect: ≥1 (sessions exist)","Step 2: Run the jq analysis command → expect: numeric output showing 5m and 1h token totals","Step 3: After upgrading to v2.1.90+, run a fresh session for >10 minutes → expect: non-zero ephemeral_1h_input_tokens on cache-read turns"],"verified_count":0,"failed_count":0,"source_type":"github","status":"published","language":"en","source_url":null,"extra":{},"created_at":"2026-06-15T02:07:47.690Z","updated_at":"2026-06-15T02:07:47.690Z"},{"id":"3fe682f0-c934-444c-bfe6-7735e0835aa2","issue_id":"b8f4b1f2-44d6-4d5e-b5e9-2ab67f0ca9be","title":"Upgrade to Claude Code v2.1.90+ to fix quota-stuck-on-5m-TTL bug","summary":"The critical client-side bug that kept sessions permanently on 5m TTL after quota exhaustion was fixed in v2.1.90 (published April 1, 2026). Upgrading ensures sessions properly transition back to optimal TTL tiers after quota resets. This is the most impactful single fix — apply immediately.","steps":["Check your current Claude Code version: `claude --version`","Upgrade to latest: `npm install -g @anthropic-ai/claude-code@latest`","Verify the upgrade: `claude --version` should show ≥2.1.90 (latest: 2.1.174)","Restart any running Claude Code sessions — the fix only applies to new sessions"],"commands":["claude --version","npm install -g @anthropic-ai/claude-code@latest","npm view @anthropic-ai/claude-code version","file $(which claude)"],"config_examples":[],"explanation":null,"risks":["Upgrading may introduce other behavioral changes in newer versions","npm global install may conflict with Homebrew or other package managers — check `file $(which claude)` to confirm the binary source"],"risk_level":"low","verification_steps":["Step 1: Run `claude --version` → expect: version ≥ 2.1.90 (e.g., '2.1.174 (Claude Code)')","Step 2: Start a fresh session and monitor JSONL logs for `ephemeral_1h_input_tokens` appearing on cache reads after the session runs for >5 minutes → expect: non-zero 1h cache hits on longer sessions","Step 3: Exhaust quota in a test session, then restart → expect: cache TTL tier resets properly on new session (no permanent 5m stuck state)"],"verified_count":0,"failed_count":0,"source_type":"official","status":"published","language":"en","source_url":null,"extra":{},"created_at":"2026-06-15T02:07:46.972Z","updated_at":"2026-06-15T02:07:46.972Z"}]}}