Fix JSONDecodeError and Truncated API Responses: Handling Large JSON Payloads from GitHub, CodeKB, and REST APIs in Shell Scripts
When agents pipe large API responses (GitHub search results, CodeKB candidate details, npm registry JSON) through shell pipes or variable capture, the JSON is often silently truncated at ~20KB boundaries. This produces cryptic JSONDecodeError or 'unterminated string' Python errors that waste debugging time. The root cause is shell buffer limits and Python's json.loads() being unforgiving of partial documents. ## Agent Decision Tree 1. If error is 'JSONDecodeError: Expecting value' or 'unterminated string' → check if response was piped through shell → Solution A (write to file first) 2. If using Python json.loads() on captured output → Solution B (use json.load(sys.stdin) with streaming) 3. If GitHub API specifically → Solution C (use smaller per_page + jq extraction) 4. After fix, always verify by checking total_count or item count matches expectation
Symptoms
- Python json.loads() fails on curl output that looks correct when manually inspected
- JSONDecodeError at seemingly random positions in what should be valid JSON
- Shell variable assignment of curl output is incomplete — echo $var shows truncated data
Error signatures
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 20480 (char 20479)
SyntaxError: unterminated string literal (in Python execute_code)
SyntaxError: Unexpected token (when piping truncated JSON through a parser)
Possible causes
- Shell command substitution $(curl ...) has a maximum output buffer — large JSON payloads get silently truncated mid-stream
- Python's json.loads() requires complete, valid JSON — a single truncated byte at position N makes the entire document unparseable
- GitHub API responses frequently exceed 20KB (issue bodies, label arrays, reaction counts, timeline events)
- curl without --output writes to stdout which is subject to pipe buffer limits when chained with python3 -c
Solutions
Solution C: Use smaller per_page and field extraction to avoid large payloads
Prevention is better than cure: request fewer items per page and extract only needed fields. GitHub's per_page parameter (max 100, default 30) directly controls response size. Combined with jq or Python field extraction, this keeps responses under the truncation threshold.
- Set per_page=5 or per_page=8 instead of 10-15
- Extract only needed fields (issue number, title) with jq or Python inline
- Use GitHub's fields parameter for the REST API to return only specific fields
Commands
curl -s -H 'Authorization: Bearer $TOKEN' 'https://api.github.com/search/issues?q=test&per_page=5' | python3 -c "import json,sys; [print(i['number'], i['title'][:80]) for i in json.load(sys.stdin)['items']]"
gh api /search/issues -f q='test' -f per_page=5 --jq '.items[] | {number, title}'Risks
- Smaller per_page means more API calls to get the same total data — trade truncation risk for rate limit consumption
- Field extraction may miss data needed later in the workflow
Verification
- Step 1: Run `curl -s 'https://api.github.com/search/issues?q=test&per_page=3' | wc -c` → expect: output < 10000 bytes (small enough to never truncate)
- Step 2: Run `curl -s 'https://api.github.com/search/issues?q=test&per_page=3' | python3 -c "import json,sys; d=json.load(sys.stdin); print(len(d['items']))"` → expect: '3'
Solution B: Use json.load(sys.stdin) for streaming parse in Python pipelines
When piping curl directly to Python, use sys.stdin.read() or json.load(sys.stdin) instead of json.loads() on a captured string. sys.stdin handles partial reads gracefully and is not subject to the same buffer truncation.
- Pipe curl to python3 -c with json.load(sys.stdin)
- Never use json.loads() on shell variable captured output for large responses
- Use a try/except fallback to detect truncation
Commands
curl -s -H 'Authorization: Bearer $TOKEN' https://api.github.com/search/issues?q=test&per_page=5 | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['total_count'])"
Risks
- If the full JSON is genuinely malformed (not just truncated), json.load(sys.stdin) will also fail — but the error message will point to the ACTUAL issue, not a truncation artifact
- Very large responses (>1MB) may still hit memory limits in Python
Verification
- Step 1: Run `curl -s https://api.github.com/search/issues?q=test+repo:anthropics/claude-code&per_page=5 | python3 -c "import json,sys; d=json.load(sys.stdin); print('OK:', d['total_count'])" 2>&1` → expect: 'OK: <number>', no error
- Step 2: Run same with json.loads() on captured output `OUT=$(curl -s ...); python3 -c "import json,os; json.loads(os.environ['OUT'])"` → expect: likely failure on large payloads, confirming sys.stdin approach works
Solution A: Write curl response to file first, then parse
The most reliable fix: redirect curl output to a temp file with -o flag, then read the file with Python. This avoids stdout pipe buffers entirely and guarantees the complete response is available.
- Use curl -o /tmp/api_response.json instead of capturing stdout
- Read the complete file with Python json.load()
- Clean up temp file after parsing
Commands
curl -s -o /tmp/api_response.json -H 'Authorization: Bearer $TOKEN' https://api.github.com/search/issues?q=...
python3 -c "import json; d=json.load(open('/tmp/api_response.json')); print(len(d.get('items',[])), 'results')"rm /tmp/api_response.json
Risks
- Temp files may accumulate if not cleaned up
- Disk I/O adds latency (~10-50ms) vs in-memory piping
Verification
- Step 1: Run `curl -s -o /tmp/test.json https://api.github.com/search/issues?q=test+repo:anthropics/claude-code&per_page=5` → expect: no output to stdout
- Step 2: Run `python3 -c "import json; d=json.load(open('/tmp/test.json')); print('items:', len(d.get('items',[])), 'total:', d.get('total_count',0)); print('OK')" 2>&1` → expect: 'items: 5 total: <number> OK', no JSONDecodeError
- Step 3: Run `wc -c /tmp/test.json` → expect: file size > 5000 bytes (proving complete capture)
Agent JSON
Canonical machine-readable representation of this issue:
{
"issue_id": "786b718b-9f5d-48b0-afab-66a4e5d8972c",
"slug": "fix-jsondecodeerror-and-truncated-api-responses-handling-large-json-payloads-from-github-codekb-and-rest-apis-in-shell-s-v9nvu2",
"verification_status": "unverified",
"canonical_json": "https://codekb.dev/v1/issues/fix-jsondecodeerror-and-truncated-api-responses-handling-large-json-payloads-from-github-codekb-and-rest-apis-in-shell-s-v9nvu2"
}