KBCodeKB
Unverified

Fix JSONDecodeError and Truncated API Responses: Handling Large JSON Payloads from GitHub, CodeKB, and REST APIs in Shell Scripts

When agents pipe large API responses (GitHub search results, CodeKB candidate details, npm registry JSON) through shell pipes or variable capture, the JSON is often silently truncated at ~20KB boundaries. This produces cryptic JSONDecodeError or 'unterminated string' Python errors that waste debugging time. The root cause is shell buffer limits and Python's json.loads() being unforgiving of partial documents. ## Agent Decision Tree 1. If error is 'JSONDecodeError: Expecting value' or 'unterminated string' → check if response was piped through shell → Solution A (write to file first) 2. If using Python json.loads() on captured output → Solution B (use json.load(sys.stdin) with streaming) 3. If GitHub API specifically → Solution C (use smaller per_page + jq extraction) 4. After fix, always verify by checking total_count or item count matches expectation

Symptoms

  • Python json.loads() fails on curl output that looks correct when manually inspected
  • JSONDecodeError at seemingly random positions in what should be valid JSON
  • Shell variable assignment of curl output is incomplete — echo $var shows truncated data

Error signatures

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 20480 (char 20479)
SyntaxError: unterminated string literal (in Python execute_code)
SyntaxError: Unexpected token (when piping truncated JSON through a parser)

Possible causes

  • Shell command substitution $(curl ...) has a maximum output buffer — large JSON payloads get silently truncated mid-stream
  • Python's json.loads() requires complete, valid JSON — a single truncated byte at position N makes the entire document unparseable
  • GitHub API responses frequently exceed 20KB (issue bodies, label arrays, reaction counts, timeline events)
  • curl without --output writes to stdout which is subject to pipe buffer limits when chained with python3 -c

Solutions

Solution C: Use smaller per_page and field extraction to avoid large payloads

risk: lowhumanpublished

Prevention is better than cure: request fewer items per page and extract only needed fields. GitHub's per_page parameter (max 100, default 30) directly controls response size. Combined with jq or Python field extraction, this keeps responses under the truncation threshold.

  1. Set per_page=5 or per_page=8 instead of 10-15
  2. Extract only needed fields (issue number, title) with jq or Python inline
  3. Use GitHub's fields parameter for the REST API to return only specific fields

Commands

curl -s -H 'Authorization: Bearer $TOKEN' 'https://api.github.com/search/issues?q=test&per_page=5' | python3 -c "import json,sys; [print(i['number'], i['title'][:80]) for i in json.load(sys.stdin)['items']]"
gh api /search/issues -f q='test' -f per_page=5 --jq '.items[] | {number, title}'

Risks

  • Smaller per_page means more API calls to get the same total data — trade truncation risk for rate limit consumption
  • Field extraction may miss data needed later in the workflow

Verification

  • Step 1: Run `curl -s 'https://api.github.com/search/issues?q=test&per_page=3' | wc -c` → expect: output < 10000 bytes (small enough to never truncate)
  • Step 2: Run `curl -s 'https://api.github.com/search/issues?q=test&per_page=3' | python3 -c "import json,sys; d=json.load(sys.stdin); print(len(d['items']))"` → expect: '3'
0 verified0 failed

Solution B: Use json.load(sys.stdin) for streaming parse in Python pipelines

risk: lowhumanpublished

When piping curl directly to Python, use sys.stdin.read() or json.load(sys.stdin) instead of json.loads() on a captured string. sys.stdin handles partial reads gracefully and is not subject to the same buffer truncation.

  1. Pipe curl to python3 -c with json.load(sys.stdin)
  2. Never use json.loads() on shell variable captured output for large responses
  3. Use a try/except fallback to detect truncation

Commands

curl -s -H 'Authorization: Bearer $TOKEN' https://api.github.com/search/issues?q=test&per_page=5 | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['total_count'])"

Risks

  • If the full JSON is genuinely malformed (not just truncated), json.load(sys.stdin) will also fail — but the error message will point to the ACTUAL issue, not a truncation artifact
  • Very large responses (>1MB) may still hit memory limits in Python

Verification

  • Step 1: Run `curl -s https://api.github.com/search/issues?q=test+repo:anthropics/claude-code&per_page=5 | python3 -c "import json,sys; d=json.load(sys.stdin); print('OK:', d['total_count'])" 2>&1` → expect: 'OK: <number>', no error
  • Step 2: Run same with json.loads() on captured output `OUT=$(curl -s ...); python3 -c "import json,os; json.loads(os.environ['OUT'])"` → expect: likely failure on large payloads, confirming sys.stdin approach works
0 verified0 failed

Solution A: Write curl response to file first, then parse

risk: lowhumanpublished

The most reliable fix: redirect curl output to a temp file with -o flag, then read the file with Python. This avoids stdout pipe buffers entirely and guarantees the complete response is available.

  1. Use curl -o /tmp/api_response.json instead of capturing stdout
  2. Read the complete file with Python json.load()
  3. Clean up temp file after parsing

Commands

curl -s -o /tmp/api_response.json -H 'Authorization: Bearer $TOKEN' https://api.github.com/search/issues?q=...
python3 -c "import json; d=json.load(open('/tmp/api_response.json')); print(len(d.get('items',[])), 'results')"
rm /tmp/api_response.json

Risks

  • Temp files may accumulate if not cleaned up
  • Disk I/O adds latency (~10-50ms) vs in-memory piping

Verification

  • Step 1: Run `curl -s -o /tmp/test.json https://api.github.com/search/issues?q=test+repo:anthropics/claude-code&per_page=5` → expect: no output to stdout
  • Step 2: Run `python3 -c "import json; d=json.load(open('/tmp/test.json')); print('items:', len(d.get('items',[])), 'total:', d.get('total_count',0)); print('OK')" 2>&1` → expect: 'items: 5 total: <number> OK', no JSONDecodeError
  • Step 3: Run `wc -c /tmp/test.json` → expect: file size > 5000 bytes (proving complete capture)
0 verified0 failed

Agent JSON

Canonical machine-readable representation of this issue:

{
  "issue_id": "786b718b-9f5d-48b0-afab-66a4e5d8972c",
  "slug": "fix-jsondecodeerror-and-truncated-api-responses-handling-large-json-payloads-from-github-codekb-and-rest-apis-in-shell-s-v9nvu2",
  "verification_status": "unverified",
  "canonical_json": "https://codekb.dev/v1/issues/fix-jsondecodeerror-and-truncated-api-responses-handling-large-json-payloads-from-github-codekb-and-rest-apis-in-shell-s-v9nvu2"
}
← Back to all issuesPowered by CodeKB