MCP Server Horizontal Scaling: Session Persistence Across Multiple Workers in Streamable HTTP Mode
When deploying MCP servers (FastMCP) horizontally on AWS ECS or Kubernetes, clients hit intermittent 'Bad Request: No valid session ID provided' 400 errors because StreamableHTTPSessionManager stores sessions in-memory only. Five solutions ranging from ALB sticky sessions to custom Redis-backed session managers and the emerging mcp-persist package, with the long-term fix awaiting an official SDK-level pluggable SessionStore protocol.
Symptoms
- Intermittent 'Bad Request: No valid session ID provided' 400 errors when connecting to horizontally-scaled MCP server
- MCP Inspector connections fail when traffic routes to a different ECS task or Kubernetes pod
- Sampling functionality completely broken across multiple service instances
- EventStore (Redis/SQLite) persistence works for events but sessions still tied to a single worker instance
- Client-server state mismatch when a task cycles or rolling updates occur mid-session
Possible causes
- StreamableHTTPSessionManager uses an in-memory Python dict (_sessions) to store active sessions — not shareable across service instances in a load-balanced deployment
- StreamableHTTPServerTransport._request_streams dict is the real stateful core — it maps stream IDs to active request queues, and these queues cannot survive pod restarts or cross-instance routing
- Even with external EventStore (Redis/SQLite/PostgreSQL) implemented for event resumability, the session manager still relies on in-memory dict for session ID validation, causing lookup failures when traffic routes to a different worker
- stateless_http=True mode disables features requiring authentication context or cross-call resumability such as sampling and tool-specific state
Solutions
Solution 5: Track SDK-level SessionStore Protocol (Long-term Correct Fix)
The MCP Python SDK community is discussing a pluggable SessionStore protocol (analogous to the existing EventStore interface) that would allow Redis, DynamoDB, or PostgreSQL backends to replace the in-memory _sessions dict. When available, this will be the definitive solution for horizontally-scaled stateful MCP servers without sticky sessions.
- Subscribe to GitHub Issue #880 on modelcontextprotocol/python-sdk for updates
- Search for related PRs: gh pr list --repo modelcontextprotocol/python-sdk --search session
- Review the current _sessions dict usage scope in streamable_http_manager.py to assess migration complexity
- Prepare your deployment to adopt the SessionStore interface once released
- Test in staging with Redis/DynamoDB-backed SessionStore before production rollout
Commands
gh issue view 880 --repo modelcontextprotocol/python-sdk
gh pr list --repo modelcontextprotocol/python-sdk --search session --state open
Config examples
# Conceptual future API (not yet available):
# from mcp.server.session_store import RedisSessionStore
#
# session_store = RedisSessionStore(redis_url="redis://...")
# mcp = FastMCP("My App", session_store=session_store)Verification
- Once official SessionStore interface is released: deploy to ECS multi-task, remove sticky sessions, test with MCP Inspector for zero 400 errors across workers
- Confirm sampling and other stateful features work correctly across worker instances
Solution 4: mcp-persist Package for Production EventStore (Community Middle-ground)
Use the mcp-persist package providing tested EventStore backends (SQLite, Redis, PostgreSQL) with TTL, atomic monotonic IDs, and a with_persistence() helper that integrates into a Starlette app in 2 lines. Solves the EventStore persistence problem but still needs sticky sessions for session ID routing until SDK provides SessionStore plugin support.
- Install mcp-persist and your chosen backend driver (redis, psycopg2, etc.)
- Instantiate the appropriate EventStore class (RedisEventStore, PostgresEventStore, etc.)
- Wrap your Starlette MCP app with with_persistence(app, store, ttl=N)
- Keep sticky sessions configured for session ID routing across workers
- Monitor event TTL cleanup and confirm atomic ID generation works correctly
Commands
pip install mcp-persist redis
# For PostgreSQL backend: # pip install mcp-persist psycopg[binary]
Config examples
from mcp_persist import RedisEventStore, with_persistence
store = RedisEventStore(
redis_url="redis://my-redis-cluster:6379/0",
key_prefix="mcp:events:"
)
app = with_persistence(
mcp_app,
store,
ttl=3600 # auto-expire sessions after 1 hour
)Verification
- Simulate a worker restart — confirm events are replayed from the persistent store on reconnect
- Verify TTL cleanup by checking that stale session keys are removed after the configured duration
- Test cross-worker tool calls — confirm sticky sessions still needed for session ID routing
Solution 3: Custom PersistentSessionManager via Redis (Advanced Community Hack)
Subclass StreamableHTTPSessionManager and override _handle_stateful_request() to serialize/deserialize sessions via Redis. When a request hits a worker that doesn't have the session in memory, restore it from Redis. Works for basic tool calls but _request_streams state for streaming still needs per-worker affinity.
- Install redis and redis[hiredis] packages for async Redis client
- Create a PersistentSessionManager subclass of StreamableHTTPSessionManager
- Implement JSON serialization/deserialization of session state to Redis with TTL
- Override _handle_stateful_request to check Redis when session ID is not found locally
- Note: _request_streams dict must be handled separately — streaming requests still need same-worker routing
Commands
pip install redis[hiredis]
docker run -d --name mcp-redis -p 6379:6379 redis:7-alpine
Config examples
import json
import redis.asyncio as aioredis
from mcp.server.streamable_http_manager import StreamableHTTPSessionManager
class PersistentSessionManager(StreamableHTTPSessionManager):
"""StreamableHTTPSessionManager with Redis-backed session persistence."""
def __init__(self, *args, redis_url="redis://localhost:6379", **kwargs):
super().__init__(*args, **kwargs)
self._redis = aioredis.from_url(redis_url, decode_responses=True)
async def _handle_stateful_request(self, request, session_id):
if session_id and session_id not in self._sessions:
cached = await self._redis.get(f"mcp:session:{session_id}")
if cached:
# Recreate session from cached data
session_data = json.loads(cached)
self._sessions[session_id] = self._create_session(session_data)
return await super()._handle_stateful_request(request, session_id)
async def _persist_session(self, session_id, ttl=3600):
if session_id in self._sessions:
data = json.dumps(self._serialize_session(self._sessions[session_id]))
await self._redis.setex(f"mcp:session:{session_id}", ttl, data)Verification
- Deploy 2+ workers behind a round-robin load balancer (no sticky sessions), connect via MCP Inspector and call tools — verify sessions migrate across workers
- Check Redis keys (KEYS mcp:session:*) to confirm session data is being persisted with appropriate TTL
- Test streaming responses — note this may still fail due to _request_streams being worker-local
Solution 2: stateless_http Mode (Feature-Limited Alternative)
Pass stateless_http=True to FastMCP.run() to operate without session state entirely. Eliminates the need for session persistence but disables features like sampling that require per-session context.
- Modify your FastMCP server startup code to add stateless_http=True parameter
- Remove any sticky session configuration from your load balancer
- Redeploy the service across multiple instances
- Test all tools to confirm they function correctly without session state
- Document which features are unavailable (sampling, per-session authentication context)
Commands
# FastMCP stateless startup pattern python -c "from my_server import mcp; mcp.run(transport='streamable-http', stateless_http=True)"
Config examples
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("My Tools Server")
@mcp.tool()
def calculate_bmi(weight_kg: float, height_m: float) -> float:
"""Calculate BMI given weight in kg and height in meters."""
return weight_kg / (height_m ** 2)
if __name__ == '__main__':
# stateless_http=True enables horizontal scaling without sticky sessions
mcp.run(transport='streamable-http', stateless_http=True)Verification
- Scale deployment to 3+ instances, run multiple concurrent MCP Inspector sessions, confirm zero 'No valid session ID' errors
- Verify that stateless tools return correct results across all instances
- Confirm sampling feature returns appropriate error (expected: not supported in stateless mode)
Solution 1: ALB Sticky Sessions (Short-term Workaround)
Configure cookie-based session affinity on AWS ALB or Kubernetes Ingress to pin all requests from a client to the same backend instance. Simple but sessions are lost when a task cycles or restarts.
- Enable stickiness on the AWS ALB target group via aws elbv2 CLI or console
- Set stickiness cookie duration to an appropriate value (e.g., 86400 seconds / 1 day)
- For Kubernetes, add nginx.ingress.kubernetes.io/affinity annotation set to 'cookie'
- Test with MCP Inspector to confirm no 400 errors on tool calls
- Document that rolling deployments and task restarts will still break open sessions
Commands
aws elbv2 modify-target-group-attributes --target-group-arn <TG_ARN> --attributes Key=stickiness.enabled,Value=true Key=stickiness.type,Value=lb_cookie Key=stickiness.lb_cookie.duration_seconds,Value=86400
Config examples
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: mcp-server-ingress
annotations:
nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/session-cookie-name: "MCP_SESSION"
nginx.ingress.kubernetes.io/session-cookie-max-age: "86400"
spec:
rules:
- host: mcp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mcp-server-service
port:
number: 8000Verification
- Deploy with sticky sessions enabled, connect via MCP Inspector, execute 10+ tool calls, verify zero 400 errors
- Check ALB access logs or Ingress controller logs to confirm all requests from a session hit the same backend pod
Agent JSON
Canonical machine-readable representation of this issue:
{
"issue_id": "3db4feef-2bf1-4cab-9343-4e454030e2d5",
"slug": "mcp-server-horizontal-scaling-session-persistence-across-multiple-workers-in-streamable-http-mode-4hqih2",
"verification_status": "unverified",
"canonical_json": "https://codekb.dev/v1/issues/mcp-server-horizontal-scaling-session-persistence-across-multiple-workers-in-streamable-http-mode-4hqih2"
}