KBCodeKB
Unverified

MCP Server Horizontal Scaling: Session Persistence Across Multiple Workers in Streamable HTTP Mode

When deploying MCP servers (FastMCP) horizontally on AWS ECS or Kubernetes, clients hit intermittent 'Bad Request: No valid session ID provided' 400 errors because StreamableHTTPSessionManager stores sessions in-memory only. Five solutions ranging from ALB sticky sessions to custom Redis-backed session managers and the emerging mcp-persist package, with the long-term fix awaiting an official SDK-level pluggable SessionStore protocol.

Symptoms

  • Intermittent 'Bad Request: No valid session ID provided' 400 errors when connecting to horizontally-scaled MCP server
  • MCP Inspector connections fail when traffic routes to a different ECS task or Kubernetes pod
  • Sampling functionality completely broken across multiple service instances
  • EventStore (Redis/SQLite) persistence works for events but sessions still tied to a single worker instance
  • Client-server state mismatch when a task cycles or rolling updates occur mid-session

Possible causes

  • StreamableHTTPSessionManager uses an in-memory Python dict (_sessions) to store active sessions — not shareable across service instances in a load-balanced deployment
  • StreamableHTTPServerTransport._request_streams dict is the real stateful core — it maps stream IDs to active request queues, and these queues cannot survive pod restarts or cross-instance routing
  • Even with external EventStore (Redis/SQLite/PostgreSQL) implemented for event resumability, the session manager still relies on in-memory dict for session ID validation, causing lookup failures when traffic routes to a different worker
  • stateless_http=True mode disables features requiring authentication context or cross-call resumability such as sampling and tool-specific state

Solutions

Solution 5: Track SDK-level SessionStore Protocol (Long-term Correct Fix)

risk: lowgithubpending_review

The MCP Python SDK community is discussing a pluggable SessionStore protocol (analogous to the existing EventStore interface) that would allow Redis, DynamoDB, or PostgreSQL backends to replace the in-memory _sessions dict. When available, this will be the definitive solution for horizontally-scaled stateful MCP servers without sticky sessions.

  1. Subscribe to GitHub Issue #880 on modelcontextprotocol/python-sdk for updates
  2. Search for related PRs: gh pr list --repo modelcontextprotocol/python-sdk --search session
  3. Review the current _sessions dict usage scope in streamable_http_manager.py to assess migration complexity
  4. Prepare your deployment to adopt the SessionStore interface once released
  5. Test in staging with Redis/DynamoDB-backed SessionStore before production rollout

Commands

gh issue view 880 --repo modelcontextprotocol/python-sdk
gh pr list --repo modelcontextprotocol/python-sdk --search session --state open

Config examples

# Conceptual future API (not yet available):
# from mcp.server.session_store import RedisSessionStore
#
# session_store = RedisSessionStore(redis_url="redis://...")
# mcp = FastMCP("My App", session_store=session_store)

Verification

  • Once official SessionStore interface is released: deploy to ECS multi-task, remove sticky sessions, test with MCP Inspector for zero 400 errors across workers
  • Confirm sampling and other stateful features work correctly across worker instances
0 verified0 failed

Solution 4: mcp-persist Package for Production EventStore (Community Middle-ground)

risk: lowgithubpending_review

Use the mcp-persist package providing tested EventStore backends (SQLite, Redis, PostgreSQL) with TTL, atomic monotonic IDs, and a with_persistence() helper that integrates into a Starlette app in 2 lines. Solves the EventStore persistence problem but still needs sticky sessions for session ID routing until SDK provides SessionStore plugin support.

  1. Install mcp-persist and your chosen backend driver (redis, psycopg2, etc.)
  2. Instantiate the appropriate EventStore class (RedisEventStore, PostgresEventStore, etc.)
  3. Wrap your Starlette MCP app with with_persistence(app, store, ttl=N)
  4. Keep sticky sessions configured for session ID routing across workers
  5. Monitor event TTL cleanup and confirm atomic ID generation works correctly

Commands

pip install mcp-persist redis
# For PostgreSQL backend:
# pip install mcp-persist psycopg[binary]

Config examples

from mcp_persist import RedisEventStore, with_persistence

store = RedisEventStore(
    redis_url="redis://my-redis-cluster:6379/0",
    key_prefix="mcp:events:"
)

app = with_persistence(
    mcp_app,
    store,
    ttl=3600  # auto-expire sessions after 1 hour
)

Verification

  • Simulate a worker restart — confirm events are replayed from the persistent store on reconnect
  • Verify TTL cleanup by checking that stale session keys are removed after the configured duration
  • Test cross-worker tool calls — confirm sticky sessions still needed for session ID routing
0 verified0 failed

Solution 3: Custom PersistentSessionManager via Redis (Advanced Community Hack)

risk: lowgithubpending_review

Subclass StreamableHTTPSessionManager and override _handle_stateful_request() to serialize/deserialize sessions via Redis. When a request hits a worker that doesn't have the session in memory, restore it from Redis. Works for basic tool calls but _request_streams state for streaming still needs per-worker affinity.

  1. Install redis and redis[hiredis] packages for async Redis client
  2. Create a PersistentSessionManager subclass of StreamableHTTPSessionManager
  3. Implement JSON serialization/deserialization of session state to Redis with TTL
  4. Override _handle_stateful_request to check Redis when session ID is not found locally
  5. Note: _request_streams dict must be handled separately — streaming requests still need same-worker routing

Commands

pip install redis[hiredis]
docker run -d --name mcp-redis -p 6379:6379 redis:7-alpine

Config examples

import json
import redis.asyncio as aioredis
from mcp.server.streamable_http_manager import StreamableHTTPSessionManager

class PersistentSessionManager(StreamableHTTPSessionManager):
    """StreamableHTTPSessionManager with Redis-backed session persistence."""

    def __init__(self, *args, redis_url="redis://localhost:6379", **kwargs):
        super().__init__(*args, **kwargs)
        self._redis = aioredis.from_url(redis_url, decode_responses=True)

    async def _handle_stateful_request(self, request, session_id):
        if session_id and session_id not in self._sessions:
            cached = await self._redis.get(f"mcp:session:{session_id}")
            if cached:
                # Recreate session from cached data
                session_data = json.loads(cached)
                self._sessions[session_id] = self._create_session(session_data)
        return await super()._handle_stateful_request(request, session_id)

    async def _persist_session(self, session_id, ttl=3600):
        if session_id in self._sessions:
            data = json.dumps(self._serialize_session(self._sessions[session_id]))
            await self._redis.setex(f"mcp:session:{session_id}", ttl, data)

Verification

  • Deploy 2+ workers behind a round-robin load balancer (no sticky sessions), connect via MCP Inspector and call tools — verify sessions migrate across workers
  • Check Redis keys (KEYS mcp:session:*) to confirm session data is being persisted with appropriate TTL
  • Test streaming responses — note this may still fail due to _request_streams being worker-local
0 verified0 failed

Solution 2: stateless_http Mode (Feature-Limited Alternative)

risk: lowofficialpending_review

Pass stateless_http=True to FastMCP.run() to operate without session state entirely. Eliminates the need for session persistence but disables features like sampling that require per-session context.

  1. Modify your FastMCP server startup code to add stateless_http=True parameter
  2. Remove any sticky session configuration from your load balancer
  3. Redeploy the service across multiple instances
  4. Test all tools to confirm they function correctly without session state
  5. Document which features are unavailable (sampling, per-session authentication context)

Commands

# FastMCP stateless startup pattern
python -c "from my_server import mcp; mcp.run(transport='streamable-http', stateless_http=True)"

Config examples

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("My Tools Server")

@mcp.tool()
def calculate_bmi(weight_kg: float, height_m: float) -> float:
    """Calculate BMI given weight in kg and height in meters."""
    return weight_kg / (height_m ** 2)

if __name__ == '__main__':
    # stateless_http=True enables horizontal scaling without sticky sessions
    mcp.run(transport='streamable-http', stateless_http=True)

Verification

  • Scale deployment to 3+ instances, run multiple concurrent MCP Inspector sessions, confirm zero 'No valid session ID' errors
  • Verify that stateless tools return correct results across all instances
  • Confirm sampling feature returns appropriate error (expected: not supported in stateless mode)
0 verified0 failed

Solution 1: ALB Sticky Sessions (Short-term Workaround)

risk: lowgithubpending_review

Configure cookie-based session affinity on AWS ALB or Kubernetes Ingress to pin all requests from a client to the same backend instance. Simple but sessions are lost when a task cycles or restarts.

  1. Enable stickiness on the AWS ALB target group via aws elbv2 CLI or console
  2. Set stickiness cookie duration to an appropriate value (e.g., 86400 seconds / 1 day)
  3. For Kubernetes, add nginx.ingress.kubernetes.io/affinity annotation set to 'cookie'
  4. Test with MCP Inspector to confirm no 400 errors on tool calls
  5. Document that rolling deployments and task restarts will still break open sessions

Commands

aws elbv2 modify-target-group-attributes --target-group-arn <TG_ARN> --attributes Key=stickiness.enabled,Value=true Key=stickiness.type,Value=lb_cookie Key=stickiness.lb_cookie.duration_seconds,Value=86400

Config examples

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-server-ingress
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "MCP_SESSION"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "86400"
spec:
  rules:
  - host: mcp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: mcp-server-service
            port:
              number: 8000

Verification

  • Deploy with sticky sessions enabled, connect via MCP Inspector, execute 10+ tool calls, verify zero 400 errors
  • Check ALB access logs or Ingress controller logs to confirm all requests from a session hit the same backend pod
0 verified0 failed

Agent JSON

Canonical machine-readable representation of this issue:

{
  "issue_id": "3db4feef-2bf1-4cab-9343-4e454030e2d5",
  "slug": "mcp-server-horizontal-scaling-session-persistence-across-multiple-workers-in-streamable-http-mode-4hqih2",
  "verification_status": "unverified",
  "canonical_json": "https://codekb.dev/v1/issues/mcp-server-horizontal-scaling-session-persistence-across-multiple-workers-in-streamable-http-mode-4hqih2"
}
← Back to all issuesPowered by CodeKB
MCP Server Horizontal Scaling: Session Persistence Across Multiple Workers in Streamable HTTP Mode · CodeKB