# LLM Gateway MCP Server Unified LLM access with failover chains, cost tracking, and streaming support. ## Features - **Multi-Provider Support**: Anthropic, OpenAI, Google, DeepSeek - **Automatic Failover**: Circuit breaker with configurable thresholds - **Cost Tracking**: Redis-based per-project/agent usage tracking - **Streaming**: SSE support for real-time token delivery - **Model Groups**: Pre-configured chains for different use cases ## Quick Start ```bash # Install dependencies uv sync # Run tests IS_TEST=True uv run pytest -v # Start server uv run python server.py ``` ## Configuration Environment variables: ```bash LLM_GATEWAY_HOST=0.0.0.0 LLM_GATEWAY_PORT=8001 LLM_GATEWAY_DEBUG=false LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1 # Provider API keys ANTHROPIC_API_KEY=sk-ant-... OPENAI_API_KEY=sk-... GOOGLE_API_KEY=... DEEPSEEK_API_KEY=... ``` ## MCP Tools ### chat_completion Generate completions with automatic failover. ```json { "project_id": "proj-123", "agent_id": "agent-456", "messages": [{"role": "user", "content": "Hello"}], "model_group": "reasoning", "max_tokens": 4096, "temperature": 0.7, "stream": false } ``` ### count_tokens Count tokens in text using tiktoken. ```json { "project_id": "proj-123", "agent_id": "agent-456", "text": "Hello, world!", "model": "gpt-4" } ``` ### list_models List available models by group. ```json { "project_id": "proj-123", "agent_id": "agent-456", "model_group": "code" } ``` ### get_usage Get usage statistics. ```json { "project_id": "proj-123", "agent_id": "agent-456", "period": "day" } ``` ## Model Groups | Group | Primary | Fallback 1 | Fallback 2 | |-------|---------|------------|------------| | reasoning | claude-opus-4-5 | gpt-4.1 | gemini-2.5-pro | | code | claude-sonnet-4 | gpt-4.1 | deepseek-coder | | fast | claude-haiku | gpt-4.1-mini | gemini-flash | | vision | claude-sonnet-4 | gpt-4.1 | gemini-2.5-pro | | embedding | text-embedding-3-large | voyage-3 | - | ## Circuit Breaker - **Threshold**: 5 consecutive failures - **Cooldown**: 30 seconds - **Half-Open**: After cooldown, allows one test request ## Testing ```bash # Full test suite with coverage IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing # Specific test file IS_TEST=True uv run pytest tests/test_server.py -v ``` ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check | | `/mcp/tools` | GET | List available tools | | `/mcp` | POST | JSON-RPC 2.0 tool execution |