forked from cardosofelipe/fast-next-template
Updated line breaks and indentation for improved readability in circuit state recovery logic, ensuring consistent style.
LLM Gateway MCP Server
Unified LLM access with failover chains, cost tracking, and streaming support.
Features
- Multi-Provider Support: Anthropic, OpenAI, Google, DeepSeek
- Automatic Failover: Circuit breaker with configurable thresholds
- Cost Tracking: Redis-based per-project/agent usage tracking
- Streaming: SSE support for real-time token delivery
- Model Groups: Pre-configured chains for different use cases
Quick Start
# Install dependencies
uv sync
# Run tests
IS_TEST=True uv run pytest -v
# Start server
uv run python server.py
Configuration
Environment variables:
LLM_GATEWAY_HOST=0.0.0.0
LLM_GATEWAY_PORT=8001
LLM_GATEWAY_DEBUG=false
LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1
# Provider API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
DEEPSEEK_API_KEY=...
MCP Tools
chat_completion
Generate completions with automatic failover.
{
"project_id": "proj-123",
"agent_id": "agent-456",
"messages": [{"role": "user", "content": "Hello"}],
"model_group": "reasoning",
"max_tokens": 4096,
"temperature": 0.7,
"stream": false
}
count_tokens
Count tokens in text using tiktoken.
{
"project_id": "proj-123",
"agent_id": "agent-456",
"text": "Hello, world!",
"model": "gpt-4"
}
list_models
List available models by group.
{
"project_id": "proj-123",
"agent_id": "agent-456",
"model_group": "code"
}
get_usage
Get usage statistics.
{
"project_id": "proj-123",
"agent_id": "agent-456",
"period": "day"
}
Model Groups
| Group | Primary | Fallback 1 | Fallback 2 |
|---|---|---|---|
| reasoning | claude-opus-4-5 | gpt-4.1 | gemini-2.5-pro |
| code | claude-sonnet-4 | gpt-4.1 | deepseek-coder |
| fast | claude-haiku | gpt-4.1-mini | gemini-flash |
| vision | claude-sonnet-4 | gpt-4.1 | gemini-2.5-pro |
| embedding | text-embedding-3-large | voyage-3 | - |
Circuit Breaker
- Threshold: 5 consecutive failures
- Cooldown: 30 seconds
- Half-Open: After cooldown, allows one test request
Testing
# Full test suite with coverage
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing
# Specific test file
IS_TEST=True uv run pytest tests/test_server.py -v
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/mcp/tools |
GET | List available tools |
/mcp |
POST | JSON-RPC 2.0 tool execution |