forked from cardosofelipe/fast-next-template
- Introduced `format-check` for verification without modification in `llm-gateway` and `knowledge-base` Makefiles. - Updated `validate` to include `format-check`. - Added `format-all` to root Makefile for consistent formatting across all components. - Unexported `VIRTUAL_ENV` to prevent virtual environment warnings.
LLM Gateway MCP Server
Unified LLM access with failover chains, cost tracking, and streaming support.
Features
- Multi-Provider Support: Anthropic, OpenAI, Google, DeepSeek
- Automatic Failover: Circuit breaker with configurable thresholds
- Cost Tracking: Redis-based per-project/agent usage tracking
- Streaming: SSE support for real-time token delivery
- Model Groups: Pre-configured chains for different use cases
Quick Start
# Install dependencies
uv sync
# Run tests
IS_TEST=True uv run pytest -v
# Start server
uv run python server.py
Configuration
Environment variables:
LLM_GATEWAY_HOST=0.0.0.0
LLM_GATEWAY_PORT=8001
LLM_GATEWAY_DEBUG=false
LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1
# Provider API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
DEEPSEEK_API_KEY=...
MCP Tools
chat_completion
Generate completions with automatic failover.
{
"project_id": "proj-123",
"agent_id": "agent-456",
"messages": [{"role": "user", "content": "Hello"}],
"model_group": "reasoning",
"max_tokens": 4096,
"temperature": 0.7,
"stream": false
}
count_tokens
Count tokens in text using tiktoken.
{
"project_id": "proj-123",
"agent_id": "agent-456",
"text": "Hello, world!",
"model": "gpt-4"
}
list_models
List available models by group.
{
"project_id": "proj-123",
"agent_id": "agent-456",
"model_group": "code"
}
get_usage
Get usage statistics.
{
"project_id": "proj-123",
"agent_id": "agent-456",
"period": "day"
}
Model Groups
| Group | Primary | Fallback 1 | Fallback 2 |
|---|---|---|---|
| reasoning | claude-opus-4-5 | gpt-4.1 | gemini-2.5-pro |
| code | claude-sonnet-4 | gpt-4.1 | deepseek-coder |
| fast | claude-haiku | gpt-4.1-mini | gemini-flash |
| vision | claude-sonnet-4 | gpt-4.1 | gemini-2.5-pro |
| embedding | text-embedding-3-large | voyage-3 | - |
Circuit Breaker
- Threshold: 5 consecutive failures
- Cooldown: 30 seconds
- Half-Open: After cooldown, allows one test request
Testing
# Full test suite with coverage
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing
# Specific test file
IS_TEST=True uv run pytest tests/test_server.py -v
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/mcp/tools |
GET | List available tools |
/mcp |
POST | JSON-RPC 2.0 tool execution |