Files
Felipe Cardoso 82c3a6ba47 chore(makefiles): add format-check target and unify formatting logic
- Introduced `format-check` for verification without modification in `llm-gateway` and `knowledge-base` Makefiles.
- Updated `validate` to include `format-check`.
- Added `format-all` to root Makefile for consistent formatting across all components.
- Unexported `VIRTUAL_ENV` to prevent virtual environment warnings.
2026-01-06 17:25:21 +01:00
..

LLM Gateway MCP Server

Unified LLM access with failover chains, cost tracking, and streaming support.

Features

  • Multi-Provider Support: Anthropic, OpenAI, Google, DeepSeek
  • Automatic Failover: Circuit breaker with configurable thresholds
  • Cost Tracking: Redis-based per-project/agent usage tracking
  • Streaming: SSE support for real-time token delivery
  • Model Groups: Pre-configured chains for different use cases

Quick Start

# Install dependencies
uv sync

# Run tests
IS_TEST=True uv run pytest -v

# Start server
uv run python server.py

Configuration

Environment variables:

LLM_GATEWAY_HOST=0.0.0.0
LLM_GATEWAY_PORT=8001
LLM_GATEWAY_DEBUG=false
LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1

# Provider API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
DEEPSEEK_API_KEY=...

MCP Tools

chat_completion

Generate completions with automatic failover.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "messages": [{"role": "user", "content": "Hello"}],
  "model_group": "reasoning",
  "max_tokens": 4096,
  "temperature": 0.7,
  "stream": false
}

count_tokens

Count tokens in text using tiktoken.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "text": "Hello, world!",
  "model": "gpt-4"
}

list_models

List available models by group.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "model_group": "code"
}

get_usage

Get usage statistics.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "period": "day"
}

Model Groups

Group Primary Fallback 1 Fallback 2
reasoning claude-opus-4-5 gpt-4.1 gemini-2.5-pro
code claude-sonnet-4 gpt-4.1 deepseek-coder
fast claude-haiku gpt-4.1-mini gemini-flash
vision claude-sonnet-4 gpt-4.1 gemini-2.5-pro
embedding text-embedding-3-large voyage-3 -

Circuit Breaker

  • Threshold: 5 consecutive failures
  • Cooldown: 30 seconds
  • Half-Open: After cooldown, allows one test request

Testing

# Full test suite with coverage
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing

# Specific test file
IS_TEST=True uv run pytest tests/test_server.py -v

API Endpoints

Endpoint Method Description
/health GET Health check
/mcp/tools GET List available tools
/mcp POST JSON-RPC 2.0 tool execution