# LLM Gateway MCP Server

Unified LLM access with failover chains, cost tracking, and streaming support.

## Features

- **Multi-Provider Support**: Anthropic, OpenAI, Google, DeepSeek
- **Automatic Failover**: Circuit breaker with configurable thresholds
- **Cost Tracking**: Redis-based per-project/agent usage tracking
- **Streaming**: SSE support for real-time token delivery
- **Model Groups**: Pre-configured chains for different use cases

## Quick Start

```bash
# Install dependencies
uv sync

# Run tests
IS_TEST=True uv run pytest -v

# Start server
uv run python server.py
```

## Configuration

Environment variables:
```bash
LLM_GATEWAY_HOST=0.0.0.0
LLM_GATEWAY_PORT=8001
LLM_GATEWAY_DEBUG=false
LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1

# Provider API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
DEEPSEEK_API_KEY=...
```

## MCP Tools

### chat_completion

Generate completions with automatic failover.

```json
{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "messages": [{"role": "user", "content": "Hello"}],
  "model_group": "reasoning",
  "max_tokens": 4096,
  "temperature": 0.7,
  "stream": false
}
```

### count_tokens

Count tokens in text using tiktoken.

```json
{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "text": "Hello, world!",
  "model": "gpt-4"
}
```

### list_models

List available models by group.

```json
{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "model_group": "code"
}
```

### get_usage

Get usage statistics.

```json
{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "period": "day"
}
```

## Model Groups

| Group | Primary | Fallback 1 | Fallback 2 |
|-------|---------|------------|------------|
| reasoning | claude-opus-4-5 | gpt-4.1 | gemini-2.5-pro |
| code | claude-sonnet-4 | gpt-4.1 | deepseek-coder |
| fast | claude-haiku | gpt-4.1-mini | gemini-flash |
| vision | claude-sonnet-4 | gpt-4.1 | gemini-2.5-pro |
| embedding | text-embedding-3-large | voyage-3 | - |

## Circuit Breaker

- **Threshold**: 5 consecutive failures
- **Cooldown**: 30 seconds
- **Half-Open**: After cooldown, allows one test request

## Testing

```bash
# Full test suite with coverage
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing

# Specific test file
IS_TEST=True uv run pytest tests/test_server.py -v
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/mcp/tools` | GET | List available tools |
| `/mcp` | POST | JSON-RPC 2.0 tool execution |