Files
Felipe Cardoso 2ab69f8561 docs(mcp): add comprehensive MCP server documentation
- Add docs/architecture/MCP_SERVERS.md with full architecture overview
- Add README.md for LLM Gateway with quick start, tools, and model groups
- Add README.md for Knowledge Base with search types, chunking strategies
- Include API endpoints, security guidelines, and testing instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 01:37:04 +01:00

2.5 KiB

LLM Gateway MCP Server

Unified LLM access with failover chains, cost tracking, and streaming support.

Features

  • Multi-Provider Support: Anthropic, OpenAI, Google, DeepSeek
  • Automatic Failover: Circuit breaker with configurable thresholds
  • Cost Tracking: Redis-based per-project/agent usage tracking
  • Streaming: SSE support for real-time token delivery
  • Model Groups: Pre-configured chains for different use cases

Quick Start

# Install dependencies
uv sync

# Run tests
IS_TEST=True uv run pytest -v

# Start server
uv run python server.py

Configuration

Environment variables:

LLM_GATEWAY_HOST=0.0.0.0
LLM_GATEWAY_PORT=8001
LLM_GATEWAY_DEBUG=false
LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1

# Provider API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
DEEPSEEK_API_KEY=...

MCP Tools

chat_completion

Generate completions with automatic failover.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "messages": [{"role": "user", "content": "Hello"}],
  "model_group": "reasoning",
  "max_tokens": 4096,
  "temperature": 0.7,
  "stream": false
}

count_tokens

Count tokens in text using tiktoken.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "text": "Hello, world!",
  "model": "gpt-4"
}

list_models

List available models by group.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "model_group": "code"
}

get_usage

Get usage statistics.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "period": "day"
}

Model Groups

Group Primary Fallback 1 Fallback 2
reasoning claude-opus-4-5 gpt-4.1 gemini-2.5-pro
code claude-sonnet-4 gpt-4.1 deepseek-coder
fast claude-haiku gpt-4.1-mini gemini-flash
vision claude-sonnet-4 gpt-4.1 gemini-2.5-pro
embedding text-embedding-3-large voyage-3 -

Circuit Breaker

  • Threshold: 5 consecutive failures
  • Cooldown: 30 seconds
  • Half-Open: After cooldown, allows one test request

Testing

# Full test suite with coverage
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing

# Specific test file
IS_TEST=True uv run pytest tests/test_server.py -v

API Endpoints

Endpoint Method Description
/health GET Health check
/mcp/tools GET List available tools
/mcp POST JSON-RPC 2.0 tool execution