Files

Felipe Cardoso b6c38cac88 refactor(llm-gateway): adjust if-condition formatting for thread safety check

Updated line breaks and indentation for improved readability in circuit state recovery logic, ensuring consistent style.

2026-01-06 17:20:49 +01:00

tests

fix(llm-gateway): improve type safety and datetime consistency

2026-01-03 20:56:05 +01:00

config.py

feat(llm-gateway): implement LLM Gateway MCP Server (#56 )

2026-01-03 20:31:19 +01:00

cost_tracking.py

fix(mcp-gateway): address critical issues from deep review

2026-01-04 01:36:55 +01:00

Dockerfile

feat: enhance database transactions, add Makefiles, and improve Docker setup

2026-01-05 00:49:19 +01:00

exceptions.py

fix(llm-gateway): improve type safety and datetime consistency

2026-01-03 20:56:05 +01:00

failover.py

refactor(llm-gateway): adjust if-condition formatting for thread safety check

2026-01-06 17:20:49 +01:00

Makefile

feat: enhance database transactions, add Makefiles, and improve Docker setup

2026-01-05 00:49:19 +01:00

models.py

fix(llm-gateway): improve type safety and datetime consistency

2026-01-03 20:56:05 +01:00

providers.py

fix(mcp-gateway): address critical issues from deep review

2026-01-04 01:36:55 +01:00

pyproject.toml

fix(llm-gateway): improve type safety and datetime consistency

2026-01-03 20:56:05 +01:00

README.md

docs(mcp): add comprehensive MCP server documentation

2026-01-04 01:37:04 +01:00

routing.py

fix(llm-gateway): improve type safety and datetime consistency

2026-01-03 20:56:05 +01:00

server.py

fix(llm-gateway): improve type safety and datetime consistency

2026-01-03 20:56:05 +01:00

streaming.py

fix(llm-gateway): improve type safety and datetime consistency

2026-01-03 20:56:05 +01:00

uv.lock

feat(llm-gateway): implement LLM Gateway MCP Server (#56 )

2026-01-03 20:31:19 +01:00

README.md

LLM Gateway MCP Server

Unified LLM access with failover chains, cost tracking, and streaming support.

Features

Multi-Provider Support: Anthropic, OpenAI, Google, DeepSeek
Automatic Failover: Circuit breaker with configurable thresholds
Cost Tracking: Redis-based per-project/agent usage tracking
Streaming: SSE support for real-time token delivery
Model Groups: Pre-configured chains for different use cases

Quick Start

# Install dependencies
uv sync

# Run tests
IS_TEST=True uv run pytest -v

# Start server
uv run python server.py

Configuration

Environment variables:

LLM_GATEWAY_HOST=0.0.0.0
LLM_GATEWAY_PORT=8001
LLM_GATEWAY_DEBUG=false
LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1

# Provider API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
DEEPSEEK_API_KEY=...

MCP Tools

chat_completion

Generate completions with automatic failover.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "messages": [{"role": "user", "content": "Hello"}],
  "model_group": "reasoning",
  "max_tokens": 4096,
  "temperature": 0.7,
  "stream": false
}

count_tokens

Count tokens in text using tiktoken.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "text": "Hello, world!",
  "model": "gpt-4"
}

list_models

List available models by group.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "model_group": "code"
}

get_usage

Get usage statistics.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "period": "day"
}

Model Groups

Group	Primary	Fallback 1	Fallback 2
reasoning	claude-opus-4-5	gpt-4.1	gemini-2.5-pro
code	claude-sonnet-4	gpt-4.1	deepseek-coder
fast	claude-haiku	gpt-4.1-mini	gemini-flash
vision	claude-sonnet-4	gpt-4.1	gemini-2.5-pro
embedding	text-embedding-3-large	voyage-3	-

Circuit Breaker

Threshold: 5 consecutive failures
Cooldown: 30 seconds
Half-Open: After cooldown, allows one test request

Testing

# Full test suite with coverage
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing

# Specific test file
IS_TEST=True uv run pytest tests/test_server.py -v

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/mcp/tools`	GET	List available tools
`/mcp`	POST	JSON-RPC 2.0 tool execution