forked from cardosofelipe/fast-next-template

Files

Felipe Cardoso 2ab69f8561 docs(mcp): add comprehensive MCP server documentation

- Add docs/architecture/MCP_SERVERS.md with full architecture overview
- Add README.md for LLM Gateway with quick start, tools, and model groups
- Add README.md for Knowledge Base with search types, chunking strategies
- Include API endpoints, security guidelines, and testing instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-04 01:37:04 +01:00

2.5 KiB

Raw Permalink Blame History

LLM Gateway MCP Server

Unified LLM access with failover chains, cost tracking, and streaming support.

Features

Multi-Provider Support: Anthropic, OpenAI, Google, DeepSeek
Automatic Failover: Circuit breaker with configurable thresholds
Cost Tracking: Redis-based per-project/agent usage tracking
Streaming: SSE support for real-time token delivery
Model Groups: Pre-configured chains for different use cases

Quick Start

# Install dependencies
uv sync

# Run tests
IS_TEST=True uv run pytest -v

# Start server
uv run python server.py

Configuration

Environment variables:

LLM_GATEWAY_HOST=0.0.0.0
LLM_GATEWAY_PORT=8001
LLM_GATEWAY_DEBUG=false
LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1

# Provider API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
DEEPSEEK_API_KEY=...

MCP Tools

chat_completion

Generate completions with automatic failover.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "messages": [{"role": "user", "content": "Hello"}],
  "model_group": "reasoning",
  "max_tokens": 4096,
  "temperature": 0.7,
  "stream": false
}

count_tokens

Count tokens in text using tiktoken.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "text": "Hello, world!",
  "model": "gpt-4"
}

list_models

List available models by group.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "model_group": "code"
}

get_usage

Get usage statistics.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "period": "day"
}

Model Groups

Group	Primary	Fallback 1	Fallback 2
reasoning	claude-opus-4-5	gpt-4.1	gemini-2.5-pro
code	claude-sonnet-4	gpt-4.1	deepseek-coder
fast	claude-haiku	gpt-4.1-mini	gemini-flash
vision	claude-sonnet-4	gpt-4.1	gemini-2.5-pro
embedding	text-embedding-3-large	voyage-3	-

Circuit Breaker

Threshold: 5 consecutive failures
Cooldown: 30 seconds
Half-Open: After cooldown, allows one test request

Testing

# Full test suite with coverage
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing

# Specific test file
IS_TEST=True uv run pytest tests/test_server.py -v

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/mcp/tools`	GET	List available tools
`/mcp`	POST	JSON-RPC 2.0 tool execution

2.5 KiB Raw Permalink Blame History