forked from cardosofelipe/fast-next-template
- Add docs/architecture/MCP_SERVERS.md with full architecture overview - Add README.md for LLM Gateway with quick start, tools, and model groups - Add README.md for Knowledge Base with search types, chunking strategies - Include API endpoints, security guidelines, and testing instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
130 lines
2.5 KiB
Markdown
130 lines
2.5 KiB
Markdown
# LLM Gateway MCP Server
|
|
|
|
Unified LLM access with failover chains, cost tracking, and streaming support.
|
|
|
|
## Features
|
|
|
|
- **Multi-Provider Support**: Anthropic, OpenAI, Google, DeepSeek
|
|
- **Automatic Failover**: Circuit breaker with configurable thresholds
|
|
- **Cost Tracking**: Redis-based per-project/agent usage tracking
|
|
- **Streaming**: SSE support for real-time token delivery
|
|
- **Model Groups**: Pre-configured chains for different use cases
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Install dependencies
|
|
uv sync
|
|
|
|
# Run tests
|
|
IS_TEST=True uv run pytest -v
|
|
|
|
# Start server
|
|
uv run python server.py
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Environment variables:
|
|
```bash
|
|
LLM_GATEWAY_HOST=0.0.0.0
|
|
LLM_GATEWAY_PORT=8001
|
|
LLM_GATEWAY_DEBUG=false
|
|
LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1
|
|
|
|
# Provider API keys
|
|
ANTHROPIC_API_KEY=sk-ant-...
|
|
OPENAI_API_KEY=sk-...
|
|
GOOGLE_API_KEY=...
|
|
DEEPSEEK_API_KEY=...
|
|
```
|
|
|
|
## MCP Tools
|
|
|
|
### chat_completion
|
|
|
|
Generate completions with automatic failover.
|
|
|
|
```json
|
|
{
|
|
"project_id": "proj-123",
|
|
"agent_id": "agent-456",
|
|
"messages": [{"role": "user", "content": "Hello"}],
|
|
"model_group": "reasoning",
|
|
"max_tokens": 4096,
|
|
"temperature": 0.7,
|
|
"stream": false
|
|
}
|
|
```
|
|
|
|
### count_tokens
|
|
|
|
Count tokens in text using tiktoken.
|
|
|
|
```json
|
|
{
|
|
"project_id": "proj-123",
|
|
"agent_id": "agent-456",
|
|
"text": "Hello, world!",
|
|
"model": "gpt-4"
|
|
}
|
|
```
|
|
|
|
### list_models
|
|
|
|
List available models by group.
|
|
|
|
```json
|
|
{
|
|
"project_id": "proj-123",
|
|
"agent_id": "agent-456",
|
|
"model_group": "code"
|
|
}
|
|
```
|
|
|
|
### get_usage
|
|
|
|
Get usage statistics.
|
|
|
|
```json
|
|
{
|
|
"project_id": "proj-123",
|
|
"agent_id": "agent-456",
|
|
"period": "day"
|
|
}
|
|
```
|
|
|
|
## Model Groups
|
|
|
|
| Group | Primary | Fallback 1 | Fallback 2 |
|
|
|-------|---------|------------|------------|
|
|
| reasoning | claude-opus-4-5 | gpt-4.1 | gemini-2.5-pro |
|
|
| code | claude-sonnet-4 | gpt-4.1 | deepseek-coder |
|
|
| fast | claude-haiku | gpt-4.1-mini | gemini-flash |
|
|
| vision | claude-sonnet-4 | gpt-4.1 | gemini-2.5-pro |
|
|
| embedding | text-embedding-3-large | voyage-3 | - |
|
|
|
|
## Circuit Breaker
|
|
|
|
- **Threshold**: 5 consecutive failures
|
|
- **Cooldown**: 30 seconds
|
|
- **Half-Open**: After cooldown, allows one test request
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Full test suite with coverage
|
|
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing
|
|
|
|
# Specific test file
|
|
IS_TEST=True uv run pytest tests/test_server.py -v
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/health` | GET | Health check |
|
|
| `/mcp/tools` | GET | List available tools |
|
|
| `/mcp` | POST | JSON-RPC 2.0 tool execution |
|