forked from cardosofelipe/fast-next-template
feat(mcp): Implement LLM Gateway MCP Server #56
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implement the LLM Gateway MCP server that provides unified access to multiple LLM providers with intelligent routing, failover, and cost tracking. This is the highest priority MCP server as all agent interactions depend on it.
Sub-Tasks
1. Project Setup
mcp-servers/llm-gateway/pyproject.tomlwith dependenciesfastmcp>=0.4.0,litellm>=1.50.0,redis>=5.0.0Dockerfile,.dockerignore)docker-compose.dev.ymlREADME.mdwith setup instructions2. LiteLLM Integration (
providers.py)3. Model Group Configuration (
models.py)reasoninggroup: Claude Opus 4.5 → GPT-5.1 → Gemini 3 Procodegroup: Claude Sonnet 4 → Codex Max → DeepSeek Coderfastgroup: Claude Haiku 3.5 → GPT-5.1 Mini → Gemini Flashvisiongroup: Claude Opus 4.5 → GPT-5.1 Vision → Gemini Pro Visionembeddinggroup: text-embedding-3-large → ada-0024. Failover Chain (
failover.py)5. Routing Logic (
routing.py)ModelRouterclass6. Cost Tracking (
cost_tracking.py)UsageRecordmodel with all cost fields7. Token Usage Logging (
usage.py)8. MCP Tools Implementation (
server.py)completetool (non-streaming)stream_completetool (streaming)get_usagetoolhealth_checktoollist_modelstool9. Streaming Support (
streaming.py)10. Error Handling
LLMErrorbase exceptionProviderErrorfor provider failuresRateLimitErrorfor rate limitingContextLengthErrorfor token overflowContentFilterErrorfor blocked content11. Configuration
config.yamlfor provider settings12. Docker & Deployment
Dockerfile(multi-stage build)13. Testing
providers.pyfailover.pyrouting.pycost_tracking.py14. Documentation
Technical Specifications
MCP Tools
Model Pricing (per 1K tokens)
Acceptance Criteria
Dependencies
Assignable To
backend-engineer agent
Implementation complete! PR #71 is ready for review.
Summary:
chat_completion,list_models,get_usage,count_tokensPR: #71