feat(backend): Implement MCP Client Infrastructure #55

Closed
opened 2026-01-03 01:24:06 +00:00 by cardosofelipe · 0 comments

Summary

Create the core MCP client infrastructure that enables the backend to communicate with all MCP servers. This is the foundation for all Phase 2 MCP integrations.

Sub-Tasks

1. Project Setup

  • Create backend/app/services/mcp/ directory structure
  • Add MCP SDK dependency to pyproject.toml (mcp>=1.0.0)
  • Add httpx for HTTP transport
  • Create __init__.py with public exports

2. Configuration System

  • Create backend/app/core/config/mcp_servers.yaml schema
  • Add Pydantic model for MCP server config (MCPServerConfig)
  • Add config loading in backend/app/core/config.py
  • Support environment variable overrides for URLs
  • Add validation for required server configurations
  • Document configuration options in README

3. Server Registry (registry.py)

  • Create MCPServerRegistry class
  • Implement register(name, config) method
  • Implement get(name) method with KeyError handling
  • Implement list_servers() method
  • Implement get_capabilities(name) method (lazy-loaded)
  • Add thread-safe singleton pattern
  • Add registry initialization on app startup

4. Connection Management (connection.py)

  • Create MCPConnection class wrapping MCP client
  • Implement connection pooling (1 connection per server)
  • Add automatic reconnection with exponential backoff (1s, 2s, 4s, 8s, max 30s)
  • Implement connection timeout handling (configurable, default 10s)
  • Add graceful disconnect on shutdown
  • Implement connection health ping
  • Add connection state enum (CONNECTING, CONNECTED, DISCONNECTED, ERROR)
  • Handle SSL/TLS for production connections

5. MCPClientManager (client_manager.py)

  • Create MCPClientManager class as main facade
  • Implement async def connect(server_name: str) -> MCPClient
  • Implement async def disconnect(server_name: str) -> None
  • Implement async def disconnect_all() -> None
  • Implement async def call_tool(server: str, tool: str, args: dict) -> ToolResult
  • Implement async def list_tools(server: str) -> list[ToolInfo]
  • Implement async def health_check() -> dict[str, ServerHealth]
  • Add dependency injection support for FastAPI
  • Create get_mcp_client() dependency function

6. Tool Call Routing (routing.py)

  • Create ToolRouter class
  • Implement tool name to server mapping
  • Add request/response serialization
  • Implement retry logic (3 attempts with backoff)
  • Add timeout per tool call (configurable)
  • Handle partial failures gracefully
  • Add circuit breaker pattern (5 failures = open for 30s)

7. Error Handling (exceptions.py)

  • Create MCPError base exception
  • Create MCPConnectionError for connection failures
  • Create MCPToolError for tool execution failures
  • Create MCPTimeoutError for timeouts
  • Create MCPServerNotFoundError for unknown servers
  • Add error context preservation (original exception chaining)

8. Logging & Observability

  • Add structured logging for all MCP operations
  • Log connection state changes
  • Log tool calls with timing (request/response)
  • Add correlation IDs for request tracing
  • Create metrics for tool call latency
  • Create metrics for connection pool status
  • Add health check endpoint at /api/v1/mcp/health

9. API Endpoints (api/routes/mcp.py)

  • Create GET /api/v1/mcp/servers - list registered servers
  • Create GET /api/v1/mcp/servers/{name}/tools - list server tools
  • Create GET /api/v1/mcp/health - health check all servers
  • Create POST /api/v1/mcp/call - direct tool call (admin only)
  • Add OpenAPI documentation for all endpoints

10. Testing

  • Create tests/services/mcp/test_registry.py
  • Create tests/services/mcp/test_connection.py
  • Create tests/services/mcp/test_client_manager.py
  • Create tests/services/mcp/test_routing.py
  • Create mock MCP server for integration tests
  • Add integration tests with mock server
  • Achieve >90% code coverage
  • Add E2E test for full tool call flow

11. Documentation

  • Create backend/docs/MCP_CLIENT.md with usage examples
  • Document configuration options
  • Add troubleshooting guide
  • Document error handling patterns

Technical Specifications

MCPClientManager Interface

class MCPClientManager:
    """Manages connections to all MCP servers."""
    
    async def connect(self, server_name: str) -> MCPClient
    async def disconnect(self, server_name: str) -> None
    async def disconnect_all(self) -> None
    async def call_tool(self, server: str, tool: str, args: dict) -> ToolResult
    async def list_tools(self, server: str) -> list[ToolInfo]
    async def health_check(self) -> dict[str, ServerHealth]

Configuration Schema

mcp_servers:
  llm-gateway:
    url: ${LLM_GATEWAY_URL:-http://llm-gateway:8001}
    transport: http
    timeout: 30
    retry_attempts: 3
    circuit_breaker_threshold: 5
  knowledge-base:
    url: ${KNOWLEDGE_BASE_URL:-http://knowledge-base:8002}
    transport: http
    timeout: 10
  git-ops:
    url: ${GIT_OPS_URL:-http://git-ops:8003}
    transport: http
    timeout: 60
  issues:
    url: ${ISSUES_URL:-http://issues:8004}
    transport: http
    timeout: 10

Acceptance Criteria

  • MCPClientManager can connect to MCP servers
  • Server registry loads from configuration
  • Connection auto-reconnects on failure
  • Tool calls route correctly to servers
  • Health check endpoint reports server status
  • Circuit breaker prevents cascading failures
  • Comprehensive unit tests (>90% coverage)
  • Integration tests with mock MCP servers
  • Logging for all MCP interactions
  • API endpoints documented in OpenAPI

Dependencies

  • No blockers (can start immediately)
  • Blocks: #56, #57, #58, #59 (All MCP servers)

Assignable To

backend-engineer agent

## Summary Create the core MCP client infrastructure that enables the backend to communicate with all MCP servers. This is the foundation for all Phase 2 MCP integrations. ## Sub-Tasks ### 1. Project Setup - [ ] Create `backend/app/services/mcp/` directory structure - [ ] Add MCP SDK dependency to `pyproject.toml` (`mcp>=1.0.0`) - [ ] Add `httpx` for HTTP transport - [ ] Create `__init__.py` with public exports ### 2. Configuration System - [ ] Create `backend/app/core/config/mcp_servers.yaml` schema - [ ] Add Pydantic model for MCP server config (`MCPServerConfig`) - [ ] Add config loading in `backend/app/core/config.py` - [ ] Support environment variable overrides for URLs - [ ] Add validation for required server configurations - [ ] Document configuration options in README ### 3. Server Registry (`registry.py`) - [ ] Create `MCPServerRegistry` class - [ ] Implement `register(name, config)` method - [ ] Implement `get(name)` method with KeyError handling - [ ] Implement `list_servers()` method - [ ] Implement `get_capabilities(name)` method (lazy-loaded) - [ ] Add thread-safe singleton pattern - [ ] Add registry initialization on app startup ### 4. Connection Management (`connection.py`) - [ ] Create `MCPConnection` class wrapping MCP client - [ ] Implement connection pooling (1 connection per server) - [ ] Add automatic reconnection with exponential backoff (1s, 2s, 4s, 8s, max 30s) - [ ] Implement connection timeout handling (configurable, default 10s) - [ ] Add graceful disconnect on shutdown - [ ] Implement connection health ping - [ ] Add connection state enum (CONNECTING, CONNECTED, DISCONNECTED, ERROR) - [ ] Handle SSL/TLS for production connections ### 5. MCPClientManager (`client_manager.py`) - [ ] Create `MCPClientManager` class as main facade - [ ] Implement `async def connect(server_name: str) -> MCPClient` - [ ] Implement `async def disconnect(server_name: str) -> None` - [ ] Implement `async def disconnect_all() -> None` - [ ] Implement `async def call_tool(server: str, tool: str, args: dict) -> ToolResult` - [ ] Implement `async def list_tools(server: str) -> list[ToolInfo]` - [ ] Implement `async def health_check() -> dict[str, ServerHealth]` - [ ] Add dependency injection support for FastAPI - [ ] Create `get_mcp_client()` dependency function ### 6. Tool Call Routing (`routing.py`) - [ ] Create `ToolRouter` class - [ ] Implement tool name to server mapping - [ ] Add request/response serialization - [ ] Implement retry logic (3 attempts with backoff) - [ ] Add timeout per tool call (configurable) - [ ] Handle partial failures gracefully - [ ] Add circuit breaker pattern (5 failures = open for 30s) ### 7. Error Handling (`exceptions.py`) - [ ] Create `MCPError` base exception - [ ] Create `MCPConnectionError` for connection failures - [ ] Create `MCPToolError` for tool execution failures - [ ] Create `MCPTimeoutError` for timeouts - [ ] Create `MCPServerNotFoundError` for unknown servers - [ ] Add error context preservation (original exception chaining) ### 8. Logging & Observability - [ ] Add structured logging for all MCP operations - [ ] Log connection state changes - [ ] Log tool calls with timing (request/response) - [ ] Add correlation IDs for request tracing - [ ] Create metrics for tool call latency - [ ] Create metrics for connection pool status - [ ] Add health check endpoint at `/api/v1/mcp/health` ### 9. API Endpoints (`api/routes/mcp.py`) - [ ] Create `GET /api/v1/mcp/servers` - list registered servers - [ ] Create `GET /api/v1/mcp/servers/{name}/tools` - list server tools - [ ] Create `GET /api/v1/mcp/health` - health check all servers - [ ] Create `POST /api/v1/mcp/call` - direct tool call (admin only) - [ ] Add OpenAPI documentation for all endpoints ### 10. Testing - [ ] Create `tests/services/mcp/test_registry.py` - [ ] Create `tests/services/mcp/test_connection.py` - [ ] Create `tests/services/mcp/test_client_manager.py` - [ ] Create `tests/services/mcp/test_routing.py` - [ ] Create mock MCP server for integration tests - [ ] Add integration tests with mock server - [ ] Achieve >90% code coverage - [ ] Add E2E test for full tool call flow ### 11. Documentation - [ ] Create `backend/docs/MCP_CLIENT.md` with usage examples - [ ] Document configuration options - [ ] Add troubleshooting guide - [ ] Document error handling patterns ## Technical Specifications ### MCPClientManager Interface ```python class MCPClientManager: """Manages connections to all MCP servers.""" async def connect(self, server_name: str) -> MCPClient async def disconnect(self, server_name: str) -> None async def disconnect_all(self) -> None async def call_tool(self, server: str, tool: str, args: dict) -> ToolResult async def list_tools(self, server: str) -> list[ToolInfo] async def health_check(self) -> dict[str, ServerHealth] ``` ### Configuration Schema ```yaml mcp_servers: llm-gateway: url: ${LLM_GATEWAY_URL:-http://llm-gateway:8001} transport: http timeout: 30 retry_attempts: 3 circuit_breaker_threshold: 5 knowledge-base: url: ${KNOWLEDGE_BASE_URL:-http://knowledge-base:8002} transport: http timeout: 10 git-ops: url: ${GIT_OPS_URL:-http://git-ops:8003} transport: http timeout: 60 issues: url: ${ISSUES_URL:-http://issues:8004} transport: http timeout: 10 ``` ## Acceptance Criteria - [ ] MCPClientManager can connect to MCP servers - [ ] Server registry loads from configuration - [ ] Connection auto-reconnects on failure - [ ] Tool calls route correctly to servers - [ ] Health check endpoint reports server status - [ ] Circuit breaker prevents cascading failures - [ ] Comprehensive unit tests (>90% coverage) - [ ] Integration tests with mock MCP servers - [ ] Logging for all MCP interactions - [ ] API endpoints documented in OpenAPI ## Dependencies - No blockers (can start immediately) - Blocks: #56, #57, #58, #59 (All MCP servers) ## Assignable To backend-engineer agent
cardosofelipe added the backendmcpphase-2 labels 2026-01-03 01:24:54 +00:00
Sign in to join this conversation.