docs(mcp): add comprehensive MCP server documentation

- Add docs/architecture/MCP_SERVERS.md with full architecture overview - Add README.md for LLM Gateway with quick start, tools, and model groups - Add README.md for Knowledge Base with search types, chunking strategies - Include API endpoints, security guidelines, and testing instructions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 01:37:04 +01:00
parent 95342cc94d
commit 2ab69f8561
3 changed files with 499 additions and 0 deletions
--- a/docs/architecture/MCP_SERVERS.md
+++ b/docs/architecture/MCP_SERVERS.md
@@ -0,0 +1,192 @@
 # MCP Servers Architecture
 This document describes the Model Context Protocol (MCP) server architecture in Syndarix.
 ## Overview
 Syndarix uses MCP servers to provide specialized capabilities to AI agents. Each MCP server exposes tools via JSON-RPC 2.0 that agents can invoke through the MCPClientManager.
 ## Architecture Diagram
 ```
 ┌─────────────────────────────────────────────────────────────────────┐
 │                         Backend (FastAPI)                            │
 │  ┌─────────────────────────────────────────────────────────────┐    │
 │  │                    MCPClientManager                          │    │
 │  │  - Connection pooling    - Health checks    - Tool routing   │    │
 │  └──────────────────────────┬──────────────────────────────────┘    │
 └─────────────────────────────┼───────────────────────────────────────┘
                              │ HTTP/JSON-RPC 2.0
          ┌───────────────────┼───────────────────┐
          │                   │                   │
          ▼                   ▼                   ▼
 ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
 │   LLM Gateway   │ │ Knowledge Base  │ │   Future MCP    │
 │   Port 8001     │ │   Port 8002     │ │   Servers       │
 │                 │ │                 │ │                 │
 │ - chat_complete │ │ - search        │ │ - git_ops       │
 │ - count_tokens  │ │ - ingest        │ │ - issues        │
 │ - list_models   │ │ - delete        │ │ - etc.          │
 │ - get_usage     │ │ - update        │ │                 │
 └────────┬────────┘ └────────┬────────┘ └─────────────────┘
         │                   │
         ▼                   ▼
 ┌─────────────────┐ ┌─────────────────┐
 │    LiteLLM      │ │    PostgreSQL   │
 │  (Anthropic,    │ │   + pgvector    │
 │   OpenAI, etc.) │ │                 │
 └─────────────────┘ └─────────────────┘
 ```
 ## MCP Servers
 ### 1. LLM Gateway (`mcp-servers/llm-gateway/`)
 **Purpose**: Unified access to multiple LLM providers with failover, streaming, and cost tracking.
 **Port**: 8001
 **Tools**:
 | Tool | Description |
 |------|-------------|
 | `chat_completion` | Generate completions with automatic failover |
 | `count_tokens` | Count tokens in text using tiktoken |
 | `list_models` | List available models by group |
 | `get_usage` | Get token/cost usage statistics |
 **Model Groups**:
 - `reasoning`: Claude Opus 4.5 → GPT-4.1 → Gemini 2.5 Pro
 - `code`: Claude Sonnet 4 → Codex → DeepSeek Coder
 - `fast`: Claude Haiku → GPT-4.1 Mini → Gemini Flash
 - `vision`: Claude Opus 4.5 → GPT-4.1 Vision
 - `embedding`: text-embedding-3-large → voyage-3
 **Features**:
 - Circuit breaker for provider failures (5 failures → 30s cooldown)
 - Redis-based cost tracking per project/agent
 - Streaming support via SSE
 - Automatic failover chain
 ### 2. Knowledge Base (`mcp-servers/knowledge-base/`)
 **Purpose**: RAG capabilities with pgvector for semantic search, intelligent chunking, and collection management.
 **Port**: 8002
 **Tools**:
 | Tool | Description |
 |------|-------------|
 | `search_knowledge` | Semantic, keyword, or hybrid search |
 | `ingest_content` | Add content with automatic chunking |
 | `delete_content` | Remove by source, collection, or IDs |
 | `list_collections` | List collections in a project |
 | `get_collection_stats` | Get collection statistics |
 | `update_document` | Atomically replace document content |
 **Chunking Strategies**:
 - **Code**: AST-aware for Python, tree-sitter for JS/TS/Go/Rust
 - **Markdown**: Heading-hierarchy aware, preserves structure
 - **Text**: Sentence-based with configurable overlap
 **Search Types**:
 - **Semantic**: pgvector cosine similarity (HNSW index)
 - **Keyword**: PostgreSQL full-text search (ts_rank)
 - **Hybrid**: Reciprocal Rank Fusion (RRF) combining both
 **Features**:
 - Redis caching for embedding deduplication
 - 1536-dimension embeddings via LLM Gateway
 - Atomic document updates (delete + insert in transaction)
 - Per-project collection isolation
 ## Communication Protocol
 All MCP servers use JSON-RPC 2.0 over HTTP:
 ### Tool Discovery
 ```
 GET /mcp/tools
 Response: { "tools": [{ "name": "...", "description": "...", "inputSchema": {...} }] }
 ```
 ### Tool Execution
 ```
 POST /mcp
 Request: {
  "jsonrpc": "2.0",
  "method": "tool_name",
  "params": { "project_id": "...", "agent_id": "...", ... },
  "id": 1
 }
 Response: {
  "jsonrpc": "2.0",
  "result": { "success": true, ... },
  "id": 1
 }
 ```
 ### Health Check
 ```
 GET /health
 Response: { "status": "healthy", "dependencies": {...} }
 ```
 ## Configuration
 ### Environment Variables
 **LLM Gateway**:
 ```bash
 LLM_GATEWAY_HOST=0.0.0.0
 LLM_GATEWAY_PORT=8001
 LLM_GATEWAY_REDIS_URL=redis://redis:6379/1
 ANTHROPIC_API_KEY=...
 OPENAI_API_KEY=...
 ```
 **Knowledge Base**:
 ```bash
 KB_HOST=0.0.0.0
 KB_PORT=8002
 KB_DATABASE_URL=postgresql://...
 KB_REDIS_URL=redis://redis:6379/2
 KB_LLM_GATEWAY_URL=http://llm-gateway:8001
 ```
 ## Security
 ### Input Validation
 - `project_id`, `agent_id`: Alphanumeric + hyphens/underscores (1-128 chars)
 - `collection`: Alphanumeric + hyphens/underscores (1-64 chars)
 - `source_path`: No path traversal (`..`), no null bytes, max 4096 chars
 - `content`: Max size limit (configurable, default 10MB)
 ### Error Codes
 | Code | Meaning |
 |------|---------|
 | `INVALID_REQUEST` | Input validation failed |
 | `NOT_FOUND` | Resource not found |
 | `INTERNAL_ERROR` | Unexpected server error |
 | `EMBEDDING_ERROR` | Embedding generation failed |
 | `SEARCH_ERROR` | Search operation failed |
 ## Testing
 ```bash
 # Run LLM Gateway tests
 cd mcp-servers/llm-gateway
 IS_TEST=True uv run pytest -v --cov=.
 # Run Knowledge Base tests
 cd mcp-servers/knowledge-base
 IS_TEST=True uv run pytest -v --cov=.
 ```
 ## Adding New MCP Servers
 1. Create directory under `mcp-servers/<name>/`
 2. Use FastMCP for tool registration
 3. Implement `/health`, `/mcp/tools`, `/mcp` endpoints
 4. Add Docker configuration
 5. Register in MCPClientManager config
 6. Add tests (>90% coverage target)
--- a/mcp-servers/knowledge-base/README.md
+++ b/mcp-servers/knowledge-base/README.md
@@ -0,0 +1,178 @@
 # Knowledge Base MCP Server
 RAG capabilities with pgvector for semantic search, intelligent chunking, and collection management.
 ## Features
 - **Semantic Search**: pgvector cosine similarity with HNSW indexing
 - **Keyword Search**: PostgreSQL full-text search
 - **Hybrid Search**: Reciprocal Rank Fusion combining both
 - **Intelligent Chunking**: Code-aware, markdown-aware, and text chunking
 - **Collection Management**: Per-project knowledge organization
 - **Embedding Caching**: Redis deduplication for efficiency
 ## Quick Start
 ```bash
 # Install dependencies
 uv sync
 # Run tests
 IS_TEST=True uv run pytest -v
 # Start server
 uv run python server.py
 ```
 ## Configuration
 Environment variables:
 ```bash
 KB_HOST=0.0.0.0
 KB_PORT=8002
 KB_DEBUG=false
 KB_DATABASE_URL=postgresql://user:pass@localhost:5432/syndarix
 KB_REDIS_URL=redis://localhost:6379/2
 KB_LLM_GATEWAY_URL=http://localhost:8001
 ```
 ## MCP Tools
 ### search_knowledge
 Search the knowledge base.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "query": "authentication flow",
  "search_type": "hybrid",
  "collection": "code",
  "limit": 10,
  "threshold": 0.7,
  "file_types": ["python", "typescript"]
 }
 ```
 ### ingest_content
 Add content to the knowledge base.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "content": "def authenticate(user): ...",
  "source_path": "/src/auth.py",
  "collection": "code",
  "chunk_type": "code",
  "file_type": "python"
 }
 ```
 ### delete_content
 Remove content from the knowledge base.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "source_path": "/src/old_file.py"
 }
 ```
 ### list_collections
 List all collections in a project.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456"
 }
 ```
 ### get_collection_stats
 Get detailed collection statistics.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "collection": "code"
 }
 ```
 ### update_document
 Atomically replace document content.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "source_path": "/src/auth.py",
  "content": "def authenticate_v2(user): ...",
  "collection": "code",
  "chunk_type": "code",
  "file_type": "python"
 }
 ```
 ## Chunking Strategies
 ### Code Chunking
 - **Python**: AST-based (functions, classes, methods)
 - **JavaScript/TypeScript**: Tree-sitter based
 - **Go/Rust**: Tree-sitter based
 - Target: ~500 tokens, 50 token overlap
 ### Markdown Chunking
 - Heading-hierarchy aware
 - Preserves code blocks
 - Target: ~800 tokens, 100 token overlap
 ### Text Chunking
 - Sentence-based splitting
 - Target: ~400 tokens, 50 token overlap
 ## Search Types
 ### Semantic Search
 Uses pgvector cosine similarity with HNSW indexing for fast approximate nearest neighbor search.
 ### Keyword Search
 Uses PostgreSQL full-text search with ts_rank scoring.
 ### Hybrid Search
 Combines semantic and keyword results using Reciprocal Rank Fusion (RRF):
 - Default weights: 70% semantic, 30% keyword
 - Configurable via settings
 ## Security
 - Input validation for all IDs and paths
 - Path traversal prevention
 - Content size limits (default 10MB)
 - Per-project data isolation
 ## Testing
 ```bash
 # Full test suite with coverage
 IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing
 # Specific test file
 IS_TEST=True uv run pytest tests/test_server.py -v
 ```
 ## API Endpoints
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/health` | GET | Health check with dependency status |
 | `/mcp/tools` | GET | List available tools |
 | `/mcp` | POST | JSON-RPC 2.0 tool execution |
--- a/mcp-servers/llm-gateway/README.md
+++ b/mcp-servers/llm-gateway/README.md
@@ -0,0 +1,129 @@
 # LLM Gateway MCP Server
 Unified LLM access with failover chains, cost tracking, and streaming support.
 ## Features
 - **Multi-Provider Support**: Anthropic, OpenAI, Google, DeepSeek
 - **Automatic Failover**: Circuit breaker with configurable thresholds
 - **Cost Tracking**: Redis-based per-project/agent usage tracking
 - **Streaming**: SSE support for real-time token delivery
 - **Model Groups**: Pre-configured chains for different use cases
 ## Quick Start
 ```bash
 # Install dependencies
 uv sync
 # Run tests
 IS_TEST=True uv run pytest -v
 # Start server
 uv run python server.py
 ```
 ## Configuration
 Environment variables:
 ```bash
 LLM_GATEWAY_HOST=0.0.0.0
 LLM_GATEWAY_PORT=8001
 LLM_GATEWAY_DEBUG=false
 LLM_GATEWAY_REDIS_URL=redis://localhost:6379/1
 # Provider API keys
 ANTHROPIC_API_KEY=sk-ant-...
 OPENAI_API_KEY=sk-...
 GOOGLE_API_KEY=...
 DEEPSEEK_API_KEY=...
 ```
 ## MCP Tools
 ### chat_completion
 Generate completions with automatic failover.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "messages": [{"role": "user", "content": "Hello"}],
  "model_group": "reasoning",
  "max_tokens": 4096,
  "temperature": 0.7,
  "stream": false
 }
 ```
 ### count_tokens
 Count tokens in text using tiktoken.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "text": "Hello, world!",
  "model": "gpt-4"
 }
 ```
 ### list_models
 List available models by group.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "model_group": "code"
 }
 ```
 ### get_usage
 Get usage statistics.
 ```json
 {
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "period": "day"
 }
 ```
 ## Model Groups
 | Group | Primary | Fallback 1 | Fallback 2 |
 |-------|---------|------------|------------|
 | reasoning | claude-opus-4-5 | gpt-4.1 | gemini-2.5-pro |
 | code | claude-sonnet-4 | gpt-4.1 | deepseek-coder |
 | fast | claude-haiku | gpt-4.1-mini | gemini-flash |
 | vision | claude-sonnet-4 | gpt-4.1 | gemini-2.5-pro |
 | embedding | text-embedding-3-large | voyage-3 | - |
 ## Circuit Breaker
 - **Threshold**: 5 consecutive failures
 - **Cooldown**: 30 seconds
 - **Half-Open**: After cooldown, allows one test request
 ## Testing
 ```bash
 # Full test suite with coverage
 IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing
 # Specific test file
 IS_TEST=True uv run pytest tests/test_server.py -v
 ```
 ## API Endpoints
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/health` | GET | Health check |
 | `/mcp/tools` | GET | List available tools |
 | `/mcp` | POST | JSON-RPC 2.0 tool execution |