# Knowledge Base MCP Server RAG capabilities with pgvector for semantic search, intelligent chunking, and collection management. ## Features - **Semantic Search**: pgvector cosine similarity with HNSW indexing - **Keyword Search**: PostgreSQL full-text search - **Hybrid Search**: Reciprocal Rank Fusion combining both - **Intelligent Chunking**: Code-aware, markdown-aware, and text chunking - **Collection Management**: Per-project knowledge organization - **Embedding Caching**: Redis deduplication for efficiency ## Quick Start ```bash # Install dependencies uv sync # Run tests IS_TEST=True uv run pytest -v # Start server uv run python server.py ``` ## Configuration Environment variables: ```bash KB_HOST=0.0.0.0 KB_PORT=8002 KB_DEBUG=false KB_DATABASE_URL=postgresql://user:pass@localhost:5432/syndarix KB_REDIS_URL=redis://localhost:6379/2 KB_LLM_GATEWAY_URL=http://localhost:8001 ``` ## MCP Tools ### search_knowledge Search the knowledge base. ```json { "project_id": "proj-123", "agent_id": "agent-456", "query": "authentication flow", "search_type": "hybrid", "collection": "code", "limit": 10, "threshold": 0.7, "file_types": ["python", "typescript"] } ``` ### ingest_content Add content to the knowledge base. ```json { "project_id": "proj-123", "agent_id": "agent-456", "content": "def authenticate(user): ...", "source_path": "/src/auth.py", "collection": "code", "chunk_type": "code", "file_type": "python" } ``` ### delete_content Remove content from the knowledge base. ```json { "project_id": "proj-123", "agent_id": "agent-456", "source_path": "/src/old_file.py" } ``` ### list_collections List all collections in a project. ```json { "project_id": "proj-123", "agent_id": "agent-456" } ``` ### get_collection_stats Get detailed collection statistics. ```json { "project_id": "proj-123", "agent_id": "agent-456", "collection": "code" } ``` ### update_document Atomically replace document content. ```json { "project_id": "proj-123", "agent_id": "agent-456", "source_path": "/src/auth.py", "content": "def authenticate_v2(user): ...", "collection": "code", "chunk_type": "code", "file_type": "python" } ``` ## Chunking Strategies ### Code Chunking - **Python**: AST-based (functions, classes, methods) - **JavaScript/TypeScript**: Tree-sitter based - **Go/Rust**: Tree-sitter based - Target: ~500 tokens, 50 token overlap ### Markdown Chunking - Heading-hierarchy aware - Preserves code blocks - Target: ~800 tokens, 100 token overlap ### Text Chunking - Sentence-based splitting - Target: ~400 tokens, 50 token overlap ## Search Types ### Semantic Search Uses pgvector cosine similarity with HNSW indexing for fast approximate nearest neighbor search. ### Keyword Search Uses PostgreSQL full-text search with ts_rank scoring. ### Hybrid Search Combines semantic and keyword results using Reciprocal Rank Fusion (RRF): - Default weights: 70% semantic, 30% keyword - Configurable via settings ## Security - Input validation for all IDs and paths - Path traversal prevention - Content size limits (default 10MB) - Per-project data isolation ## Testing ```bash # Full test suite with coverage IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing # Specific test file IS_TEST=True uv run pytest tests/test_server.py -v ``` ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check with dependency status | | `/mcp/tools` | GET | List available tools | | `/mcp` | POST | JSON-RPC 2.0 tool execution |