Files

Felipe Cardoso 4154dd5268 feat: enhance database transactions, add Makefiles, and improve Docker setup

- Refactored database batch operations to ensure transaction atomicity and simplify nested structure.
- Added `Makefile` for `knowledge-base` and `llm-gateway` modules to streamline development workflows.
- Simplified `Dockerfile` for `llm-gateway` by removing multi-stage builds and optimizing dependencies.
- Improved code readability in `collection_manager` and `failover` modules with refined logic.
- Minor fixes in `test_server` and Redis health check handling for better diagnostics.

2026-01-05 00:49:19 +01:00

chunking

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

tests

feat: enhance database transactions, add Makefiles, and improve Docker setup

2026-01-05 00:49:19 +01:00

collection_manager.py

feat: enhance database transactions, add Makefiles, and improve Docker setup

2026-01-05 00:49:19 +01:00

config.py

fix(mcp-kb): address critical issues from deep review

2026-01-04 01:03:58 +01:00

database.py

feat: enhance database transactions, add Makefiles, and improve Docker setup

2026-01-05 00:49:19 +01:00

Dockerfile

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

embeddings.py

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

exceptions.py

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

Makefile

feat: enhance database transactions, add Makefiles, and improve Docker setup

2026-01-05 00:49:19 +01:00

models.py

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

pyproject.toml

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

README.md

docs(mcp): add comprehensive MCP server documentation

2026-01-04 01:37:04 +01:00

search.py

fix(mcp-kb): add input validation, path security, and health checks

2026-01-04 01:18:50 +01:00

server.py

feat: enhance database transactions, add Makefiles, and improve Docker setup

2026-01-05 00:49:19 +01:00

uv.lock

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

README.md

Knowledge Base MCP Server

RAG capabilities with pgvector for semantic search, intelligent chunking, and collection management.

Features

Semantic Search: pgvector cosine similarity with HNSW indexing
Keyword Search: PostgreSQL full-text search
Hybrid Search: Reciprocal Rank Fusion combining both
Intelligent Chunking: Code-aware, markdown-aware, and text chunking
Collection Management: Per-project knowledge organization
Embedding Caching: Redis deduplication for efficiency

Quick Start

# Install dependencies
uv sync

# Run tests
IS_TEST=True uv run pytest -v

# Start server
uv run python server.py

Configuration

Environment variables:

KB_HOST=0.0.0.0
KB_PORT=8002
KB_DEBUG=false
KB_DATABASE_URL=postgresql://user:pass@localhost:5432/syndarix
KB_REDIS_URL=redis://localhost:6379/2
KB_LLM_GATEWAY_URL=http://localhost:8001

MCP Tools

search_knowledge

Search the knowledge base.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "query": "authentication flow",
  "search_type": "hybrid",
  "collection": "code",
  "limit": 10,
  "threshold": 0.7,
  "file_types": ["python", "typescript"]
}

ingest_content

Add content to the knowledge base.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "content": "def authenticate(user): ...",
  "source_path": "/src/auth.py",
  "collection": "code",
  "chunk_type": "code",
  "file_type": "python"
}

delete_content

Remove content from the knowledge base.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "source_path": "/src/old_file.py"
}

list_collections

List all collections in a project.

{
  "project_id": "proj-123",
  "agent_id": "agent-456"
}

get_collection_stats

Get detailed collection statistics.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "collection": "code"
}

update_document

Atomically replace document content.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "source_path": "/src/auth.py",
  "content": "def authenticate_v2(user): ...",
  "collection": "code",
  "chunk_type": "code",
  "file_type": "python"
}

Chunking Strategies

Code Chunking

Python: AST-based (functions, classes, methods)
JavaScript/TypeScript: Tree-sitter based
Go/Rust: Tree-sitter based
Target: ~500 tokens, 50 token overlap

Markdown Chunking

Heading-hierarchy aware
Preserves code blocks
Target: ~800 tokens, 100 token overlap

Text Chunking

Sentence-based splitting
Target: ~400 tokens, 50 token overlap

Search Types

Semantic Search

Uses pgvector cosine similarity with HNSW indexing for fast approximate nearest neighbor search.

Keyword Search

Uses PostgreSQL full-text search with ts_rank scoring.

Hybrid Search

Combines semantic and keyword results using Reciprocal Rank Fusion (RRF):

Default weights: 70% semantic, 30% keyword
Configurable via settings

Security

Input validation for all IDs and paths
Path traversal prevention
Content size limits (default 10MB)
Per-project data isolation

Testing

# Full test suite with coverage
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing

# Specific test file
IS_TEST=True uv run pytest tests/test_server.py -v

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check with dependency status
`/mcp/tools`	GET	List available tools
`/mcp`	POST	JSON-RPC 2.0 tool execution