Template

Files

Felipe Cardoso 51404216ae refactor(knowledge-base mcp server): adjust formatting for consistency and readability

Improved code formatting, line breaks, and indentation across chunking logic and multiple test modules to enhance code clarity and maintain consistent style. No functional changes made.

2026-01-06 17:20:31 +01:00

chunking

refactor(knowledge-base mcp server): adjust formatting for consistency and readability

2026-01-06 17:20:31 +01:00

tests

refactor(knowledge-base mcp server): adjust formatting for consistency and readability

2026-01-06 17:20:31 +01:00

collection_manager.py

refactor(knowledge-base mcp server): adjust formatting for consistency and readability

2026-01-06 17:20:31 +01:00

config.py

fix(mcp-kb): address critical issues from deep review

2026-01-04 01:03:58 +01:00

database.py

fix(knowledge-base): ensure pgvector extension before pool creation

2026-01-06 02:55:02 +01:00

Dockerfile

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

embeddings.py

refactor(knowledge-base mcp server): adjust formatting for consistency and readability

2026-01-06 17:20:31 +01:00

exceptions.py

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

Makefile

feat: enhance database transactions, add Makefiles, and improve Docker setup

2026-01-05 00:49:19 +01:00

models.py

refactor(knowledge-base mcp server): adjust formatting for consistency and readability

2026-01-06 17:20:31 +01:00

pyproject.toml

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

README.md

docs(mcp): add comprehensive MCP server documentation

2026-01-04 01:37:04 +01:00

search.py

refactor(knowledge-base mcp server): adjust formatting for consistency and readability

2026-01-06 17:20:31 +01:00

server.py

refactor(knowledge-base mcp server): adjust formatting for consistency and readability

2026-01-06 17:20:31 +01:00

uv.lock

feat(knowledge-base): implement Knowledge Base MCP Server (#57 )

2026-01-03 21:33:26 +01:00

README.md

Knowledge Base MCP Server

RAG capabilities with pgvector for semantic search, intelligent chunking, and collection management.

Features

Semantic Search: pgvector cosine similarity with HNSW indexing
Keyword Search: PostgreSQL full-text search
Hybrid Search: Reciprocal Rank Fusion combining both
Intelligent Chunking: Code-aware, markdown-aware, and text chunking
Collection Management: Per-project knowledge organization
Embedding Caching: Redis deduplication for efficiency

Quick Start

# Install dependencies
uv sync

# Run tests
IS_TEST=True uv run pytest -v

# Start server
uv run python server.py

Configuration

Environment variables:

KB_HOST=0.0.0.0
KB_PORT=8002
KB_DEBUG=false
KB_DATABASE_URL=postgresql://user:pass@localhost:5432/syndarix
KB_REDIS_URL=redis://localhost:6379/2
KB_LLM_GATEWAY_URL=http://localhost:8001

MCP Tools

search_knowledge

Search the knowledge base.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "query": "authentication flow",
  "search_type": "hybrid",
  "collection": "code",
  "limit": 10,
  "threshold": 0.7,
  "file_types": ["python", "typescript"]
}

ingest_content

Add content to the knowledge base.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "content": "def authenticate(user): ...",
  "source_path": "/src/auth.py",
  "collection": "code",
  "chunk_type": "code",
  "file_type": "python"
}

delete_content

Remove content from the knowledge base.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "source_path": "/src/old_file.py"
}

list_collections

List all collections in a project.

{
  "project_id": "proj-123",
  "agent_id": "agent-456"
}

get_collection_stats

Get detailed collection statistics.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "collection": "code"
}

update_document

Atomically replace document content.

{
  "project_id": "proj-123",
  "agent_id": "agent-456",
  "source_path": "/src/auth.py",
  "content": "def authenticate_v2(user): ...",
  "collection": "code",
  "chunk_type": "code",
  "file_type": "python"
}

Chunking Strategies

Code Chunking

Python: AST-based (functions, classes, methods)
JavaScript/TypeScript: Tree-sitter based
Go/Rust: Tree-sitter based
Target: ~500 tokens, 50 token overlap

Markdown Chunking

Heading-hierarchy aware
Preserves code blocks
Target: ~800 tokens, 100 token overlap

Text Chunking

Sentence-based splitting
Target: ~400 tokens, 50 token overlap

Search Types

Semantic Search

Uses pgvector cosine similarity with HNSW indexing for fast approximate nearest neighbor search.

Keyword Search

Uses PostgreSQL full-text search with ts_rank scoring.

Hybrid Search

Combines semantic and keyword results using Reciprocal Rank Fusion (RRF):

Default weights: 70% semantic, 30% keyword
Configurable via settings

Security

Input validation for all IDs and paths
Path traversal prevention
Content size limits (default 10MB)
Per-project data isolation

Testing

# Full test suite with coverage
IS_TEST=True uv run pytest -v --cov=. --cov-report=term-missing

# Specific test file
IS_TEST=True uv run pytest tests/test_server.py -v

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check with dependency status
`/mcp/tools`	GET	List available tools
`/mcp`	POST	JSON-RPC 2.0 tool execution