503 Commits

Author SHA1 Message Date
Felipe Cardoso
48ecb40f18 feat(memory): implement memory scoping with hierarchy and access control (#93)
Add scope management system for hierarchical memory access:
- ScopeManager with hierarchy: Global → Project → Agent Type → Agent Instance → Session
- ScopePolicy for access control (read, write, inherit permissions)
- ScopeResolver for resolving queries across scope hierarchies with inheritance
- ScopeFilter for filtering scopes by type, project, or agent
- Access control enforcement with parent scope visibility
- Deduplication support during resolution across scopes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:39:22 +01:00
Felipe Cardoso
b818f17418 feat(memory): add procedural memory implementation (Issue #92)
Implements procedural memory for learned skills and procedures:

Core functionality:
- ProceduralMemory class for procedure storage/retrieval
- record_procedure with duplicate detection and step merging
- find_matching for context-based procedure search
- record_outcome for success/failure tracking
- get_best_procedure for finding highest success rate
- update_steps for procedure refinement

Supporting modules:
- ProcedureMatcher: Keyword-based procedure matching
- MatchResult/MatchContext: Matching result types
- Success rate weighting in match scoring

Test coverage:
- 43 unit tests covering all modules
- matching.py: 97% coverage
- memory.py: 86% coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:31:32 +01:00
Felipe Cardoso
e946787a61 feat(memory): add semantic memory implementation (Issue #91)
Implements semantic memory with fact storage, retrieval, and verification:

Core functionality:
- SemanticMemory class for fact storage/retrieval
- Fact storage as subject-predicate-object triples
- Duplicate detection with reinforcement
- Semantic search with text-based fallback
- Entity-based retrieval
- Confidence scoring and decay
- Conflict resolution

Supporting modules:
- FactExtractor: Pattern-based fact extraction from episodes
- FactVerifier: Contradiction detection and reliability scoring

Test coverage:
- 47 unit tests covering all modules
- extraction.py: 99% coverage
- verification.py: 95% coverage
- memory.py: 78% coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:23:06 +01:00
Felipe Cardoso
3554efe66a feat(memory): add episodic memory implementation (Issue #90)
Implements the episodic memory service for storing and retrieving
agent task execution experiences. This enables learning from past
successes and failures.

Components:
- EpisodicMemory: Main service class combining recording and retrieval
- EpisodeRecorder: Handles episode creation, importance scoring
- EpisodeRetriever: Multiple retrieval strategies (recency, semantic,
  outcome, importance, task type)

Key features:
- Records task completions with context, actions, outcomes
- Calculates importance scores based on outcome, duration, lessons
- Semantic search with fallback to recency when embeddings unavailable
- Full CRUD operations with statistics and summarization
- Comprehensive unit tests (50 tests, all passing)

Closes #90

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:08:16 +01:00
Felipe Cardoso
bd988f76b0 fix(memory): address review findings from Issue #88
Fixes based on multi-agent review:

Model Improvements:
- Remove duplicate index ix_procedures_agent_type (already indexed via Column)
- Fix postgresql_where to use text() instead of string literal in Fact model
- Add thread-safety to Procedure.success_rate property (snapshot values)

Data Integrity Constraints:
- Add CheckConstraint for Episode: importance_score 0-1, duration >= 0, tokens >= 0
- Add CheckConstraint for Fact: confidence 0-1
- Add CheckConstraint for Procedure: success_count >= 0, failure_count >= 0

Migration Updates:
- Add check constraints creation in upgrade()
- Add check constraints removal in downgrade()

Note: SQLAlchemy Column default=list is correct (callable factory pattern)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:54:51 +01:00
Felipe Cardoso
4974233169 feat(memory): add working memory implementation (Issue #89)
Implements session-scoped ephemeral memory with:

Storage Backends:
- InMemoryStorage: Thread-safe fallback with TTL support and capacity limits
- RedisStorage: Primary storage with connection pooling and JSON serialization
- Auto-fallback from Redis to in-memory when unavailable

WorkingMemory Class:
- Key-value storage with TTL and reserved key protection
- Task state tracking with progress updates
- Scratchpad for reasoning steps with timestamps
- Checkpoint/snapshot support for recovery
- Factory methods for auto-configured storage

Tests:
- 55 unit tests covering all functionality
- Tests for basic ops, TTL, capacity, concurrency
- Tests for task state, scratchpad, checkpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:51:03 +01:00
Felipe Cardoso
c9d8c0835c feat(memory): add database schema and storage layer (Issue #88)
Add SQLAlchemy models for the Agent Memory System:
- WorkingMemory: Key-value storage with TTL for active sessions
- Episode: Experiential memories from task executions
- Fact: Semantic knowledge triples with confidence scores
- Procedure: Learned skills and procedures with success tracking
- MemoryConsolidationLog: Tracks consolidation jobs between memory tiers

Create enums for memory system:
- ScopeType: global, project, agent_type, agent_instance, session
- EpisodeOutcome: success, failure, partial
- ConsolidationType: working_to_episodic, episodic_to_semantic, etc.
- ConsolidationStatus: pending, running, completed, failed

Add Alembic migration (0005) for all memory tables with:
- Foreign key relationships to projects, agent_instances, agent_types
- Comprehensive indexes for query patterns
- Unique constraints for key lookups and triple uniqueness
- Vector embedding column placeholders (Text fallback until pgvector enabled)

Fix timezone-naive datetime.now() in types.py TaskState (review feedback)

Includes 30 unit tests for models and enums.

Closes #88

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:37:58 +01:00
Felipe Cardoso
085a748929 feat(memory): #87 project setup & core architecture
Implements Sub-Issue #87 of Issue #62 (Agent Memory System).

Core infrastructure:
- memory/types.py: Type definitions for all memory types (Working, Episodic,
  Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome
- memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton
- memory/exceptions.py: Comprehensive exception hierarchy for memory operations
- memory/manager.py: MemoryManager facade with placeholder methods

Directory structure:
- working/: Working memory (Redis/in-memory) - to be implemented in #89
- episodic/: Episodic memory (experiences) - to be implemented in #90
- semantic/: Semantic memory (facts) - to be implemented in #91
- procedural/: Procedural memory (skills) - to be implemented in #92
- scoping/: Scope management - to be implemented in #93
- indexing/: Vector indexing - to be implemented in #94
- consolidation/: Memory consolidation - to be implemented in #95

Tests: 71 unit tests for config, types, and exceptions
Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:27:36 +01:00
Felipe Cardoso
4b149b8a52 feat(tests): add unit tests for Context Management API routes
- Added detailed unit tests for `/context` endpoints, covering health checks, context assembly, token counting, budget retrieval, and cache invalidation.
- Included edge cases, error handling, and input validation for context-related operations.
- Improved test coverage for the Context Management module with mocked dependencies and integration scenarios.
2026-01-05 01:02:49 +01:00
Felipe Cardoso
ad0c06851d feat(tests): add comprehensive E2E tests for MCP and Agent workflows
- Introduced end-to-end tests for MCP workflows, including server discovery, authentication, context engine operations, error handling, and input validation.
- Added full lifecycle tests for agent workflows, covering type management, instance spawning, status transitions, and admin-only operations.
- Enhanced test coverage for real-world MCP and Agent scenarios across PostgreSQL and async environments.
2026-01-05 01:02:41 +01:00
Felipe Cardoso
49359b1416 feat(api): add Context Management API and routes
- Introduced a new `context` module and its endpoints for Context Management.
- Added `/context` route to the API router for assembling LLM context, token counting, budget management, and cache invalidation.
- Implemented health checks, context assembly, token counting, and caching operations in the Context Management Engine.
- Included schemas for request/response models and tightened error handling for context-related operations.
2026-01-05 01:02:33 +01:00
Felipe Cardoso
911d950c15 feat(tests): add comprehensive integration tests for MCP stack
- Introduced integration tests covering backend, LLM Gateway, Knowledge Base, and Context Engine.
- Includes health checks, tool listing, token counting, and end-to-end MCP flows.
- Added `RUN_INTEGRATION_TESTS` environment flag to enable selective test execution.
- Includes a quick health check script to verify service availability before running tests.
2026-01-05 01:02:22 +01:00
Felipe Cardoso
b2a3ac60e0 feat: add integration testing target to Makefile
- Introduced `test-integration` command for MCP integration tests.
- Expanded help section with details about running integration tests.
- Improved Makefile's testing capabilities for enhanced developer workflows.
2026-01-05 01:02:16 +01:00
Felipe Cardoso
dea092e1bb feat: extend Makefile with testing and validation commands, expand help section
- Added new targets for testing (`test`, `test-backend`, `test-mcp`, `test-frontend`, etc.) and validation (`validate`, `validate-all`).
- Enhanced help section to reflect updates, including detailed descriptions for testing, validation, and new MCP-specific commands.
- Improved developer workflow by centralizing testing and linting processes in the Makefile.
2026-01-05 01:02:09 +01:00
Felipe Cardoso
4154dd5268 feat: enhance database transactions, add Makefiles, and improve Docker setup
- Refactored database batch operations to ensure transaction atomicity and simplify nested structure.
- Added `Makefile` for `knowledge-base` and `llm-gateway` modules to streamline development workflows.
- Simplified `Dockerfile` for `llm-gateway` by removing multi-stage builds and optimizing dependencies.
- Improved code readability in `collection_manager` and `failover` modules with refined logic.
- Minor fixes in `test_server` and Redis health check handling for better diagnostics.
2026-01-05 00:49:19 +01:00
Felipe Cardoso
db12937495 feat: integrate MCP servers into Docker Compose files for development and deployment
- Added `mcp-llm-gateway` and `mcp-knowledge-base` services to `docker-compose.dev.yml`, `docker-compose.deploy.yml`, and `docker-compose.yml` for AI agent capabilities.
- Configured health checks, environment variables, and dependencies for MCP services.
- Included updated resource limits and deployment settings for production environments.
- Connected backend and agent services to the MCP servers.
2026-01-05 00:49:10 +01:00
Felipe Cardoso
81e1456631 test(activity): fix flaky test by generating fresh events for today group
- Resolves timezone and day boundary issues by creating fresh "today" events in the test case.
2026-01-05 00:30:36 +01:00
Felipe Cardoso
58e78d8700 docs(workflow): add pre-commit hooks documentation
Document the pre-commit hook setup, behavior, and rationale for
protecting only main/dev branches while allowing flexibility on
feature branches.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 19:49:45 +01:00
Felipe Cardoso
5e80139afa chore: add pre-commit hook for protected branch validation
Adds a git hook that:
- Blocks commits to main/dev if validation fails
- Runs `make validate` for backend changes
- Runs `npm run validate` for frontend changes
- Skips validation for feature branches (can run manually)

To enable: git config core.hooksPath .githooks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 19:42:53 +01:00
Felipe Cardoso
60ebeaa582 test(safety): add comprehensive tests for safety framework modules
Add tests to improve backend coverage from 85% to 93%:

- test_audit.py: 60 tests for AuditLogger (20% -> 99%)
  - Hash chain integrity, sanitization, retention, handlers
  - Fixed bug: hash chain modification after event creation
  - Fixed bug: verification not using correct prev_hash

- test_hitl.py: Tests for HITL manager (0% -> 100%)
- test_permissions.py: Tests for permissions manager (0% -> 99%)
- test_rollback.py: Tests for rollback manager (0% -> 100%)
- test_metrics.py: Tests for metrics collector (0% -> 100%)
- test_mcp_integration.py: Tests for MCP safety wrapper (0% -> 100%)
- test_validation.py: Additional cache and edge case tests (76% -> 100%)
- test_scoring.py: Lock cleanup and edge case tests (78% -> 91%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 19:41:54 +01:00
Felipe Cardoso
758052dcff feat(context): improve budget validation and XML safety in ranking and Claude adapter
- Added stricter budget validation in ContextRanker with explicit error handling for invalid configurations.
- Introduced `_get_valid_token_count()` helper to validate and safeguard token counts.
- Enhanced XML escaping in Claude adapter to prevent injection risks from scores and unhandled content.
2026-01-04 16:02:18 +01:00
Felipe Cardoso
1628eacf2b feat(context): enhance timeout handling, tenant isolation, and budget management
- Added timeout enforcement for token counting, scoring, and compression with detailed error handling.
- Introduced tenant isolation in context caching using project and agent identifiers.
- Enhanced budget management with stricter checks for critical context overspending and buffer limitations.
- Optimized per-context locking with cleanup to prevent memory leaks in concurrent environments.
- Updated default assembly timeout settings for improved performance and reliability.
- Improved XML escaping in Claude adapter for safety against injection attacks.
- Standardized token estimation using model-specific ratios.
2026-01-04 15:52:50 +01:00
Felipe Cardoso
2bea057fb1 chore(context): refactor for consistency, optimize formatting, and simplify logic
- Cleaned up unnecessary comments in `__all__` definitions for better readability.
- Adjusted indentation and formatting across modules for improved clarity (e.g., long lines, logical grouping).
- Simplified conditional expressions and inline comments for context scoring and ranking.
- Replaced some hard-coded values with type-safe annotations (e.g., `ClassVar`).
- Removed unused imports and ensured consistent usage across test files.
- Updated `test_score_not_cached_on_context` to clarify caching behavior.
- Improved truncation strategy logic and marker handling.
2026-01-04 15:23:14 +01:00
Felipe Cardoso
9e54f16e56 test(context): add edge case tests for truncation and scoring concurrency
- Add tests for truncation edge cases, including zero tokens, short content, and marker handling.
- Add concurrency tests for scoring to verify per-context locking and handling of multiple contexts.
2026-01-04 12:38:04 +01:00
Felipe Cardoso
96e6400bd8 feat(context): enhance performance, caching, and settings management
- Replace hard-coded limits with configurable settings (e.g., cache memory size, truncation strategy, relevance settings).
- Optimize parallel execution in token counting, scoring, and reranking for source diversity.
- Improve caching logic:
  - Add per-context locks for safe parallel scoring.
  - Reuse precomputed fingerprints for cache efficiency.
- Make truncation, scoring, and ranker behaviors fully configurable via settings.
- Add support for middle truncation, context hash-based hashing, and dynamic token limiting.
- Refactor methods for scalability and better error handling.

Tests: Updated all affected components with additional test cases.
2026-01-04 12:37:58 +01:00
Felipe Cardoso
6c7b72f130 chore(context): apply linter fixes and sort imports (#86)
Phase 8 of Context Management Engine - Final Cleanup:

- Sort __all__ exports alphabetically
- Sort imports per isort conventions
- Fix minor linting issues

Final test results:
- 311 context management tests passing
- 2507 total backend tests passing
- 85% code coverage

Context Management Engine is complete with all 8 phases:
1. Foundation: Types, Config, Exceptions
2. Token Budget Management
3. Context Scoring & Ranking
4. Context Assembly Pipeline
5. Model Adapters (Claude, OpenAI)
6. Caching Layer (Redis + in-memory)
7. Main Engine & Integration
8. Testing & Documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:46:56 +01:00
Felipe Cardoso
027ebfc332 feat(context): implement main ContextEngine with full integration (#85)
Phase 7 of Context Management Engine - Main Engine:

- Add ContextEngine as main orchestration class
- Integrate all components: calculator, scorer, ranker, compressor, cache
- Add high-level assemble_context() API with:
  - System prompt support
  - Task description support
  - Knowledge Base integration via MCP
  - Conversation history conversion
  - Tool results conversion
  - Custom contexts support
- Add helper methods:
  - get_budget_for_model()
  - count_tokens() with caching
  - invalidate_cache()
  - get_stats()
- Add create_context_engine() factory function

Tests: 26 new tests, 311 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:44:40 +01:00
Felipe Cardoso
c2466ab401 feat(context): implement Redis-based caching layer (#84)
Phase 6 of Context Management Engine - Caching Layer:

- Add ContextCache with Redis integration
- Support fingerprint-based assembled context caching
- Support token count caching (model-specific)
- Support score caching (scorer + context + query)
- Add in-memory fallback with LRU eviction
- Add cache invalidation with pattern matching
- Add cache statistics reporting

Key features:
- Hierarchical cache key structure (ctx:type:hash)
- Automatic TTL expiration
- Memory cache for fast repeated access
- Graceful degradation when Redis unavailable

Tests: 29 new tests, 285 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:41:21 +01:00
Felipe Cardoso
7828d35e06 feat(context): implement model adapters for Claude and OpenAI (#83)
Phase 5 of Context Management Engine - Model Adapters:

- Add ModelAdapter abstract base class with model matching
- Add DefaultAdapter for unknown models (plain text)
- Add ClaudeAdapter with XML-based formatting:
  - <system_instructions> for system context
  - <reference_documents>/<document> for knowledge
  - <conversation_history>/<message> for chat
  - <tool_results>/<tool_result> for tool outputs
  - XML escaping for special characters
- Add OpenAIAdapter with markdown formatting:
  - ## headers for sections
  - ### Source headers for documents
  - **ROLE** bold labels for conversation
  - Code blocks for tool outputs
- Add get_adapter() factory function for model selection

Tests: 33 new tests, 256 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:36:32 +01:00
Felipe Cardoso
6b07e62f00 feat(context): implement assembly pipeline and compression (#82)
Phase 4 of Context Management Engine - Assembly Pipeline:

- Add TruncationStrategy with end/middle/sentence-aware truncation
- Add TruncationResult dataclass for tracking compression metrics
- Add ContextCompressor for type-specific compression
- Add ContextPipeline orchestrating full assembly workflow:
  - Token counting for all contexts
  - Scoring and ranking via ContextRanker
  - Optional compression when budget threshold exceeded
  - Model-specific formatting (XML for Claude, markdown for OpenAI)
- Add PipelineMetrics for performance tracking
- Update AssembledContext with new fields (model, contexts, metadata)
- Add backward compatibility aliases for renamed fields

Tests: 34 new tests, 223 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:32:25 +01:00
Felipe Cardoso
0d2005ddcb feat(context): implement context scoring and ranking (Phase 3)
Add comprehensive scoring system with three strategies:
- RelevanceScorer: Semantic similarity with keyword fallback
- RecencyScorer: Exponential decay with type-specific half-lives
- PriorityScorer: Priority-based scoring with type bonuses

Implement CompositeScorer combining all strategies with configurable
weights (default: 50% relevance, 30% recency, 20% priority).

Add ContextRanker for budget-aware context selection with:
- Greedy selection algorithm respecting token budgets
- CRITICAL priority contexts always included
- Diversity reranking to prevent source dominance
- Comprehensive selection statistics

68 tests covering all scoring and ranking functionality.

Part of #61 - Context Management Engine

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:24:06 +01:00
Felipe Cardoso
dfa75e682e feat(context): implement token budget management (Phase 2)
Add TokenCalculator with LLM Gateway integration for accurate token
counting with in-memory caching and fallback character-based estimation.
Implement TokenBudget for tracking allocations per context type with
budget enforcement, and BudgetAllocator for creating budgets based on
model context window sizes.

- TokenCalculator: MCP integration, caching, model-specific ratios
- TokenBudget: allocation tracking, can_fit/allocate/deallocate/reset
- BudgetAllocator: model context sizes, budget creation and adjustment
- 35 comprehensive tests covering all budget functionality

Part of #61 - Context Management Engine

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:13:23 +01:00
Felipe Cardoso
22ecb5e989 feat(context): Phase 1 - Foundation types, config and exceptions (#79)
Implements the foundation for Context Management Engine:

Types (backend/app/services/context/types/):
- BaseContext: Abstract base with ID, content, priority, scoring
- SystemContext: System prompts, personas, instructions
- KnowledgeContext: RAG results from Knowledge Base MCP
- ConversationContext: Chat history with role support
- TaskContext: Task/issue context with acceptance criteria
- ToolContext: Tool definitions and execution results
- AssembledContext: Final assembled context result

Configuration (config.py):
- Token budget allocation (system 5%, task 10%, knowledge 40%, etc.)
- Scoring weights (relevance 50%, recency 30%, priority 20%)
- Cache settings (TTL, prefix)
- Performance settings (max assembly time, parallel scoring)
- Environment variable overrides with CTX_ prefix

Exceptions (exceptions.py):
- ContextError: Base exception
- BudgetExceededError: Token budget violations
- TokenCountError: Token counting failures
- CompressionError: Compression failures
- AssemblyTimeoutError: Assembly timeout
- ScoringError, FormattingError, CacheError
- ContextNotFoundError, InvalidContextError

All 86 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:07:39 +01:00
Felipe Cardoso
2ab69f8561 docs(mcp): add comprehensive MCP server documentation
- Add docs/architecture/MCP_SERVERS.md with full architecture overview
- Add README.md for LLM Gateway with quick start, tools, and model groups
- Add README.md for Knowledge Base with search types, chunking strategies
- Include API endpoints, security guidelines, and testing instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 01:37:04 +01:00
Felipe Cardoso
95342cc94d fix(mcp-gateway): address critical issues from deep review
Frontend:
- Fix debounce race condition in UserListTable search handler
- Use useRef to properly track and cleanup timeout between keystrokes

Backend (LLM Gateway):
- Add thread-safe double-checked locking for global singletons
  (providers, circuit registry, cost tracker)
- Fix Redis URL parsing with proper urlparse validation
- Add explicit error handling for malformed Redis URLs
- Document circuit breaker state transition safety

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 01:36:55 +01:00
Felipe Cardoso
f6194b3e19 Merge pull request #72: feat(knowledge-base): implement Knowledge Base MCP Server (#57)
Implements RAG capabilities with pgvector, intelligent chunking, and 6 MCP tools.

Closes #57
2026-01-04 01:28:20 +01:00
Felipe Cardoso
6bb376a336 fix(mcp-kb): add input validation, path security, and health checks
Security fixes from deep review:
- Add input validation patterns for project_id, agent_id, collection
- Add path traversal protection for source_path (reject .., null bytes)
- Add error codes (INTERNAL_ERROR) to generic exception handlers
- Handle FieldInfo objects in validation for test robustness

Performance fixes:
- Enable concurrent hybrid search with asyncio.gather

Health endpoint improvements:
- Check all dependencies (database, Redis, LLM Gateway)
- Return degraded/unhealthy status based on dependency health
- Updated tests for new health check response structure

All 139 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 01:18:50 +01:00
Felipe Cardoso
cd7a9ccbdf fix(mcp-kb): add transactional batch insert and atomic document update
- Wrap store_embeddings_batch in transaction for all-or-nothing semantics
- Add replace_source_embeddings method for atomic document updates
- Update collection_manager to use transactional replace
- Prevents race conditions and data inconsistency (closes #77)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 01:07:40 +01:00
Felipe Cardoso
953af52d0e fix(mcp-kb): address critical issues from deep review
- Fix SQL HAVING clause bug by using CTE approach (closes #73)
- Add /mcp JSON-RPC 2.0 endpoint for tool execution (closes #74)
- Add /mcp/tools endpoint for tool discovery (closes #75)
- Add content size limits to prevent DoS attacks (closes #78)
- Add comprehensive tests for new endpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 01:03:58 +01:00
Felipe Cardoso
e6e98d4ed1 docs(workflow): enforce stack verification as mandatory step
- Added "Stack Verification" section to CLAUDE.md with detailed steps.
- Updated WORKFLOW.md to mandate running the full stack before marking work as complete.
- Prevents issues where high test coverage masks application startup failures.
2026-01-04 00:58:31 +01:00
Felipe Cardoso
ca5f5e3383 refactor(environment): update virtualenv path to /opt/venv in Docker setup
- Adjusted `docker-compose.dev.yml` to reflect the new venv location.
- Modified entrypoint script and Dockerfile to reference `/opt/venv` for isolated dependencies.
- Improved bind mount setup to prevent venv overwrites during development.
2026-01-04 00:58:24 +01:00
Felipe Cardoso
d0fc7f37ff feat(knowledge-base): implement Knowledge Base MCP Server (#57)
Implements RAG capabilities with pgvector for semantic search:

- Intelligent chunking strategies (code-aware, markdown-aware, text)
- Semantic search with vector similarity (HNSW index)
- Keyword search with PostgreSQL full-text search
- Hybrid search using Reciprocal Rank Fusion (RRF)
- Redis caching for embeddings
- Collection management (ingest, search, delete, stats)
- FastMCP tools: search_knowledge, ingest_content, delete_content,
  list_collections, get_collection_stats, update_document

Testing:
- 128 comprehensive tests covering all components
- 58% code coverage (database integration tests use mocks)
- Passes ruff linting and mypy type checking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 21:33:26 +01:00
Felipe Cardoso
18d717e996 Merge pull request #71 from feature/56-llm-gateway-mcp-server
feat(llm-gateway): implement LLM Gateway MCP Server (#56)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-01-03 20:56:35 +01:00
Felipe Cardoso
f482559e15 fix(llm-gateway): improve type safety and datetime consistency
- Add type annotations for mypy compliance
- Use UTC-aware datetimes consistently (datetime.now(UTC))
- Add type: ignore comments for LiteLLM incomplete stubs
- Fix import ordering and formatting
- Update pyproject.toml mypy configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 20:56:05 +01:00
Felipe Cardoso
6e8b0b022a feat(llm-gateway): implement LLM Gateway MCP Server (#56)
Implements complete LLM Gateway MCP Server with:
- FastMCP server with 4 tools: chat_completion, list_models, get_usage, count_tokens
- LiteLLM Router with multi-provider failover chains
- Circuit breaker pattern for fault tolerance
- Redis-based cost tracking per project/agent
- Comprehensive test suite (209 tests, 92% coverage)

Model groups defined per ADR-004:
- reasoning: claude-opus-4 → gpt-4.1 → gemini-2.5-pro
- code: claude-sonnet-4 → gpt-4.1 → deepseek-coder
- fast: claude-haiku → gpt-4.1-mini → gemini-2.0-flash

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 20:31:19 +01:00
Felipe Cardoso
746fb7b181 refactor(connection): improve retry and cleanup behavior in project events
- Refined retry delay logic for clarity and correctness in `getNextRetryDelay`.
- Added `connectRef` to ensure latest `connect` function is called in retries.
- Separated cleanup and connection management effects to prevent premature disconnections.
- Enhanced inline comments for maintainability.
2026-01-03 18:36:51 +01:00
Felipe Cardoso
caf283bed2 feat(safety): enhance rate limiting and cost control with alert deduplication and usage tracking
- Added `record_action` in `RateLimiter` for precise tracking of slot consumption post-validation.
- Introduced deduplication mechanism for warning alerts in `CostController` to prevent spamming.
- Refactored `CostController`'s session and daily budget alert handling for improved clarity.
- Implemented test suites for `CostController` and `SafetyGuardian` to validate changes.
- Expanded integration testing to cover deduplication, validation, and loop detection edge cases.
2026-01-03 17:55:34 +01:00
Felipe Cardoso
520c06175e refactor(safety): apply consistent formatting across services and tests
Improved code readability and uniformity by standardizing line breaks, indentation, and inline conditions across safety-related services, models, and tests, including content filters, validation rules, and emergency controls.
2026-01-03 16:23:39 +01:00
Felipe Cardoso
065e43c5a9 fix(tests): use delay variables in retry delay test
The delay2 and delay3 variables were calculated but never asserted,
causing lint warnings. Added assertions to verify all delays are
positive and within max bounds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 16:19:54 +01:00
Felipe Cardoso
c8b88dadc3 fix(safety): copy default patterns to avoid test pollution
The ContentFilter was appending references to DEFAULT_PATTERNS objects,
so when tests modified patterns (e.g., disabling them), those changes
persisted across test runs. Use dataclass replace() to create copies.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 12:08:43 +01:00