syndarix

Author	SHA1	Message	Date
Felipe Cardoso	57680c3772	feat(memory): implement metrics and observability (#100 ) Add comprehensive metrics collector for memory system with: - Counter metrics: operations, retrievals, cache hits/misses, consolidations, episodes recorded, patterns/anomalies/insights detected - Gauge metrics: item counts, memory size, cache size, procedure success rates, active sessions, pending consolidations - Histogram metrics: working memory latency, retrieval latency, consolidation duration, embedding latency - Prometheus format export - Summary and cache stats helpers 31 tests covering all metric types, singleton pattern, and edge cases. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 11:00:53 +01:00
Felipe Cardoso	997cfaa03a	feat(memory): implement memory reflection service (#99 ) Add reflection layer for memory system with pattern detection, success/failure factor analysis, anomaly detection, and insights generation. Enables agents to learn from past experiences and identify optimization opportunities. Key components: - Pattern detection: recurring success/failure, action sequences, temporal, efficiency - Factor analysis: action, context, timing, resource, preceding state factors - Anomaly detection: unusual duration, token usage, failure rates, action patterns - Insight generation: optimization, warning, learning, recommendation, trend insights Also fixes pre-existing timezone issues in test_types.py (datetime.now() -> datetime.now(UTC)). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 04:22:23 +01:00
Felipe Cardoso	6954774e36	feat(memory): implement caching layer for memory operations (#98 ) Add comprehensive caching layer for the Agent Memory System: - HotMemoryCache: LRU cache for frequently accessed memories - Python 3.12 type parameter syntax - Thread-safe operations with RLock - TTL-based expiration - Access count tracking for hot memory identification - Scoped invalidation by type, scope, or pattern - EmbeddingCache: Cache embeddings by content hash - Content-hash based deduplication - Optional Redis backing for persistence - LRU eviction with configurable max size - CachedEmbeddingGenerator wrapper for transparent caching - CacheManager: Unified cache management - Coordinates hot cache, embedding cache, and retrieval cache - Centralized invalidation across all caches - Aggregated statistics and hit rate tracking - Automatic cleanup scheduling - Cache warmup support Performance targets: - Cache hit rate > 80% for hot memories - Cache operations < 1ms (memory), < 5ms (Redis) 83 new tests with comprehensive coverage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 04:04:13 +01:00
Felipe Cardoso	30e5c68304	feat(memory): integrate memory system with context engine (#97 ) ## Changes ### New Context Type - Add MEMORY to ContextType enum for agent memory context - Create MemoryContext class with subtypes (working, episodic, semantic, procedural) - Factory methods: from_working_memory, from_episodic_memory, from_semantic_memory, from_procedural_memory ### Memory Context Source - MemoryContextSource service fetches relevant memories for context assembly - Configurable fetch limits per memory type - Parallel fetching from all memory types ### Agent Lifecycle Hooks - AgentLifecycleManager handles spawn, pause, resume, terminate events - spawn: Initialize working memory with optional initial state - pause: Create checkpoint of working memory - resume: Restore from checkpoint - terminate: Consolidate working memory to episodic memory - LifecycleHooks for custom extension points ### Context Engine Integration - Add memory_query parameter to assemble_context() - Add session_id and agent_type_id for memory scoping - Memory budget allocation (15% by default) - set_memory_source() for runtime configuration ### Tests - 48 new tests for MemoryContext, MemoryContextSource, and lifecycle hooks - All 108 memory-related tests passing - mypy and ruff checks passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 03:49:22 +01:00
Felipe Cardoso	0b24d4c6cc	feat(memory): implement MCP tools for agent memory operations (#96 ) Add MCP-compatible tools that expose memory operations to agents: Tools implemented: - remember: Store data in working, episodic, semantic, or procedural memory - recall: Retrieve memories by query across multiple memory types - forget: Delete specific keys or bulk delete by pattern - reflect: Analyze patterns in recent episodes (success/failure factors) - get_memory_stats: Return usage statistics and breakdowns - search_procedures: Find procedures matching trigger patterns - record_outcome: Record task outcomes and update procedure success rates Key components: - tools.py: Pydantic schemas for tool argument validation with comprehensive field constraints (importance 0-1, TTL limits, limit ranges) - service.py: MemoryToolService coordinating memory type operations with proper scoping via ToolContext (project_id, agent_instance_id, session_id) - Lazy initialization of memory services (WorkingMemory, EpisodicMemory, SemanticMemory, ProceduralMemory) Test coverage: - 60 tests covering tool definitions, argument validation, and service execution paths - Mock-based tests for all memory type interactions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 03:32:10 +01:00
Felipe Cardoso	1670e05e0d	feat(memory): implement memory consolidation service and tasks (#95 ) - Add MemoryConsolidationService with Working→Episodic→Semantic/Procedural transfer - Add Celery tasks for session and nightly consolidation - Implement memory pruning with importance-based retention - Add comprehensive test suite (32 tests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 03:04:28 +01:00
Felipe Cardoso	999b7ac03f	feat(memory): implement memory indexing and retrieval engine (#94 ) Add comprehensive indexing and retrieval system for memory search: - VectorIndex for semantic similarity search using cosine similarity - TemporalIndex for time-based queries with range and recency support - EntityIndex for entity-based lookups with multi-entity intersection - OutcomeIndex for success/failure filtering on episodes - MemoryIndexer as unified interface for all index types - RetrievalEngine with hybrid search combining all indices - RelevanceScorer for multi-signal relevance scoring - RetrievalCache for LRU caching of search results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 02:50:13 +01:00
Felipe Cardoso	48ecb40f18	feat(memory): implement memory scoping with hierarchy and access control (#93 ) Add scope management system for hierarchical memory access: - ScopeManager with hierarchy: Global → Project → Agent Type → Agent Instance → Session - ScopePolicy for access control (read, write, inherit permissions) - ScopeResolver for resolving queries across scope hierarchies with inheritance - ScopeFilter for filtering scopes by type, project, or agent - Access control enforcement with parent scope visibility - Deduplication support during resolution across scopes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 02:39:22 +01:00
Felipe Cardoso	b818f17418	feat(memory): add procedural memory implementation (Issue #92 ) Implements procedural memory for learned skills and procedures: Core functionality: - ProceduralMemory class for procedure storage/retrieval - record_procedure with duplicate detection and step merging - find_matching for context-based procedure search - record_outcome for success/failure tracking - get_best_procedure for finding highest success rate - update_steps for procedure refinement Supporting modules: - ProcedureMatcher: Keyword-based procedure matching - MatchResult/MatchContext: Matching result types - Success rate weighting in match scoring Test coverage: - 43 unit tests covering all modules - matching.py: 97% coverage - memory.py: 86% coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 02:31:32 +01:00
Felipe Cardoso	e946787a61	feat(memory): add semantic memory implementation (Issue #91 ) Implements semantic memory with fact storage, retrieval, and verification: Core functionality: - SemanticMemory class for fact storage/retrieval - Fact storage as subject-predicate-object triples - Duplicate detection with reinforcement - Semantic search with text-based fallback - Entity-based retrieval - Confidence scoring and decay - Conflict resolution Supporting modules: - FactExtractor: Pattern-based fact extraction from episodes - FactVerifier: Contradiction detection and reliability scoring Test coverage: - 47 unit tests covering all modules - extraction.py: 99% coverage - verification.py: 95% coverage - memory.py: 78% coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 02:23:06 +01:00
Felipe Cardoso	3554efe66a	feat(memory): add episodic memory implementation (Issue #90 ) Implements the episodic memory service for storing and retrieving agent task execution experiences. This enables learning from past successes and failures. Components: - EpisodicMemory: Main service class combining recording and retrieval - EpisodeRecorder: Handles episode creation, importance scoring - EpisodeRetriever: Multiple retrieval strategies (recency, semantic, outcome, importance, task type) Key features: - Records task completions with context, actions, outcomes - Calculates importance scores based on outcome, duration, lessons - Semantic search with fallback to recency when embeddings unavailable - Full CRUD operations with statistics and summarization - Comprehensive unit tests (50 tests, all passing) Closes #90 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 02:08:16 +01:00
Felipe Cardoso	bd988f76b0	fix(memory): address review findings from Issue #88 Fixes based on multi-agent review: Model Improvements: - Remove duplicate index ix_procedures_agent_type (already indexed via Column) - Fix postgresql_where to use text() instead of string literal in Fact model - Add thread-safety to Procedure.success_rate property (snapshot values) Data Integrity Constraints: - Add CheckConstraint for Episode: importance_score 0-1, duration >= 0, tokens >= 0 - Add CheckConstraint for Fact: confidence 0-1 - Add CheckConstraint for Procedure: success_count >= 0, failure_count >= 0 Migration Updates: - Add check constraints creation in upgrade() - Add check constraints removal in downgrade() Note: SQLAlchemy Column default=list is correct (callable factory pattern) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 01:54:51 +01:00
Felipe Cardoso	4974233169	feat(memory): add working memory implementation (Issue #89 ) Implements session-scoped ephemeral memory with: Storage Backends: - InMemoryStorage: Thread-safe fallback with TTL support and capacity limits - RedisStorage: Primary storage with connection pooling and JSON serialization - Auto-fallback from Redis to in-memory when unavailable WorkingMemory Class: - Key-value storage with TTL and reserved key protection - Task state tracking with progress updates - Scratchpad for reasoning steps with timestamps - Checkpoint/snapshot support for recovery - Factory methods for auto-configured storage Tests: - 55 unit tests covering all functionality - Tests for basic ops, TTL, capacity, concurrency - Tests for task state, scratchpad, checkpoints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 01:51:03 +01:00
Felipe Cardoso	c9d8c0835c	feat(memory): add database schema and storage layer (Issue #88 ) Add SQLAlchemy models for the Agent Memory System: - WorkingMemory: Key-value storage with TTL for active sessions - Episode: Experiential memories from task executions - Fact: Semantic knowledge triples with confidence scores - Procedure: Learned skills and procedures with success tracking - MemoryConsolidationLog: Tracks consolidation jobs between memory tiers Create enums for memory system: - ScopeType: global, project, agent_type, agent_instance, session - EpisodeOutcome: success, failure, partial - ConsolidationType: working_to_episodic, episodic_to_semantic, etc. - ConsolidationStatus: pending, running, completed, failed Add Alembic migration (0005) for all memory tables with: - Foreign key relationships to projects, agent_instances, agent_types - Comprehensive indexes for query patterns - Unique constraints for key lookups and triple uniqueness - Vector embedding column placeholders (Text fallback until pgvector enabled) Fix timezone-naive datetime.now() in types.py TaskState (review feedback) Includes 30 unit tests for models and enums. Closes #88 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 01:37:58 +01:00
Felipe Cardoso	085a748929	feat(memory): #87 project setup & core architecture Implements Sub-Issue #87 of Issue #62 (Agent Memory System). Core infrastructure: - memory/types.py: Type definitions for all memory types (Working, Episodic, Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome - memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton - memory/exceptions.py: Comprehensive exception hierarchy for memory operations - memory/manager.py: MemoryManager facade with placeholder methods Directory structure: - working/: Working memory (Redis/in-memory) - to be implemented in #89 - episodic/: Episodic memory (experiences) - to be implemented in #90 - semantic/: Semantic memory (facts) - to be implemented in #91 - procedural/: Procedural memory (skills) - to be implemented in #92 - scoping/: Scope management - to be implemented in #93 - indexing/: Vector indexing - to be implemented in #94 - consolidation/: Memory consolidation - to be implemented in #95 Tests: 71 unit tests for config, types, and exceptions Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 01:27:36 +01:00
Felipe Cardoso	4b149b8a52	feat(tests): add unit tests for Context Management API routes - Added detailed unit tests for `/context` endpoints, covering health checks, context assembly, token counting, budget retrieval, and cache invalidation. - Included edge cases, error handling, and input validation for context-related operations. - Improved test coverage for the Context Management module with mocked dependencies and integration scenarios.	2026-01-05 01:02:49 +01:00
Felipe Cardoso	ad0c06851d	feat(tests): add comprehensive E2E tests for MCP and Agent workflows - Introduced end-to-end tests for MCP workflows, including server discovery, authentication, context engine operations, error handling, and input validation. - Added full lifecycle tests for agent workflows, covering type management, instance spawning, status transitions, and admin-only operations. - Enhanced test coverage for real-world MCP and Agent scenarios across PostgreSQL and async environments.	2026-01-05 01:02:41 +01:00
Felipe Cardoso	49359b1416	feat(api): add Context Management API and routes - Introduced a new `context` module and its endpoints for Context Management. - Added `/context` route to the API router for assembling LLM context, token counting, budget management, and cache invalidation. - Implemented health checks, context assembly, token counting, and caching operations in the Context Management Engine. - Included schemas for request/response models and tightened error handling for context-related operations.	2026-01-05 01:02:33 +01:00
Felipe Cardoso	911d950c15	feat(tests): add comprehensive integration tests for MCP stack - Introduced integration tests covering backend, LLM Gateway, Knowledge Base, and Context Engine. - Includes health checks, tool listing, token counting, and end-to-end MCP flows. - Added `RUN_INTEGRATION_TESTS` environment flag to enable selective test execution. - Includes a quick health check script to verify service availability before running tests.	2026-01-05 01:02:22 +01:00
Felipe Cardoso	b2a3ac60e0	feat: add integration testing target to Makefile - Introduced `test-integration` command for MCP integration tests. - Expanded help section with details about running integration tests. - Improved Makefile's testing capabilities for enhanced developer workflows.	2026-01-05 01:02:16 +01:00
Felipe Cardoso	60ebeaa582	test(safety): add comprehensive tests for safety framework modules Add tests to improve backend coverage from 85% to 93%: - test_audit.py: 60 tests for AuditLogger (20% -> 99%) - Hash chain integrity, sanitization, retention, handlers - Fixed bug: hash chain modification after event creation - Fixed bug: verification not using correct prev_hash - test_hitl.py: Tests for HITL manager (0% -> 100%) - test_permissions.py: Tests for permissions manager (0% -> 99%) - test_rollback.py: Tests for rollback manager (0% -> 100%) - test_metrics.py: Tests for metrics collector (0% -> 100%) - test_mcp_integration.py: Tests for MCP safety wrapper (0% -> 100%) - test_validation.py: Additional cache and edge case tests (76% -> 100%) - test_scoring.py: Lock cleanup and edge case tests (78% -> 91%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 19:41:54 +01:00
Felipe Cardoso	758052dcff	feat(context): improve budget validation and XML safety in ranking and Claude adapter - Added stricter budget validation in ContextRanker with explicit error handling for invalid configurations. - Introduced `_get_valid_token_count()` helper to validate and safeguard token counts. - Enhanced XML escaping in Claude adapter to prevent injection risks from scores and unhandled content.	2026-01-04 16:02:18 +01:00
Felipe Cardoso	1628eacf2b	feat(context): enhance timeout handling, tenant isolation, and budget management - Added timeout enforcement for token counting, scoring, and compression with detailed error handling. - Introduced tenant isolation in context caching using project and agent identifiers. - Enhanced budget management with stricter checks for critical context overspending and buffer limitations. - Optimized per-context locking with cleanup to prevent memory leaks in concurrent environments. - Updated default assembly timeout settings for improved performance and reliability. - Improved XML escaping in Claude adapter for safety against injection attacks. - Standardized token estimation using model-specific ratios.	2026-01-04 15:52:50 +01:00
Felipe Cardoso	2bea057fb1	chore(context): refactor for consistency, optimize formatting, and simplify logic - Cleaned up unnecessary comments in `__all__` definitions for better readability. - Adjusted indentation and formatting across modules for improved clarity (e.g., long lines, logical grouping). - Simplified conditional expressions and inline comments for context scoring and ranking. - Replaced some hard-coded values with type-safe annotations (e.g., `ClassVar`). - Removed unused imports and ensured consistent usage across test files. - Updated `test_score_not_cached_on_context` to clarify caching behavior. - Improved truncation strategy logic and marker handling.	2026-01-04 15:23:14 +01:00
Felipe Cardoso	9e54f16e56	test(context): add edge case tests for truncation and scoring concurrency - Add tests for truncation edge cases, including zero tokens, short content, and marker handling. - Add concurrency tests for scoring to verify per-context locking and handling of multiple contexts.	2026-01-04 12:38:04 +01:00
Felipe Cardoso	96e6400bd8	feat(context): enhance performance, caching, and settings management - Replace hard-coded limits with configurable settings (e.g., cache memory size, truncation strategy, relevance settings). - Optimize parallel execution in token counting, scoring, and reranking for source diversity. - Improve caching logic: - Add per-context locks for safe parallel scoring. - Reuse precomputed fingerprints for cache efficiency. - Make truncation, scoring, and ranker behaviors fully configurable via settings. - Add support for middle truncation, context hash-based hashing, and dynamic token limiting. - Refactor methods for scalability and better error handling. Tests: Updated all affected components with additional test cases.	2026-01-04 12:37:58 +01:00
Felipe Cardoso	6c7b72f130	chore(context): apply linter fixes and sort imports (#86 ) Phase 8 of Context Management Engine - Final Cleanup: - Sort __all__ exports alphabetically - Sort imports per isort conventions - Fix minor linting issues Final test results: - 311 context management tests passing - 2507 total backend tests passing - 85% code coverage Context Management Engine is complete with all 8 phases: 1. Foundation: Types, Config, Exceptions 2. Token Budget Management 3. Context Scoring & Ranking 4. Context Assembly Pipeline 5. Model Adapters (Claude, OpenAI) 6. Caching Layer (Redis + in-memory) 7. Main Engine & Integration 8. Testing & Documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 02:46:56 +01:00
Felipe Cardoso	027ebfc332	feat(context): implement main ContextEngine with full integration (#85 ) Phase 7 of Context Management Engine - Main Engine: - Add ContextEngine as main orchestration class - Integrate all components: calculator, scorer, ranker, compressor, cache - Add high-level assemble_context() API with: - System prompt support - Task description support - Knowledge Base integration via MCP - Conversation history conversion - Tool results conversion - Custom contexts support - Add helper methods: - get_budget_for_model() - count_tokens() with caching - invalidate_cache() - get_stats() - Add create_context_engine() factory function Tests: 26 new tests, 311 total context tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 02:44:40 +01:00
Felipe Cardoso	c2466ab401	feat(context): implement Redis-based caching layer (#84 ) Phase 6 of Context Management Engine - Caching Layer: - Add ContextCache with Redis integration - Support fingerprint-based assembled context caching - Support token count caching (model-specific) - Support score caching (scorer + context + query) - Add in-memory fallback with LRU eviction - Add cache invalidation with pattern matching - Add cache statistics reporting Key features: - Hierarchical cache key structure (ctx:type:hash) - Automatic TTL expiration - Memory cache for fast repeated access - Graceful degradation when Redis unavailable Tests: 29 new tests, 285 total context tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 02:41:21 +01:00
Felipe Cardoso	7828d35e06	feat(context): implement model adapters for Claude and OpenAI (#83 ) Phase 5 of Context Management Engine - Model Adapters: - Add ModelAdapter abstract base class with model matching - Add DefaultAdapter for unknown models (plain text) - Add ClaudeAdapter with XML-based formatting: - <system_instructions> for system context - <reference_documents>/<document> for knowledge - <conversation_history>/<message> for chat - <tool_results>/<tool_result> for tool outputs - XML escaping for special characters - Add OpenAIAdapter with markdown formatting: - ## headers for sections - ### Source headers for documents - ROLE bold labels for conversation - Code blocks for tool outputs - Add get_adapter() factory function for model selection Tests: 33 new tests, 256 total context tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 02:36:32 +01:00
Felipe Cardoso	6b07e62f00	feat(context): implement assembly pipeline and compression (#82 ) Phase 4 of Context Management Engine - Assembly Pipeline: - Add TruncationStrategy with end/middle/sentence-aware truncation - Add TruncationResult dataclass for tracking compression metrics - Add ContextCompressor for type-specific compression - Add ContextPipeline orchestrating full assembly workflow: - Token counting for all contexts - Scoring and ranking via ContextRanker - Optional compression when budget threshold exceeded - Model-specific formatting (XML for Claude, markdown for OpenAI) - Add PipelineMetrics for performance tracking - Update AssembledContext with new fields (model, contexts, metadata) - Add backward compatibility aliases for renamed fields Tests: 34 new tests, 223 total context tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 02:32:25 +01:00
Felipe Cardoso	0d2005ddcb	feat(context): implement context scoring and ranking (Phase 3) Add comprehensive scoring system with three strategies: - RelevanceScorer: Semantic similarity with keyword fallback - RecencyScorer: Exponential decay with type-specific half-lives - PriorityScorer: Priority-based scoring with type bonuses Implement CompositeScorer combining all strategies with configurable weights (default: 50% relevance, 30% recency, 20% priority). Add ContextRanker for budget-aware context selection with: - Greedy selection algorithm respecting token budgets - CRITICAL priority contexts always included - Diversity reranking to prevent source dominance - Comprehensive selection statistics 68 tests covering all scoring and ranking functionality. Part of #61 - Context Management Engine 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 02:24:06 +01:00
Felipe Cardoso	dfa75e682e	feat(context): implement token budget management (Phase 2) Add TokenCalculator with LLM Gateway integration for accurate token counting with in-memory caching and fallback character-based estimation. Implement TokenBudget for tracking allocations per context type with budget enforcement, and BudgetAllocator for creating budgets based on model context window sizes. - TokenCalculator: MCP integration, caching, model-specific ratios - TokenBudget: allocation tracking, can_fit/allocate/deallocate/reset - BudgetAllocator: model context sizes, budget creation and adjustment - 35 comprehensive tests covering all budget functionality Part of #61 - Context Management Engine 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 02:13:23 +01:00
Felipe Cardoso	22ecb5e989	feat(context): Phase 1 - Foundation types, config and exceptions (#79 ) Implements the foundation for Context Management Engine: Types (backend/app/services/context/types/): - BaseContext: Abstract base with ID, content, priority, scoring - SystemContext: System prompts, personas, instructions - KnowledgeContext: RAG results from Knowledge Base MCP - ConversationContext: Chat history with role support - TaskContext: Task/issue context with acceptance criteria - ToolContext: Tool definitions and execution results - AssembledContext: Final assembled context result Configuration (config.py): - Token budget allocation (system 5%, task 10%, knowledge 40%, etc.) - Scoring weights (relevance 50%, recency 30%, priority 20%) - Cache settings (TTL, prefix) - Performance settings (max assembly time, parallel scoring) - Environment variable overrides with CTX_ prefix Exceptions (exceptions.py): - ContextError: Base exception - BudgetExceededError: Token budget violations - TokenCountError: Token counting failures - CompressionError: Compression failures - AssemblyTimeoutError: Assembly timeout - ScoringError, FormattingError, CacheError - ContextNotFoundError, InvalidContextError All 86 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-04 02:07:39 +01:00
Felipe Cardoso	ca5f5e3383	refactor(environment): update virtualenv path to `/opt/venv` in Docker setup - Adjusted `docker-compose.dev.yml` to reflect the new venv location. - Modified entrypoint script and Dockerfile to reference `/opt/venv` for isolated dependencies. - Improved bind mount setup to prevent venv overwrites during development.	2026-01-04 00:58:24 +01:00
Felipe Cardoso	caf283bed2	feat(safety): enhance rate limiting and cost control with alert deduplication and usage tracking - Added `record_action` in `RateLimiter` for precise tracking of slot consumption post-validation. - Introduced deduplication mechanism for warning alerts in `CostController` to prevent spamming. - Refactored `CostController`'s session and daily budget alert handling for improved clarity. - Implemented test suites for `CostController` and `SafetyGuardian` to validate changes. - Expanded integration testing to cover deduplication, validation, and loop detection edge cases.	2026-01-03 17:55:34 +01:00
Felipe Cardoso	520c06175e	refactor(safety): apply consistent formatting across services and tests Improved code readability and uniformity by standardizing line breaks, indentation, and inline conditions across safety-related services, models, and tests, including content filters, validation rules, and emergency controls.	2026-01-03 16:23:39 +01:00
Felipe Cardoso	065e43c5a9	fix(tests): use delay variables in retry delay test The delay2 and delay3 variables were calculated but never asserted, causing lint warnings. Added assertions to verify all delays are positive and within max bounds. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 16:19:54 +01:00
Felipe Cardoso	c8b88dadc3	fix(safety): copy default patterns to avoid test pollution The ContentFilter was appending references to DEFAULT_PATTERNS objects, so when tests modified patterns (e.g., disabling them), those changes persisted across test runs. Use dataclass replace() to create copies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 12:08:43 +01:00
Felipe Cardoso	015f2de6c6	test(safety): add Phase E comprehensive safety tests - Add tests for models: ActionMetadata, ActionRequest, ActionResult, ValidationRule, BudgetStatus, RateLimitConfig, ApprovalRequest/Response, Checkpoint, RollbackResult, AuditEvent, SafetyPolicy, GuardianResult - Add tests for validation: ActionValidator rules, priorities, patterns, bypass mode, batch validation, rule creation helpers - Add tests for loops: LoopDetector exact/semantic/oscillation detection, LoopBreaker throttle/backoff, history management - Add tests for content filter: PII filtering (email, phone, SSN, credit card), secret blocking (API keys, GitHub tokens, private keys), custom patterns, scan without filtering, dict filtering - Add tests for emergency controls: state management, pause/resume/reset, scoped emergency stops, callbacks, EmergencyTrigger events - Fix exception kwargs in content filter and emergency controls to match exception class signatures All 108 tests passing with lint and type checks clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 11:52:35 +01:00
Felipe Cardoso	f36bfb3781	feat(safety): add Phase D MCP integration and metrics - Add MCPSafetyWrapper for safe MCP tool execution - Add MCPToolCall/MCPToolResult models for MCP interactions - Add SafeToolExecutor context manager - Add SafetyMetrics collector with Prometheus export support - Track validations, approvals, rate limits, budgets, and more - Support for counters, gauges, and histograms Issue #63 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 11:40:14 +01:00
Felipe Cardoso	ef659cd72d	feat(safety): add Phase C advanced controls - Add rollback manager with file checkpointing and transaction context - Add HITL manager with approval queues and notification handlers - Add content filter with PII, secrets, and injection detection - Add emergency controls with stop/pause/resume capabilities - Update SafetyConfig with checkpoint_dir setting Issue #63 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 11:36:24 +01:00
Felipe Cardoso	728edd1453	feat(backend): add Phase B safety subsystems (#63 ) Implements core control subsystems for the safety framework: Action Validation (validation/validator.py): - Rule-based validation engine with priority ordering - Allow/deny/require-approval rule types - Pattern matching for tools and resources - Validation result caching with LRU eviction - Emergency bypass capability with audit Permission System (permissions/manager.py): - Per-agent permission grants on resources - Resource pattern matching (wildcards) - Temporary permissions with expiration - Permission inheritance hierarchy - Default deny with configurable defaults Cost Control (costs/controller.py): - Per-session and per-day budget tracking - Token and USD cost limits - Warning alerts at configurable thresholds - Budget rollover and reset policies - Real-time usage tracking Rate Limiting (limits/limiter.py): - Sliding window rate limiter - Per-action, per-LLM-call, per-file-op limits - Burst allowance with recovery - Configurable limits per operation type Loop Detection (loops/detector.py): - Exact repetition detection (same action+args) - Semantic repetition (similar actions) - Oscillation pattern detection (A→B→A→B) - Per-agent action history tracking - Loop breaking suggestions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 11:28:00 +01:00
Felipe Cardoso	498c0a0e94	feat(backend): add safety framework foundation (Phase A) (#63 ) Core safety framework architecture for autonomous agent guardrails: Core Components: - SafetyGuardian: Main orchestrator for all safety checks - AuditLogger: Comprehensive audit logging with hash chain tamper detection - SafetyConfig: Pydantic-based configuration - Models: Action requests, validation results, policies, checkpoints Exception Hierarchy: - SafetyError base with context preservation - Permission, Budget, RateLimit, Loop errors - Approval workflow errors (Required, Denied, Timeout) - Rollback, Sandbox, Emergency exceptions Safety Policy System: - Autonomy level based policies (FULL_CONTROL, MILESTONE, AUTONOMOUS) - Cost limits, rate limits, permission patterns - HITL approval requirements per action type - Configurable loop detection thresholds Directory Structure: - validation/, costs/, limits/, loops/ - Control subsystems - permissions/, rollback/, hitl/ - Access and recovery - content/, sandbox/, emergency/ - Protection systems - audit/, policies/ - Logging and configuration Phase A establishes the architecture. Subsystems to be implemented in Phase B-C. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 11:22:25 +01:00
Felipe Cardoso	e5975fa5d0	feat(backend): implement MCP client infrastructure (#55 ) Core MCP client implementation with comprehensive tooling: Services: - MCPClientManager: Main facade for all MCP operations - MCPServerRegistry: Thread-safe singleton for server configs - ConnectionPool: Connection pooling with auto-reconnection - ToolRouter: Automatic tool routing with circuit breaker - AsyncCircuitBreaker: Custom async-compatible circuit breaker Configuration: - YAML-based config with Pydantic models - Environment variable expansion support - Transport types: HTTP, SSE, STDIO API Endpoints: - GET /mcp/servers - List all MCP servers - GET /mcp/servers/{name}/tools - List server tools - GET /mcp/tools - List all tools from all servers - GET /mcp/health - Health check all servers - POST /mcp/call - Execute tool (admin only) - GET /mcp/circuit-breakers - Circuit breaker status - POST /mcp/circuit-breakers/{name}/reset - Reset circuit breaker - POST /mcp/servers/{name}/reconnect - Force reconnection Testing: - 156 unit tests with comprehensive coverage - Tests for all services, routes, and error handling - Proper mocking and async test support Documentation: - MCP_CLIENT.md with usage examples - Phase 2+ workflow documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-03 11:12:41 +01:00
Felipe Cardoso	664415111a	test(backend): add comprehensive tests for OAuth and agent endpoints - Added tests for OAuth provider admin and consent endpoints covering edge cases. - Extended agent-related tests to handle incorrect project associations and lifecycle state transitions. - Introduced tests for sprint status transitions and validation checks. - Improved multiline formatting consistency across all test functions.	2026-01-03 01:44:11 +01:00
Felipe Cardoso	acd18ff694	chore(backend): standardize multiline formatting across modules Reformatted multiline function calls, object definitions, and queries for improved code readability and consistency. Adjusted imports and constraints where necessary.	2026-01-03 01:35:18 +01:00
Felipe Cardoso	8b6cca5d4d	refactor(backend): simplify ENUM handling in alembic migration script - Removed explicit ENUM creation statements; rely on `sa.Enum` to auto-generate ENUM types during table creation. - Cleaned up redundant `create_type=False` arguments to streamline definitions.	2026-01-01 12:34:09 +01:00
Felipe Cardoso	7280b182bd	fix(backend): race condition fixes for task completion and sprint operations ## Changes ### agent_instance.py - Task Completion Counter Race Condition - Changed `record_task_completion()` from read-modify-write pattern to atomic SQL UPDATE - Previously: Read instance → increment in Python memory → write back - Now: Uses `UPDATE ... SET tasks_completed = tasks_completed + 1` - Prevents lost updates when multiple concurrent task completions occur ### sprint.py - Row-Level Locking for Sprint Operations - Added `with_for_update()` to `complete_sprint()` to prevent race conditions during velocity calculation - Added `with_for_update()` to `cancel_sprint()` for consistency - Ensures atomic check-and-update for sprint status changes ## Impact These fixes prevent: - Counter metrics being lost under concurrent load - Data corruption during sprint completion - Race conditions with concurrent sprint status changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-31 17:23:33 +01:00
Felipe Cardoso	06b2491c1f	fix(backend): critical bug fixes for agent termination and sprint validation Bug Fixes: - bulk_terminate_by_project now unassigns issues before terminating agents to prevent orphaned issue assignments - PATCH /issues/{id} now validates sprint status - cannot assign issues to COMPLETED or CANCELLED sprints - archive_project now performs cascading cleanup: - Terminates all active agent instances - Cancels all planned/active sprints - Unassigns issues from terminated agents Added edge case tests for all fixed bugs (19 new tests total): - TestBulkTerminateEdgeCases - TestSprintStatusValidation - TestArchiveProjectCleanup - TestDataIntegrityEdgeCases (IDOR protection) Coverage: 93% (1836 tests passing) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-31 15:23:21 +01:00

1 2 3 4

177 Commits