syndarix

Author	SHA1	Message	Date
Felipe Cardoso	c8ba23928e	fix(memory): add thread-safe singleton initialization - Add threading.Lock with double-check locking to ScopeManager - Add asyncio.Lock with double-check locking to MemoryReflection - Make reset_memory_metrics async with proper locking - Update test fixtures to handle async reset functions	2026-01-05 17:39:39 +01:00
Felipe Cardoso	032738c8dd	fix(memory): add data integrity constraints to Fact model - Change source_episode_ids from JSON to JSONB for PostgreSQL consistency - Add unique constraint for global facts (project_id IS NULL) - Add CHECK constraint ensuring reinforcement_count >= 1	2026-01-05 17:39:30 +01:00
Felipe Cardoso	6121aac899	fix(tests): move memory model tests to avoid import conflicts Moved tests/unit/models/memory/ to tests/models/memory/ to avoid Python import path conflicts when pytest collects all tests. The conflict was caused by tests/models/ and tests/unit/models/ both having __init__.py files, causing Python to confuse app.models.memory imports.	2026-01-05 15:45:30 +01:00
Felipe Cardoso	8c7c89a49e	feat(memory): add memory consolidation task and switch `source_episode_ids` to JSON - Added `memory_consolidation` to the task list and updated `__all__` in test files. - Updated `source_episode_ids` in `Fact` model to use JSON for cross-database compatibility. - Revised related database migrations to use JSONB instead of ARRAY. - Adjusted test concurrency in Makefile for improved test performance.	2026-01-05 15:38:52 +01:00
Felipe Cardoso	535e0055e1	style(memory): apply ruff formatting and linting fixes Auto-fixed linting errors and formatting issues: - Removed unused imports (F401): pytest, Any, AnalysisType, MemoryType, OutcomeType - Removed unused variable (F841): hooks variable in test - Applied consistent formatting across memory service and test files	2026-01-05 14:07:48 +01:00
Felipe Cardoso	1eaa923cd2	docs(memory): add comprehensive memory system documentation (#101 ) Add complete documentation for the Agent Memory System including: - Architecture overview with ASCII diagram - Memory type descriptions (working, episodic, semantic, procedural) - Usage examples for all memory operations - Memory scoping hierarchy explanation - Consolidation flow documentation - MCP tools reference - Reflection capabilities - Configuration reference table - Integration with Context Engine - Metrics reference - Performance targets - Troubleshooting guide - Directory structure	2026-01-05 11:03:57 +01:00
Felipe Cardoso	08bca06e71	feat(memory): implement metrics and observability (#100 ) Add comprehensive metrics collector for memory system with: - Counter metrics: operations, retrievals, cache hits/misses, consolidations, episodes recorded, patterns/anomalies/insights detected - Gauge metrics: item counts, memory size, cache size, procedure success rates, active sessions, pending consolidations - Histogram metrics: working memory latency, retrieval latency, consolidation duration, embedding latency - Prometheus format export - Summary and cache stats helpers 31 tests covering all metric types, singleton pattern, and edge cases.	2026-01-05 11:00:53 +01:00
Felipe Cardoso	05b75de21f	feat(memory): implement memory reflection service (#99 ) Add reflection layer for memory system with pattern detection, success/failure factor analysis, anomaly detection, and insights generation. Enables agents to learn from past experiences and identify optimization opportunities. Key components: - Pattern detection: recurring success/failure, action sequences, temporal, efficiency - Factor analysis: action, context, timing, resource, preceding state factors - Anomaly detection: unusual duration, token usage, failure rates, action patterns - Insight generation: optimization, warning, learning, recommendation, trend insights Also fixes pre-existing timezone issues in test_types.py (datetime.now() -> datetime.now(UTC)).	2026-01-05 04:22:23 +01:00
Felipe Cardoso	6be8e2e88d	feat(memory): implement caching layer for memory operations (#98 ) Add comprehensive caching layer for the Agent Memory System: - HotMemoryCache: LRU cache for frequently accessed memories - Python 3.12 type parameter syntax - Thread-safe operations with RLock - TTL-based expiration - Access count tracking for hot memory identification - Scoped invalidation by type, scope, or pattern - EmbeddingCache: Cache embeddings by content hash - Content-hash based deduplication - Optional Redis backing for persistence - LRU eviction with configurable max size - CachedEmbeddingGenerator wrapper for transparent caching - CacheManager: Unified cache management - Coordinates hot cache, embedding cache, and retrieval cache - Centralized invalidation across all caches - Aggregated statistics and hit rate tracking - Automatic cleanup scheduling - Cache warmup support Performance targets: - Cache hit rate > 80% for hot memories - Cache operations < 1ms (memory), < 5ms (Redis) 83 new tests with comprehensive coverage.	2026-01-05 04:04:13 +01:00
Felipe Cardoso	283f2567df	feat(memory): integrate memory system with context engine (#97 ) ## Changes ### New Context Type - Add MEMORY to ContextType enum for agent memory context - Create MemoryContext class with subtypes (working, episodic, semantic, procedural) - Factory methods: from_working_memory, from_episodic_memory, from_semantic_memory, from_procedural_memory ### Memory Context Source - MemoryContextSource service fetches relevant memories for context assembly - Configurable fetch limits per memory type - Parallel fetching from all memory types ### Agent Lifecycle Hooks - AgentLifecycleManager handles spawn, pause, resume, terminate events - spawn: Initialize working memory with optional initial state - pause: Create checkpoint of working memory - resume: Restore from checkpoint - terminate: Consolidate working memory to episodic memory - LifecycleHooks for custom extension points ### Context Engine Integration - Add memory_query parameter to assemble_context() - Add session_id and agent_type_id for memory scoping - Memory budget allocation (15% by default) - set_memory_source() for runtime configuration ### Tests - 48 new tests for MemoryContext, MemoryContextSource, and lifecycle hooks - All 108 memory-related tests passing - mypy and ruff checks passing	2026-01-05 03:49:22 +01:00
Felipe Cardoso	6444f22e64	feat(memory): implement MCP tools for agent memory operations (#96 ) Add MCP-compatible tools that expose memory operations to agents: Tools implemented: - remember: Store data in working, episodic, semantic, or procedural memory - recall: Retrieve memories by query across multiple memory types - forget: Delete specific keys or bulk delete by pattern - reflect: Analyze patterns in recent episodes (success/failure factors) - get_memory_stats: Return usage statistics and breakdowns - search_procedures: Find procedures matching trigger patterns - record_outcome: Record task outcomes and update procedure success rates Key components: - tools.py: Pydantic schemas for tool argument validation with comprehensive field constraints (importance 0-1, TTL limits, limit ranges) - service.py: MemoryToolService coordinating memory type operations with proper scoping via ToolContext (project_id, agent_instance_id, session_id) - Lazy initialization of memory services (WorkingMemory, EpisodicMemory, SemanticMemory, ProceduralMemory) Test coverage: - 60 tests covering tool definitions, argument validation, and service execution paths - Mock-based tests for all memory type interactions	2026-01-05 03:32:10 +01:00
Felipe Cardoso	7b4db3e687	feat(memory): implement memory consolidation service and tasks (#95 ) - Add MemoryConsolidationService with Working→Episodic→Semantic/Procedural transfer - Add Celery tasks for session and nightly consolidation - Implement memory pruning with importance-based retention - Add comprehensive test suite (32 tests)	2026-01-05 03:04:28 +01:00
Felipe Cardoso	6b66db8b09	feat(memory): implement memory indexing and retrieval engine (#94 ) Add comprehensive indexing and retrieval system for memory search: - VectorIndex for semantic similarity search using cosine similarity - TemporalIndex for time-based queries with range and recency support - EntityIndex for entity-based lookups with multi-entity intersection - OutcomeIndex for success/failure filtering on episodes - MemoryIndexer as unified interface for all index types - RetrievalEngine with hybrid search combining all indices - RelevanceScorer for multi-signal relevance scoring - RetrievalCache for LRU caching of search results	2026-01-05 02:50:13 +01:00
Felipe Cardoso	12c8fa9ba5	feat(memory): implement memory scoping with hierarchy and access control (#93 ) Add scope management system for hierarchical memory access: - ScopeManager with hierarchy: Global → Project → Agent Type → Agent Instance → Session - ScopePolicy for access control (read, write, inherit permissions) - ScopeResolver for resolving queries across scope hierarchies with inheritance - ScopeFilter for filtering scopes by type, project, or agent - Access control enforcement with parent scope visibility - Deduplication support during resolution across scopes	2026-01-05 02:39:22 +01:00
Felipe Cardoso	e587e70be1	feat(memory): add procedural memory implementation (Issue #92 ) Implements procedural memory for learned skills and procedures: Core functionality: - ProceduralMemory class for procedure storage/retrieval - record_procedure with duplicate detection and step merging - find_matching for context-based procedure search - record_outcome for success/failure tracking - get_best_procedure for finding highest success rate - update_steps for procedure refinement Supporting modules: - ProcedureMatcher: Keyword-based procedure matching - MatchResult/MatchContext: Matching result types - Success rate weighting in match scoring Test coverage: - 43 unit tests covering all modules - matching.py: 97% coverage - memory.py: 86% coverage	2026-01-05 02:31:32 +01:00
Felipe Cardoso	72b10ce001	feat(memory): add semantic memory implementation (Issue #91 ) Implements semantic memory with fact storage, retrieval, and verification: Core functionality: - SemanticMemory class for fact storage/retrieval - Fact storage as subject-predicate-object triples - Duplicate detection with reinforcement - Semantic search with text-based fallback - Entity-based retrieval - Confidence scoring and decay - Conflict resolution Supporting modules: - FactExtractor: Pattern-based fact extraction from episodes - FactVerifier: Contradiction detection and reliability scoring Test coverage: - 47 unit tests covering all modules - extraction.py: 99% coverage - verification.py: 95% coverage - memory.py: 78% coverage	2026-01-05 02:23:06 +01:00
Felipe Cardoso	28121864a2	feat(memory): add episodic memory implementation (Issue #90 ) Implements the episodic memory service for storing and retrieving agent task execution experiences. This enables learning from past successes and failures. Components: - EpisodicMemory: Main service class combining recording and retrieval - EpisodeRecorder: Handles episode creation, importance scoring - EpisodeRetriever: Multiple retrieval strategies (recency, semantic, outcome, importance, task type) Key features: - Records task completions with context, actions, outcomes - Calculates importance scores based on outcome, duration, lessons - Semantic search with fallback to recency when embeddings unavailable - Full CRUD operations with statistics and summarization - Comprehensive unit tests (50 tests, all passing) Closes #90	2026-01-05 02:08:16 +01:00
Felipe Cardoso	26fd776927	fix(memory): address review findings from Issue #88 Fixes based on multi-agent review: Model Improvements: - Remove duplicate index ix_procedures_agent_type (already indexed via Column) - Fix postgresql_where to use text() instead of string literal in Fact model - Add thread-safety to Procedure.success_rate property (snapshot values) Data Integrity Constraints: - Add CheckConstraint for Episode: importance_score 0-1, duration >= 0, tokens >= 0 - Add CheckConstraint for Fact: confidence 0-1 - Add CheckConstraint for Procedure: success_count >= 0, failure_count >= 0 Migration Updates: - Add check constraints creation in upgrade() - Add check constraints removal in downgrade() Note: SQLAlchemy Column default=list is correct (callable factory pattern)	2026-01-05 01:54:51 +01:00
Felipe Cardoso	66cdfb6a5f	feat(memory): add working memory implementation (Issue #89 ) Implements session-scoped ephemeral memory with: Storage Backends: - InMemoryStorage: Thread-safe fallback with TTL support and capacity limits - RedisStorage: Primary storage with connection pooling and JSON serialization - Auto-fallback from Redis to in-memory when unavailable WorkingMemory Class: - Key-value storage with TTL and reserved key protection - Task state tracking with progress updates - Scratchpad for reasoning steps with timestamps - Checkpoint/snapshot support for recovery - Factory methods for auto-configured storage Tests: - 55 unit tests covering all functionality - Tests for basic ops, TTL, capacity, concurrency - Tests for task state, scratchpad, checkpoints	2026-01-05 01:51:03 +01:00
Felipe Cardoso	c56fa77680	feat(memory): add database schema and storage layer (Issue #88 ) Add SQLAlchemy models for the Agent Memory System: - WorkingMemory: Key-value storage with TTL for active sessions - Episode: Experiential memories from task executions - Fact: Semantic knowledge triples with confidence scores - Procedure: Learned skills and procedures with success tracking - MemoryConsolidationLog: Tracks consolidation jobs between memory tiers Create enums for memory system: - ScopeType: global, project, agent_type, agent_instance, session - EpisodeOutcome: success, failure, partial - ConsolidationType: working_to_episodic, episodic_to_semantic, etc. - ConsolidationStatus: pending, running, completed, failed Add Alembic migration (0005) for all memory tables with: - Foreign key relationships to projects, agent_instances, agent_types - Comprehensive indexes for query patterns - Unique constraints for key lookups and triple uniqueness - Vector embedding column placeholders (Text fallback until pgvector enabled) Fix timezone-naive datetime.now() in types.py TaskState (review feedback) Includes 30 unit tests for models and enums. Closes #88	2026-01-05 01:37:58 +01:00
Felipe Cardoso	11dbafd2b5	feat(memory): #87 project setup & core architecture Implements Sub-Issue #87 of Issue #62 (Agent Memory System). Core infrastructure: - memory/types.py: Type definitions for all memory types (Working, Episodic, Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome - memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton - memory/exceptions.py: Comprehensive exception hierarchy for memory operations - memory/manager.py: MemoryManager facade with placeholder methods Directory structure: - working/: Working memory (Redis/in-memory) - to be implemented in #89 - episodic/: Episodic memory (experiences) - to be implemented in #90 - semantic/: Semantic memory (facts) - to be implemented in #91 - procedural/: Procedural memory (skills) - to be implemented in #92 - scoping/: Scope management - to be implemented in #93 - indexing/: Vector indexing - to be implemented in #94 - consolidation/: Memory consolidation - to be implemented in #95 Tests: 71 unit tests for config, types, and exceptions Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md	2026-01-05 01:27:36 +01:00
Felipe Cardoso	d72c262a29	feat(tests): add unit tests for Context Management API routes - Added detailed unit tests for `/context` endpoints, covering health checks, context assembly, token counting, budget retrieval, and cache invalidation. - Included edge cases, error handling, and input validation for context-related operations. - Improved test coverage for the Context Management module with mocked dependencies and integration scenarios.	2026-01-05 01:02:49 +01:00
Felipe Cardoso	c385643d6b	feat(tests): add comprehensive E2E tests for MCP and Agent workflows - Introduced end-to-end tests for MCP workflows, including server discovery, authentication, context engine operations, error handling, and input validation. - Added full lifecycle tests for agent workflows, covering type management, instance spawning, status transitions, and admin-only operations. - Enhanced test coverage for real-world MCP and Agent scenarios across PostgreSQL and async environments.	2026-01-05 01:02:41 +01:00
Felipe Cardoso	0931675bb8	feat(api): add Context Management API and routes - Introduced a new `context` module and its endpoints for Context Management. - Added `/context` route to the API router for assembling LLM context, token counting, budget management, and cache invalidation. - Implemented health checks, context assembly, token counting, and caching operations in the Context Management Engine. - Included schemas for request/response models and tightened error handling for context-related operations.	2026-01-05 01:02:33 +01:00
Felipe Cardoso	dff5fe14d8	feat(tests): add comprehensive integration tests for MCP stack - Introduced integration tests covering backend, LLM Gateway, Knowledge Base, and Context Engine. - Includes health checks, tool listing, token counting, and end-to-end MCP flows. - Added `RUN_INTEGRATION_TESTS` environment flag to enable selective test execution. - Includes a quick health check script to verify service availability before running tests.	2026-01-05 01:02:22 +01:00
Felipe Cardoso	010fb6002c	feat: add integration testing target to Makefile - Introduced `test-integration` command for MCP integration tests. - Expanded help section with details about running integration tests. - Improved Makefile's testing capabilities for enhanced developer workflows.	2026-01-05 01:02:16 +01:00
Felipe Cardoso	c8e5a18cbd	test(safety): add comprehensive tests for safety framework modules Add tests to improve backend coverage from 85% to 93%: - test_audit.py: 60 tests for AuditLogger (20% -> 99%) - Hash chain integrity, sanitization, retention, handlers - Fixed bug: hash chain modification after event creation - Fixed bug: verification not using correct prev_hash - test_hitl.py: Tests for HITL manager (0% -> 100%) - test_permissions.py: Tests for permissions manager (0% -> 99%) - test_rollback.py: Tests for rollback manager (0% -> 100%) - test_metrics.py: Tests for metrics collector (0% -> 100%) - test_mcp_integration.py: Tests for MCP safety wrapper (0% -> 100%) - test_validation.py: Additional cache and edge case tests (76% -> 100%) - test_scoring.py: Lock cleanup and edge case tests (78% -> 91%)	2026-01-04 19:41:54 +01:00
Felipe Cardoso	fa625dfe32	feat(context): improve budget validation and XML safety in ranking and Claude adapter - Added stricter budget validation in ContextRanker with explicit error handling for invalid configurations. - Introduced `_get_valid_token_count()` helper to validate and safeguard token counts. - Enhanced XML escaping in Claude adapter to prevent injection risks from scores and unhandled content.	2026-01-04 16:02:18 +01:00
Felipe Cardoso	f346cf8bb1	feat(context): enhance timeout handling, tenant isolation, and budget management - Added timeout enforcement for token counting, scoring, and compression with detailed error handling. - Introduced tenant isolation in context caching using project and agent identifiers. - Enhanced budget management with stricter checks for critical context overspending and buffer limitations. - Optimized per-context locking with cleanup to prevent memory leaks in concurrent environments. - Updated default assembly timeout settings for improved performance and reliability. - Improved XML escaping in Claude adapter for safety against injection attacks. - Standardized token estimation using model-specific ratios.	2026-01-04 15:52:50 +01:00
Felipe Cardoso	9c88aa4a2c	chore(context): refactor for consistency, optimize formatting, and simplify logic - Cleaned up unnecessary comments in `__all__` definitions for better readability. - Adjusted indentation and formatting across modules for improved clarity (e.g., long lines, logical grouping). - Simplified conditional expressions and inline comments for context scoring and ranking. - Replaced some hard-coded values with type-safe annotations (e.g., `ClassVar`). - Removed unused imports and ensured consistent usage across test files. - Updated `test_score_not_cached_on_context` to clarify caching behavior. - Improved truncation strategy logic and marker handling.	2026-01-04 15:23:14 +01:00
Felipe Cardoso	6f18372689	test(context): add edge case tests for truncation and scoring concurrency - Add tests for truncation edge cases, including zero tokens, short content, and marker handling. - Add concurrency tests for scoring to verify per-context locking and handling of multiple contexts.	2026-01-04 12:38:04 +01:00
Felipe Cardoso	844660eea2	feat(context): enhance performance, caching, and settings management - Replace hard-coded limits with configurable settings (e.g., cache memory size, truncation strategy, relevance settings). - Optimize parallel execution in token counting, scoring, and reranking for source diversity. - Improve caching logic: - Add per-context locks for safe parallel scoring. - Reuse precomputed fingerprints for cache efficiency. - Make truncation, scoring, and ranker behaviors fully configurable via settings. - Add support for middle truncation, context hash-based hashing, and dynamic token limiting. - Refactor methods for scalability and better error handling. Tests: Updated all affected components with additional test cases.	2026-01-04 12:37:58 +01:00
Felipe Cardoso	c6b0dc7af8	chore(context): apply linter fixes and sort imports (#86 ) Phase 8 of Context Management Engine - Final Cleanup: - Sort __all__ exports alphabetically - Sort imports per isort conventions - Fix minor linting issues Final test results: - 311 context management tests passing - 2507 total backend tests passing - 85% code coverage Context Management Engine is complete with all 8 phases: 1. Foundation: Types, Config, Exceptions 2. Token Budget Management 3. Context Scoring & Ranking 4. Context Assembly Pipeline 5. Model Adapters (Claude, OpenAI) 6. Caching Layer (Redis + in-memory) 7. Main Engine & Integration 8. Testing & Documentation	2026-01-04 02:46:56 +01:00
Felipe Cardoso	8bc27599d7	feat(context): implement main ContextEngine with full integration (#85 ) Phase 7 of Context Management Engine - Main Engine: - Add ContextEngine as main orchestration class - Integrate all components: calculator, scorer, ranker, compressor, cache - Add high-level assemble_context() API with: - System prompt support - Task description support - Knowledge Base integration via MCP - Conversation history conversion - Tool results conversion - Custom contexts support - Add helper methods: - get_budget_for_model() - count_tokens() with caching - invalidate_cache() - get_stats() - Add create_context_engine() factory function Tests: 26 new tests, 311 total context tests passing	2026-01-04 02:44:40 +01:00
Felipe Cardoso	1c8d7f8f73	feat(context): implement Redis-based caching layer (#84 ) Phase 6 of Context Management Engine - Caching Layer: - Add ContextCache with Redis integration - Support fingerprint-based assembled context caching - Support token count caching (model-specific) - Support score caching (scorer + context + query) - Add in-memory fallback with LRU eviction - Add cache invalidation with pattern matching - Add cache statistics reporting Key features: - Hierarchical cache key structure (ctx:type:hash) - Automatic TTL expiration - Memory cache for fast repeated access - Graceful degradation when Redis unavailable Tests: 29 new tests, 285 total context tests passing	2026-01-04 02:41:21 +01:00
Felipe Cardoso	2aaae5382e	feat(context): implement model adapters for Claude and OpenAI (#83 ) Phase 5 of Context Management Engine - Model Adapters: - Add ModelAdapter abstract base class with model matching - Add DefaultAdapter for unknown models (plain text) - Add ClaudeAdapter with XML-based formatting: - <system_instructions> for system context - <reference_documents>/<document> for knowledge - <conversation_history>/<message> for chat - <tool_results>/<tool_result> for tool outputs - XML escaping for special characters - Add OpenAIAdapter with markdown formatting: - ## headers for sections - ### Source headers for documents - ROLE bold labels for conversation - Code blocks for tool outputs - Add get_adapter() factory function for model selection Tests: 33 new tests, 256 total context tests passing	2026-01-04 02:36:32 +01:00
Felipe Cardoso	d94b3ea904	feat(context): implement assembly pipeline and compression (#82 ) Phase 4 of Context Management Engine - Assembly Pipeline: - Add TruncationStrategy with end/middle/sentence-aware truncation - Add TruncationResult dataclass for tracking compression metrics - Add ContextCompressor for type-specific compression - Add ContextPipeline orchestrating full assembly workflow: - Token counting for all contexts - Scoring and ranking via ContextRanker - Optional compression when budget threshold exceeded - Model-specific formatting (XML for Claude, markdown for OpenAI) - Add PipelineMetrics for performance tracking - Update AssembledContext with new fields (model, contexts, metadata) - Add backward compatibility aliases for renamed fields Tests: 34 new tests, 223 total context tests passing	2026-01-04 02:32:25 +01:00
Felipe Cardoso	78f874a5c3	feat(context): implement context scoring and ranking (Phase 3) Add comprehensive scoring system with three strategies: - RelevanceScorer: Semantic similarity with keyword fallback - RecencyScorer: Exponential decay with type-specific half-lives - PriorityScorer: Priority-based scoring with type bonuses Implement CompositeScorer combining all strategies with configurable weights (default: 50% relevance, 30% recency, 20% priority). Add ContextRanker for budget-aware context selection with: - Greedy selection algorithm respecting token budgets - CRITICAL priority contexts always included - Diversity reranking to prevent source dominance - Comprehensive selection statistics 68 tests covering all scoring and ranking functionality. Part of #61 - Context Management Engine	2026-01-04 02:24:06 +01:00
Felipe Cardoso	a394a12f66	feat(context): implement token budget management (Phase 2) Add TokenCalculator with LLM Gateway integration for accurate token counting with in-memory caching and fallback character-based estimation. Implement TokenBudget for tracking allocations per context type with budget enforcement, and BudgetAllocator for creating budgets based on model context window sizes. - TokenCalculator: MCP integration, caching, model-specific ratios - TokenBudget: allocation tracking, can_fit/allocate/deallocate/reset - BudgetAllocator: model context sizes, budget creation and adjustment - 35 comprehensive tests covering all budget functionality Part of #61 - Context Management Engine	2026-01-04 02:13:23 +01:00
Felipe Cardoso	4a54dcc96a	feat(context): Phase 1 - Foundation types, config and exceptions (#79 ) Implements the foundation for Context Management Engine: Types (backend/app/services/context/types/): - BaseContext: Abstract base with ID, content, priority, scoring - SystemContext: System prompts, personas, instructions - KnowledgeContext: RAG results from Knowledge Base MCP - ConversationContext: Chat history with role support - TaskContext: Task/issue context with acceptance criteria - ToolContext: Tool definitions and execution results - AssembledContext: Final assembled context result Configuration (config.py): - Token budget allocation (system 5%, task 10%, knowledge 40%, etc.) - Scoring weights (relevance 50%, recency 30%, priority 20%) - Cache settings (TTL, prefix) - Performance settings (max assembly time, parallel scoring) - Environment variable overrides with CTX_ prefix Exceptions (exceptions.py): - ContextError: Base exception - BudgetExceededError: Token budget violations - TokenCountError: Token counting failures - CompressionError: Compression failures - AssemblyTimeoutError: Assembly timeout - ScoringError, FormattingError, CacheError - ContextNotFoundError, InvalidContextError All 86 tests pass.	2026-01-04 02:07:39 +01:00
Felipe Cardoso	361dfde90c	refactor(environment): update virtualenv path to `/opt/venv` in Docker setup - Adjusted `docker-compose.dev.yml` to reflect the new venv location. - Modified entrypoint script and Dockerfile to reference `/opt/venv` for isolated dependencies. - Improved bind mount setup to prevent venv overwrites during development.	2026-01-04 00:58:24 +01:00
Felipe Cardoso	451df58cc2	feat(safety): enhance rate limiting and cost control with alert deduplication and usage tracking - Added `record_action` in `RateLimiter` for precise tracking of slot consumption post-validation. - Introduced deduplication mechanism for warning alerts in `CostController` to prevent spamming. - Refactored `CostController`'s session and daily budget alert handling for improved clarity. - Implemented test suites for `CostController` and `SafetyGuardian` to validate changes. - Expanded integration testing to cover deduplication, validation, and loop detection edge cases.	2026-01-03 17:55:34 +01:00
Felipe Cardoso	41cf5c99a1	refactor(safety): apply consistent formatting across services and tests Improved code readability and uniformity by standardizing line breaks, indentation, and inline conditions across safety-related services, models, and tests, including content filters, validation rules, and emergency controls.	2026-01-03 16:23:39 +01:00
Felipe Cardoso	f49f12cbe4	fix(tests): use delay variables in retry delay test The delay2 and delay3 variables were calculated but never asserted, causing lint warnings. Added assertions to verify all delays are positive and within max bounds.	2026-01-03 16:19:54 +01:00
Felipe Cardoso	8cc3ee4c46	fix(safety): copy default patterns to avoid test pollution The ContentFilter was appending references to DEFAULT_PATTERNS objects, so when tests modified patterns (e.g., disabling them), those changes persisted across test runs. Use dataclass replace() to create copies.	2026-01-03 12:08:43 +01:00
Felipe Cardoso	7ff64a40d0	test(safety): add Phase E comprehensive safety tests - Add tests for models: ActionMetadata, ActionRequest, ActionResult, ValidationRule, BudgetStatus, RateLimitConfig, ApprovalRequest/Response, Checkpoint, RollbackResult, AuditEvent, SafetyPolicy, GuardianResult - Add tests for validation: ActionValidator rules, priorities, patterns, bypass mode, batch validation, rule creation helpers - Add tests for loops: LoopDetector exact/semantic/oscillation detection, LoopBreaker throttle/backoff, history management - Add tests for content filter: PII filtering (email, phone, SSN, credit card), secret blocking (API keys, GitHub tokens, private keys), custom patterns, scan without filtering, dict filtering - Add tests for emergency controls: state management, pause/resume/reset, scoped emergency stops, callbacks, EmergencyTrigger events - Fix exception kwargs in content filter and emergency controls to match exception class signatures All 108 tests passing with lint and type checks clean.	2026-01-03 11:52:35 +01:00
Felipe Cardoso	595d9e4fa0	feat(safety): add Phase D MCP integration and metrics - Add MCPSafetyWrapper for safe MCP tool execution - Add MCPToolCall/MCPToolResult models for MCP interactions - Add SafeToolExecutor context manager - Add SafetyMetrics collector with Prometheus export support - Track validations, approvals, rate limits, budgets, and more - Support for counters, gauges, and histograms Issue #63	2026-01-03 11:40:14 +01:00
Felipe Cardoso	ebe0fe09d0	feat(safety): add Phase C advanced controls - Add rollback manager with file checkpointing and transaction context - Add HITL manager with approval queues and notification handlers - Add content filter with PII, secrets, and injection detection - Add emergency controls with stop/pause/resume capabilities - Update SafetyConfig with checkpoint_dir setting Issue #63	2026-01-03 11:36:24 +01:00
Felipe Cardoso	71e4c560e4	feat(backend): add Phase B safety subsystems (#63 ) Implements core control subsystems for the safety framework: Action Validation (validation/validator.py): - Rule-based validation engine with priority ordering - Allow/deny/require-approval rule types - Pattern matching for tools and resources - Validation result caching with LRU eviction - Emergency bypass capability with audit Permission System (permissions/manager.py): - Per-agent permission grants on resources - Resource pattern matching (wildcards) - Temporary permissions with expiration - Permission inheritance hierarchy - Default deny with configurable defaults Cost Control (costs/controller.py): - Per-session and per-day budget tracking - Token and USD cost limits - Warning alerts at configurable thresholds - Budget rollover and reset policies - Real-time usage tracking Rate Limiting (limits/limiter.py): - Sliding window rate limiter - Per-action, per-LLM-call, per-file-op limits - Burst allowance with recovery - Configurable limits per operation type Loop Detection (loops/detector.py): - Exact repetition detection (same action+args) - Semantic repetition (similar actions) - Oscillation pattern detection (A→B→A→B) - Per-agent action history tracking - Loop breaking suggestions	2026-01-03 11:28:00 +01:00
Felipe Cardoso	4307bc1380	feat(backend): add safety framework foundation (Phase A) (#63 ) Core safety framework architecture for autonomous agent guardrails: Core Components: - SafetyGuardian: Main orchestrator for all safety checks - AuditLogger: Comprehensive audit logging with hash chain tamper detection - SafetyConfig: Pydantic-based configuration - Models: Action requests, validation results, policies, checkpoints Exception Hierarchy: - SafetyError base with context preservation - Permission, Budget, RateLimit, Loop errors - Approval workflow errors (Required, Denied, Timeout) - Rollback, Sandbox, Emergency exceptions Safety Policy System: - Autonomy level based policies (FULL_CONTROL, MILESTONE, AUTONOMOUS) - Cost limits, rate limits, permission patterns - HITL approval requirements per action type - Configurable loop detection thresholds Directory Structure: - validation/, costs/, limits/, loops/ - Control subsystems - permissions/, rollback/, hitl/ - Access and recovery - content/, sandbox/, emergency/ - Protection systems - audit/, policies/ - Logging and configuration Phase A establishes the architecture. Subsystems to be implemented in Phase B-C.	2026-01-03 11:22:25 +01:00

1 2 3 4

183 Commits