202 Commits

Author SHA1 Message Date
Felipe Cardoso
4ad3d20cf2 chore(agents): update sort_order values for agent types to improve logical grouping 2026-01-06 18:43:29 +01:00
Felipe Cardoso
8e16e2645e test(forms): add unit tests for FormTextarea and FormSelect components
- Add comprehensive test coverage for FormTextarea and FormSelect components to validate rendering, accessibility, props forwarding, error handling, and behavior.
- Introduced function-scoped fixtures in e2e tests to ensure test isolation and address event loop issues with pytest-asyncio and SQLAlchemy.
2026-01-06 17:54:49 +01:00
Felipe Cardoso
3f23bc3db3 refactor(migrations): replace hardcoded database URL with configurable environment variable and update command syntax to use consistent quoting style 2026-01-06 17:19:28 +01:00
Felipe Cardoso
a0ec5fa2cc test(agents): add validation tests for category and display fields
Added comprehensive unit and API tests to validate AgentType category and display fields:
- Category validation for valid, null, and invalid values
- Icon, color, and sort_order field constraints
- Typical tasks and collaboration hints handling (stripping, removing empty strings, normalization)
- New API tests for field creation, filtering, updating, and grouping
2026-01-06 17:19:21 +01:00
Felipe Cardoso
9339ea30a1 feat(agents): add category and display fields to AgentType model
Add 6 new fields to AgentType for better organization and UI display:
- category: enum for grouping (development, design, quality, etc.)
- icon: Lucide icon identifier for UI
- color: hex color code for visual distinction
- sort_order: display ordering within categories
- typical_tasks: list of tasks the agent excels at
- collaboration_hints: agent slugs that work well together

Backend changes:
- Add AgentTypeCategory enum to enums.py
- Update AgentType model with 6 new columns and indexes
- Update schemas with validators for new fields
- Add category filter and /grouped endpoint to routes
- Update CRUD with get_grouped_by_category method
- Update seed data with categories for all 27 agents
- Add migration 0007

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 16:11:22 +01:00
Felipe Cardoso
79cb6bfd7b feat(agents): comprehensive agent types with rich personalities
Major revamp of agent types based on SOTA personality design research:
- Expanded from 6 to 27 specialized agent types
- Rich personality prompts following Anthropic and CrewAI best practices
- Each agent has structured prompt with Core Identity, Expertise,
  Principles, and Scenario Handling sections

Agent Categories:
- Core Development (8): Product Owner, PM, BA, Architect, Full Stack,
  Backend, Frontend, Mobile Engineers
- Design (2): UI/UX Designer, UX Researcher
- Quality & Operations (3): QA, DevOps, Security Engineers
- AI/ML (5): AI/ML Engineer, Researcher, CV, NLP, MLOps Engineers
- Data (2): Data Scientist, Data Engineer
- Leadership (2): Technical Lead, Scrum Master
- Domain Specialists (5): Financial, Healthcare, Scientific,
  Behavioral Psychology Experts, Technical Writer

Research applied:
- Anthropic Claude persona design guidelines
- CrewAI role/backstory/goal patterns
- Role prompting research on detailed vs generic personas
- Temperature tuning per agent type (0.2-0.7 based on role)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 14:25:13 +01:00
Felipe Cardoso
600657adc4 fix(agents): properly initialize form with API data defaults
Root cause: The demo data's model_params was missing `top_p`, but the
Zod schema required all three fields (temperature, max_tokens, top_p).
This caused silent validation failures when editing agent types.

Fixes:
1. Add getInitialValues() that ensures all required fields have defaults
2. Handle nested validation errors in handleFormError (e.g., model_params.top_p)
3. Add useEffect to reset form when agentType changes
4. Add console.error logging for debugging validation failures
5. Update demo data to include top_p in all agent types

The form now properly initializes with safe defaults for any missing
fields from the API response, preventing silent validation failures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 11:54:45 +01:00
Felipe Cardoso
5a4d93df26 feat(dashboard): use real API data and add 3 more demo projects
Dashboard changes:
- Update useDashboard hook to fetch real projects from API
- Calculate stats (active projects, agents, issues) from real data
- Keep pending approvals as mock (no backend endpoint yet)

Demo data additions:
- API Gateway Modernization project (active, complex)
- Customer Analytics Dashboard project (completed)
- DevOps Pipeline Automation project (active, complex)
- Added sprints, agent instances, and issues for each new project

Total demo data: 6 projects, 14 agents, 22 issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 03:10:10 +01:00
Felipe Cardoso
7ef217be39 feat(demo): tie all demo projects to admin user
- Update demo_data.json to use "__admin__" as owner_email for all projects
- Add admin user lookup in load_demo_data() with special "__admin__" key
- Remove notification_email from project settings (not a valid field)

This ensures demo projects are visible to the admin user when logged in.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 03:00:07 +01:00
Felipe Cardoso
f9a72fcb34 fix(models): use enum values instead of names for PostgreSQL
Add values_callable to all enum columns so SQLAlchemy serializes using
the enum's .value (lowercase) instead of .name (uppercase). PostgreSQL
enum types defined in migrations use lowercase values.

Fixes: invalid input value for enum autonomy_level: "MILESTONE"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 02:53:45 +01:00
Felipe Cardoso
fcb0a5f86a fix(models): add explicit enum names to match migration types
SQLAlchemy's Enum() auto-generates type names from Python class names
(e.g., AutonomyLevel -> autonomylevel), but migrations defined them
with underscores (e.g., autonomy_level). This mismatch caused:

  "type 'autonomylevel' does not exist"

Added explicit name parameters to all enum columns to match the
migration-defined type names:
- autonomy_level, project_status, project_complexity, client_mode
- agent_status, sprint_status
- issue_type, issue_status, issue_priority, sync_status

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 02:48:10 +01:00
Felipe Cardoso
92782bcb05 refactor(init_db): remove demo data file and implement structured seeding
- Delete `demo_data.json` replaced by structured logic for better modularity.
- Add support for seeding default agent types and new demo data structure.
- Ensure demo mode only executes when explicitly enabled (settings.DEMO_MODE).
- Enhance logging for improved debugging during DB initialization.
2026-01-06 02:34:34 +01:00
Felipe Cardoso
1dcf99ee38 fix(memory): use deque for metrics histograms to ensure bounded memory usage
- Replace default empty list with `deque` for `memory_retrieval_latency_seconds`
- Prevents unbounded memory growth by leveraging bounded circular buffer behavior
2026-01-06 02:34:28 +01:00
Felipe Cardoso
192237e69b fix(memory): unify Outcome enum and add ABANDONED support
- Add ABANDONED value to core Outcome enum in types.py
- Replace duplicate OutcomeType class in mcp/tools.py with alias to Outcome
- Simplify mcp/service.py to use outcome directly (no more silent mapping)
- Add migration 0006 to extend PostgreSQL episode_outcome enum
- Add missing constraints to migration 0005 (ix_facts_unique_triple_global)

This fixes the semantic issue where ABANDONED outcomes were silently
converted to FAILURE, losing information about task abandonment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 01:46:48 +01:00
Felipe Cardoso
3edce9cd26 fix(memory): address critical bugs from multi-agent review
Bug Fixes:
- Remove singleton pattern from consolidation/reflection services to
  prevent stale database session bugs (session is now passed per-request)
- Add LRU eviction to MemoryToolService._working dict (max 1000 sessions)
  to prevent unbounded memory growth
- Replace O(n) list.remove() with O(1) OrderedDict.move_to_end() in
  RetrievalCache for better performance under load
- Use deque with maxlen for metrics histograms to prevent unbounded
  memory growth (circular buffer with 10k max samples)
- Use full UUID for checkpoint IDs instead of 8-char prefix to avoid
  collision risk at scale (birthday paradox at ~50k checkpoints)

Test Updates:
- Update checkpoint test to expect 36-char UUID
- Update reflection singleton tests to expect new factory behavior
- Add reset_memory_reflection() no-op for backwards compatibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 18:55:32 +01:00
Felipe Cardoso
35aea2d73a perf(mcp): optimize test performance with parallel connections and reduced retries
- Connect to MCP servers concurrently instead of sequentially
- Reduce retry settings in test mode (IS_TEST=True):
  - 1 attempt instead of 3
  - 100ms retry delay instead of 1s
  - 2s timeout instead of 30-120s

Reduces MCP E2E test time from ~16s to under 1s.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 18:33:38 +01:00
Felipe Cardoso
d0f32d04f7 fix(tests): reduce TTL durations to improve test reliability
- Adjusted TTL durations and sleep intervals across memory and cache tests for consistent expiration behavior.
- Prevented test flakiness caused by timing discrepancies in token expiration and cache cleanup.
2026-01-05 18:29:02 +01:00
Felipe Cardoso
da85a8aba8 fix(memory): prevent entry metadata mutation in vector search
- Create shallow copy of VectorIndexEntry when adding similarity score
- Prevents mutation of cached entries that could corrupt shared state

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:39:54 +01:00
Felipe Cardoso
f8bd1011e9 security(memory): escape SQL ILIKE patterns to prevent injection
- Add _escape_like_pattern() helper to escape SQL wildcards (%, _, \)
- Apply escaping in SemanticMemory.search_facts and get_by_entity
- Apply escaping in ProceduralMemory.search and find_best_for_task

Prevents attackers from injecting SQL wildcard patterns through
user-controlled search terms.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:39:47 +01:00
Felipe Cardoso
f057c2f0b6 fix(memory): add thread-safe singleton initialization
- Add threading.Lock with double-check locking to ScopeManager
- Add asyncio.Lock with double-check locking to MemoryReflection
- Make reset_memory_metrics async with proper locking
- Update test fixtures to handle async reset functions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:39:39 +01:00
Felipe Cardoso
33ec889fc4 fix(memory): add data integrity constraints to Fact model
- Change source_episode_ids from JSON to JSONB for PostgreSQL consistency
- Add unique constraint for global facts (project_id IS NULL)
- Add CHECK constraint ensuring reinforcement_count >= 1

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:39:30 +01:00
Felipe Cardoso
74b8c65741 fix(tests): move memory model tests to avoid import conflicts
Moved tests/unit/models/memory/ to tests/models/memory/ to avoid
Python import path conflicts when pytest collects all tests.

The conflict was caused by tests/models/ and tests/unit/models/ both
having __init__.py files, causing Python to confuse app.models.memory
imports.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 15:45:30 +01:00
Felipe Cardoso
b232298c61 feat(memory): add memory consolidation task and switch source_episode_ids to JSON
- Added `memory_consolidation` to the task list and updated `__all__` in test files.
- Updated `source_episode_ids` in `Fact` model to use JSON for cross-database compatibility.
- Revised related database migrations to use JSONB instead of ARRAY.
- Adjusted test concurrency in Makefile for improved test performance.
2026-01-05 15:38:52 +01:00
Felipe Cardoso
cf6291ac8e style(memory): apply ruff formatting and linting fixes
Auto-fixed linting errors and formatting issues:
- Removed unused imports (F401): pytest, Any, AnalysisType, MemoryType, OutcomeType
- Removed unused variable (F841): hooks variable in test
- Applied consistent formatting across memory service and test files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 14:07:48 +01:00
Felipe Cardoso
e3fe0439fd docs(memory): add comprehensive memory system documentation (#101)
Add complete documentation for the Agent Memory System including:
- Architecture overview with ASCII diagram
- Memory type descriptions (working, episodic, semantic, procedural)
- Usage examples for all memory operations
- Memory scoping hierarchy explanation
- Consolidation flow documentation
- MCP tools reference
- Reflection capabilities
- Configuration reference table
- Integration with Context Engine
- Metrics reference
- Performance targets
- Troubleshooting guide
- Directory structure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 11:03:57 +01:00
Felipe Cardoso
57680c3772 feat(memory): implement metrics and observability (#100)
Add comprehensive metrics collector for memory system with:
- Counter metrics: operations, retrievals, cache hits/misses, consolidations,
  episodes recorded, patterns/anomalies/insights detected
- Gauge metrics: item counts, memory size, cache size, procedure success rates,
  active sessions, pending consolidations
- Histogram metrics: working memory latency, retrieval latency, consolidation
  duration, embedding latency
- Prometheus format export
- Summary and cache stats helpers

31 tests covering all metric types, singleton pattern, and edge cases.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 11:00:53 +01:00
Felipe Cardoso
997cfaa03a feat(memory): implement memory reflection service (#99)
Add reflection layer for memory system with pattern detection, success/failure
factor analysis, anomaly detection, and insights generation. Enables agents to
learn from past experiences and identify optimization opportunities.

Key components:
- Pattern detection: recurring success/failure, action sequences, temporal, efficiency
- Factor analysis: action, context, timing, resource, preceding state factors
- Anomaly detection: unusual duration, token usage, failure rates, action patterns
- Insight generation: optimization, warning, learning, recommendation, trend insights

Also fixes pre-existing timezone issues in test_types.py (datetime.now() -> datetime.now(UTC)).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 04:22:23 +01:00
Felipe Cardoso
6954774e36 feat(memory): implement caching layer for memory operations (#98)
Add comprehensive caching layer for the Agent Memory System:

- HotMemoryCache: LRU cache for frequently accessed memories
  - Python 3.12 type parameter syntax
  - Thread-safe operations with RLock
  - TTL-based expiration
  - Access count tracking for hot memory identification
  - Scoped invalidation by type, scope, or pattern

- EmbeddingCache: Cache embeddings by content hash
  - Content-hash based deduplication
  - Optional Redis backing for persistence
  - LRU eviction with configurable max size
  - CachedEmbeddingGenerator wrapper for transparent caching

- CacheManager: Unified cache management
  - Coordinates hot cache, embedding cache, and retrieval cache
  - Centralized invalidation across all caches
  - Aggregated statistics and hit rate tracking
  - Automatic cleanup scheduling
  - Cache warmup support

Performance targets:
- Cache hit rate > 80% for hot memories
- Cache operations < 1ms (memory), < 5ms (Redis)

83 new tests with comprehensive coverage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 04:04:13 +01:00
Felipe Cardoso
30e5c68304 feat(memory): integrate memory system with context engine (#97)
## Changes

### New Context Type
- Add MEMORY to ContextType enum for agent memory context
- Create MemoryContext class with subtypes (working, episodic, semantic, procedural)
- Factory methods: from_working_memory, from_episodic_memory, from_semantic_memory, from_procedural_memory

### Memory Context Source
- MemoryContextSource service fetches relevant memories for context assembly
- Configurable fetch limits per memory type
- Parallel fetching from all memory types

### Agent Lifecycle Hooks
- AgentLifecycleManager handles spawn, pause, resume, terminate events
- spawn: Initialize working memory with optional initial state
- pause: Create checkpoint of working memory
- resume: Restore from checkpoint
- terminate: Consolidate working memory to episodic memory
- LifecycleHooks for custom extension points

### Context Engine Integration
- Add memory_query parameter to assemble_context()
- Add session_id and agent_type_id for memory scoping
- Memory budget allocation (15% by default)
- set_memory_source() for runtime configuration

### Tests
- 48 new tests for MemoryContext, MemoryContextSource, and lifecycle hooks
- All 108 memory-related tests passing
- mypy and ruff checks passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 03:49:22 +01:00
Felipe Cardoso
0b24d4c6cc feat(memory): implement MCP tools for agent memory operations (#96)
Add MCP-compatible tools that expose memory operations to agents:

Tools implemented:
- remember: Store data in working, episodic, semantic, or procedural memory
- recall: Retrieve memories by query across multiple memory types
- forget: Delete specific keys or bulk delete by pattern
- reflect: Analyze patterns in recent episodes (success/failure factors)
- get_memory_stats: Return usage statistics and breakdowns
- search_procedures: Find procedures matching trigger patterns
- record_outcome: Record task outcomes and update procedure success rates

Key components:
- tools.py: Pydantic schemas for tool argument validation with comprehensive
  field constraints (importance 0-1, TTL limits, limit ranges)
- service.py: MemoryToolService coordinating memory type operations with
  proper scoping via ToolContext (project_id, agent_instance_id, session_id)
- Lazy initialization of memory services (WorkingMemory, EpisodicMemory,
  SemanticMemory, ProceduralMemory)

Test coverage:
- 60 tests covering tool definitions, argument validation, and service
  execution paths
- Mock-based tests for all memory type interactions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 03:32:10 +01:00
Felipe Cardoso
1670e05e0d feat(memory): implement memory consolidation service and tasks (#95)
- Add MemoryConsolidationService with Working→Episodic→Semantic/Procedural transfer
- Add Celery tasks for session and nightly consolidation
- Implement memory pruning with importance-based retention
- Add comprehensive test suite (32 tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 03:04:28 +01:00
Felipe Cardoso
999b7ac03f feat(memory): implement memory indexing and retrieval engine (#94)
Add comprehensive indexing and retrieval system for memory search:
- VectorIndex for semantic similarity search using cosine similarity
- TemporalIndex for time-based queries with range and recency support
- EntityIndex for entity-based lookups with multi-entity intersection
- OutcomeIndex for success/failure filtering on episodes
- MemoryIndexer as unified interface for all index types
- RetrievalEngine with hybrid search combining all indices
- RelevanceScorer for multi-signal relevance scoring
- RetrievalCache for LRU caching of search results

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:50:13 +01:00
Felipe Cardoso
48ecb40f18 feat(memory): implement memory scoping with hierarchy and access control (#93)
Add scope management system for hierarchical memory access:
- ScopeManager with hierarchy: Global → Project → Agent Type → Agent Instance → Session
- ScopePolicy for access control (read, write, inherit permissions)
- ScopeResolver for resolving queries across scope hierarchies with inheritance
- ScopeFilter for filtering scopes by type, project, or agent
- Access control enforcement with parent scope visibility
- Deduplication support during resolution across scopes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:39:22 +01:00
Felipe Cardoso
b818f17418 feat(memory): add procedural memory implementation (Issue #92)
Implements procedural memory for learned skills and procedures:

Core functionality:
- ProceduralMemory class for procedure storage/retrieval
- record_procedure with duplicate detection and step merging
- find_matching for context-based procedure search
- record_outcome for success/failure tracking
- get_best_procedure for finding highest success rate
- update_steps for procedure refinement

Supporting modules:
- ProcedureMatcher: Keyword-based procedure matching
- MatchResult/MatchContext: Matching result types
- Success rate weighting in match scoring

Test coverage:
- 43 unit tests covering all modules
- matching.py: 97% coverage
- memory.py: 86% coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:31:32 +01:00
Felipe Cardoso
e946787a61 feat(memory): add semantic memory implementation (Issue #91)
Implements semantic memory with fact storage, retrieval, and verification:

Core functionality:
- SemanticMemory class for fact storage/retrieval
- Fact storage as subject-predicate-object triples
- Duplicate detection with reinforcement
- Semantic search with text-based fallback
- Entity-based retrieval
- Confidence scoring and decay
- Conflict resolution

Supporting modules:
- FactExtractor: Pattern-based fact extraction from episodes
- FactVerifier: Contradiction detection and reliability scoring

Test coverage:
- 47 unit tests covering all modules
- extraction.py: 99% coverage
- verification.py: 95% coverage
- memory.py: 78% coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:23:06 +01:00
Felipe Cardoso
3554efe66a feat(memory): add episodic memory implementation (Issue #90)
Implements the episodic memory service for storing and retrieving
agent task execution experiences. This enables learning from past
successes and failures.

Components:
- EpisodicMemory: Main service class combining recording and retrieval
- EpisodeRecorder: Handles episode creation, importance scoring
- EpisodeRetriever: Multiple retrieval strategies (recency, semantic,
  outcome, importance, task type)

Key features:
- Records task completions with context, actions, outcomes
- Calculates importance scores based on outcome, duration, lessons
- Semantic search with fallback to recency when embeddings unavailable
- Full CRUD operations with statistics and summarization
- Comprehensive unit tests (50 tests, all passing)

Closes #90

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:08:16 +01:00
Felipe Cardoso
bd988f76b0 fix(memory): address review findings from Issue #88
Fixes based on multi-agent review:

Model Improvements:
- Remove duplicate index ix_procedures_agent_type (already indexed via Column)
- Fix postgresql_where to use text() instead of string literal in Fact model
- Add thread-safety to Procedure.success_rate property (snapshot values)

Data Integrity Constraints:
- Add CheckConstraint for Episode: importance_score 0-1, duration >= 0, tokens >= 0
- Add CheckConstraint for Fact: confidence 0-1
- Add CheckConstraint for Procedure: success_count >= 0, failure_count >= 0

Migration Updates:
- Add check constraints creation in upgrade()
- Add check constraints removal in downgrade()

Note: SQLAlchemy Column default=list is correct (callable factory pattern)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:54:51 +01:00
Felipe Cardoso
4974233169 feat(memory): add working memory implementation (Issue #89)
Implements session-scoped ephemeral memory with:

Storage Backends:
- InMemoryStorage: Thread-safe fallback with TTL support and capacity limits
- RedisStorage: Primary storage with connection pooling and JSON serialization
- Auto-fallback from Redis to in-memory when unavailable

WorkingMemory Class:
- Key-value storage with TTL and reserved key protection
- Task state tracking with progress updates
- Scratchpad for reasoning steps with timestamps
- Checkpoint/snapshot support for recovery
- Factory methods for auto-configured storage

Tests:
- 55 unit tests covering all functionality
- Tests for basic ops, TTL, capacity, concurrency
- Tests for task state, scratchpad, checkpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:51:03 +01:00
Felipe Cardoso
c9d8c0835c feat(memory): add database schema and storage layer (Issue #88)
Add SQLAlchemy models for the Agent Memory System:
- WorkingMemory: Key-value storage with TTL for active sessions
- Episode: Experiential memories from task executions
- Fact: Semantic knowledge triples with confidence scores
- Procedure: Learned skills and procedures with success tracking
- MemoryConsolidationLog: Tracks consolidation jobs between memory tiers

Create enums for memory system:
- ScopeType: global, project, agent_type, agent_instance, session
- EpisodeOutcome: success, failure, partial
- ConsolidationType: working_to_episodic, episodic_to_semantic, etc.
- ConsolidationStatus: pending, running, completed, failed

Add Alembic migration (0005) for all memory tables with:
- Foreign key relationships to projects, agent_instances, agent_types
- Comprehensive indexes for query patterns
- Unique constraints for key lookups and triple uniqueness
- Vector embedding column placeholders (Text fallback until pgvector enabled)

Fix timezone-naive datetime.now() in types.py TaskState (review feedback)

Includes 30 unit tests for models and enums.

Closes #88

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:37:58 +01:00
Felipe Cardoso
085a748929 feat(memory): #87 project setup & core architecture
Implements Sub-Issue #87 of Issue #62 (Agent Memory System).

Core infrastructure:
- memory/types.py: Type definitions for all memory types (Working, Episodic,
  Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome
- memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton
- memory/exceptions.py: Comprehensive exception hierarchy for memory operations
- memory/manager.py: MemoryManager facade with placeholder methods

Directory structure:
- working/: Working memory (Redis/in-memory) - to be implemented in #89
- episodic/: Episodic memory (experiences) - to be implemented in #90
- semantic/: Semantic memory (facts) - to be implemented in #91
- procedural/: Procedural memory (skills) - to be implemented in #92
- scoping/: Scope management - to be implemented in #93
- indexing/: Vector indexing - to be implemented in #94
- consolidation/: Memory consolidation - to be implemented in #95

Tests: 71 unit tests for config, types, and exceptions
Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:27:36 +01:00
Felipe Cardoso
4b149b8a52 feat(tests): add unit tests for Context Management API routes
- Added detailed unit tests for `/context` endpoints, covering health checks, context assembly, token counting, budget retrieval, and cache invalidation.
- Included edge cases, error handling, and input validation for context-related operations.
- Improved test coverage for the Context Management module with mocked dependencies and integration scenarios.
2026-01-05 01:02:49 +01:00
Felipe Cardoso
ad0c06851d feat(tests): add comprehensive E2E tests for MCP and Agent workflows
- Introduced end-to-end tests for MCP workflows, including server discovery, authentication, context engine operations, error handling, and input validation.
- Added full lifecycle tests for agent workflows, covering type management, instance spawning, status transitions, and admin-only operations.
- Enhanced test coverage for real-world MCP and Agent scenarios across PostgreSQL and async environments.
2026-01-05 01:02:41 +01:00
Felipe Cardoso
49359b1416 feat(api): add Context Management API and routes
- Introduced a new `context` module and its endpoints for Context Management.
- Added `/context` route to the API router for assembling LLM context, token counting, budget management, and cache invalidation.
- Implemented health checks, context assembly, token counting, and caching operations in the Context Management Engine.
- Included schemas for request/response models and tightened error handling for context-related operations.
2026-01-05 01:02:33 +01:00
Felipe Cardoso
911d950c15 feat(tests): add comprehensive integration tests for MCP stack
- Introduced integration tests covering backend, LLM Gateway, Knowledge Base, and Context Engine.
- Includes health checks, tool listing, token counting, and end-to-end MCP flows.
- Added `RUN_INTEGRATION_TESTS` environment flag to enable selective test execution.
- Includes a quick health check script to verify service availability before running tests.
2026-01-05 01:02:22 +01:00
Felipe Cardoso
b2a3ac60e0 feat: add integration testing target to Makefile
- Introduced `test-integration` command for MCP integration tests.
- Expanded help section with details about running integration tests.
- Improved Makefile's testing capabilities for enhanced developer workflows.
2026-01-05 01:02:16 +01:00
Felipe Cardoso
60ebeaa582 test(safety): add comprehensive tests for safety framework modules
Add tests to improve backend coverage from 85% to 93%:

- test_audit.py: 60 tests for AuditLogger (20% -> 99%)
  - Hash chain integrity, sanitization, retention, handlers
  - Fixed bug: hash chain modification after event creation
  - Fixed bug: verification not using correct prev_hash

- test_hitl.py: Tests for HITL manager (0% -> 100%)
- test_permissions.py: Tests for permissions manager (0% -> 99%)
- test_rollback.py: Tests for rollback manager (0% -> 100%)
- test_metrics.py: Tests for metrics collector (0% -> 100%)
- test_mcp_integration.py: Tests for MCP safety wrapper (0% -> 100%)
- test_validation.py: Additional cache and edge case tests (76% -> 100%)
- test_scoring.py: Lock cleanup and edge case tests (78% -> 91%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 19:41:54 +01:00
Felipe Cardoso
758052dcff feat(context): improve budget validation and XML safety in ranking and Claude adapter
- Added stricter budget validation in ContextRanker with explicit error handling for invalid configurations.
- Introduced `_get_valid_token_count()` helper to validate and safeguard token counts.
- Enhanced XML escaping in Claude adapter to prevent injection risks from scores and unhandled content.
2026-01-04 16:02:18 +01:00
Felipe Cardoso
1628eacf2b feat(context): enhance timeout handling, tenant isolation, and budget management
- Added timeout enforcement for token counting, scoring, and compression with detailed error handling.
- Introduced tenant isolation in context caching using project and agent identifiers.
- Enhanced budget management with stricter checks for critical context overspending and buffer limitations.
- Optimized per-context locking with cleanup to prevent memory leaks in concurrent environments.
- Updated default assembly timeout settings for improved performance and reliability.
- Improved XML escaping in Claude adapter for safety against injection attacks.
- Standardized token estimation using model-specific ratios.
2026-01-04 15:52:50 +01:00
Felipe Cardoso
2bea057fb1 chore(context): refactor for consistency, optimize formatting, and simplify logic
- Cleaned up unnecessary comments in `__all__` definitions for better readability.
- Adjusted indentation and formatting across modules for improved clarity (e.g., long lines, logical grouping).
- Simplified conditional expressions and inline comments for context scoring and ranking.
- Replaced some hard-coded values with type-safe annotations (e.g., `ClassVar`).
- Removed unused imports and ensured consistent usage across test files.
- Updated `test_score_not_cached_on_context` to clarify caching behavior.
- Improved truncation strategy logic and marker handling.
2026-01-04 15:23:14 +01:00
Felipe Cardoso
9e54f16e56 test(context): add edge case tests for truncation and scoring concurrency
- Add tests for truncation edge cases, including zero tokens, short content, and marker handling.
- Add concurrency tests for scoring to verify per-context locking and handling of multiple contexts.
2026-01-04 12:38:04 +01:00