# Agent Memory System - Implementation Plan ## Issue #62 - Part of Epic #60 (Phase 2: MCP Integration) **Branch:** `feature/62-agent-memory-system` **Parent Epic:** #60 [EPIC] Phase 2: MCP Integration **Dependencies:** #56 (LLM Gateway), #57 (Knowledge Base), #61 (Context Management Engine) --- ## Executive Summary The Agent Memory System provides multi-tier cognitive memory for AI agents, enabling them to: - Maintain state across sessions (Working Memory) - Learn from past experiences (Episodic Memory) - Store and retrieve facts (Semantic Memory) - Develop and reuse procedures (Procedural Memory) ### Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Agent Memory System │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Working Memory │───────────────────▶ │ Episodic Memory │ │ │ │ (Redis/In-Mem) │ consolidate │ (PostgreSQL) │ │ │ │ │ │ │ │ │ │ • Current task │ │ • Past sessions │ │ │ │ • Variables │ │ • Experiences │ │ │ │ • Scratchpad │ │ • Outcomes │ │ │ └─────────────────┘ └────────┬────────┘ │ │ │ │ │ extract │ │ │ ▼ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │Procedural Memory│◀─────────────────────│ Semantic Memory │ │ │ │ (PostgreSQL) │ learn from │ (PostgreSQL + │ │ │ │ │ │ pgvector) │ │ │ │ • Procedures │ │ │ │ │ │ • Skills │ │ • Facts │ │ │ │ • Patterns │ │ • Entities │ │ │ └─────────────────┘ │ • Relationships │ │ │ └─────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### Memory Scoping Hierarchy ``` Global Memory (shared by all) └── Project Memory (per project) └── Agent Type Memory (per agent type) └── Agent Instance Memory (per instance) └── Session Memory (ephemeral) ``` --- ## Sub-Issue Breakdown ### Phase 1: Foundation (Critical Path) #### Sub-Issue #62-1: Project Setup & Core Architecture **Priority:** P0 - Must complete first **Estimated Complexity:** Medium **Tasks:** - [ ] Create `backend/app/services/memory/` directory structure - [ ] Create `__init__.py` with public API exports - [ ] Create `config.py` with `MemorySettings` (Pydantic) - [ ] Define base interfaces in `types.py`: - `MemoryItem` - Base class for all memory items - `MemoryScope` - Enum for scoping levels - `MemoryStore` - Abstract base for storage backends - [ ] Create `manager.py` with `MemoryManager` class (facade) - [ ] Create `exceptions.py` with memory-specific errors - [ ] Write ADR-010 documenting memory architecture decisions - [ ] Create dependency injection setup - [ ] Unit tests for configuration and types **Deliverables:** - Directory structure matching existing patterns (like `context/`, `safety/`) - Configuration with MEM_ env prefix - Type definitions for all memory concepts - Comprehensive unit tests --- #### Sub-Issue #62-2: Database Schema & Storage Layer **Priority:** P0 - Required for all memory types **Estimated Complexity:** High **Database Tables:** 1. **`working_memory`** - Ephemeral key-value storage - `id` (UUID, PK) - `scope_type` (ENUM: global/project/agent_type/agent_instance/session) - `scope_id` (VARCHAR - the ID for the scope level) - `key` (VARCHAR) - `value` (JSONB) - `expires_at` (TIMESTAMP WITH TZ) - `created_at`, `updated_at` 2. **`episodes`** - Experiential memories - `id` (UUID, PK) - `project_id` (UUID, FK) - `agent_instance_id` (UUID, FK, nullable) - `agent_type_id` (UUID, FK, nullable) - `session_id` (VARCHAR) - `task_type` (VARCHAR) - `task_description` (TEXT) - `actions` (JSONB) - `context_summary` (TEXT) - `outcome` (ENUM: success/failure/partial) - `outcome_details` (TEXT) - `duration_seconds` (FLOAT) - `tokens_used` (BIGINT) - `lessons_learned` (JSONB - list of strings) - `importance_score` (FLOAT, 0-1) - `embedding` (VECTOR(1536)) - `occurred_at` (TIMESTAMP WITH TZ) - `created_at`, `updated_at` 3. **`facts`** - Semantic knowledge - `id` (UUID, PK) - `project_id` (UUID, FK, nullable - null for global) - `subject` (VARCHAR) - `predicate` (VARCHAR) - `object` (TEXT) - `confidence` (FLOAT, 0-1) - `source_episode_ids` (UUID[]) - `first_learned` (TIMESTAMP WITH TZ) - `last_reinforced` (TIMESTAMP WITH TZ) - `reinforcement_count` (INT) - `embedding` (VECTOR(1536)) - `created_at`, `updated_at` 4. **`procedures`** - Learned skills - `id` (UUID, PK) - `project_id` (UUID, FK, nullable) - `agent_type_id` (UUID, FK, nullable) - `name` (VARCHAR) - `trigger_pattern` (TEXT) - `steps` (JSONB) - `success_count` (INT) - `failure_count` (INT) - `last_used` (TIMESTAMP WITH TZ) - `embedding` (VECTOR(1536)) - `created_at`, `updated_at` 5. **`memory_consolidation_log`** - Consolidation tracking - `id` (UUID, PK) - `consolidation_type` (ENUM) - `source_count` (INT) - `result_count` (INT) - `started_at`, `completed_at` - `status` (ENUM: pending/running/completed/failed) - `error` (TEXT, nullable) **Tasks:** - [ ] Create SQLAlchemy models in `backend/app/models/memory/` - [ ] Create Alembic migration with all tables - [ ] Add pgvector indexes (HNSW for episodes, facts, procedures) - [ ] Create repository classes in `backend/app/crud/memory/` - [ ] Add composite indexes for common query patterns - [ ] Unit tests for all repositories --- #### Sub-Issue #62-3: Working Memory Implementation **Priority:** P0 - Core functionality **Estimated Complexity:** Medium **Components:** - `backend/app/services/memory/working/memory.py` - WorkingMemory class - `backend/app/services/memory/working/storage.py` - Redis + in-memory backend **Features:** - [ ] Session-scoped containers with automatic cleanup - [ ] Variable storage (get/set/delete) - [ ] Task state tracking (current step, status, progress) - [ ] Scratchpad for reasoning steps - [ ] Configurable capacity limits - [ ] TTL-based expiration - [ ] Checkpoint/snapshot support for recovery - [ ] Redis primary storage with in-memory fallback **API:** ```python class WorkingMemory: async def set(self, key: str, value: Any, ttl_seconds: int | None = None) -> None async def get(self, key: str, default: Any = None) -> Any async def delete(self, key: str) -> bool async def exists(self, key: str) -> bool async def list_keys(self, pattern: str = "*") -> list[str] async def get_all(self) -> dict[str, Any] async def clear(self) -> int async def set_task_state(self, state: TaskState) -> None async def get_task_state(self) -> TaskState | None async def append_scratchpad(self, content: str) -> None async def get_scratchpad(self) -> list[str] async def create_checkpoint(self) -> str # Returns checkpoint ID async def restore_checkpoint(self, checkpoint_id: str) -> None ``` --- ### Phase 2: Memory Types #### Sub-Issue #62-4: Episodic Memory Implementation **Priority:** P1 **Estimated Complexity:** High **Components:** - `backend/app/services/memory/episodic/memory.py` - EpisodicMemory class - `backend/app/services/memory/episodic/recorder.py` - Episode recording - `backend/app/services/memory/episodic/retrieval.py` - Retrieval strategies **Features:** - [ ] Episode recording during agent execution - [ ] Store task completions with context - [ ] Store failures with error context - [ ] Retrieval by semantic similarity (vector search) - [ ] Retrieval by recency - [ ] Retrieval by outcome (success/failure) - [ ] Importance scoring based on outcome significance - [ ] Episode summarization for long-term storage **API:** ```python class EpisodicMemory: async def record_episode(self, episode: EpisodeCreate) -> Episode async def search_similar(self, query: str, limit: int = 10) -> list[Episode] async def get_recent(self, limit: int = 10, since: datetime | None = None) -> list[Episode] async def get_by_outcome(self, outcome: Outcome, limit: int = 10) -> list[Episode] async def get_by_task_type(self, task_type: str, limit: int = 10) -> list[Episode] async def update_importance(self, episode_id: UUID, score: float) -> None async def summarize_episodes(self, episode_ids: list[UUID]) -> str ``` --- #### Sub-Issue #62-5: Semantic Memory Implementation **Priority:** P1 **Estimated Complexity:** High **Components:** - `backend/app/services/memory/semantic/memory.py` - SemanticMemory class - `backend/app/services/memory/semantic/extraction.py` - Fact extraction from episodes - `backend/app/services/memory/semantic/verification.py` - Fact verification **Features:** - [ ] Fact storage with triple format (subject, predicate, object) - [ ] Confidence scoring and decay - [ ] Fact extraction from episodic memory - [ ] Conflict resolution for contradictory facts - [ ] Retrieval by query (semantic search) - [ ] Retrieval by entity (subject or object) - [ ] Source tracking (which episodes contributed) - [ ] Reinforcement on repeated learning **API:** ```python class SemanticMemory: async def store_fact(self, fact: FactCreate) -> Fact async def search_facts(self, query: str, limit: int = 10) -> list[Fact] async def get_by_entity(self, entity: str, limit: int = 20) -> list[Fact] async def reinforce_fact(self, fact_id: UUID) -> Fact async def deprecate_fact(self, fact_id: UUID, reason: str) -> None async def extract_facts_from_episode(self, episode: Episode) -> list[Fact] async def resolve_conflict(self, fact_ids: list[UUID]) -> Fact ``` --- #### Sub-Issue #62-6: Procedural Memory Implementation **Priority:** P2 **Estimated Complexity:** Medium **Components:** - `backend/app/services/memory/procedural/memory.py` - ProceduralMemory class - `backend/app/services/memory/procedural/matching.py` - Procedure matching **Features:** - [ ] Procedure recording from successful task patterns - [ ] Trigger pattern matching - [ ] Step-by-step procedure storage - [ ] Success/failure rate tracking - [ ] Procedure suggestion based on context - [ ] Procedure versioning **API:** ```python class ProceduralMemory: async def record_procedure(self, procedure: ProcedureCreate) -> Procedure async def find_matching(self, context: str, limit: int = 5) -> list[Procedure] async def record_outcome(self, procedure_id: UUID, success: bool) -> None async def get_best_procedure(self, task_type: str) -> Procedure | None async def update_steps(self, procedure_id: UUID, steps: list[Step]) -> Procedure ``` --- ### Phase 3: Advanced Features #### Sub-Issue #62-7: Memory Scoping **Priority:** P1 **Estimated Complexity:** Medium **Components:** - `backend/app/services/memory/scoping/scope.py` - Scope management - `backend/app/services/memory/scoping/resolver.py` - Scope resolution **Features:** - [ ] Global scope (shared across all) - [ ] Project scope (per project) - [ ] Agent type scope (per agent type) - [ ] Agent instance scope (per instance) - [ ] Session scope (ephemeral) - [ ] Scope inheritance (child sees parent memories) - [ ] Access control policies --- #### Sub-Issue #62-8: Memory Indexing & Retrieval **Priority:** P1 **Estimated Complexity:** High **Components:** - `backend/app/services/memory/indexing/index.py` - Memory indexer - `backend/app/services/memory/indexing/retrieval.py` - Retrieval engine **Features:** - [ ] Vector embeddings for all memory types - [ ] Temporal index (by time) - [ ] Entity index (by entities mentioned) - [ ] Outcome index (by success/failure) - [ ] Hybrid retrieval (vector + filters) - [ ] Relevance scoring - [ ] Retrieval caching --- #### Sub-Issue #62-9: Memory Consolidation **Priority:** P2 **Estimated Complexity:** High **Components:** - `backend/app/services/memory/consolidation/service.py` - Consolidation service - `backend/app/tasks/memory_consolidation.py` - Celery tasks **Features:** - [ ] Working → Episodic transfer (session end) - [ ] Episodic → Semantic extraction (learn facts) - [ ] Episodic → Procedural extraction (learn procedures) - [ ] Nightly consolidation Celery tasks - [ ] Memory pruning (remove low-value) - [ ] Importance-based retention --- ### Phase 4: Integration #### Sub-Issue #62-10: MCP Tools Definition **Priority:** P0 - Required for agent usage **Estimated Complexity:** Medium **MCP Tools:** 1. **`remember`** - Store in memory ```json { "memory_type": "working|episodic|semantic|procedural", "content": "...", "importance": 0.8, "ttl_seconds": 3600 } ``` 2. **`recall`** - Retrieve from memory ```json { "query": "...", "memory_types": ["episodic", "semantic"], "limit": 10, "filters": {"outcome": "success"} } ``` 3. **`forget`** - Remove from memory ```json { "memory_type": "working", "key": "temp_calculation" } ``` 4. **`reflect`** - Analyze patterns ```json { "analysis_type": "recent_patterns|success_factors|failure_patterns" } ``` 5. **`get_memory_stats`** - Usage statistics 6. **`search_procedures`** - Find relevant procedures 7. **`record_outcome`** - Record task success/failure --- #### Sub-Issue #62-11: Component Integration **Priority:** P1 **Estimated Complexity:** Medium **Integrations:** - [ ] Context Engine (#61) - Include relevant memories in context assembly - [ ] Knowledge Base (#57) - Coordinate with KB to avoid duplication - [ ] LLM Gateway (#56) - Use for embedding generation - [ ] Agent lifecycle hooks (spawn, pause, resume, terminate) --- #### Sub-Issue #62-12: Caching Layer **Priority:** P2 **Estimated Complexity:** Medium **Features:** - [ ] Hot memory caching (frequently accessed) - [ ] Retrieval result caching - [ ] Embedding caching - [ ] Cache invalidation strategies --- ### Phase 5: Intelligence & Quality #### Sub-Issue #62-13: Memory Reflection **Priority:** P3 **Estimated Complexity:** High **Features:** - [ ] Pattern detection in episodic memory - [ ] Success/failure factor analysis - [ ] Anomaly detection - [ ] Insights generation --- #### Sub-Issue #62-14: Metrics & Observability **Priority:** P2 **Estimated Complexity:** Low **Metrics:** - `memory_size_bytes` by type and scope - `memory_operations_total` counter - `memory_retrieval_latency_seconds` histogram - `memory_consolidation_duration_seconds` histogram - `procedure_success_rate` gauge --- #### Sub-Issue #62-15: Documentation & Final Testing **Priority:** P0 **Estimated Complexity:** Medium **Deliverables:** - [ ] README with architecture overview - [ ] API documentation with examples - [ ] Integration guide - [ ] E2E tests for full memory lifecycle - [ ] Achieve >90% code coverage - [ ] Performance benchmarks --- ## Implementation Order ``` Phase 1 (Foundation) - Sequential #62-1 → #62-2 → #62-3 Phase 2 (Memory Types) - Can parallelize after Phase 1 #62-4, #62-5, #62-6 (parallel after #62-3) Phase 3 (Advanced) - Sequential within phase #62-7 → #62-8 → #62-9 Phase 4 (Integration) - After Phase 2 #62-10 → #62-11 → #62-12 Phase 5 (Quality) - Final #62-13, #62-14, #62-15 ``` --- ## Performance Targets | Metric | Target | Notes | |--------|--------|-------| | Working memory get/set | <5ms | P95 | | Episodic memory retrieval | <100ms | P95, as per epic | | Semantic memory search | <100ms | P95 | | Procedural memory matching | <50ms | P95 | | Consolidation batch | <30s | Per 1000 episodes | --- ## Risk Mitigation 1. **Embedding costs** - Use caching aggressively, batch embeddings 2. **Storage growth** - Implement TTL, pruning, and archival policies 3. **Query performance** - HNSW indexes, pagination, query optimization 4. **Scope complexity** - Start simple (instance scope only), add hierarchy later --- ## Review Checkpoints After each sub-issue: 1. Run `make validate-all` 2. Multi-agent code review 3. Verify E2E stack still works 4. Commit with granular message