Implements Sub-Issue #87 of Issue #62 (Agent Memory System). Core infrastructure: - memory/types.py: Type definitions for all memory types (Working, Episodic, Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome - memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton - memory/exceptions.py: Comprehensive exception hierarchy for memory operations - memory/manager.py: MemoryManager facade with placeholder methods Directory structure: - working/: Working memory (Redis/in-memory) - to be implemented in #89 - episodic/: Episodic memory (experiences) - to be implemented in #90 - semantic/: Semantic memory (facts) - to be implemented in #91 - procedural/: Procedural memory (skills) - to be implemented in #92 - scoping/: Scope management - to be implemented in #93 - indexing/: Vector indexing - to be implemented in #94 - consolidation/: Memory consolidation - to be implemented in #95 Tests: 71 unit tests for config, types, and exceptions Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
18 KiB
Agent Memory System - Implementation Plan
Issue #62 - Part of Epic #60 (Phase 2: MCP Integration)
Branch: feature/62-agent-memory-system
Parent Epic: #60 [EPIC] Phase 2: MCP Integration
Dependencies: #56 (LLM Gateway), #57 (Knowledge Base), #61 (Context Management Engine)
Executive Summary
The Agent Memory System provides multi-tier cognitive memory for AI agents, enabling them to:
- Maintain state across sessions (Working Memory)
- Learn from past experiences (Episodic Memory)
- Store and retrieve facts (Semantic Memory)
- Develop and reuse procedures (Procedural Memory)
Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ Agent Memory System │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Working Memory │───────────────────▶ │ Episodic Memory │ │
│ │ (Redis/In-Mem) │ consolidate │ (PostgreSQL) │ │
│ │ │ │ │ │
│ │ • Current task │ │ • Past sessions │ │
│ │ • Variables │ │ • Experiences │ │
│ │ • Scratchpad │ │ • Outcomes │ │
│ └─────────────────┘ └────────┬────────┘ │
│ │ │
│ extract │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │Procedural Memory│◀─────────────────────│ Semantic Memory │ │
│ │ (PostgreSQL) │ learn from │ (PostgreSQL + │ │
│ │ │ │ pgvector) │ │
│ │ • Procedures │ │ │ │
│ │ • Skills │ │ • Facts │ │
│ │ • Patterns │ │ • Entities │ │
│ └─────────────────┘ │ • Relationships │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
Memory Scoping Hierarchy
Global Memory (shared by all)
└── Project Memory (per project)
└── Agent Type Memory (per agent type)
└── Agent Instance Memory (per instance)
└── Session Memory (ephemeral)
Sub-Issue Breakdown
Phase 1: Foundation (Critical Path)
Sub-Issue #62-1: Project Setup & Core Architecture
Priority: P0 - Must complete first Estimated Complexity: Medium
Tasks:
- Create
backend/app/services/memory/directory structure - Create
__init__.pywith public API exports - Create
config.pywithMemorySettings(Pydantic) - Define base interfaces in
types.py:MemoryItem- Base class for all memory itemsMemoryScope- Enum for scoping levelsMemoryStore- Abstract base for storage backends
- Create
manager.pywithMemoryManagerclass (facade) - Create
exceptions.pywith memory-specific errors - Write ADR-010 documenting memory architecture decisions
- Create dependency injection setup
- Unit tests for configuration and types
Deliverables:
- Directory structure matching existing patterns (like
context/,safety/) - Configuration with MEM_ env prefix
- Type definitions for all memory concepts
- Comprehensive unit tests
Sub-Issue #62-2: Database Schema & Storage Layer
Priority: P0 - Required for all memory types Estimated Complexity: High
Database Tables:
-
working_memory- Ephemeral key-value storageid(UUID, PK)scope_type(ENUM: global/project/agent_type/agent_instance/session)scope_id(VARCHAR - the ID for the scope level)key(VARCHAR)value(JSONB)expires_at(TIMESTAMP WITH TZ)created_at,updated_at
-
episodes- Experiential memoriesid(UUID, PK)project_id(UUID, FK)agent_instance_id(UUID, FK, nullable)agent_type_id(UUID, FK, nullable)session_id(VARCHAR)task_type(VARCHAR)task_description(TEXT)actions(JSONB)context_summary(TEXT)outcome(ENUM: success/failure/partial)outcome_details(TEXT)duration_seconds(FLOAT)tokens_used(BIGINT)lessons_learned(JSONB - list of strings)importance_score(FLOAT, 0-1)embedding(VECTOR(1536))occurred_at(TIMESTAMP WITH TZ)created_at,updated_at
-
facts- Semantic knowledgeid(UUID, PK)project_id(UUID, FK, nullable - null for global)subject(VARCHAR)predicate(VARCHAR)object(TEXT)confidence(FLOAT, 0-1)source_episode_ids(UUID[])first_learned(TIMESTAMP WITH TZ)last_reinforced(TIMESTAMP WITH TZ)reinforcement_count(INT)embedding(VECTOR(1536))created_at,updated_at
-
procedures- Learned skillsid(UUID, PK)project_id(UUID, FK, nullable)agent_type_id(UUID, FK, nullable)name(VARCHAR)trigger_pattern(TEXT)steps(JSONB)success_count(INT)failure_count(INT)last_used(TIMESTAMP WITH TZ)embedding(VECTOR(1536))created_at,updated_at
-
memory_consolidation_log- Consolidation trackingid(UUID, PK)consolidation_type(ENUM)source_count(INT)result_count(INT)started_at,completed_atstatus(ENUM: pending/running/completed/failed)error(TEXT, nullable)
Tasks:
- Create SQLAlchemy models in
backend/app/models/memory/ - Create Alembic migration with all tables
- Add pgvector indexes (HNSW for episodes, facts, procedures)
- Create repository classes in
backend/app/crud/memory/ - Add composite indexes for common query patterns
- Unit tests for all repositories
Sub-Issue #62-3: Working Memory Implementation
Priority: P0 - Core functionality Estimated Complexity: Medium
Components:
backend/app/services/memory/working/memory.py- WorkingMemory classbackend/app/services/memory/working/storage.py- Redis + in-memory backend
Features:
- Session-scoped containers with automatic cleanup
- Variable storage (get/set/delete)
- Task state tracking (current step, status, progress)
- Scratchpad for reasoning steps
- Configurable capacity limits
- TTL-based expiration
- Checkpoint/snapshot support for recovery
- Redis primary storage with in-memory fallback
API:
class WorkingMemory:
async def set(self, key: str, value: Any, ttl_seconds: int | None = None) -> None
async def get(self, key: str, default: Any = None) -> Any
async def delete(self, key: str) -> bool
async def exists(self, key: str) -> bool
async def list_keys(self, pattern: str = "*") -> list[str]
async def get_all(self) -> dict[str, Any]
async def clear(self) -> int
async def set_task_state(self, state: TaskState) -> None
async def get_task_state(self) -> TaskState | None
async def append_scratchpad(self, content: str) -> None
async def get_scratchpad(self) -> list[str]
async def create_checkpoint(self) -> str # Returns checkpoint ID
async def restore_checkpoint(self, checkpoint_id: str) -> None
Phase 2: Memory Types
Sub-Issue #62-4: Episodic Memory Implementation
Priority: P1 Estimated Complexity: High
Components:
backend/app/services/memory/episodic/memory.py- EpisodicMemory classbackend/app/services/memory/episodic/recorder.py- Episode recordingbackend/app/services/memory/episodic/retrieval.py- Retrieval strategies
Features:
- Episode recording during agent execution
- Store task completions with context
- Store failures with error context
- Retrieval by semantic similarity (vector search)
- Retrieval by recency
- Retrieval by outcome (success/failure)
- Importance scoring based on outcome significance
- Episode summarization for long-term storage
API:
class EpisodicMemory:
async def record_episode(self, episode: EpisodeCreate) -> Episode
async def search_similar(self, query: str, limit: int = 10) -> list[Episode]
async def get_recent(self, limit: int = 10, since: datetime | None = None) -> list[Episode]
async def get_by_outcome(self, outcome: Outcome, limit: int = 10) -> list[Episode]
async def get_by_task_type(self, task_type: str, limit: int = 10) -> list[Episode]
async def update_importance(self, episode_id: UUID, score: float) -> None
async def summarize_episodes(self, episode_ids: list[UUID]) -> str
Sub-Issue #62-5: Semantic Memory Implementation
Priority: P1 Estimated Complexity: High
Components:
backend/app/services/memory/semantic/memory.py- SemanticMemory classbackend/app/services/memory/semantic/extraction.py- Fact extraction from episodesbackend/app/services/memory/semantic/verification.py- Fact verification
Features:
- Fact storage with triple format (subject, predicate, object)
- Confidence scoring and decay
- Fact extraction from episodic memory
- Conflict resolution for contradictory facts
- Retrieval by query (semantic search)
- Retrieval by entity (subject or object)
- Source tracking (which episodes contributed)
- Reinforcement on repeated learning
API:
class SemanticMemory:
async def store_fact(self, fact: FactCreate) -> Fact
async def search_facts(self, query: str, limit: int = 10) -> list[Fact]
async def get_by_entity(self, entity: str, limit: int = 20) -> list[Fact]
async def reinforce_fact(self, fact_id: UUID) -> Fact
async def deprecate_fact(self, fact_id: UUID, reason: str) -> None
async def extract_facts_from_episode(self, episode: Episode) -> list[Fact]
async def resolve_conflict(self, fact_ids: list[UUID]) -> Fact
Sub-Issue #62-6: Procedural Memory Implementation
Priority: P2 Estimated Complexity: Medium
Components:
backend/app/services/memory/procedural/memory.py- ProceduralMemory classbackend/app/services/memory/procedural/matching.py- Procedure matching
Features:
- Procedure recording from successful task patterns
- Trigger pattern matching
- Step-by-step procedure storage
- Success/failure rate tracking
- Procedure suggestion based on context
- Procedure versioning
API:
class ProceduralMemory:
async def record_procedure(self, procedure: ProcedureCreate) -> Procedure
async def find_matching(self, context: str, limit: int = 5) -> list[Procedure]
async def record_outcome(self, procedure_id: UUID, success: bool) -> None
async def get_best_procedure(self, task_type: str) -> Procedure | None
async def update_steps(self, procedure_id: UUID, steps: list[Step]) -> Procedure
Phase 3: Advanced Features
Sub-Issue #62-7: Memory Scoping
Priority: P1 Estimated Complexity: Medium
Components:
backend/app/services/memory/scoping/scope.py- Scope managementbackend/app/services/memory/scoping/resolver.py- Scope resolution
Features:
- Global scope (shared across all)
- Project scope (per project)
- Agent type scope (per agent type)
- Agent instance scope (per instance)
- Session scope (ephemeral)
- Scope inheritance (child sees parent memories)
- Access control policies
Sub-Issue #62-8: Memory Indexing & Retrieval
Priority: P1 Estimated Complexity: High
Components:
backend/app/services/memory/indexing/index.py- Memory indexerbackend/app/services/memory/indexing/retrieval.py- Retrieval engine
Features:
- Vector embeddings for all memory types
- Temporal index (by time)
- Entity index (by entities mentioned)
- Outcome index (by success/failure)
- Hybrid retrieval (vector + filters)
- Relevance scoring
- Retrieval caching
Sub-Issue #62-9: Memory Consolidation
Priority: P2 Estimated Complexity: High
Components:
backend/app/services/memory/consolidation/service.py- Consolidation servicebackend/app/tasks/memory_consolidation.py- Celery tasks
Features:
- Working → Episodic transfer (session end)
- Episodic → Semantic extraction (learn facts)
- Episodic → Procedural extraction (learn procedures)
- Nightly consolidation Celery tasks
- Memory pruning (remove low-value)
- Importance-based retention
Phase 4: Integration
Sub-Issue #62-10: MCP Tools Definition
Priority: P0 - Required for agent usage Estimated Complexity: Medium
MCP Tools:
-
remember- Store in memory{ "memory_type": "working|episodic|semantic|procedural", "content": "...", "importance": 0.8, "ttl_seconds": 3600 } -
recall- Retrieve from memory{ "query": "...", "memory_types": ["episodic", "semantic"], "limit": 10, "filters": {"outcome": "success"} } -
forget- Remove from memory{ "memory_type": "working", "key": "temp_calculation" } -
reflect- Analyze patterns{ "analysis_type": "recent_patterns|success_factors|failure_patterns" } -
get_memory_stats- Usage statistics -
search_procedures- Find relevant procedures -
record_outcome- Record task success/failure
Sub-Issue #62-11: Component Integration
Priority: P1 Estimated Complexity: Medium
Integrations:
- Context Engine (#61) - Include relevant memories in context assembly
- Knowledge Base (#57) - Coordinate with KB to avoid duplication
- LLM Gateway (#56) - Use for embedding generation
- Agent lifecycle hooks (spawn, pause, resume, terminate)
Sub-Issue #62-12: Caching Layer
Priority: P2 Estimated Complexity: Medium
Features:
- Hot memory caching (frequently accessed)
- Retrieval result caching
- Embedding caching
- Cache invalidation strategies
Phase 5: Intelligence & Quality
Sub-Issue #62-13: Memory Reflection
Priority: P3 Estimated Complexity: High
Features:
- Pattern detection in episodic memory
- Success/failure factor analysis
- Anomaly detection
- Insights generation
Sub-Issue #62-14: Metrics & Observability
Priority: P2 Estimated Complexity: Low
Metrics:
memory_size_bytesby type and scopememory_operations_totalcountermemory_retrieval_latency_secondshistogrammemory_consolidation_duration_secondshistogramprocedure_success_rategauge
Sub-Issue #62-15: Documentation & Final Testing
Priority: P0 Estimated Complexity: Medium
Deliverables:
- README with architecture overview
- API documentation with examples
- Integration guide
- E2E tests for full memory lifecycle
- Achieve >90% code coverage
- Performance benchmarks
Implementation Order
Phase 1 (Foundation) - Sequential
#62-1 → #62-2 → #62-3
Phase 2 (Memory Types) - Can parallelize after Phase 1
#62-4, #62-5, #62-6 (parallel after #62-3)
Phase 3 (Advanced) - Sequential within phase
#62-7 → #62-8 → #62-9
Phase 4 (Integration) - After Phase 2
#62-10 → #62-11 → #62-12
Phase 5 (Quality) - Final
#62-13, #62-14, #62-15
Performance Targets
| Metric | Target | Notes |
|---|---|---|
| Working memory get/set | <5ms | P95 |
| Episodic memory retrieval | <100ms | P95, as per epic |
| Semantic memory search | <100ms | P95 |
| Procedural memory matching | <50ms | P95 |
| Consolidation batch | <30s | Per 1000 episodes |
Risk Mitigation
- Embedding costs - Use caching aggressively, batch embeddings
- Storage growth - Implement TTL, pruning, and archival policies
- Query performance - HNSW indexes, pagination, query optimization
- Scope complexity - Start simple (instance scope only), add hierarchy later
Review Checkpoints
After each sub-issue:
- Run
make validate-all - Multi-agent code review
- Verify E2E stack still works
- Commit with granular message