forked from cardosofelipe/fast-next-template
Implements Sub-Issue #87 of Issue #62 (Agent Memory System). Core infrastructure: - memory/types.py: Type definitions for all memory types (Working, Episodic, Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome - memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton - memory/exceptions.py: Comprehensive exception hierarchy for memory operations - memory/manager.py: MemoryManager facade with placeholder methods Directory structure: - working/: Working memory (Redis/in-memory) - to be implemented in #89 - episodic/: Episodic memory (experiences) - to be implemented in #90 - semantic/: Semantic memory (facts) - to be implemented in #91 - procedural/: Procedural memory (skills) - to be implemented in #92 - scoping/: Scope management - to be implemented in #93 - indexing/: Vector indexing - to be implemented in #94 - consolidation/: Memory consolidation - to be implemented in #95 Tests: 71 unit tests for config, types, and exceptions Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
527 lines
18 KiB
Markdown
527 lines
18 KiB
Markdown
# Agent Memory System - Implementation Plan
|
|
|
|
## Issue #62 - Part of Epic #60 (Phase 2: MCP Integration)
|
|
|
|
**Branch:** `feature/62-agent-memory-system`
|
|
**Parent Epic:** #60 [EPIC] Phase 2: MCP Integration
|
|
**Dependencies:** #56 (LLM Gateway), #57 (Knowledge Base), #61 (Context Management Engine)
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
The Agent Memory System provides multi-tier cognitive memory for AI agents, enabling them to:
|
|
- Maintain state across sessions (Working Memory)
|
|
- Learn from past experiences (Episodic Memory)
|
|
- Store and retrieve facts (Semantic Memory)
|
|
- Develop and reuse procedures (Procedural Memory)
|
|
|
|
### Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ Agent Memory System │
|
|
├─────────────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────┐ ┌─────────────────┐ │
|
|
│ │ Working Memory │───────────────────▶ │ Episodic Memory │ │
|
|
│ │ (Redis/In-Mem) │ consolidate │ (PostgreSQL) │ │
|
|
│ │ │ │ │ │
|
|
│ │ • Current task │ │ • Past sessions │ │
|
|
│ │ • Variables │ │ • Experiences │ │
|
|
│ │ • Scratchpad │ │ • Outcomes │ │
|
|
│ └─────────────────┘ └────────┬────────┘ │
|
|
│ │ │
|
|
│ extract │ │
|
|
│ ▼ │
|
|
│ ┌─────────────────┐ ┌─────────────────┐ │
|
|
│ │Procedural Memory│◀─────────────────────│ Semantic Memory │ │
|
|
│ │ (PostgreSQL) │ learn from │ (PostgreSQL + │ │
|
|
│ │ │ │ pgvector) │ │
|
|
│ │ • Procedures │ │ │ │
|
|
│ │ • Skills │ │ • Facts │ │
|
|
│ │ • Patterns │ │ • Entities │ │
|
|
│ └─────────────────┘ │ • Relationships │ │
|
|
│ └─────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Memory Scoping Hierarchy
|
|
|
|
```
|
|
Global Memory (shared by all)
|
|
└── Project Memory (per project)
|
|
└── Agent Type Memory (per agent type)
|
|
└── Agent Instance Memory (per instance)
|
|
└── Session Memory (ephemeral)
|
|
```
|
|
|
|
---
|
|
|
|
## Sub-Issue Breakdown
|
|
|
|
### Phase 1: Foundation (Critical Path)
|
|
|
|
#### Sub-Issue #62-1: Project Setup & Core Architecture
|
|
**Priority:** P0 - Must complete first
|
|
**Estimated Complexity:** Medium
|
|
|
|
**Tasks:**
|
|
- [ ] Create `backend/app/services/memory/` directory structure
|
|
- [ ] Create `__init__.py` with public API exports
|
|
- [ ] Create `config.py` with `MemorySettings` (Pydantic)
|
|
- [ ] Define base interfaces in `types.py`:
|
|
- `MemoryItem` - Base class for all memory items
|
|
- `MemoryScope` - Enum for scoping levels
|
|
- `MemoryStore` - Abstract base for storage backends
|
|
- [ ] Create `manager.py` with `MemoryManager` class (facade)
|
|
- [ ] Create `exceptions.py` with memory-specific errors
|
|
- [ ] Write ADR-010 documenting memory architecture decisions
|
|
- [ ] Create dependency injection setup
|
|
- [ ] Unit tests for configuration and types
|
|
|
|
**Deliverables:**
|
|
- Directory structure matching existing patterns (like `context/`, `safety/`)
|
|
- Configuration with MEM_ env prefix
|
|
- Type definitions for all memory concepts
|
|
- Comprehensive unit tests
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-2: Database Schema & Storage Layer
|
|
**Priority:** P0 - Required for all memory types
|
|
**Estimated Complexity:** High
|
|
|
|
**Database Tables:**
|
|
|
|
1. **`working_memory`** - Ephemeral key-value storage
|
|
- `id` (UUID, PK)
|
|
- `scope_type` (ENUM: global/project/agent_type/agent_instance/session)
|
|
- `scope_id` (VARCHAR - the ID for the scope level)
|
|
- `key` (VARCHAR)
|
|
- `value` (JSONB)
|
|
- `expires_at` (TIMESTAMP WITH TZ)
|
|
- `created_at`, `updated_at`
|
|
|
|
2. **`episodes`** - Experiential memories
|
|
- `id` (UUID, PK)
|
|
- `project_id` (UUID, FK)
|
|
- `agent_instance_id` (UUID, FK, nullable)
|
|
- `agent_type_id` (UUID, FK, nullable)
|
|
- `session_id` (VARCHAR)
|
|
- `task_type` (VARCHAR)
|
|
- `task_description` (TEXT)
|
|
- `actions` (JSONB)
|
|
- `context_summary` (TEXT)
|
|
- `outcome` (ENUM: success/failure/partial)
|
|
- `outcome_details` (TEXT)
|
|
- `duration_seconds` (FLOAT)
|
|
- `tokens_used` (BIGINT)
|
|
- `lessons_learned` (JSONB - list of strings)
|
|
- `importance_score` (FLOAT, 0-1)
|
|
- `embedding` (VECTOR(1536))
|
|
- `occurred_at` (TIMESTAMP WITH TZ)
|
|
- `created_at`, `updated_at`
|
|
|
|
3. **`facts`** - Semantic knowledge
|
|
- `id` (UUID, PK)
|
|
- `project_id` (UUID, FK, nullable - null for global)
|
|
- `subject` (VARCHAR)
|
|
- `predicate` (VARCHAR)
|
|
- `object` (TEXT)
|
|
- `confidence` (FLOAT, 0-1)
|
|
- `source_episode_ids` (UUID[])
|
|
- `first_learned` (TIMESTAMP WITH TZ)
|
|
- `last_reinforced` (TIMESTAMP WITH TZ)
|
|
- `reinforcement_count` (INT)
|
|
- `embedding` (VECTOR(1536))
|
|
- `created_at`, `updated_at`
|
|
|
|
4. **`procedures`** - Learned skills
|
|
- `id` (UUID, PK)
|
|
- `project_id` (UUID, FK, nullable)
|
|
- `agent_type_id` (UUID, FK, nullable)
|
|
- `name` (VARCHAR)
|
|
- `trigger_pattern` (TEXT)
|
|
- `steps` (JSONB)
|
|
- `success_count` (INT)
|
|
- `failure_count` (INT)
|
|
- `last_used` (TIMESTAMP WITH TZ)
|
|
- `embedding` (VECTOR(1536))
|
|
- `created_at`, `updated_at`
|
|
|
|
5. **`memory_consolidation_log`** - Consolidation tracking
|
|
- `id` (UUID, PK)
|
|
- `consolidation_type` (ENUM)
|
|
- `source_count` (INT)
|
|
- `result_count` (INT)
|
|
- `started_at`, `completed_at`
|
|
- `status` (ENUM: pending/running/completed/failed)
|
|
- `error` (TEXT, nullable)
|
|
|
|
**Tasks:**
|
|
- [ ] Create SQLAlchemy models in `backend/app/models/memory/`
|
|
- [ ] Create Alembic migration with all tables
|
|
- [ ] Add pgvector indexes (HNSW for episodes, facts, procedures)
|
|
- [ ] Create repository classes in `backend/app/crud/memory/`
|
|
- [ ] Add composite indexes for common query patterns
|
|
- [ ] Unit tests for all repositories
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-3: Working Memory Implementation
|
|
**Priority:** P0 - Core functionality
|
|
**Estimated Complexity:** Medium
|
|
|
|
**Components:**
|
|
- `backend/app/services/memory/working/memory.py` - WorkingMemory class
|
|
- `backend/app/services/memory/working/storage.py` - Redis + in-memory backend
|
|
|
|
**Features:**
|
|
- [ ] Session-scoped containers with automatic cleanup
|
|
- [ ] Variable storage (get/set/delete)
|
|
- [ ] Task state tracking (current step, status, progress)
|
|
- [ ] Scratchpad for reasoning steps
|
|
- [ ] Configurable capacity limits
|
|
- [ ] TTL-based expiration
|
|
- [ ] Checkpoint/snapshot support for recovery
|
|
- [ ] Redis primary storage with in-memory fallback
|
|
|
|
**API:**
|
|
```python
|
|
class WorkingMemory:
|
|
async def set(self, key: str, value: Any, ttl_seconds: int | None = None) -> None
|
|
async def get(self, key: str, default: Any = None) -> Any
|
|
async def delete(self, key: str) -> bool
|
|
async def exists(self, key: str) -> bool
|
|
async def list_keys(self, pattern: str = "*") -> list[str]
|
|
async def get_all(self) -> dict[str, Any]
|
|
async def clear(self) -> int
|
|
async def set_task_state(self, state: TaskState) -> None
|
|
async def get_task_state(self) -> TaskState | None
|
|
async def append_scratchpad(self, content: str) -> None
|
|
async def get_scratchpad(self) -> list[str]
|
|
async def create_checkpoint(self) -> str # Returns checkpoint ID
|
|
async def restore_checkpoint(self, checkpoint_id: str) -> None
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 2: Memory Types
|
|
|
|
#### Sub-Issue #62-4: Episodic Memory Implementation
|
|
**Priority:** P1
|
|
**Estimated Complexity:** High
|
|
|
|
**Components:**
|
|
- `backend/app/services/memory/episodic/memory.py` - EpisodicMemory class
|
|
- `backend/app/services/memory/episodic/recorder.py` - Episode recording
|
|
- `backend/app/services/memory/episodic/retrieval.py` - Retrieval strategies
|
|
|
|
**Features:**
|
|
- [ ] Episode recording during agent execution
|
|
- [ ] Store task completions with context
|
|
- [ ] Store failures with error context
|
|
- [ ] Retrieval by semantic similarity (vector search)
|
|
- [ ] Retrieval by recency
|
|
- [ ] Retrieval by outcome (success/failure)
|
|
- [ ] Importance scoring based on outcome significance
|
|
- [ ] Episode summarization for long-term storage
|
|
|
|
**API:**
|
|
```python
|
|
class EpisodicMemory:
|
|
async def record_episode(self, episode: EpisodeCreate) -> Episode
|
|
async def search_similar(self, query: str, limit: int = 10) -> list[Episode]
|
|
async def get_recent(self, limit: int = 10, since: datetime | None = None) -> list[Episode]
|
|
async def get_by_outcome(self, outcome: Outcome, limit: int = 10) -> list[Episode]
|
|
async def get_by_task_type(self, task_type: str, limit: int = 10) -> list[Episode]
|
|
async def update_importance(self, episode_id: UUID, score: float) -> None
|
|
async def summarize_episodes(self, episode_ids: list[UUID]) -> str
|
|
```
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-5: Semantic Memory Implementation
|
|
**Priority:** P1
|
|
**Estimated Complexity:** High
|
|
|
|
**Components:**
|
|
- `backend/app/services/memory/semantic/memory.py` - SemanticMemory class
|
|
- `backend/app/services/memory/semantic/extraction.py` - Fact extraction from episodes
|
|
- `backend/app/services/memory/semantic/verification.py` - Fact verification
|
|
|
|
**Features:**
|
|
- [ ] Fact storage with triple format (subject, predicate, object)
|
|
- [ ] Confidence scoring and decay
|
|
- [ ] Fact extraction from episodic memory
|
|
- [ ] Conflict resolution for contradictory facts
|
|
- [ ] Retrieval by query (semantic search)
|
|
- [ ] Retrieval by entity (subject or object)
|
|
- [ ] Source tracking (which episodes contributed)
|
|
- [ ] Reinforcement on repeated learning
|
|
|
|
**API:**
|
|
```python
|
|
class SemanticMemory:
|
|
async def store_fact(self, fact: FactCreate) -> Fact
|
|
async def search_facts(self, query: str, limit: int = 10) -> list[Fact]
|
|
async def get_by_entity(self, entity: str, limit: int = 20) -> list[Fact]
|
|
async def reinforce_fact(self, fact_id: UUID) -> Fact
|
|
async def deprecate_fact(self, fact_id: UUID, reason: str) -> None
|
|
async def extract_facts_from_episode(self, episode: Episode) -> list[Fact]
|
|
async def resolve_conflict(self, fact_ids: list[UUID]) -> Fact
|
|
```
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-6: Procedural Memory Implementation
|
|
**Priority:** P2
|
|
**Estimated Complexity:** Medium
|
|
|
|
**Components:**
|
|
- `backend/app/services/memory/procedural/memory.py` - ProceduralMemory class
|
|
- `backend/app/services/memory/procedural/matching.py` - Procedure matching
|
|
|
|
**Features:**
|
|
- [ ] Procedure recording from successful task patterns
|
|
- [ ] Trigger pattern matching
|
|
- [ ] Step-by-step procedure storage
|
|
- [ ] Success/failure rate tracking
|
|
- [ ] Procedure suggestion based on context
|
|
- [ ] Procedure versioning
|
|
|
|
**API:**
|
|
```python
|
|
class ProceduralMemory:
|
|
async def record_procedure(self, procedure: ProcedureCreate) -> Procedure
|
|
async def find_matching(self, context: str, limit: int = 5) -> list[Procedure]
|
|
async def record_outcome(self, procedure_id: UUID, success: bool) -> None
|
|
async def get_best_procedure(self, task_type: str) -> Procedure | None
|
|
async def update_steps(self, procedure_id: UUID, steps: list[Step]) -> Procedure
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 3: Advanced Features
|
|
|
|
#### Sub-Issue #62-7: Memory Scoping
|
|
**Priority:** P1
|
|
**Estimated Complexity:** Medium
|
|
|
|
**Components:**
|
|
- `backend/app/services/memory/scoping/scope.py` - Scope management
|
|
- `backend/app/services/memory/scoping/resolver.py` - Scope resolution
|
|
|
|
**Features:**
|
|
- [ ] Global scope (shared across all)
|
|
- [ ] Project scope (per project)
|
|
- [ ] Agent type scope (per agent type)
|
|
- [ ] Agent instance scope (per instance)
|
|
- [ ] Session scope (ephemeral)
|
|
- [ ] Scope inheritance (child sees parent memories)
|
|
- [ ] Access control policies
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-8: Memory Indexing & Retrieval
|
|
**Priority:** P1
|
|
**Estimated Complexity:** High
|
|
|
|
**Components:**
|
|
- `backend/app/services/memory/indexing/index.py` - Memory indexer
|
|
- `backend/app/services/memory/indexing/retrieval.py` - Retrieval engine
|
|
|
|
**Features:**
|
|
- [ ] Vector embeddings for all memory types
|
|
- [ ] Temporal index (by time)
|
|
- [ ] Entity index (by entities mentioned)
|
|
- [ ] Outcome index (by success/failure)
|
|
- [ ] Hybrid retrieval (vector + filters)
|
|
- [ ] Relevance scoring
|
|
- [ ] Retrieval caching
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-9: Memory Consolidation
|
|
**Priority:** P2
|
|
**Estimated Complexity:** High
|
|
|
|
**Components:**
|
|
- `backend/app/services/memory/consolidation/service.py` - Consolidation service
|
|
- `backend/app/tasks/memory_consolidation.py` - Celery tasks
|
|
|
|
**Features:**
|
|
- [ ] Working → Episodic transfer (session end)
|
|
- [ ] Episodic → Semantic extraction (learn facts)
|
|
- [ ] Episodic → Procedural extraction (learn procedures)
|
|
- [ ] Nightly consolidation Celery tasks
|
|
- [ ] Memory pruning (remove low-value)
|
|
- [ ] Importance-based retention
|
|
|
|
---
|
|
|
|
### Phase 4: Integration
|
|
|
|
#### Sub-Issue #62-10: MCP Tools Definition
|
|
**Priority:** P0 - Required for agent usage
|
|
**Estimated Complexity:** Medium
|
|
|
|
**MCP Tools:**
|
|
|
|
1. **`remember`** - Store in memory
|
|
```json
|
|
{
|
|
"memory_type": "working|episodic|semantic|procedural",
|
|
"content": "...",
|
|
"importance": 0.8,
|
|
"ttl_seconds": 3600
|
|
}
|
|
```
|
|
|
|
2. **`recall`** - Retrieve from memory
|
|
```json
|
|
{
|
|
"query": "...",
|
|
"memory_types": ["episodic", "semantic"],
|
|
"limit": 10,
|
|
"filters": {"outcome": "success"}
|
|
}
|
|
```
|
|
|
|
3. **`forget`** - Remove from memory
|
|
```json
|
|
{
|
|
"memory_type": "working",
|
|
"key": "temp_calculation"
|
|
}
|
|
```
|
|
|
|
4. **`reflect`** - Analyze patterns
|
|
```json
|
|
{
|
|
"analysis_type": "recent_patterns|success_factors|failure_patterns"
|
|
}
|
|
```
|
|
|
|
5. **`get_memory_stats`** - Usage statistics
|
|
6. **`search_procedures`** - Find relevant procedures
|
|
7. **`record_outcome`** - Record task success/failure
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-11: Component Integration
|
|
**Priority:** P1
|
|
**Estimated Complexity:** Medium
|
|
|
|
**Integrations:**
|
|
- [ ] Context Engine (#61) - Include relevant memories in context assembly
|
|
- [ ] Knowledge Base (#57) - Coordinate with KB to avoid duplication
|
|
- [ ] LLM Gateway (#56) - Use for embedding generation
|
|
- [ ] Agent lifecycle hooks (spawn, pause, resume, terminate)
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-12: Caching Layer
|
|
**Priority:** P2
|
|
**Estimated Complexity:** Medium
|
|
|
|
**Features:**
|
|
- [ ] Hot memory caching (frequently accessed)
|
|
- [ ] Retrieval result caching
|
|
- [ ] Embedding caching
|
|
- [ ] Cache invalidation strategies
|
|
|
|
---
|
|
|
|
### Phase 5: Intelligence & Quality
|
|
|
|
#### Sub-Issue #62-13: Memory Reflection
|
|
**Priority:** P3
|
|
**Estimated Complexity:** High
|
|
|
|
**Features:**
|
|
- [ ] Pattern detection in episodic memory
|
|
- [ ] Success/failure factor analysis
|
|
- [ ] Anomaly detection
|
|
- [ ] Insights generation
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-14: Metrics & Observability
|
|
**Priority:** P2
|
|
**Estimated Complexity:** Low
|
|
|
|
**Metrics:**
|
|
- `memory_size_bytes` by type and scope
|
|
- `memory_operations_total` counter
|
|
- `memory_retrieval_latency_seconds` histogram
|
|
- `memory_consolidation_duration_seconds` histogram
|
|
- `procedure_success_rate` gauge
|
|
|
|
---
|
|
|
|
#### Sub-Issue #62-15: Documentation & Final Testing
|
|
**Priority:** P0
|
|
**Estimated Complexity:** Medium
|
|
|
|
**Deliverables:**
|
|
- [ ] README with architecture overview
|
|
- [ ] API documentation with examples
|
|
- [ ] Integration guide
|
|
- [ ] E2E tests for full memory lifecycle
|
|
- [ ] Achieve >90% code coverage
|
|
- [ ] Performance benchmarks
|
|
|
|
---
|
|
|
|
## Implementation Order
|
|
|
|
```
|
|
Phase 1 (Foundation) - Sequential
|
|
#62-1 → #62-2 → #62-3
|
|
|
|
Phase 2 (Memory Types) - Can parallelize after Phase 1
|
|
#62-4, #62-5, #62-6 (parallel after #62-3)
|
|
|
|
Phase 3 (Advanced) - Sequential within phase
|
|
#62-7 → #62-8 → #62-9
|
|
|
|
Phase 4 (Integration) - After Phase 2
|
|
#62-10 → #62-11 → #62-12
|
|
|
|
Phase 5 (Quality) - Final
|
|
#62-13, #62-14, #62-15
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Targets
|
|
|
|
| Metric | Target | Notes |
|
|
|--------|--------|-------|
|
|
| Working memory get/set | <5ms | P95 |
|
|
| Episodic memory retrieval | <100ms | P95, as per epic |
|
|
| Semantic memory search | <100ms | P95 |
|
|
| Procedural memory matching | <50ms | P95 |
|
|
| Consolidation batch | <30s | Per 1000 episodes |
|
|
|
|
---
|
|
|
|
## Risk Mitigation
|
|
|
|
1. **Embedding costs** - Use caching aggressively, batch embeddings
|
|
2. **Storage growth** - Implement TTL, pruning, and archival policies
|
|
3. **Query performance** - HNSW indexes, pagination, query optimization
|
|
4. **Scope complexity** - Start simple (instance scope only), add hierarchy later
|
|
|
|
---
|
|
|
|
## Review Checkpoints
|
|
|
|
After each sub-issue:
|
|
1. Run `make validate-all`
|
|
2. Multi-agent code review
|
|
3. Verify E2E stack still works
|
|
4. Commit with granular message
|