feat(memory): #87 project setup & core architecture

Implements Sub-Issue #87 of Issue #62 (Agent Memory System). Core infrastructure: - memory/types.py: Type definitions for all memory types (Working, Episodic, Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome - memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton - memory/exceptions.py: Comprehensive exception hierarchy for memory operations - memory/manager.py: MemoryManager facade with placeholder methods Directory structure: - working/: Working memory (Redis/in-memory) - to be implemented in #89 - episodic/: Episodic memory (experiences) - to be implemented in #90 - semantic/: Semantic memory (facts) - to be implemented in #91 - procedural/: Procedural memory (skills) - to be implemented in #92 - scoping/: Scope management - to be implemented in #93 - indexing/: Vector indexing - to be implemented in #94 - consolidation/: Memory consolidation - to be implemented in #95 Tests: 71 unit tests for config, types, and exceptions Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:27:36 +01:00
parent 4b149b8a52
commit 085a748929
17 changed files with 3242 additions and 0 deletions
--- a/docs/architecture/memory-system-plan.md
+++ b/docs/architecture/memory-system-plan.md
@@ -0,0 +1,526 @@
+# Agent Memory System - Implementation Plan
+
+## Issue #62 - Part of Epic #60 (Phase 2: MCP Integration)
+
+**Branch:** `feature/62-agent-memory-system`
+**Parent Epic:** #60 [EPIC] Phase 2: MCP Integration
+**Dependencies:** #56 (LLM Gateway), #57 (Knowledge Base), #61 (Context Management Engine)
+
+---
+
+## Executive Summary
+
+The Agent Memory System provides multi-tier cognitive memory for AI agents, enabling them to:
+- Maintain state across sessions (Working Memory)
+- Learn from past experiences (Episodic Memory)
+- Store and retrieve facts (Semantic Memory)
+- Develop and reuse procedures (Procedural Memory)
+
+### Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                           Agent Memory System                                │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                              │
+│  ┌─────────────────┐                      ┌─────────────────┐               │
+│  │ Working Memory  │───────────────────▶  │ Episodic Memory │               │
+│  │ (Redis/In-Mem)  │    consolidate       │  (PostgreSQL)   │               │
+│  │                 │                      │                 │               │
+│  │ • Current task  │                      │ • Past sessions │               │
+│  │ • Variables     │                      │ • Experiences   │               │
+│  │ • Scratchpad    │                      │ • Outcomes      │               │
+│  └─────────────────┘                      └────────┬────────┘               │
+│                                                    │                         │
+│                                           extract  │                         │
+│                                                    ▼                         │
+│  ┌─────────────────┐                      ┌─────────────────┐               │
+│  │Procedural Memory│◀─────────────────────│ Semantic Memory │               │
+│  │  (PostgreSQL)   │      learn from      │  (PostgreSQL +  │               │
+│  │                 │                      │    pgvector)    │               │
+│  │ • Procedures    │                      │                 │               │
+│  │ • Skills        │                      │ • Facts         │               │
+│  │ • Patterns      │                      │ • Entities      │               │
+│  └─────────────────┘                      │ • Relationships │               │
+│                                           └─────────────────┘               │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Memory Scoping Hierarchy
+
+```
+Global Memory (shared by all)
+└── Project Memory (per project)
+    └── Agent Type Memory (per agent type)
+        └── Agent Instance Memory (per instance)
+            └── Session Memory (ephemeral)
+```
+
+---
+
+## Sub-Issue Breakdown
+
+### Phase 1: Foundation (Critical Path)
+
+#### Sub-Issue #62-1: Project Setup & Core Architecture
+**Priority:** P0 - Must complete first
+**Estimated Complexity:** Medium
+
+**Tasks:**
+- [ ] Create `backend/app/services/memory/` directory structure
+- [ ] Create `__init__.py` with public API exports
+- [ ] Create `config.py` with `MemorySettings` (Pydantic)
+- [ ] Define base interfaces in `types.py`:
+  - `MemoryItem` - Base class for all memory items
+  - `MemoryScope` - Enum for scoping levels
+  - `MemoryStore` - Abstract base for storage backends
+- [ ] Create `manager.py` with `MemoryManager` class (facade)
+- [ ] Create `exceptions.py` with memory-specific errors
+- [ ] Write ADR-010 documenting memory architecture decisions
+- [ ] Create dependency injection setup
+- [ ] Unit tests for configuration and types
+
+**Deliverables:**
+- Directory structure matching existing patterns (like `context/`, `safety/`)
+- Configuration with MEM_ env prefix
+- Type definitions for all memory concepts
+- Comprehensive unit tests
+
+---
+
+#### Sub-Issue #62-2: Database Schema & Storage Layer
+**Priority:** P0 - Required for all memory types
+**Estimated Complexity:** High
+
+**Database Tables:**
+
+1. **`working_memory`** - Ephemeral key-value storage
+   - `id` (UUID, PK)
+   - `scope_type` (ENUM: global/project/agent_type/agent_instance/session)
+   - `scope_id` (VARCHAR - the ID for the scope level)
+   - `key` (VARCHAR)
+   - `value` (JSONB)
+   - `expires_at` (TIMESTAMP WITH TZ)
+   - `created_at`, `updated_at`
+
+2. **`episodes`** - Experiential memories
+   - `id` (UUID, PK)
+   - `project_id` (UUID, FK)
+   - `agent_instance_id` (UUID, FK, nullable)
+   - `agent_type_id` (UUID, FK, nullable)
+   - `session_id` (VARCHAR)
+   - `task_type` (VARCHAR)
+   - `task_description` (TEXT)
+   - `actions` (JSONB)
+   - `context_summary` (TEXT)
+   - `outcome` (ENUM: success/failure/partial)
+   - `outcome_details` (TEXT)
+   - `duration_seconds` (FLOAT)
+   - `tokens_used` (BIGINT)
+   - `lessons_learned` (JSONB - list of strings)
+   - `importance_score` (FLOAT, 0-1)
+   - `embedding` (VECTOR(1536))
+   - `occurred_at` (TIMESTAMP WITH TZ)
+   - `created_at`, `updated_at`
+
+3. **`facts`** - Semantic knowledge
+   - `id` (UUID, PK)
+   - `project_id` (UUID, FK, nullable - null for global)
+   - `subject` (VARCHAR)
+   - `predicate` (VARCHAR)
+   - `object` (TEXT)
+   - `confidence` (FLOAT, 0-1)
+   - `source_episode_ids` (UUID[])
+   - `first_learned` (TIMESTAMP WITH TZ)
+   - `last_reinforced` (TIMESTAMP WITH TZ)
+   - `reinforcement_count` (INT)
+   - `embedding` (VECTOR(1536))
+   - `created_at`, `updated_at`
+
+4. **`procedures`** - Learned skills
+   - `id` (UUID, PK)
+   - `project_id` (UUID, FK, nullable)
+   - `agent_type_id` (UUID, FK, nullable)
+   - `name` (VARCHAR)
+   - `trigger_pattern` (TEXT)
+   - `steps` (JSONB)
+   - `success_count` (INT)
+   - `failure_count` (INT)
+   - `last_used` (TIMESTAMP WITH TZ)
+   - `embedding` (VECTOR(1536))
+   - `created_at`, `updated_at`
+
+5. **`memory_consolidation_log`** - Consolidation tracking
+   - `id` (UUID, PK)
+   - `consolidation_type` (ENUM)
+   - `source_count` (INT)
+   - `result_count` (INT)
+   - `started_at`, `completed_at`
+   - `status` (ENUM: pending/running/completed/failed)
+   - `error` (TEXT, nullable)
+
+**Tasks:**
+- [ ] Create SQLAlchemy models in `backend/app/models/memory/`
+- [ ] Create Alembic migration with all tables
+- [ ] Add pgvector indexes (HNSW for episodes, facts, procedures)
+- [ ] Create repository classes in `backend/app/crud/memory/`
+- [ ] Add composite indexes for common query patterns
+- [ ] Unit tests for all repositories
+
+---
+
+#### Sub-Issue #62-3: Working Memory Implementation
+**Priority:** P0 - Core functionality
+**Estimated Complexity:** Medium
+
+**Components:**
+- `backend/app/services/memory/working/memory.py` - WorkingMemory class
+- `backend/app/services/memory/working/storage.py` - Redis + in-memory backend
+
+**Features:**
+- [ ] Session-scoped containers with automatic cleanup
+- [ ] Variable storage (get/set/delete)
+- [ ] Task state tracking (current step, status, progress)
+- [ ] Scratchpad for reasoning steps
+- [ ] Configurable capacity limits
+- [ ] TTL-based expiration
+- [ ] Checkpoint/snapshot support for recovery
+- [ ] Redis primary storage with in-memory fallback
+
+**API:**
+```python
+class WorkingMemory:
+    async def set(self, key: str, value: Any, ttl_seconds: int | None = None) -> None
+    async def get(self, key: str, default: Any = None) -> Any
+    async def delete(self, key: str) -> bool
+    async def exists(self, key: str) -> bool
+    async def list_keys(self, pattern: str = "*") -> list[str]
+    async def get_all(self) -> dict[str, Any]
+    async def clear(self) -> int
+    async def set_task_state(self, state: TaskState) -> None
+    async def get_task_state(self) -> TaskState | None
+    async def append_scratchpad(self, content: str) -> None
+    async def get_scratchpad(self) -> list[str]
+    async def create_checkpoint(self) -> str  # Returns checkpoint ID
+    async def restore_checkpoint(self, checkpoint_id: str) -> None
+```
+
+---
+
+### Phase 2: Memory Types
+
+#### Sub-Issue #62-4: Episodic Memory Implementation
+**Priority:** P1
+**Estimated Complexity:** High
+
+**Components:**
+- `backend/app/services/memory/episodic/memory.py` - EpisodicMemory class
+- `backend/app/services/memory/episodic/recorder.py` - Episode recording
+- `backend/app/services/memory/episodic/retrieval.py` - Retrieval strategies
+
+**Features:**
+- [ ] Episode recording during agent execution
+- [ ] Store task completions with context
+- [ ] Store failures with error context
+- [ ] Retrieval by semantic similarity (vector search)
+- [ ] Retrieval by recency
+- [ ] Retrieval by outcome (success/failure)
+- [ ] Importance scoring based on outcome significance
+- [ ] Episode summarization for long-term storage
+
+**API:**
+```python
+class EpisodicMemory:
+    async def record_episode(self, episode: EpisodeCreate) -> Episode
+    async def search_similar(self, query: str, limit: int = 10) -> list[Episode]
+    async def get_recent(self, limit: int = 10, since: datetime | None = None) -> list[Episode]
+    async def get_by_outcome(self, outcome: Outcome, limit: int = 10) -> list[Episode]
+    async def get_by_task_type(self, task_type: str, limit: int = 10) -> list[Episode]
+    async def update_importance(self, episode_id: UUID, score: float) -> None
+    async def summarize_episodes(self, episode_ids: list[UUID]) -> str
+```
+
+---
+
+#### Sub-Issue #62-5: Semantic Memory Implementation
+**Priority:** P1
+**Estimated Complexity:** High
+
+**Components:**
+- `backend/app/services/memory/semantic/memory.py` - SemanticMemory class
+- `backend/app/services/memory/semantic/extraction.py` - Fact extraction from episodes
+- `backend/app/services/memory/semantic/verification.py` - Fact verification
+
+**Features:**
+- [ ] Fact storage with triple format (subject, predicate, object)
+- [ ] Confidence scoring and decay
+- [ ] Fact extraction from episodic memory
+- [ ] Conflict resolution for contradictory facts
+- [ ] Retrieval by query (semantic search)
+- [ ] Retrieval by entity (subject or object)
+- [ ] Source tracking (which episodes contributed)
+- [ ] Reinforcement on repeated learning
+
+**API:**
+```python
+class SemanticMemory:
+    async def store_fact(self, fact: FactCreate) -> Fact
+    async def search_facts(self, query: str, limit: int = 10) -> list[Fact]
+    async def get_by_entity(self, entity: str, limit: int = 20) -> list[Fact]
+    async def reinforce_fact(self, fact_id: UUID) -> Fact
+    async def deprecate_fact(self, fact_id: UUID, reason: str) -> None
+    async def extract_facts_from_episode(self, episode: Episode) -> list[Fact]
+    async def resolve_conflict(self, fact_ids: list[UUID]) -> Fact
+```
+
+---
+
+#### Sub-Issue #62-6: Procedural Memory Implementation
+**Priority:** P2
+**Estimated Complexity:** Medium
+
+**Components:**
+- `backend/app/services/memory/procedural/memory.py` - ProceduralMemory class
+- `backend/app/services/memory/procedural/matching.py` - Procedure matching
+
+**Features:**
+- [ ] Procedure recording from successful task patterns
+- [ ] Trigger pattern matching
+- [ ] Step-by-step procedure storage
+- [ ] Success/failure rate tracking
+- [ ] Procedure suggestion based on context
+- [ ] Procedure versioning
+
+**API:**
+```python
+class ProceduralMemory:
+    async def record_procedure(self, procedure: ProcedureCreate) -> Procedure
+    async def find_matching(self, context: str, limit: int = 5) -> list[Procedure]
+    async def record_outcome(self, procedure_id: UUID, success: bool) -> None
+    async def get_best_procedure(self, task_type: str) -> Procedure | None
+    async def update_steps(self, procedure_id: UUID, steps: list[Step]) -> Procedure
+```
+
+---
+
+### Phase 3: Advanced Features
+
+#### Sub-Issue #62-7: Memory Scoping
+**Priority:** P1
+**Estimated Complexity:** Medium
+
+**Components:**
+- `backend/app/services/memory/scoping/scope.py` - Scope management
+- `backend/app/services/memory/scoping/resolver.py` - Scope resolution
+
+**Features:**
+- [ ] Global scope (shared across all)
+- [ ] Project scope (per project)
+- [ ] Agent type scope (per agent type)
+- [ ] Agent instance scope (per instance)
+- [ ] Session scope (ephemeral)
+- [ ] Scope inheritance (child sees parent memories)
+- [ ] Access control policies
+
+---
+
+#### Sub-Issue #62-8: Memory Indexing & Retrieval
+**Priority:** P1
+**Estimated Complexity:** High
+
+**Components:**
+- `backend/app/services/memory/indexing/index.py` - Memory indexer
+- `backend/app/services/memory/indexing/retrieval.py` - Retrieval engine
+
+**Features:**
+- [ ] Vector embeddings for all memory types
+- [ ] Temporal index (by time)
+- [ ] Entity index (by entities mentioned)
+- [ ] Outcome index (by success/failure)
+- [ ] Hybrid retrieval (vector + filters)
+- [ ] Relevance scoring
+- [ ] Retrieval caching
+
+---
+
+#### Sub-Issue #62-9: Memory Consolidation
+**Priority:** P2
+**Estimated Complexity:** High
+
+**Components:**
+- `backend/app/services/memory/consolidation/service.py` - Consolidation service
+- `backend/app/tasks/memory_consolidation.py` - Celery tasks
+
+**Features:**
+- [ ] Working → Episodic transfer (session end)
+- [ ] Episodic → Semantic extraction (learn facts)
+- [ ] Episodic → Procedural extraction (learn procedures)
+- [ ] Nightly consolidation Celery tasks
+- [ ] Memory pruning (remove low-value)
+- [ ] Importance-based retention
+
+---
+
+### Phase 4: Integration
+
+#### Sub-Issue #62-10: MCP Tools Definition
+**Priority:** P0 - Required for agent usage
+**Estimated Complexity:** Medium
+
+**MCP Tools:**
+
+1. **`remember`** - Store in memory
+   ```json
+   {
+     "memory_type": "working|episodic|semantic|procedural",
+     "content": "...",
+     "importance": 0.8,
+     "ttl_seconds": 3600
+   }
+   ```
+
+2. **`recall`** - Retrieve from memory
+   ```json
+   {
+     "query": "...",
+     "memory_types": ["episodic", "semantic"],
+     "limit": 10,
+     "filters": {"outcome": "success"}
+   }
+   ```
+
+3. **`forget`** - Remove from memory
+   ```json
+   {
+     "memory_type": "working",
+     "key": "temp_calculation"
+   }
+   ```
+
+4. **`reflect`** - Analyze patterns
+   ```json
+   {
+     "analysis_type": "recent_patterns|success_factors|failure_patterns"
+   }
+   ```
+
+5. **`get_memory_stats`** - Usage statistics
+6. **`search_procedures`** - Find relevant procedures
+7. **`record_outcome`** - Record task success/failure
+
+---
+
+#### Sub-Issue #62-11: Component Integration
+**Priority:** P1
+**Estimated Complexity:** Medium
+
+**Integrations:**
+- [ ] Context Engine (#61) - Include relevant memories in context assembly
+- [ ] Knowledge Base (#57) - Coordinate with KB to avoid duplication
+- [ ] LLM Gateway (#56) - Use for embedding generation
+- [ ] Agent lifecycle hooks (spawn, pause, resume, terminate)
+
+---
+
+#### Sub-Issue #62-12: Caching Layer
+**Priority:** P2
+**Estimated Complexity:** Medium
+
+**Features:**
+- [ ] Hot memory caching (frequently accessed)
+- [ ] Retrieval result caching
+- [ ] Embedding caching
+- [ ] Cache invalidation strategies
+
+---
+
+### Phase 5: Intelligence & Quality
+
+#### Sub-Issue #62-13: Memory Reflection
+**Priority:** P3
+**Estimated Complexity:** High
+
+**Features:**
+- [ ] Pattern detection in episodic memory
+- [ ] Success/failure factor analysis
+- [ ] Anomaly detection
+- [ ] Insights generation
+
+---
+
+#### Sub-Issue #62-14: Metrics & Observability
+**Priority:** P2
+**Estimated Complexity:** Low
+
+**Metrics:**
+- `memory_size_bytes` by type and scope
+- `memory_operations_total` counter
+- `memory_retrieval_latency_seconds` histogram
+- `memory_consolidation_duration_seconds` histogram
+- `procedure_success_rate` gauge
+
+---
+
+#### Sub-Issue #62-15: Documentation & Final Testing
+**Priority:** P0
+**Estimated Complexity:** Medium
+
+**Deliverables:**
+- [ ] README with architecture overview
+- [ ] API documentation with examples
+- [ ] Integration guide
+- [ ] E2E tests for full memory lifecycle
+- [ ] Achieve >90% code coverage
+- [ ] Performance benchmarks
+
+---
+
+## Implementation Order
+
+```
+Phase 1 (Foundation) - Sequential
+  #62-1 → #62-2 → #62-3
+
+Phase 2 (Memory Types) - Can parallelize after Phase 1
+  #62-4, #62-5, #62-6 (parallel after #62-3)
+
+Phase 3 (Advanced) - Sequential within phase
+  #62-7 → #62-8 → #62-9
+
+Phase 4 (Integration) - After Phase 2
+  #62-10 → #62-11 → #62-12
+
+Phase 5 (Quality) - Final
+  #62-13, #62-14, #62-15
+```
+
+---
+
+## Performance Targets
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| Working memory get/set | <5ms | P95 |
+| Episodic memory retrieval | <100ms | P95, as per epic |
+| Semantic memory search | <100ms | P95 |
+| Procedural memory matching | <50ms | P95 |
+| Consolidation batch | <30s | Per 1000 episodes |
+
+---
+
+## Risk Mitigation
+
+1. **Embedding costs** - Use caching aggressively, batch embeddings
+2. **Storage growth** - Implement TTL, pruning, and archival policies
+3. **Query performance** - HNSW indexes, pagination, query optimization
+4. **Scope complexity** - Start simple (instance scope only), add hierarchy later
+
+---
+
+## Review Checkpoints
+
+After each sub-issue:
+1. Run `make validate-all`
+2. Multi-agent code review
+3. Verify E2E stack still works
+4. Commit with granular message