feat(memory): #87 project setup & core architecture

Implements Sub-Issue #87 of Issue #62 (Agent Memory System).

Core infrastructure:
- memory/types.py: Type definitions for all memory types (Working, Episodic,
  Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome
- memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton
- memory/exceptions.py: Comprehensive exception hierarchy for memory operations
- memory/manager.py: MemoryManager facade with placeholder methods

Directory structure:
- working/: Working memory (Redis/in-memory) - to be implemented in #89
- episodic/: Episodic memory (experiences) - to be implemented in #90
- semantic/: Semantic memory (facts) - to be implemented in #91
- procedural/: Procedural memory (skills) - to be implemented in #92
- scoping/: Scope management - to be implemented in #93
- indexing/: Vector indexing - to be implemented in #94
- consolidation/: Memory consolidation - to be implemented in #95

Tests: 71 unit tests for config, types, and exceptions
Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-05 01:27:36 +01:00
parent 4b149b8a52
commit 085a748929
17 changed files with 3242 additions and 0 deletions

View File

@@ -0,0 +1,526 @@
# Agent Memory System - Implementation Plan
## Issue #62 - Part of Epic #60 (Phase 2: MCP Integration)
**Branch:** `feature/62-agent-memory-system`
**Parent Epic:** #60 [EPIC] Phase 2: MCP Integration
**Dependencies:** #56 (LLM Gateway), #57 (Knowledge Base), #61 (Context Management Engine)
---
## Executive Summary
The Agent Memory System provides multi-tier cognitive memory for AI agents, enabling them to:
- Maintain state across sessions (Working Memory)
- Learn from past experiences (Episodic Memory)
- Store and retrieve facts (Semantic Memory)
- Develop and reuse procedures (Procedural Memory)
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Agent Memory System │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Working Memory │───────────────────▶ │ Episodic Memory │ │
│ │ (Redis/In-Mem) │ consolidate │ (PostgreSQL) │ │
│ │ │ │ │ │
│ │ • Current task │ │ • Past sessions │ │
│ │ • Variables │ │ • Experiences │ │
│ │ • Scratchpad │ │ • Outcomes │ │
│ └─────────────────┘ └────────┬────────┘ │
│ │ │
│ extract │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │Procedural Memory│◀─────────────────────│ Semantic Memory │ │
│ │ (PostgreSQL) │ learn from │ (PostgreSQL + │ │
│ │ │ │ pgvector) │ │
│ │ • Procedures │ │ │ │
│ │ • Skills │ │ • Facts │ │
│ │ • Patterns │ │ • Entities │ │
│ └─────────────────┘ │ • Relationships │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Memory Scoping Hierarchy
```
Global Memory (shared by all)
└── Project Memory (per project)
└── Agent Type Memory (per agent type)
└── Agent Instance Memory (per instance)
└── Session Memory (ephemeral)
```
---
## Sub-Issue Breakdown
### Phase 1: Foundation (Critical Path)
#### Sub-Issue #62-1: Project Setup & Core Architecture
**Priority:** P0 - Must complete first
**Estimated Complexity:** Medium
**Tasks:**
- [ ] Create `backend/app/services/memory/` directory structure
- [ ] Create `__init__.py` with public API exports
- [ ] Create `config.py` with `MemorySettings` (Pydantic)
- [ ] Define base interfaces in `types.py`:
- `MemoryItem` - Base class for all memory items
- `MemoryScope` - Enum for scoping levels
- `MemoryStore` - Abstract base for storage backends
- [ ] Create `manager.py` with `MemoryManager` class (facade)
- [ ] Create `exceptions.py` with memory-specific errors
- [ ] Write ADR-010 documenting memory architecture decisions
- [ ] Create dependency injection setup
- [ ] Unit tests for configuration and types
**Deliverables:**
- Directory structure matching existing patterns (like `context/`, `safety/`)
- Configuration with MEM_ env prefix
- Type definitions for all memory concepts
- Comprehensive unit tests
---
#### Sub-Issue #62-2: Database Schema & Storage Layer
**Priority:** P0 - Required for all memory types
**Estimated Complexity:** High
**Database Tables:**
1. **`working_memory`** - Ephemeral key-value storage
- `id` (UUID, PK)
- `scope_type` (ENUM: global/project/agent_type/agent_instance/session)
- `scope_id` (VARCHAR - the ID for the scope level)
- `key` (VARCHAR)
- `value` (JSONB)
- `expires_at` (TIMESTAMP WITH TZ)
- `created_at`, `updated_at`
2. **`episodes`** - Experiential memories
- `id` (UUID, PK)
- `project_id` (UUID, FK)
- `agent_instance_id` (UUID, FK, nullable)
- `agent_type_id` (UUID, FK, nullable)
- `session_id` (VARCHAR)
- `task_type` (VARCHAR)
- `task_description` (TEXT)
- `actions` (JSONB)
- `context_summary` (TEXT)
- `outcome` (ENUM: success/failure/partial)
- `outcome_details` (TEXT)
- `duration_seconds` (FLOAT)
- `tokens_used` (BIGINT)
- `lessons_learned` (JSONB - list of strings)
- `importance_score` (FLOAT, 0-1)
- `embedding` (VECTOR(1536))
- `occurred_at` (TIMESTAMP WITH TZ)
- `created_at`, `updated_at`
3. **`facts`** - Semantic knowledge
- `id` (UUID, PK)
- `project_id` (UUID, FK, nullable - null for global)
- `subject` (VARCHAR)
- `predicate` (VARCHAR)
- `object` (TEXT)
- `confidence` (FLOAT, 0-1)
- `source_episode_ids` (UUID[])
- `first_learned` (TIMESTAMP WITH TZ)
- `last_reinforced` (TIMESTAMP WITH TZ)
- `reinforcement_count` (INT)
- `embedding` (VECTOR(1536))
- `created_at`, `updated_at`
4. **`procedures`** - Learned skills
- `id` (UUID, PK)
- `project_id` (UUID, FK, nullable)
- `agent_type_id` (UUID, FK, nullable)
- `name` (VARCHAR)
- `trigger_pattern` (TEXT)
- `steps` (JSONB)
- `success_count` (INT)
- `failure_count` (INT)
- `last_used` (TIMESTAMP WITH TZ)
- `embedding` (VECTOR(1536))
- `created_at`, `updated_at`
5. **`memory_consolidation_log`** - Consolidation tracking
- `id` (UUID, PK)
- `consolidation_type` (ENUM)
- `source_count` (INT)
- `result_count` (INT)
- `started_at`, `completed_at`
- `status` (ENUM: pending/running/completed/failed)
- `error` (TEXT, nullable)
**Tasks:**
- [ ] Create SQLAlchemy models in `backend/app/models/memory/`
- [ ] Create Alembic migration with all tables
- [ ] Add pgvector indexes (HNSW for episodes, facts, procedures)
- [ ] Create repository classes in `backend/app/crud/memory/`
- [ ] Add composite indexes for common query patterns
- [ ] Unit tests for all repositories
---
#### Sub-Issue #62-3: Working Memory Implementation
**Priority:** P0 - Core functionality
**Estimated Complexity:** Medium
**Components:**
- `backend/app/services/memory/working/memory.py` - WorkingMemory class
- `backend/app/services/memory/working/storage.py` - Redis + in-memory backend
**Features:**
- [ ] Session-scoped containers with automatic cleanup
- [ ] Variable storage (get/set/delete)
- [ ] Task state tracking (current step, status, progress)
- [ ] Scratchpad for reasoning steps
- [ ] Configurable capacity limits
- [ ] TTL-based expiration
- [ ] Checkpoint/snapshot support for recovery
- [ ] Redis primary storage with in-memory fallback
**API:**
```python
class WorkingMemory:
async def set(self, key: str, value: Any, ttl_seconds: int | None = None) -> None
async def get(self, key: str, default: Any = None) -> Any
async def delete(self, key: str) -> bool
async def exists(self, key: str) -> bool
async def list_keys(self, pattern: str = "*") -> list[str]
async def get_all(self) -> dict[str, Any]
async def clear(self) -> int
async def set_task_state(self, state: TaskState) -> None
async def get_task_state(self) -> TaskState | None
async def append_scratchpad(self, content: str) -> None
async def get_scratchpad(self) -> list[str]
async def create_checkpoint(self) -> str # Returns checkpoint ID
async def restore_checkpoint(self, checkpoint_id: str) -> None
```
---
### Phase 2: Memory Types
#### Sub-Issue #62-4: Episodic Memory Implementation
**Priority:** P1
**Estimated Complexity:** High
**Components:**
- `backend/app/services/memory/episodic/memory.py` - EpisodicMemory class
- `backend/app/services/memory/episodic/recorder.py` - Episode recording
- `backend/app/services/memory/episodic/retrieval.py` - Retrieval strategies
**Features:**
- [ ] Episode recording during agent execution
- [ ] Store task completions with context
- [ ] Store failures with error context
- [ ] Retrieval by semantic similarity (vector search)
- [ ] Retrieval by recency
- [ ] Retrieval by outcome (success/failure)
- [ ] Importance scoring based on outcome significance
- [ ] Episode summarization for long-term storage
**API:**
```python
class EpisodicMemory:
async def record_episode(self, episode: EpisodeCreate) -> Episode
async def search_similar(self, query: str, limit: int = 10) -> list[Episode]
async def get_recent(self, limit: int = 10, since: datetime | None = None) -> list[Episode]
async def get_by_outcome(self, outcome: Outcome, limit: int = 10) -> list[Episode]
async def get_by_task_type(self, task_type: str, limit: int = 10) -> list[Episode]
async def update_importance(self, episode_id: UUID, score: float) -> None
async def summarize_episodes(self, episode_ids: list[UUID]) -> str
```
---
#### Sub-Issue #62-5: Semantic Memory Implementation
**Priority:** P1
**Estimated Complexity:** High
**Components:**
- `backend/app/services/memory/semantic/memory.py` - SemanticMemory class
- `backend/app/services/memory/semantic/extraction.py` - Fact extraction from episodes
- `backend/app/services/memory/semantic/verification.py` - Fact verification
**Features:**
- [ ] Fact storage with triple format (subject, predicate, object)
- [ ] Confidence scoring and decay
- [ ] Fact extraction from episodic memory
- [ ] Conflict resolution for contradictory facts
- [ ] Retrieval by query (semantic search)
- [ ] Retrieval by entity (subject or object)
- [ ] Source tracking (which episodes contributed)
- [ ] Reinforcement on repeated learning
**API:**
```python
class SemanticMemory:
async def store_fact(self, fact: FactCreate) -> Fact
async def search_facts(self, query: str, limit: int = 10) -> list[Fact]
async def get_by_entity(self, entity: str, limit: int = 20) -> list[Fact]
async def reinforce_fact(self, fact_id: UUID) -> Fact
async def deprecate_fact(self, fact_id: UUID, reason: str) -> None
async def extract_facts_from_episode(self, episode: Episode) -> list[Fact]
async def resolve_conflict(self, fact_ids: list[UUID]) -> Fact
```
---
#### Sub-Issue #62-6: Procedural Memory Implementation
**Priority:** P2
**Estimated Complexity:** Medium
**Components:**
- `backend/app/services/memory/procedural/memory.py` - ProceduralMemory class
- `backend/app/services/memory/procedural/matching.py` - Procedure matching
**Features:**
- [ ] Procedure recording from successful task patterns
- [ ] Trigger pattern matching
- [ ] Step-by-step procedure storage
- [ ] Success/failure rate tracking
- [ ] Procedure suggestion based on context
- [ ] Procedure versioning
**API:**
```python
class ProceduralMemory:
async def record_procedure(self, procedure: ProcedureCreate) -> Procedure
async def find_matching(self, context: str, limit: int = 5) -> list[Procedure]
async def record_outcome(self, procedure_id: UUID, success: bool) -> None
async def get_best_procedure(self, task_type: str) -> Procedure | None
async def update_steps(self, procedure_id: UUID, steps: list[Step]) -> Procedure
```
---
### Phase 3: Advanced Features
#### Sub-Issue #62-7: Memory Scoping
**Priority:** P1
**Estimated Complexity:** Medium
**Components:**
- `backend/app/services/memory/scoping/scope.py` - Scope management
- `backend/app/services/memory/scoping/resolver.py` - Scope resolution
**Features:**
- [ ] Global scope (shared across all)
- [ ] Project scope (per project)
- [ ] Agent type scope (per agent type)
- [ ] Agent instance scope (per instance)
- [ ] Session scope (ephemeral)
- [ ] Scope inheritance (child sees parent memories)
- [ ] Access control policies
---
#### Sub-Issue #62-8: Memory Indexing & Retrieval
**Priority:** P1
**Estimated Complexity:** High
**Components:**
- `backend/app/services/memory/indexing/index.py` - Memory indexer
- `backend/app/services/memory/indexing/retrieval.py` - Retrieval engine
**Features:**
- [ ] Vector embeddings for all memory types
- [ ] Temporal index (by time)
- [ ] Entity index (by entities mentioned)
- [ ] Outcome index (by success/failure)
- [ ] Hybrid retrieval (vector + filters)
- [ ] Relevance scoring
- [ ] Retrieval caching
---
#### Sub-Issue #62-9: Memory Consolidation
**Priority:** P2
**Estimated Complexity:** High
**Components:**
- `backend/app/services/memory/consolidation/service.py` - Consolidation service
- `backend/app/tasks/memory_consolidation.py` - Celery tasks
**Features:**
- [ ] Working → Episodic transfer (session end)
- [ ] Episodic → Semantic extraction (learn facts)
- [ ] Episodic → Procedural extraction (learn procedures)
- [ ] Nightly consolidation Celery tasks
- [ ] Memory pruning (remove low-value)
- [ ] Importance-based retention
---
### Phase 4: Integration
#### Sub-Issue #62-10: MCP Tools Definition
**Priority:** P0 - Required for agent usage
**Estimated Complexity:** Medium
**MCP Tools:**
1. **`remember`** - Store in memory
```json
{
"memory_type": "working|episodic|semantic|procedural",
"content": "...",
"importance": 0.8,
"ttl_seconds": 3600
}
```
2. **`recall`** - Retrieve from memory
```json
{
"query": "...",
"memory_types": ["episodic", "semantic"],
"limit": 10,
"filters": {"outcome": "success"}
}
```
3. **`forget`** - Remove from memory
```json
{
"memory_type": "working",
"key": "temp_calculation"
}
```
4. **`reflect`** - Analyze patterns
```json
{
"analysis_type": "recent_patterns|success_factors|failure_patterns"
}
```
5. **`get_memory_stats`** - Usage statistics
6. **`search_procedures`** - Find relevant procedures
7. **`record_outcome`** - Record task success/failure
---
#### Sub-Issue #62-11: Component Integration
**Priority:** P1
**Estimated Complexity:** Medium
**Integrations:**
- [ ] Context Engine (#61) - Include relevant memories in context assembly
- [ ] Knowledge Base (#57) - Coordinate with KB to avoid duplication
- [ ] LLM Gateway (#56) - Use for embedding generation
- [ ] Agent lifecycle hooks (spawn, pause, resume, terminate)
---
#### Sub-Issue #62-12: Caching Layer
**Priority:** P2
**Estimated Complexity:** Medium
**Features:**
- [ ] Hot memory caching (frequently accessed)
- [ ] Retrieval result caching
- [ ] Embedding caching
- [ ] Cache invalidation strategies
---
### Phase 5: Intelligence & Quality
#### Sub-Issue #62-13: Memory Reflection
**Priority:** P3
**Estimated Complexity:** High
**Features:**
- [ ] Pattern detection in episodic memory
- [ ] Success/failure factor analysis
- [ ] Anomaly detection
- [ ] Insights generation
---
#### Sub-Issue #62-14: Metrics & Observability
**Priority:** P2
**Estimated Complexity:** Low
**Metrics:**
- `memory_size_bytes` by type and scope
- `memory_operations_total` counter
- `memory_retrieval_latency_seconds` histogram
- `memory_consolidation_duration_seconds` histogram
- `procedure_success_rate` gauge
---
#### Sub-Issue #62-15: Documentation & Final Testing
**Priority:** P0
**Estimated Complexity:** Medium
**Deliverables:**
- [ ] README with architecture overview
- [ ] API documentation with examples
- [ ] Integration guide
- [ ] E2E tests for full memory lifecycle
- [ ] Achieve >90% code coverage
- [ ] Performance benchmarks
---
## Implementation Order
```
Phase 1 (Foundation) - Sequential
#62-1 → #62-2 → #62-3
Phase 2 (Memory Types) - Can parallelize after Phase 1
#62-4, #62-5, #62-6 (parallel after #62-3)
Phase 3 (Advanced) - Sequential within phase
#62-7 → #62-8 → #62-9
Phase 4 (Integration) - After Phase 2
#62-10 → #62-11 → #62-12
Phase 5 (Quality) - Final
#62-13, #62-14, #62-15
```
---
## Performance Targets
| Metric | Target | Notes |
|--------|--------|-------|
| Working memory get/set | <5ms | P95 |
| Episodic memory retrieval | <100ms | P95, as per epic |
| Semantic memory search | <100ms | P95 |
| Procedural memory matching | <50ms | P95 |
| Consolidation batch | <30s | Per 1000 episodes |
---
## Risk Mitigation
1. **Embedding costs** - Use caching aggressively, batch embeddings
2. **Storage growth** - Implement TTL, pruning, and archival policies
3. **Query performance** - HNSW indexes, pagination, query optimization
4. **Scope complexity** - Start simple (instance scope only), add hierarchy later
---
## Review Checkpoints
After each sub-issue:
1. Run `make validate-all`
2. Multi-agent code review
3. Verify E2E stack still works
4. Commit with granular message