# Agent Memory System - Implementation Plan

## Issue #62 - Part of Epic #60 (Phase 2: MCP Integration)

**Branch:** `feature/62-agent-memory-system`
**Parent Epic:** #60 [EPIC] Phase 2: MCP Integration
**Dependencies:** #56 (LLM Gateway), #57 (Knowledge Base), #61 (Context Management Engine)

---

## Executive Summary

The Agent Memory System provides multi-tier cognitive memory for AI agents, enabling them to:
- Maintain state across sessions (Working Memory)
- Learn from past experiences (Episodic Memory)
- Store and retrieve facts (Semantic Memory)
- Develop and reuse procedures (Procedural Memory)

### Architecture Overview

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           Agent Memory System                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────┐                      ┌─────────────────┐               │
│  │ Working Memory  │───────────────────▶  │ Episodic Memory │               │
│  │ (Redis/In-Mem)  │    consolidate       │  (PostgreSQL)   │               │
│  │                 │                      │                 │               │
│  │ • Current task  │                      │ • Past sessions │               │
│  │ • Variables     │                      │ • Experiences   │               │
│  │ • Scratchpad    │                      │ • Outcomes      │               │
│  └─────────────────┘                      └────────┬────────┘               │
│                                                    │                         │
│                                           extract  │                         │
│                                                    ▼                         │
│  ┌─────────────────┐                      ┌─────────────────┐               │
│  │Procedural Memory│◀─────────────────────│ Semantic Memory │               │
│  │  (PostgreSQL)   │      learn from      │  (PostgreSQL +  │               │
│  │                 │                      │    pgvector)    │               │
│  │ • Procedures    │                      │                 │               │
│  │ • Skills        │                      │ • Facts         │               │
│  │ • Patterns      │                      │ • Entities      │               │
│  └─────────────────┘                      │ • Relationships │               │
│                                           └─────────────────┘               │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Memory Scoping Hierarchy

```
Global Memory (shared by all)
└── Project Memory (per project)
    └── Agent Type Memory (per agent type)
        └── Agent Instance Memory (per instance)
            └── Session Memory (ephemeral)
```

---

## Sub-Issue Breakdown

### Phase 1: Foundation (Critical Path)

#### Sub-Issue #62-1: Project Setup & Core Architecture
**Priority:** P0 - Must complete first
**Estimated Complexity:** Medium

**Tasks:**
- [ ] Create `backend/app/services/memory/` directory structure
- [ ] Create `__init__.py` with public API exports
- [ ] Create `config.py` with `MemorySettings` (Pydantic)
- [ ] Define base interfaces in `types.py`:
  - `MemoryItem` - Base class for all memory items
  - `MemoryScope` - Enum for scoping levels
  - `MemoryStore` - Abstract base for storage backends
- [ ] Create `manager.py` with `MemoryManager` class (facade)
- [ ] Create `exceptions.py` with memory-specific errors
- [ ] Write ADR-010 documenting memory architecture decisions
- [ ] Create dependency injection setup
- [ ] Unit tests for configuration and types

**Deliverables:**
- Directory structure matching existing patterns (like `context/`, `safety/`)
- Configuration with MEM_ env prefix
- Type definitions for all memory concepts
- Comprehensive unit tests

---

#### Sub-Issue #62-2: Database Schema & Storage Layer
**Priority:** P0 - Required for all memory types
**Estimated Complexity:** High

**Database Tables:**

1. **`working_memory`** - Ephemeral key-value storage
   - `id` (UUID, PK)
   - `scope_type` (ENUM: global/project/agent_type/agent_instance/session)
   - `scope_id` (VARCHAR - the ID for the scope level)
   - `key` (VARCHAR)
   - `value` (JSONB)
   - `expires_at` (TIMESTAMP WITH TZ)
   - `created_at`, `updated_at`

2. **`episodes`** - Experiential memories
   - `id` (UUID, PK)
   - `project_id` (UUID, FK)
   - `agent_instance_id` (UUID, FK, nullable)
   - `agent_type_id` (UUID, FK, nullable)
   - `session_id` (VARCHAR)
   - `task_type` (VARCHAR)
   - `task_description` (TEXT)
   - `actions` (JSONB)
   - `context_summary` (TEXT)
   - `outcome` (ENUM: success/failure/partial)
   - `outcome_details` (TEXT)
   - `duration_seconds` (FLOAT)
   - `tokens_used` (BIGINT)
   - `lessons_learned` (JSONB - list of strings)
   - `importance_score` (FLOAT, 0-1)
   - `embedding` (VECTOR(1536))
   - `occurred_at` (TIMESTAMP WITH TZ)
   - `created_at`, `updated_at`

3. **`facts`** - Semantic knowledge
   - `id` (UUID, PK)
   - `project_id` (UUID, FK, nullable - null for global)
   - `subject` (VARCHAR)
   - `predicate` (VARCHAR)
   - `object` (TEXT)
   - `confidence` (FLOAT, 0-1)
   - `source_episode_ids` (UUID[])
   - `first_learned` (TIMESTAMP WITH TZ)
   - `last_reinforced` (TIMESTAMP WITH TZ)
   - `reinforcement_count` (INT)
   - `embedding` (VECTOR(1536))
   - `created_at`, `updated_at`

4. **`procedures`** - Learned skills
   - `id` (UUID, PK)
   - `project_id` (UUID, FK, nullable)
   - `agent_type_id` (UUID, FK, nullable)
   - `name` (VARCHAR)
   - `trigger_pattern` (TEXT)
   - `steps` (JSONB)
   - `success_count` (INT)
   - `failure_count` (INT)
   - `last_used` (TIMESTAMP WITH TZ)
   - `embedding` (VECTOR(1536))
   - `created_at`, `updated_at`

5. **`memory_consolidation_log`** - Consolidation tracking
   - `id` (UUID, PK)
   - `consolidation_type` (ENUM)
   - `source_count` (INT)
   - `result_count` (INT)
   - `started_at`, `completed_at`
   - `status` (ENUM: pending/running/completed/failed)
   - `error` (TEXT, nullable)

**Tasks:**
- [ ] Create SQLAlchemy models in `backend/app/models/memory/`
- [ ] Create Alembic migration with all tables
- [ ] Add pgvector indexes (HNSW for episodes, facts, procedures)
- [ ] Create repository classes in `backend/app/crud/memory/`
- [ ] Add composite indexes for common query patterns
- [ ] Unit tests for all repositories

---

#### Sub-Issue #62-3: Working Memory Implementation
**Priority:** P0 - Core functionality
**Estimated Complexity:** Medium

**Components:**
- `backend/app/services/memory/working/memory.py` - WorkingMemory class
- `backend/app/services/memory/working/storage.py` - Redis + in-memory backend

**Features:**
- [ ] Session-scoped containers with automatic cleanup
- [ ] Variable storage (get/set/delete)
- [ ] Task state tracking (current step, status, progress)
- [ ] Scratchpad for reasoning steps
- [ ] Configurable capacity limits
- [ ] TTL-based expiration
- [ ] Checkpoint/snapshot support for recovery
- [ ] Redis primary storage with in-memory fallback

**API:**
```python
class WorkingMemory:
    async def set(self, key: str, value: Any, ttl_seconds: int | None = None) -> None
    async def get(self, key: str, default: Any = None) -> Any
    async def delete(self, key: str) -> bool
    async def exists(self, key: str) -> bool
    async def list_keys(self, pattern: str = "*") -> list[str]
    async def get_all(self) -> dict[str, Any]
    async def clear(self) -> int
    async def set_task_state(self, state: TaskState) -> None
    async def get_task_state(self) -> TaskState | None
    async def append_scratchpad(self, content: str) -> None
    async def get_scratchpad(self) -> list[str]
    async def create_checkpoint(self) -> str  # Returns checkpoint ID
    async def restore_checkpoint(self, checkpoint_id: str) -> None
```

---

### Phase 2: Memory Types

#### Sub-Issue #62-4: Episodic Memory Implementation
**Priority:** P1
**Estimated Complexity:** High

**Components:**
- `backend/app/services/memory/episodic/memory.py` - EpisodicMemory class
- `backend/app/services/memory/episodic/recorder.py` - Episode recording
- `backend/app/services/memory/episodic/retrieval.py` - Retrieval strategies

**Features:**
- [ ] Episode recording during agent execution
- [ ] Store task completions with context
- [ ] Store failures with error context
- [ ] Retrieval by semantic similarity (vector search)
- [ ] Retrieval by recency
- [ ] Retrieval by outcome (success/failure)
- [ ] Importance scoring based on outcome significance
- [ ] Episode summarization for long-term storage

**API:**
```python
class EpisodicMemory:
    async def record_episode(self, episode: EpisodeCreate) -> Episode
    async def search_similar(self, query: str, limit: int = 10) -> list[Episode]
    async def get_recent(self, limit: int = 10, since: datetime | None = None) -> list[Episode]
    async def get_by_outcome(self, outcome: Outcome, limit: int = 10) -> list[Episode]
    async def get_by_task_type(self, task_type: str, limit: int = 10) -> list[Episode]
    async def update_importance(self, episode_id: UUID, score: float) -> None
    async def summarize_episodes(self, episode_ids: list[UUID]) -> str
```

---

#### Sub-Issue #62-5: Semantic Memory Implementation
**Priority:** P1
**Estimated Complexity:** High

**Components:**
- `backend/app/services/memory/semantic/memory.py` - SemanticMemory class
- `backend/app/services/memory/semantic/extraction.py` - Fact extraction from episodes
- `backend/app/services/memory/semantic/verification.py` - Fact verification

**Features:**
- [ ] Fact storage with triple format (subject, predicate, object)
- [ ] Confidence scoring and decay
- [ ] Fact extraction from episodic memory
- [ ] Conflict resolution for contradictory facts
- [ ] Retrieval by query (semantic search)
- [ ] Retrieval by entity (subject or object)
- [ ] Source tracking (which episodes contributed)
- [ ] Reinforcement on repeated learning

**API:**
```python
class SemanticMemory:
    async def store_fact(self, fact: FactCreate) -> Fact
    async def search_facts(self, query: str, limit: int = 10) -> list[Fact]
    async def get_by_entity(self, entity: str, limit: int = 20) -> list[Fact]
    async def reinforce_fact(self, fact_id: UUID) -> Fact
    async def deprecate_fact(self, fact_id: UUID, reason: str) -> None
    async def extract_facts_from_episode(self, episode: Episode) -> list[Fact]
    async def resolve_conflict(self, fact_ids: list[UUID]) -> Fact
```

---

#### Sub-Issue #62-6: Procedural Memory Implementation
**Priority:** P2
**Estimated Complexity:** Medium

**Components:**
- `backend/app/services/memory/procedural/memory.py` - ProceduralMemory class
- `backend/app/services/memory/procedural/matching.py` - Procedure matching

**Features:**
- [ ] Procedure recording from successful task patterns
- [ ] Trigger pattern matching
- [ ] Step-by-step procedure storage
- [ ] Success/failure rate tracking
- [ ] Procedure suggestion based on context
- [ ] Procedure versioning

**API:**
```python
class ProceduralMemory:
    async def record_procedure(self, procedure: ProcedureCreate) -> Procedure
    async def find_matching(self, context: str, limit: int = 5) -> list[Procedure]
    async def record_outcome(self, procedure_id: UUID, success: bool) -> None
    async def get_best_procedure(self, task_type: str) -> Procedure | None
    async def update_steps(self, procedure_id: UUID, steps: list[Step]) -> Procedure
```

---

### Phase 3: Advanced Features

#### Sub-Issue #62-7: Memory Scoping
**Priority:** P1
**Estimated Complexity:** Medium

**Components:**
- `backend/app/services/memory/scoping/scope.py` - Scope management
- `backend/app/services/memory/scoping/resolver.py` - Scope resolution

**Features:**
- [ ] Global scope (shared across all)
- [ ] Project scope (per project)
- [ ] Agent type scope (per agent type)
- [ ] Agent instance scope (per instance)
- [ ] Session scope (ephemeral)
- [ ] Scope inheritance (child sees parent memories)
- [ ] Access control policies

---

#### Sub-Issue #62-8: Memory Indexing & Retrieval
**Priority:** P1
**Estimated Complexity:** High

**Components:**
- `backend/app/services/memory/indexing/index.py` - Memory indexer
- `backend/app/services/memory/indexing/retrieval.py` - Retrieval engine

**Features:**
- [ ] Vector embeddings for all memory types
- [ ] Temporal index (by time)
- [ ] Entity index (by entities mentioned)
- [ ] Outcome index (by success/failure)
- [ ] Hybrid retrieval (vector + filters)
- [ ] Relevance scoring
- [ ] Retrieval caching

---

#### Sub-Issue #62-9: Memory Consolidation
**Priority:** P2
**Estimated Complexity:** High

**Components:**
- `backend/app/services/memory/consolidation/service.py` - Consolidation service
- `backend/app/tasks/memory_consolidation.py` - Celery tasks

**Features:**
- [ ] Working → Episodic transfer (session end)
- [ ] Episodic → Semantic extraction (learn facts)
- [ ] Episodic → Procedural extraction (learn procedures)
- [ ] Nightly consolidation Celery tasks
- [ ] Memory pruning (remove low-value)
- [ ] Importance-based retention

---

### Phase 4: Integration

#### Sub-Issue #62-10: MCP Tools Definition
**Priority:** P0 - Required for agent usage
**Estimated Complexity:** Medium

**MCP Tools:**

1. **`remember`** - Store in memory
   ```json
   {
     "memory_type": "working|episodic|semantic|procedural",
     "content": "...",
     "importance": 0.8,
     "ttl_seconds": 3600
   }
   ```

2. **`recall`** - Retrieve from memory
   ```json
   {
     "query": "...",
     "memory_types": ["episodic", "semantic"],
     "limit": 10,
     "filters": {"outcome": "success"}
   }
   ```

3. **`forget`** - Remove from memory
   ```json
   {
     "memory_type": "working",
     "key": "temp_calculation"
   }
   ```

4. **`reflect`** - Analyze patterns
   ```json
   {
     "analysis_type": "recent_patterns|success_factors|failure_patterns"
   }
   ```

5. **`get_memory_stats`** - Usage statistics
6. **`search_procedures`** - Find relevant procedures
7. **`record_outcome`** - Record task success/failure

---

#### Sub-Issue #62-11: Component Integration
**Priority:** P1
**Estimated Complexity:** Medium

**Integrations:**
- [ ] Context Engine (#61) - Include relevant memories in context assembly
- [ ] Knowledge Base (#57) - Coordinate with KB to avoid duplication
- [ ] LLM Gateway (#56) - Use for embedding generation
- [ ] Agent lifecycle hooks (spawn, pause, resume, terminate)

---

#### Sub-Issue #62-12: Caching Layer
**Priority:** P2
**Estimated Complexity:** Medium

**Features:**
- [ ] Hot memory caching (frequently accessed)
- [ ] Retrieval result caching
- [ ] Embedding caching
- [ ] Cache invalidation strategies

---

### Phase 5: Intelligence & Quality

#### Sub-Issue #62-13: Memory Reflection
**Priority:** P3
**Estimated Complexity:** High

**Features:**
- [ ] Pattern detection in episodic memory
- [ ] Success/failure factor analysis
- [ ] Anomaly detection
- [ ] Insights generation

---

#### Sub-Issue #62-14: Metrics & Observability
**Priority:** P2
**Estimated Complexity:** Low

**Metrics:**
- `memory_size_bytes` by type and scope
- `memory_operations_total` counter
- `memory_retrieval_latency_seconds` histogram
- `memory_consolidation_duration_seconds` histogram
- `procedure_success_rate` gauge

---

#### Sub-Issue #62-15: Documentation & Final Testing
**Priority:** P0
**Estimated Complexity:** Medium

**Deliverables:**
- [ ] README with architecture overview
- [ ] API documentation with examples
- [ ] Integration guide
- [ ] E2E tests for full memory lifecycle
- [ ] Achieve >90% code coverage
- [ ] Performance benchmarks

---

## Implementation Order

```
Phase 1 (Foundation) - Sequential
  #62-1 → #62-2 → #62-3

Phase 2 (Memory Types) - Can parallelize after Phase 1
  #62-4, #62-5, #62-6 (parallel after #62-3)

Phase 3 (Advanced) - Sequential within phase
  #62-7 → #62-8 → #62-9

Phase 4 (Integration) - After Phase 2
  #62-10 → #62-11 → #62-12

Phase 5 (Quality) - Final
  #62-13, #62-14, #62-15
```

---

## Performance Targets

| Metric | Target | Notes |
|--------|--------|-------|
| Working memory get/set | <5ms | P95 |
| Episodic memory retrieval | <100ms | P95, as per epic |
| Semantic memory search | <100ms | P95 |
| Procedural memory matching | <50ms | P95 |
| Consolidation batch | <30s | Per 1000 episodes |

---

## Risk Mitigation

1. **Embedding costs** - Use caching aggressively, batch embeddings
2. **Storage growth** - Implement TTL, pruning, and archival policies
3. **Query performance** - HNSW indexes, pagination, query optimization
4. **Scope complexity** - Start simple (instance scope only), add hierarchy later

---

## Review Checkpoints

After each sub-issue:
1. Run `make validate-all`
2. Multi-agent code review
3. Verify E2E stack still works
4. Commit with granular message