Files
syndarix/docs/architecture/memory-system-plan.md
Felipe Cardoso 085a748929 feat(memory): #87 project setup & core architecture
Implements Sub-Issue #87 of Issue #62 (Agent Memory System).

Core infrastructure:
- memory/types.py: Type definitions for all memory types (Working, Episodic,
  Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome
- memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton
- memory/exceptions.py: Comprehensive exception hierarchy for memory operations
- memory/manager.py: MemoryManager facade with placeholder methods

Directory structure:
- working/: Working memory (Redis/in-memory) - to be implemented in #89
- episodic/: Episodic memory (experiences) - to be implemented in #90
- semantic/: Semantic memory (facts) - to be implemented in #91
- procedural/: Procedural memory (skills) - to be implemented in #92
- scoping/: Scope management - to be implemented in #93
- indexing/: Vector indexing - to be implemented in #94
- consolidation/: Memory consolidation - to be implemented in #95

Tests: 71 unit tests for config, types, and exceptions
Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:27:36 +01:00

18 KiB

Agent Memory System - Implementation Plan

Issue #62 - Part of Epic #60 (Phase 2: MCP Integration)

Branch: feature/62-agent-memory-system Parent Epic: #60 [EPIC] Phase 2: MCP Integration Dependencies: #56 (LLM Gateway), #57 (Knowledge Base), #61 (Context Management Engine)


Executive Summary

The Agent Memory System provides multi-tier cognitive memory for AI agents, enabling them to:

  • Maintain state across sessions (Working Memory)
  • Learn from past experiences (Episodic Memory)
  • Store and retrieve facts (Semantic Memory)
  • Develop and reuse procedures (Procedural Memory)

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Agent Memory System                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────┐                      ┌─────────────────┐               │
│  │ Working Memory  │───────────────────▶  │ Episodic Memory │               │
│  │ (Redis/In-Mem)  │    consolidate       │  (PostgreSQL)   │               │
│  │                 │                      │                 │               │
│  │ • Current task  │                      │ • Past sessions │               │
│  │ • Variables     │                      │ • Experiences   │               │
│  │ • Scratchpad    │                      │ • Outcomes      │               │
│  └─────────────────┘                      └────────┬────────┘               │
│                                                    │                         │
│                                           extract  │                         │
│                                                    ▼                         │
│  ┌─────────────────┐                      ┌─────────────────┐               │
│  │Procedural Memory│◀─────────────────────│ Semantic Memory │               │
│  │  (PostgreSQL)   │      learn from      │  (PostgreSQL +  │               │
│  │                 │                      │    pgvector)    │               │
│  │ • Procedures    │                      │                 │               │
│  │ • Skills        │                      │ • Facts         │               │
│  │ • Patterns      │                      │ • Entities      │               │
│  └─────────────────┘                      │ • Relationships │               │
│                                           └─────────────────┘               │
└─────────────────────────────────────────────────────────────────────────────┘

Memory Scoping Hierarchy

Global Memory (shared by all)
└── Project Memory (per project)
    └── Agent Type Memory (per agent type)
        └── Agent Instance Memory (per instance)
            └── Session Memory (ephemeral)

Sub-Issue Breakdown

Phase 1: Foundation (Critical Path)

Sub-Issue #62-1: Project Setup & Core Architecture

Priority: P0 - Must complete first Estimated Complexity: Medium

Tasks:

  • Create backend/app/services/memory/ directory structure
  • Create __init__.py with public API exports
  • Create config.py with MemorySettings (Pydantic)
  • Define base interfaces in types.py:
    • MemoryItem - Base class for all memory items
    • MemoryScope - Enum for scoping levels
    • MemoryStore - Abstract base for storage backends
  • Create manager.py with MemoryManager class (facade)
  • Create exceptions.py with memory-specific errors
  • Write ADR-010 documenting memory architecture decisions
  • Create dependency injection setup
  • Unit tests for configuration and types

Deliverables:

  • Directory structure matching existing patterns (like context/, safety/)
  • Configuration with MEM_ env prefix
  • Type definitions for all memory concepts
  • Comprehensive unit tests

Sub-Issue #62-2: Database Schema & Storage Layer

Priority: P0 - Required for all memory types Estimated Complexity: High

Database Tables:

  1. working_memory - Ephemeral key-value storage

    • id (UUID, PK)
    • scope_type (ENUM: global/project/agent_type/agent_instance/session)
    • scope_id (VARCHAR - the ID for the scope level)
    • key (VARCHAR)
    • value (JSONB)
    • expires_at (TIMESTAMP WITH TZ)
    • created_at, updated_at
  2. episodes - Experiential memories

    • id (UUID, PK)
    • project_id (UUID, FK)
    • agent_instance_id (UUID, FK, nullable)
    • agent_type_id (UUID, FK, nullable)
    • session_id (VARCHAR)
    • task_type (VARCHAR)
    • task_description (TEXT)
    • actions (JSONB)
    • context_summary (TEXT)
    • outcome (ENUM: success/failure/partial)
    • outcome_details (TEXT)
    • duration_seconds (FLOAT)
    • tokens_used (BIGINT)
    • lessons_learned (JSONB - list of strings)
    • importance_score (FLOAT, 0-1)
    • embedding (VECTOR(1536))
    • occurred_at (TIMESTAMP WITH TZ)
    • created_at, updated_at
  3. facts - Semantic knowledge

    • id (UUID, PK)
    • project_id (UUID, FK, nullable - null for global)
    • subject (VARCHAR)
    • predicate (VARCHAR)
    • object (TEXT)
    • confidence (FLOAT, 0-1)
    • source_episode_ids (UUID[])
    • first_learned (TIMESTAMP WITH TZ)
    • last_reinforced (TIMESTAMP WITH TZ)
    • reinforcement_count (INT)
    • embedding (VECTOR(1536))
    • created_at, updated_at
  4. procedures - Learned skills

    • id (UUID, PK)
    • project_id (UUID, FK, nullable)
    • agent_type_id (UUID, FK, nullable)
    • name (VARCHAR)
    • trigger_pattern (TEXT)
    • steps (JSONB)
    • success_count (INT)
    • failure_count (INT)
    • last_used (TIMESTAMP WITH TZ)
    • embedding (VECTOR(1536))
    • created_at, updated_at
  5. memory_consolidation_log - Consolidation tracking

    • id (UUID, PK)
    • consolidation_type (ENUM)
    • source_count (INT)
    • result_count (INT)
    • started_at, completed_at
    • status (ENUM: pending/running/completed/failed)
    • error (TEXT, nullable)

Tasks:

  • Create SQLAlchemy models in backend/app/models/memory/
  • Create Alembic migration with all tables
  • Add pgvector indexes (HNSW for episodes, facts, procedures)
  • Create repository classes in backend/app/crud/memory/
  • Add composite indexes for common query patterns
  • Unit tests for all repositories

Sub-Issue #62-3: Working Memory Implementation

Priority: P0 - Core functionality Estimated Complexity: Medium

Components:

  • backend/app/services/memory/working/memory.py - WorkingMemory class
  • backend/app/services/memory/working/storage.py - Redis + in-memory backend

Features:

  • Session-scoped containers with automatic cleanup
  • Variable storage (get/set/delete)
  • Task state tracking (current step, status, progress)
  • Scratchpad for reasoning steps
  • Configurable capacity limits
  • TTL-based expiration
  • Checkpoint/snapshot support for recovery
  • Redis primary storage with in-memory fallback

API:

class WorkingMemory:
    async def set(self, key: str, value: Any, ttl_seconds: int | None = None) -> None
    async def get(self, key: str, default: Any = None) -> Any
    async def delete(self, key: str) -> bool
    async def exists(self, key: str) -> bool
    async def list_keys(self, pattern: str = "*") -> list[str]
    async def get_all(self) -> dict[str, Any]
    async def clear(self) -> int
    async def set_task_state(self, state: TaskState) -> None
    async def get_task_state(self) -> TaskState | None
    async def append_scratchpad(self, content: str) -> None
    async def get_scratchpad(self) -> list[str]
    async def create_checkpoint(self) -> str  # Returns checkpoint ID
    async def restore_checkpoint(self, checkpoint_id: str) -> None

Phase 2: Memory Types

Sub-Issue #62-4: Episodic Memory Implementation

Priority: P1 Estimated Complexity: High

Components:

  • backend/app/services/memory/episodic/memory.py - EpisodicMemory class
  • backend/app/services/memory/episodic/recorder.py - Episode recording
  • backend/app/services/memory/episodic/retrieval.py - Retrieval strategies

Features:

  • Episode recording during agent execution
  • Store task completions with context
  • Store failures with error context
  • Retrieval by semantic similarity (vector search)
  • Retrieval by recency
  • Retrieval by outcome (success/failure)
  • Importance scoring based on outcome significance
  • Episode summarization for long-term storage

API:

class EpisodicMemory:
    async def record_episode(self, episode: EpisodeCreate) -> Episode
    async def search_similar(self, query: str, limit: int = 10) -> list[Episode]
    async def get_recent(self, limit: int = 10, since: datetime | None = None) -> list[Episode]
    async def get_by_outcome(self, outcome: Outcome, limit: int = 10) -> list[Episode]
    async def get_by_task_type(self, task_type: str, limit: int = 10) -> list[Episode]
    async def update_importance(self, episode_id: UUID, score: float) -> None
    async def summarize_episodes(self, episode_ids: list[UUID]) -> str

Sub-Issue #62-5: Semantic Memory Implementation

Priority: P1 Estimated Complexity: High

Components:

  • backend/app/services/memory/semantic/memory.py - SemanticMemory class
  • backend/app/services/memory/semantic/extraction.py - Fact extraction from episodes
  • backend/app/services/memory/semantic/verification.py - Fact verification

Features:

  • Fact storage with triple format (subject, predicate, object)
  • Confidence scoring and decay
  • Fact extraction from episodic memory
  • Conflict resolution for contradictory facts
  • Retrieval by query (semantic search)
  • Retrieval by entity (subject or object)
  • Source tracking (which episodes contributed)
  • Reinforcement on repeated learning

API:

class SemanticMemory:
    async def store_fact(self, fact: FactCreate) -> Fact
    async def search_facts(self, query: str, limit: int = 10) -> list[Fact]
    async def get_by_entity(self, entity: str, limit: int = 20) -> list[Fact]
    async def reinforce_fact(self, fact_id: UUID) -> Fact
    async def deprecate_fact(self, fact_id: UUID, reason: str) -> None
    async def extract_facts_from_episode(self, episode: Episode) -> list[Fact]
    async def resolve_conflict(self, fact_ids: list[UUID]) -> Fact

Sub-Issue #62-6: Procedural Memory Implementation

Priority: P2 Estimated Complexity: Medium

Components:

  • backend/app/services/memory/procedural/memory.py - ProceduralMemory class
  • backend/app/services/memory/procedural/matching.py - Procedure matching

Features:

  • Procedure recording from successful task patterns
  • Trigger pattern matching
  • Step-by-step procedure storage
  • Success/failure rate tracking
  • Procedure suggestion based on context
  • Procedure versioning

API:

class ProceduralMemory:
    async def record_procedure(self, procedure: ProcedureCreate) -> Procedure
    async def find_matching(self, context: str, limit: int = 5) -> list[Procedure]
    async def record_outcome(self, procedure_id: UUID, success: bool) -> None
    async def get_best_procedure(self, task_type: str) -> Procedure | None
    async def update_steps(self, procedure_id: UUID, steps: list[Step]) -> Procedure

Phase 3: Advanced Features

Sub-Issue #62-7: Memory Scoping

Priority: P1 Estimated Complexity: Medium

Components:

  • backend/app/services/memory/scoping/scope.py - Scope management
  • backend/app/services/memory/scoping/resolver.py - Scope resolution

Features:

  • Global scope (shared across all)
  • Project scope (per project)
  • Agent type scope (per agent type)
  • Agent instance scope (per instance)
  • Session scope (ephemeral)
  • Scope inheritance (child sees parent memories)
  • Access control policies

Sub-Issue #62-8: Memory Indexing & Retrieval

Priority: P1 Estimated Complexity: High

Components:

  • backend/app/services/memory/indexing/index.py - Memory indexer
  • backend/app/services/memory/indexing/retrieval.py - Retrieval engine

Features:

  • Vector embeddings for all memory types
  • Temporal index (by time)
  • Entity index (by entities mentioned)
  • Outcome index (by success/failure)
  • Hybrid retrieval (vector + filters)
  • Relevance scoring
  • Retrieval caching

Sub-Issue #62-9: Memory Consolidation

Priority: P2 Estimated Complexity: High

Components:

  • backend/app/services/memory/consolidation/service.py - Consolidation service
  • backend/app/tasks/memory_consolidation.py - Celery tasks

Features:

  • Working → Episodic transfer (session end)
  • Episodic → Semantic extraction (learn facts)
  • Episodic → Procedural extraction (learn procedures)
  • Nightly consolidation Celery tasks
  • Memory pruning (remove low-value)
  • Importance-based retention

Phase 4: Integration

Sub-Issue #62-10: MCP Tools Definition

Priority: P0 - Required for agent usage Estimated Complexity: Medium

MCP Tools:

  1. remember - Store in memory

    {
      "memory_type": "working|episodic|semantic|procedural",
      "content": "...",
      "importance": 0.8,
      "ttl_seconds": 3600
    }
    
  2. recall - Retrieve from memory

    {
      "query": "...",
      "memory_types": ["episodic", "semantic"],
      "limit": 10,
      "filters": {"outcome": "success"}
    }
    
  3. forget - Remove from memory

    {
      "memory_type": "working",
      "key": "temp_calculation"
    }
    
  4. reflect - Analyze patterns

    {
      "analysis_type": "recent_patterns|success_factors|failure_patterns"
    }
    
  5. get_memory_stats - Usage statistics

  6. search_procedures - Find relevant procedures

  7. record_outcome - Record task success/failure


Sub-Issue #62-11: Component Integration

Priority: P1 Estimated Complexity: Medium

Integrations:

  • Context Engine (#61) - Include relevant memories in context assembly
  • Knowledge Base (#57) - Coordinate with KB to avoid duplication
  • LLM Gateway (#56) - Use for embedding generation
  • Agent lifecycle hooks (spawn, pause, resume, terminate)

Sub-Issue #62-12: Caching Layer

Priority: P2 Estimated Complexity: Medium

Features:

  • Hot memory caching (frequently accessed)
  • Retrieval result caching
  • Embedding caching
  • Cache invalidation strategies

Phase 5: Intelligence & Quality

Sub-Issue #62-13: Memory Reflection

Priority: P3 Estimated Complexity: High

Features:

  • Pattern detection in episodic memory
  • Success/failure factor analysis
  • Anomaly detection
  • Insights generation

Sub-Issue #62-14: Metrics & Observability

Priority: P2 Estimated Complexity: Low

Metrics:

  • memory_size_bytes by type and scope
  • memory_operations_total counter
  • memory_retrieval_latency_seconds histogram
  • memory_consolidation_duration_seconds histogram
  • procedure_success_rate gauge

Sub-Issue #62-15: Documentation & Final Testing

Priority: P0 Estimated Complexity: Medium

Deliverables:

  • README with architecture overview
  • API documentation with examples
  • Integration guide
  • E2E tests for full memory lifecycle
  • Achieve >90% code coverage
  • Performance benchmarks

Implementation Order

Phase 1 (Foundation) - Sequential
  #62-1 → #62-2 → #62-3

Phase 2 (Memory Types) - Can parallelize after Phase 1
  #62-4, #62-5, #62-6 (parallel after #62-3)

Phase 3 (Advanced) - Sequential within phase
  #62-7 → #62-8 → #62-9

Phase 4 (Integration) - After Phase 2
  #62-10 → #62-11 → #62-12

Phase 5 (Quality) - Final
  #62-13, #62-14, #62-15

Performance Targets

Metric Target Notes
Working memory get/set <5ms P95
Episodic memory retrieval <100ms P95, as per epic
Semantic memory search <100ms P95
Procedural memory matching <50ms P95
Consolidation batch <30s Per 1000 episodes

Risk Mitigation

  1. Embedding costs - Use caching aggressively, batch embeddings
  2. Storage growth - Implement TTL, pruning, and archival policies
  3. Query performance - HNSW indexes, pagination, query optimization
  4. Scope complexity - Start simple (instance scope only), add hierarchy later

Review Checkpoints

After each sub-issue:

  1. Run make validate-all
  2. Multi-agent code review
  3. Verify E2E stack still works
  4. Commit with granular message