docs: add architecture spikes and deep analysis documentation

Add comprehensive spike research documents: - SPIKE-002: Agent Orchestration Pattern (LangGraph + Temporal hybrid) - SPIKE-006: Knowledge Base pgvector (RAG with hybrid search) - SPIKE-007: Agent Communication Protocol (JSON-RPC + Redis Streams) - SPIKE-008: Workflow State Machine (transitions lib + event sourcing) - SPIKE-009: Issue Synchronization (bi-directional sync with conflict resolution) - SPIKE-010: Cost Tracking (LiteLLM callbacks + budget enforcement) - SPIKE-011: Audit Logging (structured event sourcing) - SPIKE-012: Client Approval Flow (checkpoint-based approvals) Add architecture documentation: - ARCHITECTURE_DEEP_ANALYSIS.md: Memory management, security, testing strategy - IMPLEMENTATION_ROADMAP.md: 6-phase, 24-week implementation plan Closes #2, #6, #7, #8, #9, #10, #11, #12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:31:02 +01:00
parent ebd307cab4
commit 5594655fba
10 changed files with 12654 additions and 0 deletions
--- a/docs/architecture/ARCHITECTURE_DEEP_ANALYSIS.md
+++ b/docs/architecture/ARCHITECTURE_DEEP_ANALYSIS.md
@@ -0,0 +1,680 @@
+# Syndarix Architecture Deep Analysis
+
+**Version:** 1.0
+**Date:** 2025-12-29
+**Status:** Draft - Architectural Thinking
+
+---
+
+## Executive Summary
+
+This document captures deep architectural thinking about Syndarix beyond the immediate spikes. It addresses complex challenges that arise when building a truly autonomous multi-agent system and proposes solutions based on first principles.
+
+---
+
+## 1. Agent Memory and Context Management
+
+### The Challenge
+
+Agents in Syndarix may work on projects for weeks or months. LLM context windows are finite (128K-200K tokens), but project context grows unboundedly. How do we maintain coherent agent "memory" over time?
+
+### Analysis
+
+**Context Window Constraints:**
+| Model | Context Window | Practical Limit (with tools) |
+|-------|---------------|------------------------------|
+| Claude 3.5 Sonnet | 200K tokens | ~150K usable |
+| GPT-4 Turbo | 128K tokens | ~100K usable |
+| Llama 3 (70B) | 8K-128K tokens | ~80K usable |
+
+**Memory Types Needed:**
+1. **Working Memory** - Current task context (fits in context window)
+2. **Short-term Memory** - Recent conversation history (RAG-retrievable)
+3. **Long-term Memory** - Project knowledge, past decisions (RAG + summarization)
+4. **Episodic Memory** - Specific past events/mistakes to learn from
+
+### Proposed Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                     Agent Memory System                          │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐        │
+│  │   Working    │   │  Short-term  │   │  Long-term   │        │
+│  │   Memory     │   │   Memory     │   │   Memory     │        │
+│  │  (Context)   │   │   (Redis)    │   │  (pgvector)  │        │
+│  └──────┬───────┘   └──────┬───────┘   └──────┬───────┘        │
+│         │                   │                  │                 │
+│         └───────────────────┼──────────────────┘                │
+│                             │                                    │
+│                             ▼                                    │
+│  ┌──────────────────────────────────────────────────────────┐  │
+│  │                  Context Assembler                        │  │
+│  │                                                           │  │
+│  │  1. System prompt (agent personality, role)               │  │
+│  │  2. Project context (from long-term memory)               │  │
+│  │  3. Task context (current issue, requirements)            │  │
+│  │  4. Relevant history (from short-term memory)             │  │
+│  │  5. User message                                          │  │
+│  │                                                           │  │
+│  │  Total: Fit within context window limits                  │  │
+│  └──────────────────────────────────────────────────────────┘  │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+**Context Compression Strategy:**
+```python
+class ContextManager:
+    """Manages agent context to fit within LLM limits."""
+
+    MAX_CONTEXT_TOKENS = 100_000  # Leave room for response
+
+    async def build_context(
+        self,
+        agent: AgentInstance,
+        task: Task,
+        user_message: str
+    ) -> list[Message]:
+        # Fixed costs
+        system_prompt = self._get_system_prompt(agent)  # ~2K tokens
+        task_context = self._get_task_context(task)     # ~1K tokens
+
+        # Variable budget
+        remaining = self.MAX_CONTEXT_TOKENS - token_count(system_prompt, task_context, user_message)
+
+        # Allocate remaining to memories
+        long_term = await self._query_long_term(agent, task, budget=remaining * 0.4)
+        short_term = await self._get_short_term(agent, budget=remaining * 0.4)
+        episodic = await self._get_relevant_episodes(agent, task, budget=remaining * 0.2)
+
+        return self._assemble_messages(
+            system_prompt, task_context, long_term, short_term, episodic, user_message
+        )
+```
+
+**Conversation Summarization:**
+- After every N turns (e.g., 10), summarize conversation and archive
+- Use smaller/cheaper model for summarization
+- Store summaries in pgvector for semantic retrieval
+
+### Recommendation
+
+Implement a **tiered memory system** with automatic context compression and semantic retrieval. Use Redis for hot short-term memory, pgvector for cold long-term memory, and automatic summarization to prevent context overflow.
+
+---
+
+## 2. Cross-Project Knowledge Sharing
+
+### The Challenge
+
+Each project has isolated knowledge, but agents could benefit from cross-project learnings:
+- Common patterns (authentication, testing, CI/CD)
+- Technology expertise (how to configure Kubernetes)
+- Anti-patterns (what didn't work before)
+
+### Analysis
+
+**Privacy Considerations:**
+- Client data must remain isolated (contractual, legal)
+- Technical patterns are generally shareable
+- Need clear data classification
+
+**Knowledge Categories:**
+| Category | Scope | Examples |
+|----------|-------|----------|
+| **Client Data** | Project-only | Requirements, business logic, code |
+| **Technical Patterns** | Global | Best practices, configurations |
+| **Agent Learnings** | Global | What approaches worked/failed |
+| **Anti-patterns** | Global | Common mistakes to avoid |
+
+### Proposed Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    Knowledge Graph                               │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │                   GLOBAL KNOWLEDGE                       │   │
+│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │   │
+│  │  │  Patterns   │  │ Anti-patterns│  │  Expertise  │      │   │
+│  │  │   Library   │  │   Library   │  │   Index     │      │   │
+│  │  └─────────────┘  └─────────────┘  └─────────────┘      │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                             ▲                                    │
+│                             │ Curated extraction                 │
+│                             │                                    │
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
+│  │  Project A  │  │  Project B  │  │  Project C  │             │
+│  │  Knowledge  │  │  Knowledge  │  │  Knowledge  │             │
+│  │  (Isolated) │  │  (Isolated) │  │  (Isolated) │             │
+│  └─────────────┘  └─────────────┘  └─────────────┘             │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+**Knowledge Extraction Pipeline:**
+```python
+class KnowledgeExtractor:
+    """Extracts shareable learnings from project work."""
+
+    async def extract_learnings(self, project_id: str) -> list[Learning]:
+        """
+        Run periodically or after sprints to extract learnings.
+        Human review required before promoting to global.
+        """
+        # Get completed work
+        completed_issues = await self.get_completed_issues(project_id)
+
+        # Extract patterns using LLM
+        patterns = await self.llm.extract_patterns(
+            completed_issues,
+            categories=["architecture", "testing", "deployment", "security"]
+        )
+
+        # Classify privacy
+        for pattern in patterns:
+            pattern.privacy_level = await self.llm.classify_privacy(pattern)
+
+        # Return only shareable patterns for review
+        return [p for p in patterns if p.privacy_level == "public"]
+```
+
+### Recommendation
+
+Implement **privacy-aware knowledge extraction** with human review gate. Project knowledge stays isolated by default; only explicitly approved patterns flow to global knowledge.
+
+---
+
+## 3. Agent Specialization vs Generalization Trade-offs
+
+### The Challenge
+
+Should each agent type be highly specialized (depth) or have overlapping capabilities (breadth)?
+
+### Analysis
+
+**Specialization Benefits:**
+- Deeper expertise in domain
+- Cleaner system prompts
+- Less confusion about responsibilities
+- Easier to optimize prompts per role
+
+**Generalization Benefits:**
+- Fewer agent types to maintain
+- Smoother handoffs (shared context)
+- More flexible team composition
+- Graceful degradation if agent unavailable
+
+**Current Agent Types (10):**
+| Role | Primary Domain | Potential Overlap |
+|------|---------------|-------------------|
+| Product Owner | Requirements | Business Analyst |
+| Business Analyst | Documentation | Product Owner |
+| Project Manager | Planning | Product Owner |
+| Software Architect | Design | Senior Engineer |
+| Software Engineer | Coding | Architect, QA |
+| UI/UX Designer | Interface | Frontend Engineer |
+| QA Engineer | Testing | Software Engineer |
+| DevOps Engineer | Infrastructure | Senior Engineer |
+| AI/ML Engineer | ML/AI | Software Engineer |
+| Security Expert | Security | All |
+
+### Proposed Approach: Layered Specialization
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                   Agent Capability Layers                        │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  Layer 3: Role-Specific Expertise                                │
+│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐               │
+│  │ Product │ │ Architect│ │Engineer │ │   QA    │               │
+│  │ Owner   │ │         │ │         │ │         │               │
+│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘               │
+│       │           │           │           │                      │
+│  Layer 2: Shared Professional Skills                             │
+│  ┌──────────────────────────────────────────────────────┐       │
+│  │ Technical Communication | Code Understanding | Git   │       │
+│  │ Documentation | Research | Problem Decomposition     │       │
+│  └──────────────────────────────────────────────────────┘       │
+│                             │                                    │
+│  Layer 1: Foundation Model Capabilities                          │
+│  ┌──────────────────────────────────────────────────────┐       │
+│  │ Reasoning | Analysis | Writing | Coding (LLM Base)   │       │
+│  └──────────────────────────────────────────────────────┘       │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+**Capability Inheritance:**
+```python
+class AgentTypeBuilder:
+    """Builds agent types with layered capabilities."""
+
+    BASE_CAPABILITIES = [
+        "reasoning", "analysis", "writing", "coding_assist"
+    ]
+
+    PROFESSIONAL_SKILLS = [
+        "technical_communication", "code_understanding",
+        "git_operations", "documentation", "research"
+    ]
+
+    ROLE_SPECIFIC = {
+        "ENGINEER": ["code_generation", "code_review", "testing", "debugging"],
+        "ARCHITECT": ["system_design", "adr_writing", "tech_selection"],
+        "QA": ["test_planning", "test_automation", "bug_reporting"],
+        # ...
+    }
+
+    def build_capabilities(self, role: AgentRole) -> list[str]:
+        return (
+            self.BASE_CAPABILITIES +
+            self.PROFESSIONAL_SKILLS +
+            self.ROLE_SPECIFIC[role]
+        )
+```
+
+### Recommendation
+
+Adopt **layered specialization** where all agents share foundational and professional capabilities, with role-specific expertise on top. This enables smooth collaboration while maintaining clear responsibilities.
+
+---
+
+## 4. Human-Agent Collaboration Model
+
+### The Challenge
+
+Beyond approval gates, how do humans effectively collaborate with autonomous agents during active work?
+
+### Interaction Patterns
+
+| Pattern | Use Case | Frequency |
+|---------|----------|-----------|
+| **Approval** | Confirm before action | Per checkpoint |
+| **Guidance** | Steer direction | On-demand |
+| **Override** | Correct mistake | Rare |
+| **Pair Working** | Work together | Optional |
+| **Review** | Evaluate output | Post-completion |
+
+### Proposed Collaboration Interface
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                Human-Agent Collaboration Dashboard               │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │                    Activity Stream                       │   │
+│  │  ────────────────────────────────────────────────────── │   │
+│  │  [10:23] Dave (Engineer) is implementing login API      │   │
+│  │  [10:24] Dave created auth/service.py                   │   │
+│  │  [10:25] Dave is writing unit tests                     │   │
+│  │  [LIVE] Dave: "I'm adding JWT validation. Using HS256..." │  │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │                   Intervention Panel                     │   │
+│  │                                                         │   │
+│  │  [💬 Chat]  [⏸️ Pause]  [↩️ Undo Last]  [📝 Guide]     │   │
+│  │                                                         │   │
+│  │  Quick Guidance:                                        │   │
+│  │  ┌─────────────────────────────────────────────────┐   │   │
+│  │  │ "Use RS256 instead of HS256 for JWT signing"    │   │   │
+│  │  │                                    [Send] 📤    │   │   │
+│  │  └─────────────────────────────────────────────────┘   │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+**Intervention API:**
+```python
+@router.post("/agents/{agent_id}/intervene")
+async def intervene(
+    agent_id: UUID,
+    intervention: InterventionRequest,
+    current_user: User = Depends(get_current_user)
+):
+    """Allow human to intervene in agent work."""
+    match intervention.type:
+        case "pause":
+            await orchestrator.pause_agent(agent_id)
+        case "resume":
+            await orchestrator.resume_agent(agent_id)
+        case "guide":
+            await orchestrator.send_guidance(agent_id, intervention.message)
+        case "undo":
+            await orchestrator.undo_last_action(agent_id)
+        case "override":
+            await orchestrator.override_decision(agent_id, intervention.decision)
+```
+
+### Recommendation
+
+Build a **real-time collaboration dashboard** with intervention capabilities. Humans should be able to observe, guide, pause, and correct agents without stopping the entire workflow.
+
+---
+
+## 5. Testing Strategy for Autonomous AI Systems
+
+### The Challenge
+
+Traditional testing (unit, integration, E2E) doesn't capture autonomous agent behavior. How do we ensure quality?
+
+### Testing Pyramid for AI Agents
+
+```
+                    ▲
+                   ╱ ╲
+                  ╱   ╲
+                 ╱ E2E ╲         Agent Scenarios
+                ╱ Agent ╲        (Full workflows)
+               ╱─────────╲
+              ╱ Integration╲     Tool + LLM Integration
+             ╱ (with mocks) ╲    (Deterministic responses)
+            ╱─────────────────╲
+           ╱  Unit Tests       ╲  Orchestrator, Services
+          ╱  (no LLM needed)    ╲ (Pure logic)
+         ╱───────────────────────╲
+        ╱   Prompt Testing         ╲ System prompt evaluation
+       ╱   (LLM evals)              ╲(Quality metrics)
+      ╱─────────────────────────────╲
+```
+
+### Test Categories
+
+**1. Prompt Testing (Eval Framework):**
+```python
+class PromptEvaluator:
+    """Evaluate system prompt quality."""
+
+    TEST_CASES = [
+        EvalCase(
+            name="requirement_extraction",
+            input="Client wants a mobile app for food delivery",
+            expected_behaviors=[
+                "asks clarifying questions",
+                "identifies stakeholders",
+                "considers non-functional requirements"
+            ]
+        ),
+        EvalCase(
+            name="code_review_thoroughness",
+            input="Review this PR: [vulnerable SQL code]",
+            expected_behaviors=[
+                "identifies SQL injection",
+                "suggests parameterized queries",
+                "mentions security best practices"
+            ]
+        )
+    ]
+
+    async def evaluate(self, agent_type: AgentType) -> EvalReport:
+        results = []
+        for case in self.TEST_CASES:
+            response = await self.llm.complete(
+                system=agent_type.system_prompt,
+                user=case.input
+            )
+            score = await self.judge_response(response, case.expected_behaviors)
+            results.append(score)
+        return EvalReport(results)
+```
+
+**2. Integration Testing (Mock LLM):**
+```python
+@pytest.fixture
+def mock_llm():
+    """Deterministic LLM responses for integration tests."""
+    responses = {
+        "analyze requirements": "...",
+        "generate code": "def hello(): return 'world'",
+        "review code": "LGTM"
+    }
+    return MockLLM(responses)
+
+async def test_story_implementation_workflow(mock_llm):
+    """Test full workflow with predictable responses."""
+    orchestrator = AgentOrchestrator(llm=mock_llm)
+
+    result = await orchestrator.execute_workflow(
+        workflow="implement_story",
+        inputs={"story_id": "TEST-123"}
+    )
+
+    assert result.status == "completed"
+    assert "hello" in result.artifacts["code"]
+```
+
+**3. Agent Scenario Testing:**
+```python
+class AgentScenarioTest:
+    """End-to-end agent behavior testing."""
+
+    @scenario("engineer_handles_bug_report")
+    async def test_bug_resolution(self):
+        """Engineer agent should fix bugs correctly."""
+        # Setup
+        project = await create_test_project()
+        engineer = await spawn_agent("engineer", project)
+
+        # Act
+        bug = await create_issue(
+            project,
+            title="Login button not working",
+            type="bug"
+        )
+        result = await engineer.handle(bug)
+
+        # Assert
+        assert result.pr_created
+        assert result.tests_pass
+        assert "button" in result.changes_summary.lower()
+```
+
+### Recommendation
+
+Implement a **multi-layer testing strategy** with prompt evals, deterministic integration tests, and scenario-based agent testing. Use LLM-as-judge for evaluating open-ended responses.
+
+---
+
+## 6. Rollback and Recovery
+
+### The Challenge
+
+Autonomous agents will make mistakes. How do we recover gracefully?
+
+### Error Categories
+
+| Category | Example | Recovery Strategy |
+|----------|---------|-------------------|
+| **Reversible** | Wrong code generated | Revert commit, regenerate |
+| **Partially Reversible** | Merged bad PR | Revert PR, fix, re-merge |
+| **Non-reversible** | Deployed to production | Forward-fix or rollback deploy |
+| **External Side Effects** | Email sent to client | Apology + correction |
+
+### Recovery Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    Recovery System                               │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │                    Action Log                            │   │
+│  │  ┌──────────────────────────────────────────────────┐   │   │
+│  │  │ Action ID | Agent | Type | Reversible | State    │   │   │
+│  │  ├──────────────────────────────────────────────────┤   │   │
+│  │  │ a-001 | Dave | commit | Yes | completed          │   │   │
+│  │  │ a-002 | Dave | push | Yes | completed            │   │   │
+│  │  │ a-003 | Dave | create_pr | Yes | completed       │   │   │
+│  │  │ a-004 | Kate | merge_pr | Partial | completed    │   │   │
+│  │  └──────────────────────────────────────────────────┘   │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │                 Rollback Engine                          │   │
+│  │                                                         │   │
+│  │  rollback_to(action_id) -> Reverses all actions after   │   │
+│  │  undo_action(action_id) -> Reverses single action       │   │
+│  │  compensate(action_id) -> Creates compensating action   │   │
+│  │                                                         │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+**Action Logging:**
+```python
+class ActionLog:
+    """Immutable log of all agent actions for recovery."""
+
+    async def record(
+        self,
+        agent_id: UUID,
+        action_type: str,
+        inputs: dict,
+        outputs: dict,
+        reversible: bool,
+        reverse_action: str | None = None
+    ) -> ActionRecord:
+        record = ActionRecord(
+            id=uuid4(),
+            agent_id=agent_id,
+            action_type=action_type,
+            inputs=inputs,
+            outputs=outputs,
+            reversible=reversible,
+            reverse_action=reverse_action,
+            timestamp=datetime.utcnow()
+        )
+        await self.db.add(record)
+        return record
+
+    async def rollback_to(self, action_id: UUID) -> RollbackResult:
+        """Rollback all actions after the given action."""
+        actions = await self.get_actions_after(action_id)
+
+        results = []
+        for action in reversed(actions):
+            if action.reversible:
+                result = await self._execute_reverse(action)
+                results.append(result)
+            else:
+                results.append(RollbackSkipped(action, reason="non-reversible"))
+
+        return RollbackResult(results)
+```
+
+**Compensation Pattern:**
+```python
+class CompensationEngine:
+    """Handles compensating actions for non-reversible operations."""
+
+    COMPENSATIONS = {
+        "email_sent": "send_correction_email",
+        "deployment": "rollback_deployment",
+        "external_api_call": "create_reversal_request"
+    }
+
+    async def compensate(self, action: ActionRecord) -> CompensationResult:
+        if action.action_type in self.COMPENSATIONS:
+            compensation = self.COMPENSATIONS[action.action_type]
+            return await self._execute_compensation(compensation, action)
+        else:
+            return CompensationResult(
+                status="manual_required",
+                message=f"No automatic compensation for {action.action_type}"
+            )
+```
+
+### Recommendation
+
+Implement **comprehensive action logging** with rollback capabilities. Define compensation strategies for non-reversible actions. Enable point-in-time recovery for project state.
+
+---
+
+## 7. Security Considerations for Autonomous Agents
+
+### Threat Model
+
+| Threat | Risk | Mitigation |
+|--------|------|------------|
+| Agent executes malicious code | High | Sandboxed execution, code review gates |
+| Agent exfiltrates data | High | Network isolation, output filtering |
+| Prompt injection via user input | Medium | Input sanitization, prompt hardening |
+| Agent credential abuse | Medium | Least-privilege tokens, short TTL |
+| Agent collusion | Low | Independent agent instances, monitoring |
+
+### Security Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    Security Layers                               │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  Layer 4: Output Filtering                                       │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │ - Code scan before commit                               │   │
+│  │ - Secrets detection                                      │   │
+│  │ - Policy compliance check                                │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  Layer 3: Action Authorization                                   │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │ - Role-based permissions                                │   │
+│  │ - Project scope enforcement                              │   │
+│  │ - Sensitive action approval                              │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  Layer 2: Input Sanitization                                     │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │ - Prompt injection detection                            │   │
+│  │ - Content filtering                                      │   │
+│  │ - Schema validation                                      │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  Layer 1: Infrastructure Isolation                               │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │ - Container sandboxing                                   │   │
+│  │ - Network segmentation                                   │   │
+│  │ - File system restrictions                               │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### Recommendation
+
+Implement **defense-in-depth** with multiple security layers. Assume agents can be compromised and design for containment.
+
+---
+
+## Summary of Recommendations
+
+| Area | Recommendation | Priority |
+|------|----------------|----------|
+| Memory | Tiered memory with context compression | High |
+| Knowledge | Privacy-aware extraction with human gate | Medium |
+| Specialization | Layered capabilities with role-specific top | Medium |
+| Collaboration | Real-time dashboard with intervention | High |
+| Testing | Multi-layer with prompt evals | High |
+| Recovery | Action logging with rollback engine | High |
+| Security | Defense-in-depth, assume compromise | High |
+
+---
+
+## Next Steps
+
+1. **Validate with spike research** - Update based on spike findings
+2. **Create detailed ADRs** - For memory, recovery, security
+3. **Prototype critical paths** - Memory system, rollback engine
+4. **Security review** - External audit before production
+
+---
+
+*This document captures architectural thinking to guide implementation. It should be updated as spikes complete and design evolves.*
--- a/docs/architecture/IMPLEMENTATION_ROADMAP.md
+++ b/docs/architecture/IMPLEMENTATION_ROADMAP.md
@@ -0,0 +1,339 @@
+# Syndarix Implementation Roadmap
+
+**Version:** 1.0
+**Date:** 2025-12-29
+**Status:** Draft
+
+---
+
+## Executive Summary
+
+This roadmap outlines the phased implementation approach for Syndarix, prioritizing foundational infrastructure before advanced features. Each phase builds upon the previous, with clear milestones and deliverables.
+
+---
+
+## Phase 0: Foundation (Weeks 1-2)
+**Goal:** Establish development infrastructure and basic platform
+
+### 0.1 Repository Setup
+- [x] Fork PragmaStack to Syndarix
+- [x] Create spike backlog in Gitea
+- [x] Complete architecture documentation
+- [ ] Rebrand codebase (Issue #13 - in progress)
+- [ ] Configure CI/CD pipelines
+- [ ] Set up development environment documentation
+
+### 0.2 Core Infrastructure
+- [ ] Configure Redis for cache + pub/sub
+- [ ] Set up Celery worker infrastructure
+- [ ] Configure pgvector extension
+- [ ] Create MCP server directory structure
+- [ ] Set up Docker Compose for local development
+
+### Deliverables
+- Fully branded Syndarix repository
+- Working local development environment
+- CI/CD pipeline running tests
+
+---
+
+## Phase 1: Core Platform (Weeks 3-6)
+**Goal:** Basic project and agent management without LLM integration
+
+### 1.1 Data Model
+- [ ] Create Project entity and CRUD
+- [ ] Create AgentType entity and CRUD
+- [ ] Create AgentInstance entity and CRUD
+- [ ] Create Issue entity with external tracker fields
+- [ ] Create Sprint entity and CRUD
+- [ ] Database migrations with Alembic
+
+### 1.2 API Layer
+- [ ] Project management endpoints
+- [ ] Agent type configuration endpoints
+- [ ] Agent instance management endpoints
+- [ ] Issue CRUD endpoints
+- [ ] Sprint management endpoints
+
+### 1.3 Real-time Infrastructure
+- [ ] Implement EventBus with Redis Pub/Sub
+- [ ] Create SSE endpoint for project events
+- [ ] Implement event types enum
+- [ ] Add keepalive mechanism
+- [ ] Client-side SSE handling
+
+### 1.4 Frontend Foundation
+- [ ] Project dashboard page
+- [ ] Agent configuration UI
+- [ ] Issue list and detail views
+- [ ] Real-time activity feed component
+- [ ] Basic navigation and layout
+
+### Deliverables
+- CRUD operations for all core entities
+- Real-time event streaming working
+- Basic admin UI for configuration
+
+---
+
+## Phase 2: MCP Integration (Weeks 7-10)
+**Goal:** Build MCP servers for external integrations
+
+### 2.1 MCP Client Infrastructure
+- [ ] Create MCPClientManager class
+- [ ] Implement server registry
+- [ ] Add connection management with reconnection
+- [ ] Create tool call routing
+
+### 2.2 LLM Gateway MCP (Priority 1)
+- [ ] Create FastMCP server structure
+- [ ] Implement LiteLLM integration
+- [ ] Add model group routing
+- [ ] Implement failover chain
+- [ ] Add cost tracking callbacks
+- [ ] Create token usage logging
+
+### 2.3 Knowledge Base MCP (Priority 2)
+- [ ] Create pgvector schema for embeddings
+- [ ] Implement document ingestion pipeline
+- [ ] Create chunking strategies (code, markdown, text)
+- [ ] Implement semantic search
+- [ ] Add hybrid search (vector + keyword)
+- [ ] Per-project collection isolation
+
+### 2.4 Git MCP (Priority 3)
+- [ ] Create Git operations wrapper
+- [ ] Implement clone, commit, push operations
+- [ ] Add branch management
+- [ ] Create PR operations
+- [ ] Add Gitea API integration
+- [ ] Implement GitHub/GitLab adapters
+
+### 2.5 Issues MCP (Priority 4)
+- [ ] Create issue sync service
+- [ ] Implement Gitea issue operations
+- [ ] Add GitHub issue adapter
+- [ ] Add GitLab issue adapter
+- [ ] Implement bi-directional sync
+- [ ] Create conflict resolution logic
+
+### Deliverables
+- 4 working MCP servers
+- LLM calls routed through gateway
+- RAG search functional
+- Git operations working
+- Issue sync with external trackers
+
+---
+
+## Phase 3: Agent Orchestration (Weeks 11-14)
+**Goal:** Enable agents to perform autonomous work
+
+### 3.1 Agent Runner
+- [ ] Create AgentRunner class
+- [ ] Implement context assembly
+- [ ] Add memory management (short-term, long-term)
+- [ ] Implement action execution
+- [ ] Add tool call handling
+- [ ] Create agent error handling
+
+### 3.2 Agent Orchestrator
+- [ ] Implement spawn_agent method
+- [ ] Create terminate_agent method
+- [ ] Implement send_message routing
+- [ ] Add broadcast functionality
+- [ ] Create agent status tracking
+- [ ] Implement agent recovery
+
+### 3.3 Inter-Agent Communication
+- [ ] Define message format schema
+- [ ] Implement message persistence
+- [ ] Create message routing logic
+- [ ] Add @mention parsing
+- [ ] Implement priority queues
+- [ ] Add conversation threading
+
+### 3.4 Background Task Integration
+- [ ] Create Celery task wrappers
+- [ ] Implement progress reporting
+- [ ] Add task chaining for workflows
+- [ ] Create agent queue routing
+- [ ] Implement task retry logic
+
+### Deliverables
+- Agents can be spawned and communicate
+- Agents can call MCP tools
+- Background tasks for long operations
+- Agent activity visible in real-time
+
+---
+
+## Phase 4: Workflow Engine (Weeks 15-18)
+**Goal:** Implement structured workflows for software delivery
+
+### 4.1 State Machine Foundation
+- [ ] Create workflow state machine base
+- [ ] Implement state persistence
+- [ ] Add transition validation
+- [ ] Create state history logging
+- [ ] Implement compensation patterns
+
+### 4.2 Core Workflows
+- [ ] Requirements Discovery workflow
+- [ ] Architecture Spike workflow
+- [ ] Sprint Planning workflow
+- [ ] Story Implementation workflow
+- [ ] Sprint Demo workflow
+
+### 4.3 Approval Gates
+- [ ] Create approval checkpoint system
+- [ ] Implement approval UI components
+- [ ] Add notification triggers
+- [ ] Create timeout handling
+- [ ] Implement escalation logic
+
+### 4.4 Autonomy Levels
+- [ ] Implement FULL_CONTROL mode
+- [ ] Implement MILESTONE mode
+- [ ] Implement AUTONOMOUS mode
+- [ ] Create autonomy configuration UI
+- [ ] Add per-action approval overrides
+
+### Deliverables
+- Structured workflows executing
+- Approval gates working
+- Autonomy levels configurable
+- Full sprint cycle possible
+
+---
+
+## Phase 5: Advanced Features (Weeks 19-22)
+**Goal:** Polish and production readiness
+
+### 5.1 Cost Management
+- [ ] Real-time cost tracking dashboard
+- [ ] Budget configuration per project
+- [ ] Alert threshold system
+- [ ] Cost optimization recommendations
+- [ ] Historical cost analytics
+
+### 5.2 Audit & Compliance
+- [ ] Comprehensive action logging
+- [ ] Audit trail viewer UI
+- [ ] Export functionality
+- [ ] Retention policy implementation
+- [ ] Compliance report generation
+
+### 5.3 Human-Agent Collaboration
+- [ ] Live activity dashboard
+- [ ] Intervention panel (pause, guide, undo)
+- [ ] Agent chat interface
+- [ ] Context inspector
+- [ ] Decision explainer
+
+### 5.4 Additional MCP Servers
+- [ ] File System MCP
+- [ ] Code Analysis MCP
+- [ ] CI/CD MCP
+
+### Deliverables
+- Production-ready system
+- Full observability
+- Cost controls active
+- Audit compliance
+
+---
+
+## Phase 6: Polish & Launch (Weeks 23-24)
+**Goal:** Production deployment
+
+### 6.1 Performance Optimization
+- [ ] Load testing
+- [ ] Query optimization
+- [ ] Caching optimization
+- [ ] Memory profiling
+
+### 6.2 Security Hardening
+- [ ] Security audit
+- [ ] Penetration testing
+- [ ] Secrets management
+- [ ] Rate limiting tuning
+
+### 6.3 Documentation
+- [ ] User documentation
+- [ ] API documentation
+- [ ] Deployment guide
+- [ ] Runbook
+
+### 6.4 Deployment
+- [ ] Production environment setup
+- [ ] Monitoring & alerting
+- [ ] Backup & recovery
+- [ ] Launch checklist
+
+---
+
+## Risk Register
+
+| Risk | Impact | Probability | Mitigation |
+|------|--------|-------------|------------|
+| LLM API outages | High | Medium | Multi-provider failover |
+| Cost overruns | High | Medium | Budget enforcement, local models |
+| Agent hallucinations | High | Medium | Approval gates, code review |
+| Performance bottlenecks | Medium | Medium | Load testing, caching |
+| Integration failures | Medium | Low | Contract testing, mocks |
+
+---
+
+## Success Metrics
+
+| Metric | Target | Measurement |
+|--------|--------|-------------|
+| Agent task success rate | >90% | Completed tasks / total tasks |
+| Response time (P95) | <2s | API latency |
+| Cost per project | <$50/sprint | LLM + compute costs |
+| Time to first commit | <1 hour | From requirements to PR |
+| Client satisfaction | >4/5 | Post-sprint survey |
+
+---
+
+## Dependencies
+
+```
+Phase 0 ─────▶ Phase 1 ─────▶ Phase 2 ─────▶ Phase 3 ─────▶ Phase 4 ─────▶ Phase 5 ─────▶ Phase 6
+Foundation    Core Platform   MCP Integration  Agent Orch    Workflows     Advanced       Launch
+                                                  │
+                                                  │
+                                             Depends on:
+                                             - LLM Gateway
+                                             - Knowledge Base
+                                             - Real-time events
+```
+
+---
+
+## Resource Requirements
+
+### Development Team
+- 1 Backend Engineer (Python/FastAPI)
+- 1 Frontend Engineer (React/Next.js)
+- 0.5 DevOps Engineer
+- 0.25 Product Manager
+
+### Infrastructure
+- PostgreSQL (managed or self-hosted)
+- Redis (managed or self-hosted)
+- Celery workers (2-4 instances)
+- MCP servers (7 containers)
+- API server (2+ instances)
+- Frontend (static hosting or SSR)
+
+### External Services
+- Anthropic API (primary LLM)
+- OpenAI API (fallback)
+- Ollama (local models, optional)
+- Gitea/GitHub/GitLab (issue tracking)
+
+---
+
+*This roadmap will be refined as spikes complete and requirements evolve.*