Files
syndarix/docs/architecture/ARCHITECTURE_DEEP_ANALYSIS.md
Felipe Cardoso 5594655fba docs: add architecture spikes and deep analysis documentation
Add comprehensive spike research documents:
- SPIKE-002: Agent Orchestration Pattern (LangGraph + Temporal hybrid)
- SPIKE-006: Knowledge Base pgvector (RAG with hybrid search)
- SPIKE-007: Agent Communication Protocol (JSON-RPC + Redis Streams)
- SPIKE-008: Workflow State Machine (transitions lib + event sourcing)
- SPIKE-009: Issue Synchronization (bi-directional sync with conflict resolution)
- SPIKE-010: Cost Tracking (LiteLLM callbacks + budget enforcement)
- SPIKE-011: Audit Logging (structured event sourcing)
- SPIKE-012: Client Approval Flow (checkpoint-based approvals)

Add architecture documentation:
- ARCHITECTURE_DEEP_ANALYSIS.md: Memory management, security, testing strategy
- IMPLEMENTATION_ROADMAP.md: 6-phase, 24-week implementation plan

Closes #2, #6, #7, #8, #9, #10, #11, #12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:31:02 +01:00

34 KiB
Raw Permalink Blame History

Syndarix Architecture Deep Analysis

Version: 1.0 Date: 2025-12-29 Status: Draft - Architectural Thinking


Executive Summary

This document captures deep architectural thinking about Syndarix beyond the immediate spikes. It addresses complex challenges that arise when building a truly autonomous multi-agent system and proposes solutions based on first principles.


1. Agent Memory and Context Management

The Challenge

Agents in Syndarix may work on projects for weeks or months. LLM context windows are finite (128K-200K tokens), but project context grows unboundedly. How do we maintain coherent agent "memory" over time?

Analysis

Context Window Constraints:

Model Context Window Practical Limit (with tools)
Claude 3.5 Sonnet 200K tokens ~150K usable
GPT-4 Turbo 128K tokens ~100K usable
Llama 3 (70B) 8K-128K tokens ~80K usable

Memory Types Needed:

  1. Working Memory - Current task context (fits in context window)
  2. Short-term Memory - Recent conversation history (RAG-retrievable)
  3. Long-term Memory - Project knowledge, past decisions (RAG + summarization)
  4. Episodic Memory - Specific past events/mistakes to learn from

Proposed Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Agent Memory System                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐        │
│  │   Working    │   │  Short-term  │   │  Long-term   │        │
│  │   Memory     │   │   Memory     │   │   Memory     │        │
│  │  (Context)   │   │   (Redis)    │   │  (pgvector)  │        │
│  └──────┬───────┘   └──────┬───────┘   └──────┬───────┘        │
│         │                   │                  │                 │
│         └───────────────────┼──────────────────┘                │
│                             │                                    │
│                             ▼                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                  Context Assembler                        │  │
│  │                                                           │  │
│  │  1. System prompt (agent personality, role)               │  │
│  │  2. Project context (from long-term memory)               │  │
│  │  3. Task context (current issue, requirements)            │  │
│  │  4. Relevant history (from short-term memory)             │  │
│  │  5. User message                                          │  │
│  │                                                           │  │
│  │  Total: Fit within context window limits                  │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Context Compression Strategy:

class ContextManager:
    """Manages agent context to fit within LLM limits."""

    MAX_CONTEXT_TOKENS = 100_000  # Leave room for response

    async def build_context(
        self,
        agent: AgentInstance,
        task: Task,
        user_message: str
    ) -> list[Message]:
        # Fixed costs
        system_prompt = self._get_system_prompt(agent)  # ~2K tokens
        task_context = self._get_task_context(task)     # ~1K tokens

        # Variable budget
        remaining = self.MAX_CONTEXT_TOKENS - token_count(system_prompt, task_context, user_message)

        # Allocate remaining to memories
        long_term = await self._query_long_term(agent, task, budget=remaining * 0.4)
        short_term = await self._get_short_term(agent, budget=remaining * 0.4)
        episodic = await self._get_relevant_episodes(agent, task, budget=remaining * 0.2)

        return self._assemble_messages(
            system_prompt, task_context, long_term, short_term, episodic, user_message
        )

Conversation Summarization:

  • After every N turns (e.g., 10), summarize conversation and archive
  • Use smaller/cheaper model for summarization
  • Store summaries in pgvector for semantic retrieval

Recommendation

Implement a tiered memory system with automatic context compression and semantic retrieval. Use Redis for hot short-term memory, pgvector for cold long-term memory, and automatic summarization to prevent context overflow.


2. Cross-Project Knowledge Sharing

The Challenge

Each project has isolated knowledge, but agents could benefit from cross-project learnings:

  • Common patterns (authentication, testing, CI/CD)
  • Technology expertise (how to configure Kubernetes)
  • Anti-patterns (what didn't work before)

Analysis

Privacy Considerations:

  • Client data must remain isolated (contractual, legal)
  • Technical patterns are generally shareable
  • Need clear data classification

Knowledge Categories:

Category Scope Examples
Client Data Project-only Requirements, business logic, code
Technical Patterns Global Best practices, configurations
Agent Learnings Global What approaches worked/failed
Anti-patterns Global Common mistakes to avoid

Proposed Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Knowledge Graph                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                   GLOBAL KNOWLEDGE                       │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │   │
│  │  │  Patterns   │  │ Anti-patterns│  │  Expertise  │      │   │
│  │  │   Library   │  │   Library   │  │   Index     │      │   │
│  │  └─────────────┘  └─────────────┘  └─────────────┘      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                             ▲                                    │
│                             │ Curated extraction                 │
│                             │                                    │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │  Project A  │  │  Project B  │  │  Project C  │             │
│  │  Knowledge  │  │  Knowledge  │  │  Knowledge  │             │
│  │  (Isolated) │  │  (Isolated) │  │  (Isolated) │             │
│  └─────────────┘  └─────────────┘  └─────────────┘             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Knowledge Extraction Pipeline:

class KnowledgeExtractor:
    """Extracts shareable learnings from project work."""

    async def extract_learnings(self, project_id: str) -> list[Learning]:
        """
        Run periodically or after sprints to extract learnings.
        Human review required before promoting to global.
        """
        # Get completed work
        completed_issues = await self.get_completed_issues(project_id)

        # Extract patterns using LLM
        patterns = await self.llm.extract_patterns(
            completed_issues,
            categories=["architecture", "testing", "deployment", "security"]
        )

        # Classify privacy
        for pattern in patterns:
            pattern.privacy_level = await self.llm.classify_privacy(pattern)

        # Return only shareable patterns for review
        return [p for p in patterns if p.privacy_level == "public"]

Recommendation

Implement privacy-aware knowledge extraction with human review gate. Project knowledge stays isolated by default; only explicitly approved patterns flow to global knowledge.


3. Agent Specialization vs Generalization Trade-offs

The Challenge

Should each agent type be highly specialized (depth) or have overlapping capabilities (breadth)?

Analysis

Specialization Benefits:

  • Deeper expertise in domain
  • Cleaner system prompts
  • Less confusion about responsibilities
  • Easier to optimize prompts per role

Generalization Benefits:

  • Fewer agent types to maintain
  • Smoother handoffs (shared context)
  • More flexible team composition
  • Graceful degradation if agent unavailable

Current Agent Types (10):

Role Primary Domain Potential Overlap
Product Owner Requirements Business Analyst
Business Analyst Documentation Product Owner
Project Manager Planning Product Owner
Software Architect Design Senior Engineer
Software Engineer Coding Architect, QA
UI/UX Designer Interface Frontend Engineer
QA Engineer Testing Software Engineer
DevOps Engineer Infrastructure Senior Engineer
AI/ML Engineer ML/AI Software Engineer
Security Expert Security All

Proposed Approach: Layered Specialization

┌─────────────────────────────────────────────────────────────────┐
│                   Agent Capability Layers                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Layer 3: Role-Specific Expertise                                │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐               │
│  │ Product │ │ Architect│ │Engineer │ │   QA    │               │
│  │ Owner   │ │         │ │         │ │         │               │
│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘               │
│       │           │           │           │                      │
│  Layer 2: Shared Professional Skills                             │
│  ┌──────────────────────────────────────────────────────┐       │
│  │ Technical Communication | Code Understanding | Git   │       │
│  │ Documentation | Research | Problem Decomposition     │       │
│  └──────────────────────────────────────────────────────┘       │
│                             │                                    │
│  Layer 1: Foundation Model Capabilities                          │
│  ┌──────────────────────────────────────────────────────┐       │
│  │ Reasoning | Analysis | Writing | Coding (LLM Base)   │       │
│  └──────────────────────────────────────────────────────┘       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Capability Inheritance:

class AgentTypeBuilder:
    """Builds agent types with layered capabilities."""

    BASE_CAPABILITIES = [
        "reasoning", "analysis", "writing", "coding_assist"
    ]

    PROFESSIONAL_SKILLS = [
        "technical_communication", "code_understanding",
        "git_operations", "documentation", "research"
    ]

    ROLE_SPECIFIC = {
        "ENGINEER": ["code_generation", "code_review", "testing", "debugging"],
        "ARCHITECT": ["system_design", "adr_writing", "tech_selection"],
        "QA": ["test_planning", "test_automation", "bug_reporting"],
        # ...
    }

    def build_capabilities(self, role: AgentRole) -> list[str]:
        return (
            self.BASE_CAPABILITIES +
            self.PROFESSIONAL_SKILLS +
            self.ROLE_SPECIFIC[role]
        )

Recommendation

Adopt layered specialization where all agents share foundational and professional capabilities, with role-specific expertise on top. This enables smooth collaboration while maintaining clear responsibilities.


4. Human-Agent Collaboration Model

The Challenge

Beyond approval gates, how do humans effectively collaborate with autonomous agents during active work?

Interaction Patterns

Pattern Use Case Frequency
Approval Confirm before action Per checkpoint
Guidance Steer direction On-demand
Override Correct mistake Rare
Pair Working Work together Optional
Review Evaluate output Post-completion

Proposed Collaboration Interface

┌─────────────────────────────────────────────────────────────────┐
│                Human-Agent Collaboration Dashboard               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Activity Stream                       │   │
│  │  ────────────────────────────────────────────────────── │   │
│  │  [10:23] Dave (Engineer) is implementing login API      │   │
│  │  [10:24] Dave created auth/service.py                   │   │
│  │  [10:25] Dave is writing unit tests                     │   │
│  │  [LIVE] Dave: "I'm adding JWT validation. Using HS256..." │  │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                   Intervention Panel                     │   │
│  │                                                         │   │
│  │  [💬 Chat]  [⏸️ Pause]  [↩️ Undo Last]  [📝 Guide]     │   │
│  │                                                         │   │
│  │  Quick Guidance:                                        │   │
│  │  ┌─────────────────────────────────────────────────┐   │   │
│  │  │ "Use RS256 instead of HS256 for JWT signing"    │   │   │
│  │  │                                    [Send] 📤    │   │   │
│  │  └─────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Intervention API:

@router.post("/agents/{agent_id}/intervene")
async def intervene(
    agent_id: UUID,
    intervention: InterventionRequest,
    current_user: User = Depends(get_current_user)
):
    """Allow human to intervene in agent work."""
    match intervention.type:
        case "pause":
            await orchestrator.pause_agent(agent_id)
        case "resume":
            await orchestrator.resume_agent(agent_id)
        case "guide":
            await orchestrator.send_guidance(agent_id, intervention.message)
        case "undo":
            await orchestrator.undo_last_action(agent_id)
        case "override":
            await orchestrator.override_decision(agent_id, intervention.decision)

Recommendation

Build a real-time collaboration dashboard with intervention capabilities. Humans should be able to observe, guide, pause, and correct agents without stopping the entire workflow.


5. Testing Strategy for Autonomous AI Systems

The Challenge

Traditional testing (unit, integration, E2E) doesn't capture autonomous agent behavior. How do we ensure quality?

Testing Pyramid for AI Agents

 E2E ╲         Agent Scenarios
                 Agent ╲        (Full workflows)
               ╱─────────╲
               Integration╲     Tool + LLM Integration
              (with mocks) ╲    (Deterministic responses)
            ╱─────────────────╲
             Unit Tests       ╲  Orchestrator, Services
            (no LLM needed)    ╲ (Pure logic)
         ╱───────────────────────╲
           Prompt Testing         ╲ System prompt evaluation
          (LLM evals)              ╲(Quality metrics)
      ╱─────────────────────────────╲

Test Categories

1. Prompt Testing (Eval Framework):

class PromptEvaluator:
    """Evaluate system prompt quality."""

    TEST_CASES = [
        EvalCase(
            name="requirement_extraction",
            input="Client wants a mobile app for food delivery",
            expected_behaviors=[
                "asks clarifying questions",
                "identifies stakeholders",
                "considers non-functional requirements"
            ]
        ),
        EvalCase(
            name="code_review_thoroughness",
            input="Review this PR: [vulnerable SQL code]",
            expected_behaviors=[
                "identifies SQL injection",
                "suggests parameterized queries",
                "mentions security best practices"
            ]
        )
    ]

    async def evaluate(self, agent_type: AgentType) -> EvalReport:
        results = []
        for case in self.TEST_CASES:
            response = await self.llm.complete(
                system=agent_type.system_prompt,
                user=case.input
            )
            score = await self.judge_response(response, case.expected_behaviors)
            results.append(score)
        return EvalReport(results)

2. Integration Testing (Mock LLM):

@pytest.fixture
def mock_llm():
    """Deterministic LLM responses for integration tests."""
    responses = {
        "analyze requirements": "...",
        "generate code": "def hello(): return 'world'",
        "review code": "LGTM"
    }
    return MockLLM(responses)

async def test_story_implementation_workflow(mock_llm):
    """Test full workflow with predictable responses."""
    orchestrator = AgentOrchestrator(llm=mock_llm)

    result = await orchestrator.execute_workflow(
        workflow="implement_story",
        inputs={"story_id": "TEST-123"}
    )

    assert result.status == "completed"
    assert "hello" in result.artifacts["code"]

3. Agent Scenario Testing:

class AgentScenarioTest:
    """End-to-end agent behavior testing."""

    @scenario("engineer_handles_bug_report")
    async def test_bug_resolution(self):
        """Engineer agent should fix bugs correctly."""
        # Setup
        project = await create_test_project()
        engineer = await spawn_agent("engineer", project)

        # Act
        bug = await create_issue(
            project,
            title="Login button not working",
            type="bug"
        )
        result = await engineer.handle(bug)

        # Assert
        assert result.pr_created
        assert result.tests_pass
        assert "button" in result.changes_summary.lower()

Recommendation

Implement a multi-layer testing strategy with prompt evals, deterministic integration tests, and scenario-based agent testing. Use LLM-as-judge for evaluating open-ended responses.


6. Rollback and Recovery

The Challenge

Autonomous agents will make mistakes. How do we recover gracefully?

Error Categories

Category Example Recovery Strategy
Reversible Wrong code generated Revert commit, regenerate
Partially Reversible Merged bad PR Revert PR, fix, re-merge
Non-reversible Deployed to production Forward-fix or rollback deploy
External Side Effects Email sent to client Apology + correction

Recovery Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Recovery System                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Action Log                            │   │
│  │  ┌──────────────────────────────────────────────────┐   │   │
│  │  │ Action ID | Agent | Type | Reversible | State    │   │   │
│  │  ├──────────────────────────────────────────────────┤   │   │
│  │  │ a-001 | Dave | commit | Yes | completed          │   │   │
│  │  │ a-002 | Dave | push | Yes | completed            │   │   │
│  │  │ a-003 | Dave | create_pr | Yes | completed       │   │   │
│  │  │ a-004 | Kate | merge_pr | Partial | completed    │   │   │
│  │  └──────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                 Rollback Engine                          │   │
│  │                                                         │   │
│  │  rollback_to(action_id) -> Reverses all actions after   │   │
│  │  undo_action(action_id) -> Reverses single action       │   │
│  │  compensate(action_id) -> Creates compensating action   │   │
│  │                                                         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Action Logging:

class ActionLog:
    """Immutable log of all agent actions for recovery."""

    async def record(
        self,
        agent_id: UUID,
        action_type: str,
        inputs: dict,
        outputs: dict,
        reversible: bool,
        reverse_action: str | None = None
    ) -> ActionRecord:
        record = ActionRecord(
            id=uuid4(),
            agent_id=agent_id,
            action_type=action_type,
            inputs=inputs,
            outputs=outputs,
            reversible=reversible,
            reverse_action=reverse_action,
            timestamp=datetime.utcnow()
        )
        await self.db.add(record)
        return record

    async def rollback_to(self, action_id: UUID) -> RollbackResult:
        """Rollback all actions after the given action."""
        actions = await self.get_actions_after(action_id)

        results = []
        for action in reversed(actions):
            if action.reversible:
                result = await self._execute_reverse(action)
                results.append(result)
            else:
                results.append(RollbackSkipped(action, reason="non-reversible"))

        return RollbackResult(results)

Compensation Pattern:

class CompensationEngine:
    """Handles compensating actions for non-reversible operations."""

    COMPENSATIONS = {
        "email_sent": "send_correction_email",
        "deployment": "rollback_deployment",
        "external_api_call": "create_reversal_request"
    }

    async def compensate(self, action: ActionRecord) -> CompensationResult:
        if action.action_type in self.COMPENSATIONS:
            compensation = self.COMPENSATIONS[action.action_type]
            return await self._execute_compensation(compensation, action)
        else:
            return CompensationResult(
                status="manual_required",
                message=f"No automatic compensation for {action.action_type}"
            )

Recommendation

Implement comprehensive action logging with rollback capabilities. Define compensation strategies for non-reversible actions. Enable point-in-time recovery for project state.


7. Security Considerations for Autonomous Agents

Threat Model

Threat Risk Mitigation
Agent executes malicious code High Sandboxed execution, code review gates
Agent exfiltrates data High Network isolation, output filtering
Prompt injection via user input Medium Input sanitization, prompt hardening
Agent credential abuse Medium Least-privilege tokens, short TTL
Agent collusion Low Independent agent instances, monitoring

Security Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Security Layers                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Layer 4: Output Filtering                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ - Code scan before commit                               │   │
│  │ - Secrets detection                                      │   │
│  │ - Policy compliance check                                │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Layer 3: Action Authorization                                   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ - Role-based permissions                                │   │
│  │ - Project scope enforcement                              │   │
│  │ - Sensitive action approval                              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Layer 2: Input Sanitization                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ - Prompt injection detection                            │   │
│  │ - Content filtering                                      │   │
│  │ - Schema validation                                      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Layer 1: Infrastructure Isolation                               │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ - Container sandboxing                                   │   │
│  │ - Network segmentation                                   │   │
│  │ - File system restrictions                               │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Recommendation

Implement defense-in-depth with multiple security layers. Assume agents can be compromised and design for containment.


Summary of Recommendations

Area Recommendation Priority
Memory Tiered memory with context compression High
Knowledge Privacy-aware extraction with human gate Medium
Specialization Layered capabilities with role-specific top Medium
Collaboration Real-time dashboard with intervention High
Testing Multi-layer with prompt evals High
Recovery Action logging with rollback engine High
Security Defense-in-depth, assume compromise High

Next Steps

  1. Validate with spike research - Update based on spike findings
  2. Create detailed ADRs - For memory, recovery, security
  3. Prototype critical paths - Memory system, rollback engine
  4. Security review - External audit before production

This document captures architectural thinking to guide implementation. It should be updated as spikes complete and design evolves.