forked from cardosofelipe/fast-next-template

Files

Felipe Cardoso 5594655fba docs: add architecture spikes and deep analysis documentation

Add comprehensive spike research documents:
- SPIKE-002: Agent Orchestration Pattern (LangGraph + Temporal hybrid)
- SPIKE-006: Knowledge Base pgvector (RAG with hybrid search)
- SPIKE-007: Agent Communication Protocol (JSON-RPC + Redis Streams)
- SPIKE-008: Workflow State Machine (transitions lib + event sourcing)
- SPIKE-009: Issue Synchronization (bi-directional sync with conflict resolution)
- SPIKE-010: Cost Tracking (LiteLLM callbacks + budget enforcement)
- SPIKE-011: Audit Logging (structured event sourcing)
- SPIKE-012: Client Approval Flow (checkpoint-based approvals)

Add architecture documentation:
- ARCHITECTURE_DEEP_ANALYSIS.md: Memory management, security, testing strategy
- IMPLEMENTATION_ROADMAP.md: 6-phase, 24-week implementation plan

Closes #2, #6, #7, #8, #9, #10, #11, #12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-29 13:31:02 +01:00

34 KiB

Raw Permalink Blame History

Syndarix Architecture Deep Analysis

Version: 1.0 Date: 2025-12-29 Status: Draft - Architectural Thinking

Executive Summary

This document captures deep architectural thinking about Syndarix beyond the immediate spikes. It addresses complex challenges that arise when building a truly autonomous multi-agent system and proposes solutions based on first principles.

1. Agent Memory and Context Management

The Challenge

Agents in Syndarix may work on projects for weeks or months. LLM context windows are finite (128K-200K tokens), but project context grows unboundedly. How do we maintain coherent agent "memory" over time?

Analysis

Context Window Constraints:

Model	Context Window	Practical Limit (with tools)
Claude 3.5 Sonnet	200K tokens	~150K usable
GPT-4 Turbo	128K tokens	~100K usable
Llama 3 (70B)	8K-128K tokens	~80K usable

Memory Types Needed:

Working Memory - Current task context (fits in context window)
Short-term Memory - Recent conversation history (RAG-retrievable)
Long-term Memory - Project knowledge, past decisions (RAG + summarization)
Episodic Memory - Specific past events/mistakes to learn from

Proposed Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Agent Memory System                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐        │
│  │   Working    │   │  Short-term  │   │  Long-term   │        │
│  │   Memory     │   │   Memory     │   │   Memory     │        │
│  │  (Context)   │   │   (Redis)    │   │  (pgvector)  │        │
│  └──────┬───────┘   └──────┬───────┘   └──────┬───────┘        │
│         │                   │                  │                 │
│         └───────────────────┼──────────────────┘                │
│                             │                                    │
│                             ▼                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                  Context Assembler                        │  │
│  │                                                           │  │
│  │  1. System prompt (agent personality, role)               │  │
│  │  2. Project context (from long-term memory)               │  │
│  │  3. Task context (current issue, requirements)            │  │
│  │  4. Relevant history (from short-term memory)             │  │
│  │  5. User message                                          │  │
│  │                                                           │  │
│  │  Total: Fit within context window limits                  │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Context Compression Strategy:

class ContextManager:
    """Manages agent context to fit within LLM limits."""

    MAX_CONTEXT_TOKENS = 100_000  # Leave room for response

    async def build_context(
        self,
        agent: AgentInstance,
        task: Task,
        user_message: str
    ) -> list[Message]:
        # Fixed costs
        system_prompt = self._get_system_prompt(agent)  # ~2K tokens
        task_context = self._get_task_context(task)     # ~1K tokens

        # Variable budget
        remaining = self.MAX_CONTEXT_TOKENS - token_count(system_prompt, task_context, user_message)

        # Allocate remaining to memories
        long_term = await self._query_long_term(agent, task, budget=remaining * 0.4)
        short_term = await self._get_short_term(agent, budget=remaining * 0.4)
        episodic = await self._get_relevant_episodes(agent, task, budget=remaining * 0.2)

        return self._assemble_messages(
            system_prompt, task_context, long_term, short_term, episodic, user_message
        )

Conversation Summarization:

After every N turns (e.g., 10), summarize conversation and archive
Use smaller/cheaper model for summarization
Store summaries in pgvector for semantic retrieval

Recommendation

Implement a tiered memory system with automatic context compression and semantic retrieval. Use Redis for hot short-term memory, pgvector for cold long-term memory, and automatic summarization to prevent context overflow.

The Challenge

Each project has isolated knowledge, but agents could benefit from cross-project learnings:

Common patterns (authentication, testing, CI/CD)
Technology expertise (how to configure Kubernetes)
Anti-patterns (what didn't work before)

Analysis

Privacy Considerations:

Client data must remain isolated (contractual, legal)
Technical patterns are generally shareable
Need clear data classification

Knowledge Categories:

Category	Scope	Examples
Client Data	Project-only	Requirements, business logic, code
Technical Patterns	Global	Best practices, configurations
Agent Learnings	Global	What approaches worked/failed
Anti-patterns	Global	Common mistakes to avoid

Proposed Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Knowledge Graph                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                   GLOBAL KNOWLEDGE                       │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │   │
│  │  │  Patterns   │  │ Anti-patterns│  │  Expertise  │      │   │
│  │  │   Library   │  │   Library   │  │   Index     │      │   │
│  │  └─────────────┘  └─────────────┘  └─────────────┘      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                             ▲                                    │
│                             │ Curated extraction                 │
│                             │                                    │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │  Project A  │  │  Project B  │  │  Project C  │             │
│  │  Knowledge  │  │  Knowledge  │  │  Knowledge  │             │
│  │  (Isolated) │  │  (Isolated) │  │  (Isolated) │             │
│  └─────────────┘  └─────────────┘  └─────────────┘             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Knowledge Extraction Pipeline:

class KnowledgeExtractor:
    """Extracts shareable learnings from project work."""

    async def extract_learnings(self, project_id: str) -> list[Learning]:
        """
        Run periodically or after sprints to extract learnings.
        Human review required before promoting to global.
        """
        # Get completed work
        completed_issues = await self.get_completed_issues(project_id)

        # Extract patterns using LLM
        patterns = await self.llm.extract_patterns(
            completed_issues,
            categories=["architecture", "testing", "deployment", "security"]
        )

        # Classify privacy
        for pattern in patterns:
            pattern.privacy_level = await self.llm.classify_privacy(pattern)

        # Return only shareable patterns for review
        return [p for p in patterns if p.privacy_level == "public"]

Recommendation

Implement privacy-aware knowledge extraction with human review gate. Project knowledge stays isolated by default; only explicitly approved patterns flow to global knowledge.

3. Agent Specialization vs Generalization Trade-offs

The Challenge

Should each agent type be highly specialized (depth) or have overlapping capabilities (breadth)?

Analysis

Specialization Benefits:

Deeper expertise in domain
Cleaner system prompts
Less confusion about responsibilities
Easier to optimize prompts per role

Generalization Benefits:

Fewer agent types to maintain
Smoother handoffs (shared context)
More flexible team composition
Graceful degradation if agent unavailable

Current Agent Types (10):

Role	Primary Domain	Potential Overlap
Product Owner	Requirements	Business Analyst
Business Analyst	Documentation	Product Owner
Project Manager	Planning	Product Owner
Software Architect	Design	Senior Engineer
Software Engineer	Coding	Architect, QA
UI/UX Designer	Interface	Frontend Engineer
QA Engineer	Testing	Software Engineer
DevOps Engineer	Infrastructure	Senior Engineer
AI/ML Engineer	ML/AI	Software Engineer
Security Expert	Security	All

Proposed Approach: Layered Specialization

┌─────────────────────────────────────────────────────────────────┐
│                   Agent Capability Layers                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Layer 3: Role-Specific Expertise                                │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐               │
│  │ Product │ │ Architect│ │Engineer │ │   QA    │               │
│  │ Owner   │ │         │ │         │ │         │               │
│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘               │
│       │           │           │           │                      │
│  Layer 2: Shared Professional Skills                             │
│  ┌──────────────────────────────────────────────────────┐       │
│  │ Technical Communication | Code Understanding | Git   │       │
│  │ Documentation | Research | Problem Decomposition     │       │
│  └──────────────────────────────────────────────────────┘       │
│                             │                                    │
│  Layer 1: Foundation Model Capabilities                          │
│  ┌──────────────────────────────────────────────────────┐       │
│  │ Reasoning | Analysis | Writing | Coding (LLM Base)   │       │
│  └──────────────────────────────────────────────────────┘       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Capability Inheritance:

class AgentTypeBuilder:
    """Builds agent types with layered capabilities."""

    BASE_CAPABILITIES = [
        "reasoning", "analysis", "writing", "coding_assist"
    ]

    PROFESSIONAL_SKILLS = [
        "technical_communication", "code_understanding",
        "git_operations", "documentation", "research"
    ]

    ROLE_SPECIFIC = {
        "ENGINEER": ["code_generation", "code_review", "testing", "debugging"],
        "ARCHITECT": ["system_design", "adr_writing", "tech_selection"],
        "QA": ["test_planning", "test_automation", "bug_reporting"],
        # ...
    }

    def build_capabilities(self, role: AgentRole) -> list[str]:
        return (
            self.BASE_CAPABILITIES +
            self.PROFESSIONAL_SKILLS +
            self.ROLE_SPECIFIC[role]
        )

Recommendation

Adopt layered specialization where all agents share foundational and professional capabilities, with role-specific expertise on top. This enables smooth collaboration while maintaining clear responsibilities.

4. Human-Agent Collaboration Model

The Challenge

Beyond approval gates, how do humans effectively collaborate with autonomous agents during active work?

Interaction Patterns

Pattern	Use Case	Frequency
Approval	Confirm before action	Per checkpoint
Guidance	Steer direction	On-demand
Override	Correct mistake	Rare
Pair Working	Work together	Optional
Review	Evaluate output	Post-completion

Proposed Collaboration Interface

┌─────────────────────────────────────────────────────────────────┐
│                Human-Agent Collaboration Dashboard               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Activity Stream                       │   │
│  │  ────────────────────────────────────────────────────── │   │
│  │  [10:23] Dave (Engineer) is implementing login API      │   │
│  │  [10:24] Dave created auth/service.py                   │   │
│  │  [10:25] Dave is writing unit tests                     │   │
│  │  [LIVE] Dave: "I'm adding JWT validation. Using HS256..." │  │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                   Intervention Panel                     │   │
│  │                                                         │   │
│  │  [💬 Chat]  [⏸️ Pause]  [↩️ Undo Last]  [📝 Guide]     │   │
│  │                                                         │   │
│  │  Quick Guidance:                                        │   │
│  │  ┌─────────────────────────────────────────────────┐   │   │
│  │  │ "Use RS256 instead of HS256 for JWT signing"    │   │   │
│  │  │                                    [Send] 📤    │   │   │
│  │  └─────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Intervention API:

@router.post("/agents/{agent_id}/intervene")
async def intervene(
    agent_id: UUID,
    intervention: InterventionRequest,
    current_user: User = Depends(get_current_user)
):
    """Allow human to intervene in agent work."""
    match intervention.type:
        case "pause":
            await orchestrator.pause_agent(agent_id)
        case "resume":
            await orchestrator.resume_agent(agent_id)
        case "guide":
            await orchestrator.send_guidance(agent_id, intervention.message)
        case "undo":
            await orchestrator.undo_last_action(agent_id)
        case "override":
            await orchestrator.override_decision(agent_id, intervention.decision)

Recommendation

Build a real-time collaboration dashboard with intervention capabilities. Humans should be able to observe, guide, pause, and correct agents without stopping the entire workflow.

5. Testing Strategy for Autonomous AI Systems

The Challenge

Traditional testing (unit, integration, E2E) doesn't capture autonomous agent behavior. How do we ensure quality?

Testing Pyramid for AI Agents

                    ▲
                   ╱ ╲
                  ╱   ╲
                 ╱ E2E ╲         Agent Scenarios
                ╱ Agent ╲        (Full workflows)
               ╱─────────╲
              ╱ Integration╲     Tool + LLM Integration
             ╱ (with mocks) ╲    (Deterministic responses)
            ╱─────────────────╲
           ╱  Unit Tests       ╲  Orchestrator, Services
          ╱  (no LLM needed)    ╲ (Pure logic)
         ╱───────────────────────╲
        ╱   Prompt Testing         ╲ System prompt evaluation
       ╱   (LLM evals)              ╲(Quality metrics)
      ╱─────────────────────────────╲

Test Categories

1. Prompt Testing (Eval Framework):

class PromptEvaluator:
    """Evaluate system prompt quality."""

    TEST_CASES = [
        EvalCase(
            name="requirement_extraction",
            input="Client wants a mobile app for food delivery",
            expected_behaviors=[
                "asks clarifying questions",
                "identifies stakeholders",
                "considers non-functional requirements"
            ]
        ),
        EvalCase(
            name="code_review_thoroughness",
            input="Review this PR: [vulnerable SQL code]",
            expected_behaviors=[
                "identifies SQL injection",
                "suggests parameterized queries",
                "mentions security best practices"
            ]
        )
    ]

    async def evaluate(self, agent_type: AgentType) -> EvalReport:
        results = []
        for case in self.TEST_CASES:
            response = await self.llm.complete(
                system=agent_type.system_prompt,
                user=case.input
            )
            score = await self.judge_response(response, case.expected_behaviors)
            results.append(score)
        return EvalReport(results)

2. Integration Testing (Mock LLM):

@pytest.fixture
def mock_llm():
    """Deterministic LLM responses for integration tests."""
    responses = {
        "analyze requirements": "...",
        "generate code": "def hello(): return 'world'",
        "review code": "LGTM"
    }
    return MockLLM(responses)

async def test_story_implementation_workflow(mock_llm):
    """Test full workflow with predictable responses."""
    orchestrator = AgentOrchestrator(llm=mock_llm)

    result = await orchestrator.execute_workflow(
        workflow="implement_story",
        inputs={"story_id": "TEST-123"}
    )

    assert result.status == "completed"
    assert "hello" in result.artifacts["code"]

3. Agent Scenario Testing:

class AgentScenarioTest:
    """End-to-end agent behavior testing."""

    @scenario("engineer_handles_bug_report")
    async def test_bug_resolution(self):
        """Engineer agent should fix bugs correctly."""
        # Setup
        project = await create_test_project()
        engineer = await spawn_agent("engineer", project)

        # Act
        bug = await create_issue(
            project,
            title="Login button not working",
            type="bug"
        )
        result = await engineer.handle(bug)

        # Assert
        assert result.pr_created
        assert result.tests_pass
        assert "button" in result.changes_summary.lower()

Recommendation

Implement a multi-layer testing strategy with prompt evals, deterministic integration tests, and scenario-based agent testing. Use LLM-as-judge for evaluating open-ended responses.

6. Rollback and Recovery

The Challenge

Autonomous agents will make mistakes. How do we recover gracefully?

Error Categories

Category	Example	Recovery Strategy
Reversible	Wrong code generated	Revert commit, regenerate
Partially Reversible	Merged bad PR	Revert PR, fix, re-merge
Non-reversible	Deployed to production	Forward-fix or rollback deploy
External Side Effects	Email sent to client	Apology + correction

Recovery Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Recovery System                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Action Log                            │   │
│  │  ┌──────────────────────────────────────────────────┐   │   │
│  │  │ Action ID | Agent | Type | Reversible | State    │   │   │
│  │  ├──────────────────────────────────────────────────┤   │   │
│  │  │ a-001 | Dave | commit | Yes | completed          │   │   │
│  │  │ a-002 | Dave | push | Yes | completed            │   │   │
│  │  │ a-003 | Dave | create_pr | Yes | completed       │   │   │
│  │  │ a-004 | Kate | merge_pr | Partial | completed    │   │   │
│  │  └──────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                 Rollback Engine                          │   │
│  │                                                         │   │
│  │  rollback_to(action_id) -> Reverses all actions after   │   │
│  │  undo_action(action_id) -> Reverses single action       │   │
│  │  compensate(action_id) -> Creates compensating action   │   │
│  │                                                         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Action Logging:

class ActionLog:
    """Immutable log of all agent actions for recovery."""

    async def record(
        self,
        agent_id: UUID,
        action_type: str,
        inputs: dict,
        outputs: dict,
        reversible: bool,
        reverse_action: str | None = None
    ) -> ActionRecord:
        record = ActionRecord(
            id=uuid4(),
            agent_id=agent_id,
            action_type=action_type,
            inputs=inputs,
            outputs=outputs,
            reversible=reversible,
            reverse_action=reverse_action,
            timestamp=datetime.utcnow()
        )
        await self.db.add(record)
        return record

    async def rollback_to(self, action_id: UUID) -> RollbackResult:
        """Rollback all actions after the given action."""
        actions = await self.get_actions_after(action_id)

        results = []
        for action in reversed(actions):
            if action.reversible:
                result = await self._execute_reverse(action)
                results.append(result)
            else:
                results.append(RollbackSkipped(action, reason="non-reversible"))

        return RollbackResult(results)

Compensation Pattern:

class CompensationEngine:
    """Handles compensating actions for non-reversible operations."""

    COMPENSATIONS = {
        "email_sent": "send_correction_email",
        "deployment": "rollback_deployment",
        "external_api_call": "create_reversal_request"
    }

    async def compensate(self, action: ActionRecord) -> CompensationResult:
        if action.action_type in self.COMPENSATIONS:
            compensation = self.COMPENSATIONS[action.action_type]
            return await self._execute_compensation(compensation, action)
        else:
            return CompensationResult(
                status="manual_required",
                message=f"No automatic compensation for {action.action_type}"
            )

Recommendation

Implement comprehensive action logging with rollback capabilities. Define compensation strategies for non-reversible actions. Enable point-in-time recovery for project state.

7. Security Considerations for Autonomous Agents

Threat Model

Threat	Risk	Mitigation
Agent executes malicious code	High	Sandboxed execution, code review gates
Agent exfiltrates data	High	Network isolation, output filtering
Prompt injection via user input	Medium	Input sanitization, prompt hardening
Agent credential abuse	Medium	Least-privilege tokens, short TTL
Agent collusion	Low	Independent agent instances, monitoring

Security Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Security Layers                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Layer 4: Output Filtering                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ - Code scan before commit                               │   │
│  │ - Secrets detection                                      │   │
│  │ - Policy compliance check                                │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Layer 3: Action Authorization                                   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ - Role-based permissions                                │   │
│  │ - Project scope enforcement                              │   │
│  │ - Sensitive action approval                              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Layer 2: Input Sanitization                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ - Prompt injection detection                            │   │
│  │ - Content filtering                                      │   │
│  │ - Schema validation                                      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Layer 1: Infrastructure Isolation                               │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ - Container sandboxing                                   │   │
│  │ - Network segmentation                                   │   │
│  │ - File system restrictions                               │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Recommendation

Implement defense-in-depth with multiple security layers. Assume agents can be compromised and design for containment.

Summary of Recommendations

Area	Recommendation	Priority
Memory	Tiered memory with context compression	High
Knowledge	Privacy-aware extraction with human gate	Medium
Specialization	Layered capabilities with role-specific top	Medium
Collaboration	Real-time dashboard with intervention	High
Testing	Multi-layer with prompt evals	High
Recovery	Action logging with rollback engine	High
Security	Defense-in-depth, assume compromise	High

Next Steps

Validate with spike research - Update based on spike findings
Create detailed ADRs - For memory, recovery, security
Prototype critical paths - Memory system, rollback engine
Security review - External audit before production

This document captures architectural thinking to guide implementation. It should be updated as spikes complete and design evolves.

34 KiB Raw Permalink Blame History Unescape Escape

Syndarix Architecture Deep Analysis

Executive Summary

1. Agent Memory and Context Management

The Challenge

Analysis

Proposed Architecture

Recommendation

2. Cross-Project Knowledge Sharing

The Challenge

Analysis

Proposed Architecture

Recommendation

3. Agent Specialization vs Generalization Trade-offs

The Challenge

Analysis

Proposed Approach: Layered Specialization

Recommendation

4. Human-Agent Collaboration Model

The Challenge

Interaction Patterns

Proposed Collaboration Interface

Recommendation

5. Testing Strategy for Autonomous AI Systems

The Challenge

Testing Pyramid for AI Agents

Test Categories

Recommendation

6. Rollback and Recovery

The Challenge

Error Categories

Recovery Architecture

Recommendation

7. Security Considerations for Autonomous Agents

Threat Model

Security Architecture

Recommendation

Summary of Recommendations

Next Steps

34 KiB

Raw Permalink Blame History