docs: add architecture spikes and deep analysis documentation

Add comprehensive spike research documents:
- SPIKE-002: Agent Orchestration Pattern (LangGraph + Temporal hybrid)
- SPIKE-006: Knowledge Base pgvector (RAG with hybrid search)
- SPIKE-007: Agent Communication Protocol (JSON-RPC + Redis Streams)
- SPIKE-008: Workflow State Machine (transitions lib + event sourcing)
- SPIKE-009: Issue Synchronization (bi-directional sync with conflict resolution)
- SPIKE-010: Cost Tracking (LiteLLM callbacks + budget enforcement)
- SPIKE-011: Audit Logging (structured event sourcing)
- SPIKE-012: Client Approval Flow (checkpoint-based approvals)

Add architecture documentation:
- ARCHITECTURE_DEEP_ANALYSIS.md: Memory management, security, testing strategy
- IMPLEMENTATION_ROADMAP.md: 6-phase, 24-week implementation plan

Closes #2, #6, #7, #8, #9, #10, #11, #12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-29 13:31:02 +01:00
parent ebd307cab4
commit 5594655fba
10 changed files with 12654 additions and 0 deletions

View File

@@ -0,0 +1,680 @@
# Syndarix Architecture Deep Analysis
**Version:** 1.0
**Date:** 2025-12-29
**Status:** Draft - Architectural Thinking
---
## Executive Summary
This document captures deep architectural thinking about Syndarix beyond the immediate spikes. It addresses complex challenges that arise when building a truly autonomous multi-agent system and proposes solutions based on first principles.
---
## 1. Agent Memory and Context Management
### The Challenge
Agents in Syndarix may work on projects for weeks or months. LLM context windows are finite (128K-200K tokens), but project context grows unboundedly. How do we maintain coherent agent "memory" over time?
### Analysis
**Context Window Constraints:**
| Model | Context Window | Practical Limit (with tools) |
|-------|---------------|------------------------------|
| Claude 3.5 Sonnet | 200K tokens | ~150K usable |
| GPT-4 Turbo | 128K tokens | ~100K usable |
| Llama 3 (70B) | 8K-128K tokens | ~80K usable |
**Memory Types Needed:**
1. **Working Memory** - Current task context (fits in context window)
2. **Short-term Memory** - Recent conversation history (RAG-retrievable)
3. **Long-term Memory** - Project knowledge, past decisions (RAG + summarization)
4. **Episodic Memory** - Specific past events/mistakes to learn from
### Proposed Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Agent Memory System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Working │ │ Short-term │ │ Long-term │ │
│ │ Memory │ │ Memory │ │ Memory │ │
│ │ (Context) │ │ (Redis) │ │ (pgvector) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └───────────────────┼──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Context Assembler │ │
│ │ │ │
│ │ 1. System prompt (agent personality, role) │ │
│ │ 2. Project context (from long-term memory) │ │
│ │ 3. Task context (current issue, requirements) │ │
│ │ 4. Relevant history (from short-term memory) │ │
│ │ 5. User message │ │
│ │ │ │
│ │ Total: Fit within context window limits │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Context Compression Strategy:**
```python
class ContextManager:
"""Manages agent context to fit within LLM limits."""
MAX_CONTEXT_TOKENS = 100_000 # Leave room for response
async def build_context(
self,
agent: AgentInstance,
task: Task,
user_message: str
) -> list[Message]:
# Fixed costs
system_prompt = self._get_system_prompt(agent) # ~2K tokens
task_context = self._get_task_context(task) # ~1K tokens
# Variable budget
remaining = self.MAX_CONTEXT_TOKENS - token_count(system_prompt, task_context, user_message)
# Allocate remaining to memories
long_term = await self._query_long_term(agent, task, budget=remaining * 0.4)
short_term = await self._get_short_term(agent, budget=remaining * 0.4)
episodic = await self._get_relevant_episodes(agent, task, budget=remaining * 0.2)
return self._assemble_messages(
system_prompt, task_context, long_term, short_term, episodic, user_message
)
```
**Conversation Summarization:**
- After every N turns (e.g., 10), summarize conversation and archive
- Use smaller/cheaper model for summarization
- Store summaries in pgvector for semantic retrieval
### Recommendation
Implement a **tiered memory system** with automatic context compression and semantic retrieval. Use Redis for hot short-term memory, pgvector for cold long-term memory, and automatic summarization to prevent context overflow.
---
## 2. Cross-Project Knowledge Sharing
### The Challenge
Each project has isolated knowledge, but agents could benefit from cross-project learnings:
- Common patterns (authentication, testing, CI/CD)
- Technology expertise (how to configure Kubernetes)
- Anti-patterns (what didn't work before)
### Analysis
**Privacy Considerations:**
- Client data must remain isolated (contractual, legal)
- Technical patterns are generally shareable
- Need clear data classification
**Knowledge Categories:**
| Category | Scope | Examples |
|----------|-------|----------|
| **Client Data** | Project-only | Requirements, business logic, code |
| **Technical Patterns** | Global | Best practices, configurations |
| **Agent Learnings** | Global | What approaches worked/failed |
| **Anti-patterns** | Global | Common mistakes to avoid |
### Proposed Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Knowledge Graph │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ GLOBAL KNOWLEDGE │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Patterns │ │ Anti-patterns│ │ Expertise │ │ │
│ │ │ Library │ │ Library │ │ Index │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ Curated extraction │
│ │ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Project A │ │ Project B │ │ Project C │ │
│ │ Knowledge │ │ Knowledge │ │ Knowledge │ │
│ │ (Isolated) │ │ (Isolated) │ │ (Isolated) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Knowledge Extraction Pipeline:**
```python
class KnowledgeExtractor:
"""Extracts shareable learnings from project work."""
async def extract_learnings(self, project_id: str) -> list[Learning]:
"""
Run periodically or after sprints to extract learnings.
Human review required before promoting to global.
"""
# Get completed work
completed_issues = await self.get_completed_issues(project_id)
# Extract patterns using LLM
patterns = await self.llm.extract_patterns(
completed_issues,
categories=["architecture", "testing", "deployment", "security"]
)
# Classify privacy
for pattern in patterns:
pattern.privacy_level = await self.llm.classify_privacy(pattern)
# Return only shareable patterns for review
return [p for p in patterns if p.privacy_level == "public"]
```
### Recommendation
Implement **privacy-aware knowledge extraction** with human review gate. Project knowledge stays isolated by default; only explicitly approved patterns flow to global knowledge.
---
## 3. Agent Specialization vs Generalization Trade-offs
### The Challenge
Should each agent type be highly specialized (depth) or have overlapping capabilities (breadth)?
### Analysis
**Specialization Benefits:**
- Deeper expertise in domain
- Cleaner system prompts
- Less confusion about responsibilities
- Easier to optimize prompts per role
**Generalization Benefits:**
- Fewer agent types to maintain
- Smoother handoffs (shared context)
- More flexible team composition
- Graceful degradation if agent unavailable
**Current Agent Types (10):**
| Role | Primary Domain | Potential Overlap |
|------|---------------|-------------------|
| Product Owner | Requirements | Business Analyst |
| Business Analyst | Documentation | Product Owner |
| Project Manager | Planning | Product Owner |
| Software Architect | Design | Senior Engineer |
| Software Engineer | Coding | Architect, QA |
| UI/UX Designer | Interface | Frontend Engineer |
| QA Engineer | Testing | Software Engineer |
| DevOps Engineer | Infrastructure | Senior Engineer |
| AI/ML Engineer | ML/AI | Software Engineer |
| Security Expert | Security | All |
### Proposed Approach: Layered Specialization
```
┌─────────────────────────────────────────────────────────────────┐
│ Agent Capability Layers │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 3: Role-Specific Expertise │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Product │ │ Architect│ │Engineer │ │ QA │ │
│ │ Owner │ │ │ │ │ │ │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ Layer 2: Shared Professional Skills │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Technical Communication | Code Understanding | Git │ │
│ │ Documentation | Research | Problem Decomposition │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ Layer 1: Foundation Model Capabilities │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Reasoning | Analysis | Writing | Coding (LLM Base) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Capability Inheritance:**
```python
class AgentTypeBuilder:
"""Builds agent types with layered capabilities."""
BASE_CAPABILITIES = [
"reasoning", "analysis", "writing", "coding_assist"
]
PROFESSIONAL_SKILLS = [
"technical_communication", "code_understanding",
"git_operations", "documentation", "research"
]
ROLE_SPECIFIC = {
"ENGINEER": ["code_generation", "code_review", "testing", "debugging"],
"ARCHITECT": ["system_design", "adr_writing", "tech_selection"],
"QA": ["test_planning", "test_automation", "bug_reporting"],
# ...
}
def build_capabilities(self, role: AgentRole) -> list[str]:
return (
self.BASE_CAPABILITIES +
self.PROFESSIONAL_SKILLS +
self.ROLE_SPECIFIC[role]
)
```
### Recommendation
Adopt **layered specialization** where all agents share foundational and professional capabilities, with role-specific expertise on top. This enables smooth collaboration while maintaining clear responsibilities.
---
## 4. Human-Agent Collaboration Model
### The Challenge
Beyond approval gates, how do humans effectively collaborate with autonomous agents during active work?
### Interaction Patterns
| Pattern | Use Case | Frequency |
|---------|----------|-----------|
| **Approval** | Confirm before action | Per checkpoint |
| **Guidance** | Steer direction | On-demand |
| **Override** | Correct mistake | Rare |
| **Pair Working** | Work together | Optional |
| **Review** | Evaluate output | Post-completion |
### Proposed Collaboration Interface
```
┌─────────────────────────────────────────────────────────────────┐
│ Human-Agent Collaboration Dashboard │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Activity Stream │ │
│ │ ────────────────────────────────────────────────────── │ │
│ │ [10:23] Dave (Engineer) is implementing login API │ │
│ │ [10:24] Dave created auth/service.py │ │
│ │ [10:25] Dave is writing unit tests │ │
│ │ [LIVE] Dave: "I'm adding JWT validation. Using HS256..." │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Intervention Panel │ │
│ │ │ │
│ │ [💬 Chat] [⏸️ Pause] [↩️ Undo Last] [📝 Guide] │ │
│ │ │ │
│ │ Quick Guidance: │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ "Use RS256 instead of HS256 for JWT signing" │ │ │
│ │ │ [Send] 📤 │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Intervention API:**
```python
@router.post("/agents/{agent_id}/intervene")
async def intervene(
agent_id: UUID,
intervention: InterventionRequest,
current_user: User = Depends(get_current_user)
):
"""Allow human to intervene in agent work."""
match intervention.type:
case "pause":
await orchestrator.pause_agent(agent_id)
case "resume":
await orchestrator.resume_agent(agent_id)
case "guide":
await orchestrator.send_guidance(agent_id, intervention.message)
case "undo":
await orchestrator.undo_last_action(agent_id)
case "override":
await orchestrator.override_decision(agent_id, intervention.decision)
```
### Recommendation
Build a **real-time collaboration dashboard** with intervention capabilities. Humans should be able to observe, guide, pause, and correct agents without stopping the entire workflow.
---
## 5. Testing Strategy for Autonomous AI Systems
### The Challenge
Traditional testing (unit, integration, E2E) doesn't capture autonomous agent behavior. How do we ensure quality?
### Testing Pyramid for AI Agents
```
E2E ╲ Agent Scenarios
Agent ╲ (Full workflows)
╱─────────╲
Integration╲ Tool + LLM Integration
(with mocks) ╲ (Deterministic responses)
╱─────────────────╲
Unit Tests ╲ Orchestrator, Services
(no LLM needed) ╲ (Pure logic)
╱───────────────────────╲
Prompt Testing ╲ System prompt evaluation
(LLM evals) ╲(Quality metrics)
╱─────────────────────────────╲
```
### Test Categories
**1. Prompt Testing (Eval Framework):**
```python
class PromptEvaluator:
"""Evaluate system prompt quality."""
TEST_CASES = [
EvalCase(
name="requirement_extraction",
input="Client wants a mobile app for food delivery",
expected_behaviors=[
"asks clarifying questions",
"identifies stakeholders",
"considers non-functional requirements"
]
),
EvalCase(
name="code_review_thoroughness",
input="Review this PR: [vulnerable SQL code]",
expected_behaviors=[
"identifies SQL injection",
"suggests parameterized queries",
"mentions security best practices"
]
)
]
async def evaluate(self, agent_type: AgentType) -> EvalReport:
results = []
for case in self.TEST_CASES:
response = await self.llm.complete(
system=agent_type.system_prompt,
user=case.input
)
score = await self.judge_response(response, case.expected_behaviors)
results.append(score)
return EvalReport(results)
```
**2. Integration Testing (Mock LLM):**
```python
@pytest.fixture
def mock_llm():
"""Deterministic LLM responses for integration tests."""
responses = {
"analyze requirements": "...",
"generate code": "def hello(): return 'world'",
"review code": "LGTM"
}
return MockLLM(responses)
async def test_story_implementation_workflow(mock_llm):
"""Test full workflow with predictable responses."""
orchestrator = AgentOrchestrator(llm=mock_llm)
result = await orchestrator.execute_workflow(
workflow="implement_story",
inputs={"story_id": "TEST-123"}
)
assert result.status == "completed"
assert "hello" in result.artifacts["code"]
```
**3. Agent Scenario Testing:**
```python
class AgentScenarioTest:
"""End-to-end agent behavior testing."""
@scenario("engineer_handles_bug_report")
async def test_bug_resolution(self):
"""Engineer agent should fix bugs correctly."""
# Setup
project = await create_test_project()
engineer = await spawn_agent("engineer", project)
# Act
bug = await create_issue(
project,
title="Login button not working",
type="bug"
)
result = await engineer.handle(bug)
# Assert
assert result.pr_created
assert result.tests_pass
assert "button" in result.changes_summary.lower()
```
### Recommendation
Implement a **multi-layer testing strategy** with prompt evals, deterministic integration tests, and scenario-based agent testing. Use LLM-as-judge for evaluating open-ended responses.
---
## 6. Rollback and Recovery
### The Challenge
Autonomous agents will make mistakes. How do we recover gracefully?
### Error Categories
| Category | Example | Recovery Strategy |
|----------|---------|-------------------|
| **Reversible** | Wrong code generated | Revert commit, regenerate |
| **Partially Reversible** | Merged bad PR | Revert PR, fix, re-merge |
| **Non-reversible** | Deployed to production | Forward-fix or rollback deploy |
| **External Side Effects** | Email sent to client | Apology + correction |
### Recovery Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Recovery System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Action Log │ │
│ │ ┌──────────────────────────────────────────────────┐ │ │
│ │ │ Action ID | Agent | Type | Reversible | State │ │ │
│ │ ├──────────────────────────────────────────────────┤ │ │
│ │ │ a-001 | Dave | commit | Yes | completed │ │ │
│ │ │ a-002 | Dave | push | Yes | completed │ │ │
│ │ │ a-003 | Dave | create_pr | Yes | completed │ │ │
│ │ │ a-004 | Kate | merge_pr | Partial | completed │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Rollback Engine │ │
│ │ │ │
│ │ rollback_to(action_id) -> Reverses all actions after │ │
│ │ undo_action(action_id) -> Reverses single action │ │
│ │ compensate(action_id) -> Creates compensating action │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Action Logging:**
```python
class ActionLog:
"""Immutable log of all agent actions for recovery."""
async def record(
self,
agent_id: UUID,
action_type: str,
inputs: dict,
outputs: dict,
reversible: bool,
reverse_action: str | None = None
) -> ActionRecord:
record = ActionRecord(
id=uuid4(),
agent_id=agent_id,
action_type=action_type,
inputs=inputs,
outputs=outputs,
reversible=reversible,
reverse_action=reverse_action,
timestamp=datetime.utcnow()
)
await self.db.add(record)
return record
async def rollback_to(self, action_id: UUID) -> RollbackResult:
"""Rollback all actions after the given action."""
actions = await self.get_actions_after(action_id)
results = []
for action in reversed(actions):
if action.reversible:
result = await self._execute_reverse(action)
results.append(result)
else:
results.append(RollbackSkipped(action, reason="non-reversible"))
return RollbackResult(results)
```
**Compensation Pattern:**
```python
class CompensationEngine:
"""Handles compensating actions for non-reversible operations."""
COMPENSATIONS = {
"email_sent": "send_correction_email",
"deployment": "rollback_deployment",
"external_api_call": "create_reversal_request"
}
async def compensate(self, action: ActionRecord) -> CompensationResult:
if action.action_type in self.COMPENSATIONS:
compensation = self.COMPENSATIONS[action.action_type]
return await self._execute_compensation(compensation, action)
else:
return CompensationResult(
status="manual_required",
message=f"No automatic compensation for {action.action_type}"
)
```
### Recommendation
Implement **comprehensive action logging** with rollback capabilities. Define compensation strategies for non-reversible actions. Enable point-in-time recovery for project state.
---
## 7. Security Considerations for Autonomous Agents
### Threat Model
| Threat | Risk | Mitigation |
|--------|------|------------|
| Agent executes malicious code | High | Sandboxed execution, code review gates |
| Agent exfiltrates data | High | Network isolation, output filtering |
| Prompt injection via user input | Medium | Input sanitization, prompt hardening |
| Agent credential abuse | Medium | Least-privilege tokens, short TTL |
| Agent collusion | Low | Independent agent instances, monitoring |
### Security Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Security Layers │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 4: Output Filtering │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Code scan before commit │ │
│ │ - Secrets detection │ │
│ │ - Policy compliance check │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Layer 3: Action Authorization │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Role-based permissions │ │
│ │ - Project scope enforcement │ │
│ │ - Sensitive action approval │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Layer 2: Input Sanitization │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Prompt injection detection │ │
│ │ - Content filtering │ │
│ │ - Schema validation │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Layer 1: Infrastructure Isolation │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Container sandboxing │ │
│ │ - Network segmentation │ │
│ │ - File system restrictions │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Recommendation
Implement **defense-in-depth** with multiple security layers. Assume agents can be compromised and design for containment.
---
## Summary of Recommendations
| Area | Recommendation | Priority |
|------|----------------|----------|
| Memory | Tiered memory with context compression | High |
| Knowledge | Privacy-aware extraction with human gate | Medium |
| Specialization | Layered capabilities with role-specific top | Medium |
| Collaboration | Real-time dashboard with intervention | High |
| Testing | Multi-layer with prompt evals | High |
| Recovery | Action logging with rollback engine | High |
| Security | Defense-in-depth, assume compromise | High |
---
## Next Steps
1. **Validate with spike research** - Update based on spike findings
2. **Create detailed ADRs** - For memory, recovery, security
3. **Prototype critical paths** - Memory system, rollback engine
4. **Security review** - External audit before production
---
*This document captures architectural thinking to guide implementation. It should be updated as spikes complete and design evolves.*

View File

@@ -0,0 +1,339 @@
# Syndarix Implementation Roadmap
**Version:** 1.0
**Date:** 2025-12-29
**Status:** Draft
---
## Executive Summary
This roadmap outlines the phased implementation approach for Syndarix, prioritizing foundational infrastructure before advanced features. Each phase builds upon the previous, with clear milestones and deliverables.
---
## Phase 0: Foundation (Weeks 1-2)
**Goal:** Establish development infrastructure and basic platform
### 0.1 Repository Setup
- [x] Fork PragmaStack to Syndarix
- [x] Create spike backlog in Gitea
- [x] Complete architecture documentation
- [ ] Rebrand codebase (Issue #13 - in progress)
- [ ] Configure CI/CD pipelines
- [ ] Set up development environment documentation
### 0.2 Core Infrastructure
- [ ] Configure Redis for cache + pub/sub
- [ ] Set up Celery worker infrastructure
- [ ] Configure pgvector extension
- [ ] Create MCP server directory structure
- [ ] Set up Docker Compose for local development
### Deliverables
- Fully branded Syndarix repository
- Working local development environment
- CI/CD pipeline running tests
---
## Phase 1: Core Platform (Weeks 3-6)
**Goal:** Basic project and agent management without LLM integration
### 1.1 Data Model
- [ ] Create Project entity and CRUD
- [ ] Create AgentType entity and CRUD
- [ ] Create AgentInstance entity and CRUD
- [ ] Create Issue entity with external tracker fields
- [ ] Create Sprint entity and CRUD
- [ ] Database migrations with Alembic
### 1.2 API Layer
- [ ] Project management endpoints
- [ ] Agent type configuration endpoints
- [ ] Agent instance management endpoints
- [ ] Issue CRUD endpoints
- [ ] Sprint management endpoints
### 1.3 Real-time Infrastructure
- [ ] Implement EventBus with Redis Pub/Sub
- [ ] Create SSE endpoint for project events
- [ ] Implement event types enum
- [ ] Add keepalive mechanism
- [ ] Client-side SSE handling
### 1.4 Frontend Foundation
- [ ] Project dashboard page
- [ ] Agent configuration UI
- [ ] Issue list and detail views
- [ ] Real-time activity feed component
- [ ] Basic navigation and layout
### Deliverables
- CRUD operations for all core entities
- Real-time event streaming working
- Basic admin UI for configuration
---
## Phase 2: MCP Integration (Weeks 7-10)
**Goal:** Build MCP servers for external integrations
### 2.1 MCP Client Infrastructure
- [ ] Create MCPClientManager class
- [ ] Implement server registry
- [ ] Add connection management with reconnection
- [ ] Create tool call routing
### 2.2 LLM Gateway MCP (Priority 1)
- [ ] Create FastMCP server structure
- [ ] Implement LiteLLM integration
- [ ] Add model group routing
- [ ] Implement failover chain
- [ ] Add cost tracking callbacks
- [ ] Create token usage logging
### 2.3 Knowledge Base MCP (Priority 2)
- [ ] Create pgvector schema for embeddings
- [ ] Implement document ingestion pipeline
- [ ] Create chunking strategies (code, markdown, text)
- [ ] Implement semantic search
- [ ] Add hybrid search (vector + keyword)
- [ ] Per-project collection isolation
### 2.4 Git MCP (Priority 3)
- [ ] Create Git operations wrapper
- [ ] Implement clone, commit, push operations
- [ ] Add branch management
- [ ] Create PR operations
- [ ] Add Gitea API integration
- [ ] Implement GitHub/GitLab adapters
### 2.5 Issues MCP (Priority 4)
- [ ] Create issue sync service
- [ ] Implement Gitea issue operations
- [ ] Add GitHub issue adapter
- [ ] Add GitLab issue adapter
- [ ] Implement bi-directional sync
- [ ] Create conflict resolution logic
### Deliverables
- 4 working MCP servers
- LLM calls routed through gateway
- RAG search functional
- Git operations working
- Issue sync with external trackers
---
## Phase 3: Agent Orchestration (Weeks 11-14)
**Goal:** Enable agents to perform autonomous work
### 3.1 Agent Runner
- [ ] Create AgentRunner class
- [ ] Implement context assembly
- [ ] Add memory management (short-term, long-term)
- [ ] Implement action execution
- [ ] Add tool call handling
- [ ] Create agent error handling
### 3.2 Agent Orchestrator
- [ ] Implement spawn_agent method
- [ ] Create terminate_agent method
- [ ] Implement send_message routing
- [ ] Add broadcast functionality
- [ ] Create agent status tracking
- [ ] Implement agent recovery
### 3.3 Inter-Agent Communication
- [ ] Define message format schema
- [ ] Implement message persistence
- [ ] Create message routing logic
- [ ] Add @mention parsing
- [ ] Implement priority queues
- [ ] Add conversation threading
### 3.4 Background Task Integration
- [ ] Create Celery task wrappers
- [ ] Implement progress reporting
- [ ] Add task chaining for workflows
- [ ] Create agent queue routing
- [ ] Implement task retry logic
### Deliverables
- Agents can be spawned and communicate
- Agents can call MCP tools
- Background tasks for long operations
- Agent activity visible in real-time
---
## Phase 4: Workflow Engine (Weeks 15-18)
**Goal:** Implement structured workflows for software delivery
### 4.1 State Machine Foundation
- [ ] Create workflow state machine base
- [ ] Implement state persistence
- [ ] Add transition validation
- [ ] Create state history logging
- [ ] Implement compensation patterns
### 4.2 Core Workflows
- [ ] Requirements Discovery workflow
- [ ] Architecture Spike workflow
- [ ] Sprint Planning workflow
- [ ] Story Implementation workflow
- [ ] Sprint Demo workflow
### 4.3 Approval Gates
- [ ] Create approval checkpoint system
- [ ] Implement approval UI components
- [ ] Add notification triggers
- [ ] Create timeout handling
- [ ] Implement escalation logic
### 4.4 Autonomy Levels
- [ ] Implement FULL_CONTROL mode
- [ ] Implement MILESTONE mode
- [ ] Implement AUTONOMOUS mode
- [ ] Create autonomy configuration UI
- [ ] Add per-action approval overrides
### Deliverables
- Structured workflows executing
- Approval gates working
- Autonomy levels configurable
- Full sprint cycle possible
---
## Phase 5: Advanced Features (Weeks 19-22)
**Goal:** Polish and production readiness
### 5.1 Cost Management
- [ ] Real-time cost tracking dashboard
- [ ] Budget configuration per project
- [ ] Alert threshold system
- [ ] Cost optimization recommendations
- [ ] Historical cost analytics
### 5.2 Audit & Compliance
- [ ] Comprehensive action logging
- [ ] Audit trail viewer UI
- [ ] Export functionality
- [ ] Retention policy implementation
- [ ] Compliance report generation
### 5.3 Human-Agent Collaboration
- [ ] Live activity dashboard
- [ ] Intervention panel (pause, guide, undo)
- [ ] Agent chat interface
- [ ] Context inspector
- [ ] Decision explainer
### 5.4 Additional MCP Servers
- [ ] File System MCP
- [ ] Code Analysis MCP
- [ ] CI/CD MCP
### Deliverables
- Production-ready system
- Full observability
- Cost controls active
- Audit compliance
---
## Phase 6: Polish & Launch (Weeks 23-24)
**Goal:** Production deployment
### 6.1 Performance Optimization
- [ ] Load testing
- [ ] Query optimization
- [ ] Caching optimization
- [ ] Memory profiling
### 6.2 Security Hardening
- [ ] Security audit
- [ ] Penetration testing
- [ ] Secrets management
- [ ] Rate limiting tuning
### 6.3 Documentation
- [ ] User documentation
- [ ] API documentation
- [ ] Deployment guide
- [ ] Runbook
### 6.4 Deployment
- [ ] Production environment setup
- [ ] Monitoring & alerting
- [ ] Backup & recovery
- [ ] Launch checklist
---
## Risk Register
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|------------|
| LLM API outages | High | Medium | Multi-provider failover |
| Cost overruns | High | Medium | Budget enforcement, local models |
| Agent hallucinations | High | Medium | Approval gates, code review |
| Performance bottlenecks | Medium | Medium | Load testing, caching |
| Integration failures | Medium | Low | Contract testing, mocks |
---
## Success Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Agent task success rate | >90% | Completed tasks / total tasks |
| Response time (P95) | <2s | API latency |
| Cost per project | <$50/sprint | LLM + compute costs |
| Time to first commit | <1 hour | From requirements to PR |
| Client satisfaction | >4/5 | Post-sprint survey |
---
## Dependencies
```
Phase 0 ─────▶ Phase 1 ─────▶ Phase 2 ─────▶ Phase 3 ─────▶ Phase 4 ─────▶ Phase 5 ─────▶ Phase 6
Foundation Core Platform MCP Integration Agent Orch Workflows Advanced Launch
Depends on:
- LLM Gateway
- Knowledge Base
- Real-time events
```
---
## Resource Requirements
### Development Team
- 1 Backend Engineer (Python/FastAPI)
- 1 Frontend Engineer (React/Next.js)
- 0.5 DevOps Engineer
- 0.25 Product Manager
### Infrastructure
- PostgreSQL (managed or self-hosted)
- Redis (managed or self-hosted)
- Celery workers (2-4 instances)
- MCP servers (7 containers)
- API server (2+ instances)
- Frontend (static hosting or SSR)
### External Services
- Anthropic API (primary LLM)
- OpenAI API (fallback)
- Ollama (local models, optional)
- Gitea/GitHub/GitLab (issue tracking)
---
*This roadmap will be refined as spikes complete and requirements evolve.*