docs: add architecture spikes and deep analysis documentation

Add comprehensive spike research documents:
- SPIKE-002: Agent Orchestration Pattern (LangGraph + Temporal hybrid)
- SPIKE-006: Knowledge Base pgvector (RAG with hybrid search)
- SPIKE-007: Agent Communication Protocol (JSON-RPC + Redis Streams)
- SPIKE-008: Workflow State Machine (transitions lib + event sourcing)
- SPIKE-009: Issue Synchronization (bi-directional sync with conflict resolution)
- SPIKE-010: Cost Tracking (LiteLLM callbacks + budget enforcement)
- SPIKE-011: Audit Logging (structured event sourcing)
- SPIKE-012: Client Approval Flow (checkpoint-based approvals)

Add architecture documentation:
- ARCHITECTURE_DEEP_ANALYSIS.md: Memory management, security, testing strategy
- IMPLEMENTATION_ROADMAP.md: 6-phase, 24-week implementation plan

Closes #2, #6, #7, #8, #9, #10, #11, #12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-29 13:31:02 +01:00
parent ebd307cab4
commit 5594655fba
10 changed files with 12654 additions and 0 deletions

View File

@@ -0,0 +1,680 @@
# Syndarix Architecture Deep Analysis
**Version:** 1.0
**Date:** 2025-12-29
**Status:** Draft - Architectural Thinking
---
## Executive Summary
This document captures deep architectural thinking about Syndarix beyond the immediate spikes. It addresses complex challenges that arise when building a truly autonomous multi-agent system and proposes solutions based on first principles.
---
## 1. Agent Memory and Context Management
### The Challenge
Agents in Syndarix may work on projects for weeks or months. LLM context windows are finite (128K-200K tokens), but project context grows unboundedly. How do we maintain coherent agent "memory" over time?
### Analysis
**Context Window Constraints:**
| Model | Context Window | Practical Limit (with tools) |
|-------|---------------|------------------------------|
| Claude 3.5 Sonnet | 200K tokens | ~150K usable |
| GPT-4 Turbo | 128K tokens | ~100K usable |
| Llama 3 (70B) | 8K-128K tokens | ~80K usable |
**Memory Types Needed:**
1. **Working Memory** - Current task context (fits in context window)
2. **Short-term Memory** - Recent conversation history (RAG-retrievable)
3. **Long-term Memory** - Project knowledge, past decisions (RAG + summarization)
4. **Episodic Memory** - Specific past events/mistakes to learn from
### Proposed Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Agent Memory System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Working │ │ Short-term │ │ Long-term │ │
│ │ Memory │ │ Memory │ │ Memory │ │
│ │ (Context) │ │ (Redis) │ │ (pgvector) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └───────────────────┼──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Context Assembler │ │
│ │ │ │
│ │ 1. System prompt (agent personality, role) │ │
│ │ 2. Project context (from long-term memory) │ │
│ │ 3. Task context (current issue, requirements) │ │
│ │ 4. Relevant history (from short-term memory) │ │
│ │ 5. User message │ │
│ │ │ │
│ │ Total: Fit within context window limits │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Context Compression Strategy:**
```python
class ContextManager:
"""Manages agent context to fit within LLM limits."""
MAX_CONTEXT_TOKENS = 100_000 # Leave room for response
async def build_context(
self,
agent: AgentInstance,
task: Task,
user_message: str
) -> list[Message]:
# Fixed costs
system_prompt = self._get_system_prompt(agent) # ~2K tokens
task_context = self._get_task_context(task) # ~1K tokens
# Variable budget
remaining = self.MAX_CONTEXT_TOKENS - token_count(system_prompt, task_context, user_message)
# Allocate remaining to memories
long_term = await self._query_long_term(agent, task, budget=remaining * 0.4)
short_term = await self._get_short_term(agent, budget=remaining * 0.4)
episodic = await self._get_relevant_episodes(agent, task, budget=remaining * 0.2)
return self._assemble_messages(
system_prompt, task_context, long_term, short_term, episodic, user_message
)
```
**Conversation Summarization:**
- After every N turns (e.g., 10), summarize conversation and archive
- Use smaller/cheaper model for summarization
- Store summaries in pgvector for semantic retrieval
### Recommendation
Implement a **tiered memory system** with automatic context compression and semantic retrieval. Use Redis for hot short-term memory, pgvector for cold long-term memory, and automatic summarization to prevent context overflow.
---
## 2. Cross-Project Knowledge Sharing
### The Challenge
Each project has isolated knowledge, but agents could benefit from cross-project learnings:
- Common patterns (authentication, testing, CI/CD)
- Technology expertise (how to configure Kubernetes)
- Anti-patterns (what didn't work before)
### Analysis
**Privacy Considerations:**
- Client data must remain isolated (contractual, legal)
- Technical patterns are generally shareable
- Need clear data classification
**Knowledge Categories:**
| Category | Scope | Examples |
|----------|-------|----------|
| **Client Data** | Project-only | Requirements, business logic, code |
| **Technical Patterns** | Global | Best practices, configurations |
| **Agent Learnings** | Global | What approaches worked/failed |
| **Anti-patterns** | Global | Common mistakes to avoid |
### Proposed Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Knowledge Graph │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ GLOBAL KNOWLEDGE │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Patterns │ │ Anti-patterns│ │ Expertise │ │ │
│ │ │ Library │ │ Library │ │ Index │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ Curated extraction │
│ │ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Project A │ │ Project B │ │ Project C │ │
│ │ Knowledge │ │ Knowledge │ │ Knowledge │ │
│ │ (Isolated) │ │ (Isolated) │ │ (Isolated) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Knowledge Extraction Pipeline:**
```python
class KnowledgeExtractor:
"""Extracts shareable learnings from project work."""
async def extract_learnings(self, project_id: str) -> list[Learning]:
"""
Run periodically or after sprints to extract learnings.
Human review required before promoting to global.
"""
# Get completed work
completed_issues = await self.get_completed_issues(project_id)
# Extract patterns using LLM
patterns = await self.llm.extract_patterns(
completed_issues,
categories=["architecture", "testing", "deployment", "security"]
)
# Classify privacy
for pattern in patterns:
pattern.privacy_level = await self.llm.classify_privacy(pattern)
# Return only shareable patterns for review
return [p for p in patterns if p.privacy_level == "public"]
```
### Recommendation
Implement **privacy-aware knowledge extraction** with human review gate. Project knowledge stays isolated by default; only explicitly approved patterns flow to global knowledge.
---
## 3. Agent Specialization vs Generalization Trade-offs
### The Challenge
Should each agent type be highly specialized (depth) or have overlapping capabilities (breadth)?
### Analysis
**Specialization Benefits:**
- Deeper expertise in domain
- Cleaner system prompts
- Less confusion about responsibilities
- Easier to optimize prompts per role
**Generalization Benefits:**
- Fewer agent types to maintain
- Smoother handoffs (shared context)
- More flexible team composition
- Graceful degradation if agent unavailable
**Current Agent Types (10):**
| Role | Primary Domain | Potential Overlap |
|------|---------------|-------------------|
| Product Owner | Requirements | Business Analyst |
| Business Analyst | Documentation | Product Owner |
| Project Manager | Planning | Product Owner |
| Software Architect | Design | Senior Engineer |
| Software Engineer | Coding | Architect, QA |
| UI/UX Designer | Interface | Frontend Engineer |
| QA Engineer | Testing | Software Engineer |
| DevOps Engineer | Infrastructure | Senior Engineer |
| AI/ML Engineer | ML/AI | Software Engineer |
| Security Expert | Security | All |
### Proposed Approach: Layered Specialization
```
┌─────────────────────────────────────────────────────────────────┐
│ Agent Capability Layers │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 3: Role-Specific Expertise │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Product │ │ Architect│ │Engineer │ │ QA │ │
│ │ Owner │ │ │ │ │ │ │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ Layer 2: Shared Professional Skills │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Technical Communication | Code Understanding | Git │ │
│ │ Documentation | Research | Problem Decomposition │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ Layer 1: Foundation Model Capabilities │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Reasoning | Analysis | Writing | Coding (LLM Base) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Capability Inheritance:**
```python
class AgentTypeBuilder:
"""Builds agent types with layered capabilities."""
BASE_CAPABILITIES = [
"reasoning", "analysis", "writing", "coding_assist"
]
PROFESSIONAL_SKILLS = [
"technical_communication", "code_understanding",
"git_operations", "documentation", "research"
]
ROLE_SPECIFIC = {
"ENGINEER": ["code_generation", "code_review", "testing", "debugging"],
"ARCHITECT": ["system_design", "adr_writing", "tech_selection"],
"QA": ["test_planning", "test_automation", "bug_reporting"],
# ...
}
def build_capabilities(self, role: AgentRole) -> list[str]:
return (
self.BASE_CAPABILITIES +
self.PROFESSIONAL_SKILLS +
self.ROLE_SPECIFIC[role]
)
```
### Recommendation
Adopt **layered specialization** where all agents share foundational and professional capabilities, with role-specific expertise on top. This enables smooth collaboration while maintaining clear responsibilities.
---
## 4. Human-Agent Collaboration Model
### The Challenge
Beyond approval gates, how do humans effectively collaborate with autonomous agents during active work?
### Interaction Patterns
| Pattern | Use Case | Frequency |
|---------|----------|-----------|
| **Approval** | Confirm before action | Per checkpoint |
| **Guidance** | Steer direction | On-demand |
| **Override** | Correct mistake | Rare |
| **Pair Working** | Work together | Optional |
| **Review** | Evaluate output | Post-completion |
### Proposed Collaboration Interface
```
┌─────────────────────────────────────────────────────────────────┐
│ Human-Agent Collaboration Dashboard │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Activity Stream │ │
│ │ ────────────────────────────────────────────────────── │ │
│ │ [10:23] Dave (Engineer) is implementing login API │ │
│ │ [10:24] Dave created auth/service.py │ │
│ │ [10:25] Dave is writing unit tests │ │
│ │ [LIVE] Dave: "I'm adding JWT validation. Using HS256..." │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Intervention Panel │ │
│ │ │ │
│ │ [💬 Chat] [⏸️ Pause] [↩️ Undo Last] [📝 Guide] │ │
│ │ │ │
│ │ Quick Guidance: │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ "Use RS256 instead of HS256 for JWT signing" │ │ │
│ │ │ [Send] 📤 │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Intervention API:**
```python
@router.post("/agents/{agent_id}/intervene")
async def intervene(
agent_id: UUID,
intervention: InterventionRequest,
current_user: User = Depends(get_current_user)
):
"""Allow human to intervene in agent work."""
match intervention.type:
case "pause":
await orchestrator.pause_agent(agent_id)
case "resume":
await orchestrator.resume_agent(agent_id)
case "guide":
await orchestrator.send_guidance(agent_id, intervention.message)
case "undo":
await orchestrator.undo_last_action(agent_id)
case "override":
await orchestrator.override_decision(agent_id, intervention.decision)
```
### Recommendation
Build a **real-time collaboration dashboard** with intervention capabilities. Humans should be able to observe, guide, pause, and correct agents without stopping the entire workflow.
---
## 5. Testing Strategy for Autonomous AI Systems
### The Challenge
Traditional testing (unit, integration, E2E) doesn't capture autonomous agent behavior. How do we ensure quality?
### Testing Pyramid for AI Agents
```
E2E ╲ Agent Scenarios
Agent ╲ (Full workflows)
╱─────────╲
Integration╲ Tool + LLM Integration
(with mocks) ╲ (Deterministic responses)
╱─────────────────╲
Unit Tests ╲ Orchestrator, Services
(no LLM needed) ╲ (Pure logic)
╱───────────────────────╲
Prompt Testing ╲ System prompt evaluation
(LLM evals) ╲(Quality metrics)
╱─────────────────────────────╲
```
### Test Categories
**1. Prompt Testing (Eval Framework):**
```python
class PromptEvaluator:
"""Evaluate system prompt quality."""
TEST_CASES = [
EvalCase(
name="requirement_extraction",
input="Client wants a mobile app for food delivery",
expected_behaviors=[
"asks clarifying questions",
"identifies stakeholders",
"considers non-functional requirements"
]
),
EvalCase(
name="code_review_thoroughness",
input="Review this PR: [vulnerable SQL code]",
expected_behaviors=[
"identifies SQL injection",
"suggests parameterized queries",
"mentions security best practices"
]
)
]
async def evaluate(self, agent_type: AgentType) -> EvalReport:
results = []
for case in self.TEST_CASES:
response = await self.llm.complete(
system=agent_type.system_prompt,
user=case.input
)
score = await self.judge_response(response, case.expected_behaviors)
results.append(score)
return EvalReport(results)
```
**2. Integration Testing (Mock LLM):**
```python
@pytest.fixture
def mock_llm():
"""Deterministic LLM responses for integration tests."""
responses = {
"analyze requirements": "...",
"generate code": "def hello(): return 'world'",
"review code": "LGTM"
}
return MockLLM(responses)
async def test_story_implementation_workflow(mock_llm):
"""Test full workflow with predictable responses."""
orchestrator = AgentOrchestrator(llm=mock_llm)
result = await orchestrator.execute_workflow(
workflow="implement_story",
inputs={"story_id": "TEST-123"}
)
assert result.status == "completed"
assert "hello" in result.artifacts["code"]
```
**3. Agent Scenario Testing:**
```python
class AgentScenarioTest:
"""End-to-end agent behavior testing."""
@scenario("engineer_handles_bug_report")
async def test_bug_resolution(self):
"""Engineer agent should fix bugs correctly."""
# Setup
project = await create_test_project()
engineer = await spawn_agent("engineer", project)
# Act
bug = await create_issue(
project,
title="Login button not working",
type="bug"
)
result = await engineer.handle(bug)
# Assert
assert result.pr_created
assert result.tests_pass
assert "button" in result.changes_summary.lower()
```
### Recommendation
Implement a **multi-layer testing strategy** with prompt evals, deterministic integration tests, and scenario-based agent testing. Use LLM-as-judge for evaluating open-ended responses.
---
## 6. Rollback and Recovery
### The Challenge
Autonomous agents will make mistakes. How do we recover gracefully?
### Error Categories
| Category | Example | Recovery Strategy |
|----------|---------|-------------------|
| **Reversible** | Wrong code generated | Revert commit, regenerate |
| **Partially Reversible** | Merged bad PR | Revert PR, fix, re-merge |
| **Non-reversible** | Deployed to production | Forward-fix or rollback deploy |
| **External Side Effects** | Email sent to client | Apology + correction |
### Recovery Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Recovery System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Action Log │ │
│ │ ┌──────────────────────────────────────────────────┐ │ │
│ │ │ Action ID | Agent | Type | Reversible | State │ │ │
│ │ ├──────────────────────────────────────────────────┤ │ │
│ │ │ a-001 | Dave | commit | Yes | completed │ │ │
│ │ │ a-002 | Dave | push | Yes | completed │ │ │
│ │ │ a-003 | Dave | create_pr | Yes | completed │ │ │
│ │ │ a-004 | Kate | merge_pr | Partial | completed │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Rollback Engine │ │
│ │ │ │
│ │ rollback_to(action_id) -> Reverses all actions after │ │
│ │ undo_action(action_id) -> Reverses single action │ │
│ │ compensate(action_id) -> Creates compensating action │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Action Logging:**
```python
class ActionLog:
"""Immutable log of all agent actions for recovery."""
async def record(
self,
agent_id: UUID,
action_type: str,
inputs: dict,
outputs: dict,
reversible: bool,
reverse_action: str | None = None
) -> ActionRecord:
record = ActionRecord(
id=uuid4(),
agent_id=agent_id,
action_type=action_type,
inputs=inputs,
outputs=outputs,
reversible=reversible,
reverse_action=reverse_action,
timestamp=datetime.utcnow()
)
await self.db.add(record)
return record
async def rollback_to(self, action_id: UUID) -> RollbackResult:
"""Rollback all actions after the given action."""
actions = await self.get_actions_after(action_id)
results = []
for action in reversed(actions):
if action.reversible:
result = await self._execute_reverse(action)
results.append(result)
else:
results.append(RollbackSkipped(action, reason="non-reversible"))
return RollbackResult(results)
```
**Compensation Pattern:**
```python
class CompensationEngine:
"""Handles compensating actions for non-reversible operations."""
COMPENSATIONS = {
"email_sent": "send_correction_email",
"deployment": "rollback_deployment",
"external_api_call": "create_reversal_request"
}
async def compensate(self, action: ActionRecord) -> CompensationResult:
if action.action_type in self.COMPENSATIONS:
compensation = self.COMPENSATIONS[action.action_type]
return await self._execute_compensation(compensation, action)
else:
return CompensationResult(
status="manual_required",
message=f"No automatic compensation for {action.action_type}"
)
```
### Recommendation
Implement **comprehensive action logging** with rollback capabilities. Define compensation strategies for non-reversible actions. Enable point-in-time recovery for project state.
---
## 7. Security Considerations for Autonomous Agents
### Threat Model
| Threat | Risk | Mitigation |
|--------|------|------------|
| Agent executes malicious code | High | Sandboxed execution, code review gates |
| Agent exfiltrates data | High | Network isolation, output filtering |
| Prompt injection via user input | Medium | Input sanitization, prompt hardening |
| Agent credential abuse | Medium | Least-privilege tokens, short TTL |
| Agent collusion | Low | Independent agent instances, monitoring |
### Security Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Security Layers │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 4: Output Filtering │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Code scan before commit │ │
│ │ - Secrets detection │ │
│ │ - Policy compliance check │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Layer 3: Action Authorization │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Role-based permissions │ │
│ │ - Project scope enforcement │ │
│ │ - Sensitive action approval │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Layer 2: Input Sanitization │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Prompt injection detection │ │
│ │ - Content filtering │ │
│ │ - Schema validation │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Layer 1: Infrastructure Isolation │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Container sandboxing │ │
│ │ - Network segmentation │ │
│ │ - File system restrictions │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### Recommendation
Implement **defense-in-depth** with multiple security layers. Assume agents can be compromised and design for containment.
---
## Summary of Recommendations
| Area | Recommendation | Priority |
|------|----------------|----------|
| Memory | Tiered memory with context compression | High |
| Knowledge | Privacy-aware extraction with human gate | Medium |
| Specialization | Layered capabilities with role-specific top | Medium |
| Collaboration | Real-time dashboard with intervention | High |
| Testing | Multi-layer with prompt evals | High |
| Recovery | Action logging with rollback engine | High |
| Security | Defense-in-depth, assume compromise | High |
---
## Next Steps
1. **Validate with spike research** - Update based on spike findings
2. **Create detailed ADRs** - For memory, recovery, security
3. **Prototype critical paths** - Memory system, rollback engine
4. **Security review** - External audit before production
---
*This document captures architectural thinking to guide implementation. It should be updated as spikes complete and design evolves.*

View File

@@ -0,0 +1,339 @@
# Syndarix Implementation Roadmap
**Version:** 1.0
**Date:** 2025-12-29
**Status:** Draft
---
## Executive Summary
This roadmap outlines the phased implementation approach for Syndarix, prioritizing foundational infrastructure before advanced features. Each phase builds upon the previous, with clear milestones and deliverables.
---
## Phase 0: Foundation (Weeks 1-2)
**Goal:** Establish development infrastructure and basic platform
### 0.1 Repository Setup
- [x] Fork PragmaStack to Syndarix
- [x] Create spike backlog in Gitea
- [x] Complete architecture documentation
- [ ] Rebrand codebase (Issue #13 - in progress)
- [ ] Configure CI/CD pipelines
- [ ] Set up development environment documentation
### 0.2 Core Infrastructure
- [ ] Configure Redis for cache + pub/sub
- [ ] Set up Celery worker infrastructure
- [ ] Configure pgvector extension
- [ ] Create MCP server directory structure
- [ ] Set up Docker Compose for local development
### Deliverables
- Fully branded Syndarix repository
- Working local development environment
- CI/CD pipeline running tests
---
## Phase 1: Core Platform (Weeks 3-6)
**Goal:** Basic project and agent management without LLM integration
### 1.1 Data Model
- [ ] Create Project entity and CRUD
- [ ] Create AgentType entity and CRUD
- [ ] Create AgentInstance entity and CRUD
- [ ] Create Issue entity with external tracker fields
- [ ] Create Sprint entity and CRUD
- [ ] Database migrations with Alembic
### 1.2 API Layer
- [ ] Project management endpoints
- [ ] Agent type configuration endpoints
- [ ] Agent instance management endpoints
- [ ] Issue CRUD endpoints
- [ ] Sprint management endpoints
### 1.3 Real-time Infrastructure
- [ ] Implement EventBus with Redis Pub/Sub
- [ ] Create SSE endpoint for project events
- [ ] Implement event types enum
- [ ] Add keepalive mechanism
- [ ] Client-side SSE handling
### 1.4 Frontend Foundation
- [ ] Project dashboard page
- [ ] Agent configuration UI
- [ ] Issue list and detail views
- [ ] Real-time activity feed component
- [ ] Basic navigation and layout
### Deliverables
- CRUD operations for all core entities
- Real-time event streaming working
- Basic admin UI for configuration
---
## Phase 2: MCP Integration (Weeks 7-10)
**Goal:** Build MCP servers for external integrations
### 2.1 MCP Client Infrastructure
- [ ] Create MCPClientManager class
- [ ] Implement server registry
- [ ] Add connection management with reconnection
- [ ] Create tool call routing
### 2.2 LLM Gateway MCP (Priority 1)
- [ ] Create FastMCP server structure
- [ ] Implement LiteLLM integration
- [ ] Add model group routing
- [ ] Implement failover chain
- [ ] Add cost tracking callbacks
- [ ] Create token usage logging
### 2.3 Knowledge Base MCP (Priority 2)
- [ ] Create pgvector schema for embeddings
- [ ] Implement document ingestion pipeline
- [ ] Create chunking strategies (code, markdown, text)
- [ ] Implement semantic search
- [ ] Add hybrid search (vector + keyword)
- [ ] Per-project collection isolation
### 2.4 Git MCP (Priority 3)
- [ ] Create Git operations wrapper
- [ ] Implement clone, commit, push operations
- [ ] Add branch management
- [ ] Create PR operations
- [ ] Add Gitea API integration
- [ ] Implement GitHub/GitLab adapters
### 2.5 Issues MCP (Priority 4)
- [ ] Create issue sync service
- [ ] Implement Gitea issue operations
- [ ] Add GitHub issue adapter
- [ ] Add GitLab issue adapter
- [ ] Implement bi-directional sync
- [ ] Create conflict resolution logic
### Deliverables
- 4 working MCP servers
- LLM calls routed through gateway
- RAG search functional
- Git operations working
- Issue sync with external trackers
---
## Phase 3: Agent Orchestration (Weeks 11-14)
**Goal:** Enable agents to perform autonomous work
### 3.1 Agent Runner
- [ ] Create AgentRunner class
- [ ] Implement context assembly
- [ ] Add memory management (short-term, long-term)
- [ ] Implement action execution
- [ ] Add tool call handling
- [ ] Create agent error handling
### 3.2 Agent Orchestrator
- [ ] Implement spawn_agent method
- [ ] Create terminate_agent method
- [ ] Implement send_message routing
- [ ] Add broadcast functionality
- [ ] Create agent status tracking
- [ ] Implement agent recovery
### 3.3 Inter-Agent Communication
- [ ] Define message format schema
- [ ] Implement message persistence
- [ ] Create message routing logic
- [ ] Add @mention parsing
- [ ] Implement priority queues
- [ ] Add conversation threading
### 3.4 Background Task Integration
- [ ] Create Celery task wrappers
- [ ] Implement progress reporting
- [ ] Add task chaining for workflows
- [ ] Create agent queue routing
- [ ] Implement task retry logic
### Deliverables
- Agents can be spawned and communicate
- Agents can call MCP tools
- Background tasks for long operations
- Agent activity visible in real-time
---
## Phase 4: Workflow Engine (Weeks 15-18)
**Goal:** Implement structured workflows for software delivery
### 4.1 State Machine Foundation
- [ ] Create workflow state machine base
- [ ] Implement state persistence
- [ ] Add transition validation
- [ ] Create state history logging
- [ ] Implement compensation patterns
### 4.2 Core Workflows
- [ ] Requirements Discovery workflow
- [ ] Architecture Spike workflow
- [ ] Sprint Planning workflow
- [ ] Story Implementation workflow
- [ ] Sprint Demo workflow
### 4.3 Approval Gates
- [ ] Create approval checkpoint system
- [ ] Implement approval UI components
- [ ] Add notification triggers
- [ ] Create timeout handling
- [ ] Implement escalation logic
### 4.4 Autonomy Levels
- [ ] Implement FULL_CONTROL mode
- [ ] Implement MILESTONE mode
- [ ] Implement AUTONOMOUS mode
- [ ] Create autonomy configuration UI
- [ ] Add per-action approval overrides
### Deliverables
- Structured workflows executing
- Approval gates working
- Autonomy levels configurable
- Full sprint cycle possible
---
## Phase 5: Advanced Features (Weeks 19-22)
**Goal:** Polish and production readiness
### 5.1 Cost Management
- [ ] Real-time cost tracking dashboard
- [ ] Budget configuration per project
- [ ] Alert threshold system
- [ ] Cost optimization recommendations
- [ ] Historical cost analytics
### 5.2 Audit & Compliance
- [ ] Comprehensive action logging
- [ ] Audit trail viewer UI
- [ ] Export functionality
- [ ] Retention policy implementation
- [ ] Compliance report generation
### 5.3 Human-Agent Collaboration
- [ ] Live activity dashboard
- [ ] Intervention panel (pause, guide, undo)
- [ ] Agent chat interface
- [ ] Context inspector
- [ ] Decision explainer
### 5.4 Additional MCP Servers
- [ ] File System MCP
- [ ] Code Analysis MCP
- [ ] CI/CD MCP
### Deliverables
- Production-ready system
- Full observability
- Cost controls active
- Audit compliance
---
## Phase 6: Polish & Launch (Weeks 23-24)
**Goal:** Production deployment
### 6.1 Performance Optimization
- [ ] Load testing
- [ ] Query optimization
- [ ] Caching optimization
- [ ] Memory profiling
### 6.2 Security Hardening
- [ ] Security audit
- [ ] Penetration testing
- [ ] Secrets management
- [ ] Rate limiting tuning
### 6.3 Documentation
- [ ] User documentation
- [ ] API documentation
- [ ] Deployment guide
- [ ] Runbook
### 6.4 Deployment
- [ ] Production environment setup
- [ ] Monitoring & alerting
- [ ] Backup & recovery
- [ ] Launch checklist
---
## Risk Register
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|------------|
| LLM API outages | High | Medium | Multi-provider failover |
| Cost overruns | High | Medium | Budget enforcement, local models |
| Agent hallucinations | High | Medium | Approval gates, code review |
| Performance bottlenecks | Medium | Medium | Load testing, caching |
| Integration failures | Medium | Low | Contract testing, mocks |
---
## Success Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Agent task success rate | >90% | Completed tasks / total tasks |
| Response time (P95) | <2s | API latency |
| Cost per project | <$50/sprint | LLM + compute costs |
| Time to first commit | <1 hour | From requirements to PR |
| Client satisfaction | >4/5 | Post-sprint survey |
---
## Dependencies
```
Phase 0 ─────▶ Phase 1 ─────▶ Phase 2 ─────▶ Phase 3 ─────▶ Phase 4 ─────▶ Phase 5 ─────▶ Phase 6
Foundation Core Platform MCP Integration Agent Orch Workflows Advanced Launch
Depends on:
- LLM Gateway
- Knowledge Base
- Real-time events
```
---
## Resource Requirements
### Development Team
- 1 Backend Engineer (Python/FastAPI)
- 1 Frontend Engineer (React/Next.js)
- 0.5 DevOps Engineer
- 0.25 Product Manager
### Infrastructure
- PostgreSQL (managed or self-hosted)
- Redis (managed or self-hosted)
- Celery workers (2-4 instances)
- MCP servers (7 containers)
- API server (2+ instances)
- Frontend (static hosting or SSR)
### External Services
- Anthropic API (primary LLM)
- OpenAI API (fallback)
- Ollama (local models, optional)
- Gitea/GitHub/GitLab (issue tracking)
---
*This roadmap will be refined as spikes complete and requirements evolve.*

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff