docs: add remaining ADRs and comprehensive architecture documentation

Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:54:43 +01:00
parent bd702734c2
commit 406b25cda0
9 changed files with 1899 additions and 5 deletions
--- a/docs/architecture/ARCHITECTURE.md
+++ b/docs/architecture/ARCHITECTURE.md
@@ -0,0 +1,425 @@
+# Syndarix Architecture
+
+**Version:** 1.0
+**Date:** 2025-12-29
+**Status:** Approved
+
+---
+
+## Executive Summary
+
+Syndarix is an autonomous AI-powered software consulting platform that orchestrates specialized AI agents to deliver complete software solutions. This document describes the chosen architecture, key decisions, and component interactions.
+
+### Core Principles
+
+1. **Self-Hostable First:** All components are fully self-hostable with permissive licenses (MIT/BSD)
+2. **Production-Ready:** Use battle-tested technologies, not experimental frameworks
+3. **Hybrid Architecture:** Combine best-in-class tools rather than monolithic frameworks
+4. **Auditability:** Every agent action is logged and traceable
+5. **Human-in-the-Loop:** Configurable autonomy with approval checkpoints
+
+---
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                              SYNDARIX PLATFORM                                   │
+├─────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                  │
+│  ┌──────────────────────────────────────────────────────────────────────────┐   │
+│  │                         FRONTEND (Next.js 16)                             │   │
+│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐         │   │
+│  │  │ Dashboard  │  │  Project   │  │  Agent     │  │  Approval  │         │   │
+│  │  │   Pages    │  │   Views    │  │  Monitor   │  │   Queue    │         │   │
+│  │  └────────────┘  └────────────┘  └────────────┘  └────────────┘         │   │
+│  └──────────────────────────────────────────────────────────────────────────┘   │
+│                                       │                                          │
+│                          REST + SSE + WebSocket                                  │
+│                                       ▼                                          │
+│  ┌──────────────────────────────────────────────────────────────────────────┐   │
+│  │                         BACKEND (FastAPI)                                 │   │
+│  │                                                                           │   │
+│  │  ┌─────────────────────────────────────────────────────────────────────┐ │   │
+│  │  │                    ORCHESTRATION LAYER                               │ │   │
+│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                  │ │   │
+│  │  │  │   Agent     │  │  Workflow   │  │  Approval   │                  │ │   │
+│  │  │  │ Orchestrator│  │   Engine    │  │   Service   │                  │ │   │
+│  │  │  └─────────────┘  └─────────────┘  └─────────────┘                  │ │   │
+│  │  └─────────────────────────────────────────────────────────────────────┘ │   │
+│  │                                                                           │   │
+│  │  ┌─────────────────────────────────────────────────────────────────────┐ │   │
+│  │  │                    INTEGRATION LAYER                                 │ │   │
+│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                  │ │   │
+│  │  │  │ LLM Gateway │  │  MCP Client │  │   Event     │                  │ │   │
+│  │  │  │  (LiteLLM)  │  │   Manager   │  │    Bus      │                  │ │   │
+│  │  │  └─────────────┘  └─────────────┘  └─────────────┘                  │ │   │
+│  │  └─────────────────────────────────────────────────────────────────────┘ │   │
+│  └──────────────────────────────────────────────────────────────────────────┘   │
+│                                       │                                          │
+│           ┌───────────────────────────┼───────────────────────────┐             │
+│           ▼                           ▼                           ▼             │
+│  ┌────────────────┐          ┌────────────────┐          ┌────────────────┐    │
+│  │   PostgreSQL   │          │     Redis      │          │  Celery Workers│    │
+│  │   + pgvector   │          │  (Cache/Queue) │          │  (Background)  │    │
+│  └────────────────┘          └────────────────┘          └────────────────┘    │
+│                                       │                                          │
+│                                       ▼                                          │
+│  ┌──────────────────────────────────────────────────────────────────────────┐   │
+│  │                         MCP SERVERS                                       │   │
+│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │   │
+│  │  │   LLM    │  │Knowledge │  │   Git    │  │  Issues  │  │   File   │   │   │
+│  │  │ Gateway  │  │   Base   │  │   MCP    │  │   MCP    │  │  System  │   │   │
+│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘   │   │
+│  └──────────────────────────────────────────────────────────────────────────┘   │
+│                                                                                  │
+└─────────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Key Architecture Decisions
+
+### ADR Summary Matrix
+
+| ADR | Decision | Key Technology |
+|-----|----------|----------------|
+| ADR-001 | MCP Integration | FastMCP 2.0, Unified Singletons |
+| ADR-002 | Real-time Communication | SSE primary, WebSocket for chat |
+| ADR-003 | Background Tasks | Celery + Redis |
+| ADR-004 | LLM Provider | LiteLLM with failover |
+| ADR-005 | Tech Stack | PragmaStack + extensions |
+| ADR-006 | Agent Orchestration | Type-Instance pattern |
+| ADR-007 | Framework Selection | Hybrid (LangGraph + custom) |
+| ADR-008 | Knowledge Base | pgvector for RAG |
+| ADR-009 | Agent Communication | Structured messages + Redis Streams |
+| ADR-010 | Workflows | transitions + PostgreSQL + Celery |
+| ADR-011 | Issue Sync | Webhook-first + polling fallback |
+| ADR-012 | Cost Tracking | LiteLLM callbacks + Redis budgets |
+| ADR-013 | Audit Logging | Structlog + hash chaining |
+| ADR-014 | Client Approval | Checkpoint-based + notifications |
+
+---
+
+## Component Deep Dives
+
+### 1. Agent Orchestration
+
+**Pattern:** Type-Instance
+
+- **Agent Types:** Templates defining model, expertise, personality, capabilities
+- **Agent Instances:** Runtime instances spawned from types, assigned to projects
+- **Orchestrator:** Manages lifecycle, routing, and resource tracking
+
+```
+Agent Type (Template)              Agent Instance (Runtime)
+┌─────────────────────┐            ┌─────────────────────┐
+│ name: "Engineer"    │───spawn───▶│ id: "eng-001"       │
+│ model: "sonnet"     │            │ name: "Dave"        │
+│ expertise: [py, js] │            │ project: "proj-123" │
+│ capabilities: [...]  │            │ context: {...}      │
+└─────────────────────┘            │ status: ACTIVE      │
+                                   └─────────────────────┘
+```
+
+### 2. LLM Gateway (LiteLLM)
+
+**Failover Chain:**
+```
+Claude 3.5 Sonnet (Primary)
+         │
+         ▼ (on failure)
+    GPT-4 Turbo (Fallback)
+         │
+         ▼ (on failure)
+    Ollama/Llama 3 (Local)
+```
+
+**Model Groups:**
+| Group | Use Case | Primary Model |
+|-------|----------|---------------|
+| high-reasoning | Architecture, complex analysis | Claude 3.5 Sonnet |
+| fast-response | Quick tasks, status updates | Claude 3 Haiku |
+| cost-optimized | High-volume, non-critical | Local Llama 3 |
+
+### 3. Knowledge Base (RAG)
+
+**Stack:** pgvector + LiteLLM embeddings
+
+**Chunking Strategy:**
+| Content | Strategy | Model |
+|---------|----------|-------|
+| Code | AST-based (function/class) | voyage-code-3 |
+| Docs | Heading-based | text-embedding-3-small |
+| Conversations | Turn-based | text-embedding-3-small |
+
+**Search:** Hybrid (70% vector + 30% keyword)
+
+### 4. Workflow Engine
+
+**Stack:** transitions library + PostgreSQL + Celery
+
+**Core Workflows:**
+- **Sprint Workflow:** planning → active → review → done
+- **Story Workflow:** analysis → design → implementation → review → testing → done
+- **PR Workflow:** submitted → reviewing → changes_requested → approved → merged
+
+**Durability:** Event sourcing with state persistence to PostgreSQL
+
+### 5. Real-time Communication
+
+**SSE (90% of use cases):**
+- Agent activity streams
+- Project progress updates
+- Approval notifications
+- Issue change notifications
+
+**WebSocket (10% - bidirectional):**
+- Interactive chat with agents
+- Real-time debugging
+
+**Event Bus:** Redis Pub/Sub for cross-instance distribution
+
+### 6. Issue Synchronization
+
+**Architecture:** Webhook-first + polling fallback
+
+**Supported Providers:**
+- Gitea (primary)
+- GitHub
+- GitLab
+
+**Conflict Resolution:** Last-Writer-Wins with version vectors
+
+### 7. Cost Tracking
+
+**Real-time Pipeline:**
+```
+LLM Request → LiteLLM Callback → Redis INCR → Budget Check
+                    │
+              Async Queue → PostgreSQL → SSE Dashboard Update
+```
+
+**Budget Enforcement:**
+- Soft limits: Alerts + model downgrade
+- Hard limits: Block requests
+
+### 8. Audit Logging
+
+**Immutability:** SHA-256 hash chaining
+
+**Storage Tiers:**
+| Tier | Storage | Retention |
+|------|---------|-----------|
+| Hot | PostgreSQL | 0-90 days |
+| Cold | S3/MinIO | 90+ days |
+
+### 9. Client Approval Flow
+
+**Autonomy Levels:**
+| Level | Description |
+|-------|-------------|
+| FULL_CONTROL | Approve every action |
+| MILESTONE | Approve sprint boundaries |
+| AUTONOMOUS | Only critical decisions |
+
+**Notifications:** SSE + Email + Mobile Push
+
+---
+
+## Technology Stack
+
+### Core Technologies
+
+| Layer | Technology | Version | License |
+|-------|------------|---------|---------|
+| Backend | FastAPI | 0.115+ | MIT |
+| Frontend | Next.js | 16 | MIT |
+| Database | PostgreSQL + pgvector | 15+ | PostgreSQL |
+| Cache/Queue | Redis | 7.0+ | BSD-3 |
+| Task Queue | Celery | 5.3+ | BSD-3 |
+| LLM Gateway | LiteLLM | Latest | MIT |
+| MCP Framework | FastMCP | 2.0+ | MIT |
+
+### Self-Hostability Guarantee
+
+**All components are fully self-hostable with no mandatory subscriptions:**
+
+| Component | Self-Hosted | Managed Alternative (Optional) |
+|-----------|-------------|--------------------------------|
+| PostgreSQL | Yes | RDS, Neon, Supabase |
+| Redis | Yes | Redis Cloud |
+| LiteLLM | Yes | LiteLLM Enterprise |
+| Celery | Yes | - |
+| FastMCP | Yes | - |
+
+---
+
+## Data Flow Diagrams
+
+### Agent Task Execution
+
+```
+1. Client creates story in Syndarix
+         │
+         ▼
+2. Story workflow transitions to "implementation"
+         │
+         ▼
+3. Agent Orchestrator spawns Engineer instance
+         │
+         ▼
+4. Engineer queries Knowledge Base (RAG)
+         │
+         ▼
+5. Engineer calls LLM Gateway for code generation
+         │
+         ▼
+6. Engineer calls Git MCP to create branch & commit
+         │
+         ▼
+7. Engineer creates PR via Git MCP
+         │
+         ▼
+8. Workflow transitions to "review"
+         │
+         ▼
+9. If autonomy_level != AUTONOMOUS:
+   └── Approval request created
+   └── Client notified via SSE + email
+         │
+         ▼
+10. Client approves → PR merged → Workflow to "testing"
+```
+
+### Real-time Event Flow
+
+```
+Agent Action
+     │
+     ▼
+Event Bus (Redis Pub/Sub)
+     │
+     ├──▶ SSE Endpoint ──▶ Frontend Dashboard
+     │
+     ├──▶ Audit Logger ──▶ PostgreSQL
+     │
+     └──▶ Other Backend Instances (horizontal scaling)
+```
+
+---
+
+## Security Architecture
+
+### Authentication Flow
+
+- **Users:** JWT dual-token (access + refresh) via PragmaStack
+- **Agents:** Service tokens for MCP communication
+- **MCP Servers:** Internal network only, validated service tokens
+
+### Multi-Tenancy
+
+- **Project Isolation:** All queries scoped by project_id
+- **Row-Level Security:** PostgreSQL RLS for knowledge base
+- **Agent Scoping:** Every MCP tool requires project_id + agent_id
+
+### Audit Trail
+
+- **Hash Chaining:** Tamper-evident event log
+- **Complete Coverage:** All agent actions, LLM calls, MCP tool invocations
+
+---
+
+## Scalability Considerations
+
+### Horizontal Scaling
+
+| Component | Scaling Strategy |
+|-----------|-----------------|
+| FastAPI | Multiple instances behind load balancer |
+| Celery Workers | Add workers per queue as needed |
+| PostgreSQL | Read replicas, connection pooling |
+| Redis | Cluster mode for high availability |
+
+### Expected Scale
+
+| Metric | Target |
+|--------|--------|
+| Concurrent Projects | 50+ |
+| Concurrent Agent Instances | 200+ |
+| Background Jobs/minute | 500+ |
+| SSE Connections | 200+ |
+
+---
+
+## Deployment Architecture
+
+### Local Development
+
+```
+docker-compose up
+├── PostgreSQL (+ pgvector)
+├── Redis
+├── FastAPI Backend
+├── Next.js Frontend
+├── Celery Workers (agent, git, sync queues)
+├── Celery Beat (scheduler)
+├── Flower (monitoring)
+└── MCP Servers (7 containers)
+```
+
+### Production
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Load Balancer                             │
+└─────────────────────────────┬───────────────────────────────────┘
+                              │
+         ┌────────────────────┼────────────────────┐
+         ▼                    ▼                    ▼
+┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
+│  API Instance 1 │  │  API Instance 2 │  │  API Instance N │
+└─────────────────┘  └─────────────────┘  └─────────────────┘
+         │                    │                    │
+         └────────────────────┼────────────────────┘
+                              │
+         ┌────────────────────┼────────────────────┐
+         ▼                    ▼                    ▼
+┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
+│   PostgreSQL    │  │  Redis Cluster  │  │  Celery Workers │
+│   (Primary +    │  │                 │  │  (Auto-scaled)  │
+│    Replicas)    │  │                 │  │                 │
+└─────────────────┘  └─────────────────┘  └─────────────────┘
+```
+
+---
+
+## Related Documents
+
+- [Implementation Roadmap](./IMPLEMENTATION_ROADMAP.md)
+- [Architecture Deep Analysis](./ARCHITECTURE_DEEP_ANALYSIS.md)
+- [ADRs](../adrs/) - All architecture decision records
+- [Spikes](../spikes/) - Research documents
+
+---
+
+## Appendix: Full ADR List
+
+1. [ADR-001: MCP Integration Architecture](../adrs/ADR-001-mcp-integration-architecture.md)
+2. [ADR-002: Real-time Communication](../adrs/ADR-002-realtime-communication.md)
+3. [ADR-003: Background Task Architecture](../adrs/ADR-003-background-task-architecture.md)
+4. [ADR-004: LLM Provider Abstraction](../adrs/ADR-004-llm-provider-abstraction.md)
+5. [ADR-005: Technology Stack Selection](../adrs/ADR-005-tech-stack-selection.md)
+6. [ADR-006: Agent Orchestration](../adrs/ADR-006-agent-orchestration.md)
+7. [ADR-007: Agentic Framework Selection](../adrs/ADR-007-agentic-framework-selection.md)
+8. [ADR-008: Knowledge Base and RAG](../adrs/ADR-008-knowledge-base-rag.md)
+9. [ADR-009: Agent Communication Protocol](../adrs/ADR-009-agent-communication-protocol.md)
+10. [ADR-010: Workflow State Machine](../adrs/ADR-010-workflow-state-machine.md)
+11. [ADR-011: Issue Synchronization](../adrs/ADR-011-issue-synchronization.md)
+12. [ADR-012: Cost Tracking](../adrs/ADR-012-cost-tracking.md)
+13. [ADR-013: Audit Logging](../adrs/ADR-013-audit-logging.md)
+14. [ADR-014: Client Approval Flow](../adrs/ADR-014-client-approval-flow.md)
+
+---
+
+*This document serves as the authoritative architecture reference for Syndarix.*
--- a/docs/architecture/IMPLEMENTATION_ROADMAP.md
+++ b/docs/architecture/IMPLEMENTATION_ROADMAP.md
@@ -17,9 +17,11 @@ This roadmap outlines the phased implementation approach for Syndarix, prioritiz

 ### 0.1 Repository Setup
 - [x] Fork PragmaStack to Syndarix
- [x] Create spike backlog in Gitea
+- [x] Create spike backlog in Gitea (12 issues)
 - [x] Complete architecture documentation
- [ ] Rebrand codebase (Issue #13 - in progress)
+- [x] Complete all spike research (SPIKE-001 through SPIKE-012)
+- [x] Create all ADRs (ADR-001 through ADR-014)
+- [x] Rebrand codebase (all URLs, names, configs updated)
 - [ ] Configure CI/CD pipelines
 - [ ] Set up development environment documentation

@@ -31,9 +33,12 @@ This roadmap outlines the phased implementation approach for Syndarix, prioritiz
 - [ ] Set up Docker Compose for local development

 ### Deliverables
- Fully branded Syndarix repository
- Working local development environment
- CI/CD pipeline running tests
+- [x] Fully branded Syndarix repository
+- [x] Complete architecture documentation (ARCHITECTURE.md)
+- [x] All spike research completed (12 spikes)
+- [x] All ADRs documented (14 ADRs)
+- [ ] Working local development environment (Docker Compose)
+- [ ] CI/CD pipeline running tests

 ---