# Syndarix Architecture **Version:** 1.0 **Date:** 2025-12-29 **Status:** Approved --- ## Executive Summary Syndarix is an autonomous AI-powered software consulting platform that orchestrates specialized AI agents to deliver complete software solutions. This document describes the chosen architecture, key decisions, and component interactions. ### Core Principles 1. **Self-Hostable First:** All components are fully self-hostable with permissive licenses (MIT/BSD) 2. **Production-Ready:** Use battle-tested technologies, not experimental frameworks 3. **Hybrid Architecture:** Combine best-in-class tools rather than monolithic frameworks 4. **Auditability:** Every agent action is logged and traceable 5. **Human-in-the-Loop:** Configurable autonomy with approval checkpoints --- ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────────────────────┐ │ SYNDARIX PLATFORM │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────────────────────────────────────────────────┐ │ │ │ FRONTEND (Next.js 16) │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ │ │ Dashboard │ │ Project │ │ Agent │ │ Approval │ │ │ │ │ │ Pages │ │ Views │ │ Monitor │ │ Queue │ │ │ │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ REST + SSE + WebSocket │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────────────────┐ │ │ │ BACKEND (FastAPI) │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ ORCHESTRATION LAYER │ │ │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │ │ │ │ │ │ │ Agent │ │ Workflow │ │ Approval │ │ LangGraph │ │ │ │ │ │ │ │ Orchestrator│ │ Engine │ │ Service │ │ Runtime │ │ │ │ │ │ │ │(Type-Inst.) │ │(transitions)│ │ │ │ │ │ │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ INTEGRATION LAYER │ │ │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ │ │ LLM Gateway │ │ MCP Client │ │ Event │ │ │ │ │ │ │ │ (LiteLLM) │ │ Manager │ │ Bus │ │ │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────────────────────────┼───────────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ │ │ PostgreSQL │ │ Redis │ │ Celery Workers│ │ │ │ + pgvector │ │ (Cache/Queue) │ │ (Background) │ │ │ └────────────────┘ └────────────────┘ └────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────────────────┐ │ │ │ MCP SERVERS │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ LLM │ │Knowledge │ │ Git │ │ Issues │ │ File │ │ │ │ │ │ Gateway │ │ Base │ │ MCP │ │ MCP │ │ System │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────┘ ``` --- ## Key Architecture Decisions ### ADR Summary Matrix | ADR | Decision | Key Technology | |-----|----------|----------------| | ADR-001 | MCP Integration | FastMCP 2.0, Unified Singletons | | ADR-002 | Real-time Communication | SSE primary, WebSocket for chat | | ADR-003 | Background Tasks | Celery + Redis | | ADR-004 | LLM Provider | LiteLLM with failover | | ADR-005 | Tech Stack | PragmaStack + extensions | | ADR-006 | Agent Orchestration | Type-Instance pattern | | ADR-007 | Framework Selection | Hybrid (LangGraph + transitions + Celery) | | ADR-008 | Knowledge Base | pgvector for RAG | | ADR-009 | Agent Communication | Structured messages + Redis Streams | | ADR-010 | Workflows | transitions + PostgreSQL + Celery | | ADR-011 | Issue Sync | Webhook-first + polling fallback | | ADR-012 | Cost Tracking | LiteLLM callbacks + Redis budgets | | ADR-013 | Audit Logging | Structlog + hash chaining | | ADR-014 | Client Approval | Checkpoint-based + notifications | --- ## Component Deep Dives ### 1. Agent Orchestration **Pattern:** Type-Instance - **Agent Types:** Templates defining model, expertise, personality, capabilities - **Agent Instances:** Runtime instances spawned from types, assigned to projects - **Orchestrator:** Manages lifecycle, routing, and resource tracking ``` Agent Type (Template) Agent Instance (Runtime) ┌─────────────────────┐ ┌─────────────────────┐ │ name: "Engineer" │───spawn───▶│ id: "eng-001" │ │ model: "sonnet" │ │ name: "Dave" │ │ expertise: [py, js] │ │ project: "proj-123" │ │ capabilities: [...] │ │ context: {...} │ └─────────────────────┘ │ status: ACTIVE │ └─────────────────────┘ ``` ### 2. LLM Gateway (LiteLLM) **Failover Chain:** ``` Claude Opus 4.5 (Primary) │ ▼ (on failure/rate limit) GPT 5.1 Codex max (Code specialist) │ ▼ (on failure/rate limit) Gemini 3 Pro (Multimodal) │ ▼ (on failure) Qwen3-235B / DeepSeek V3.2 (Self-hosted) ``` **Model Groups:** | Group | Use Case | Primary Model | Fallback | |-------|----------|---------------|----------| | high-reasoning | Architecture, complex analysis | Claude Opus 4.5 | GPT 5.1 Codex max | | code-generation | Code writing, refactoring | GPT 5.1 Codex max | Claude Opus 4.5 | | fast-response | Quick tasks, status updates | Gemini 3 Flash | Qwen3-235B | | cost-optimized | High-volume, non-critical | Qwen3-235B | DeepSeek V3.2 | | self-hosted | Privacy-sensitive, air-gapped | DeepSeek V3.2 | Qwen3-235B | ### 3. Knowledge Base (RAG) **Stack:** pgvector + LiteLLM embeddings **Chunking Strategy:** | Content | Strategy | Model | |---------|----------|-------| | Code | AST-based (function/class) | voyage-code-3 | | Docs | Heading-based | text-embedding-3-small | | Conversations | Turn-based | text-embedding-3-small | **Search:** Hybrid (70% vector + 30% keyword) ### 4. Workflow Engine **Stack:** transitions library + PostgreSQL + Celery **Core Workflows:** - **Sprint Workflow:** planning → active → review → done - **Story Workflow:** analysis → design → implementation → review → testing → done - **PR Workflow:** submitted → reviewing → changes_requested → approved → merged **Durability:** Event sourcing with state persistence to PostgreSQL ### 5. Real-time Communication **SSE (90% of use cases):** - Agent activity streams - Project progress updates - Approval notifications - Issue change notifications **WebSocket (10% - bidirectional):** - Interactive chat with agents - Real-time debugging **Event Bus:** Redis Pub/Sub for cross-instance distribution ### 6. Issue Synchronization **Architecture:** Webhook-first + polling fallback **Supported Providers:** - Gitea (primary) - GitHub - GitLab **Conflict Resolution:** Last-Writer-Wins with version vectors ### 7. Cost Tracking **Real-time Pipeline:** ``` LLM Request → LiteLLM Callback → Redis INCR → Budget Check │ Async Queue → PostgreSQL → SSE Dashboard Update ``` **Budget Enforcement:** - Soft limits: Alerts + model downgrade - Hard limits: Block requests ### 8. Audit Logging **Immutability:** SHA-256 hash chaining **Storage Tiers:** | Tier | Storage | Retention | |------|---------|-----------| | Hot | PostgreSQL | 0-90 days | | Cold | S3/MinIO | 90+ days | ### 9. Client Approval Flow **Autonomy Levels:** | Level | Description | |-------|-------------| | FULL_CONTROL | Approve every action | | MILESTONE | Approve sprint boundaries | | AUTONOMOUS | Only critical decisions | **Notifications:** SSE + Email + Mobile Push --- ## Technology Stack ### Core Technologies | Layer | Technology | Version | License | |-------|------------|---------|---------| | Backend | FastAPI | 0.115+ | MIT | | Frontend | Next.js | 16 | MIT | | Database | PostgreSQL + pgvector | 15+ | PostgreSQL | | Cache/Queue | Redis | 7.0+ | BSD-3 | | Task Queue | Celery | 5.3+ | BSD-3 | | LLM Gateway | LiteLLM | Latest | MIT | | MCP Framework | FastMCP | 2.0+ | MIT | ### Self-Hostability Guarantee **All components are fully self-hostable with no mandatory subscriptions:** | Component | License | Self-Hosted | Managed Alternative (Optional) | |-----------|---------|-------------|--------------------------------| | PostgreSQL | PostgreSQL | Yes | RDS, Neon, Supabase | | Redis | BSD-3 | Yes | Redis Cloud | | LiteLLM | MIT | Yes | LiteLLM Enterprise | | Celery | BSD-3 | Yes | - | | FastMCP | MIT | Yes | - | | LangGraph | MIT | Yes | LangSmith (observability only) | | transitions | MIT | Yes | - | | DeepSeek V3.2 | MIT | Yes | API available | | Qwen3-235B | Apache 2.0 | Yes | Alibaba Cloud | --- ## Data Flow Diagrams ### Agent Task Execution ``` 1. Client creates story in Syndarix │ ▼ 2. Story workflow transitions to "implementation" │ ▼ 3. Agent Orchestrator spawns Engineer instance │ ▼ 4. Engineer queries Knowledge Base (RAG) │ ▼ 5. Engineer calls LLM Gateway for code generation │ ▼ 6. Engineer calls Git MCP to create branch & commit │ ▼ 7. Engineer creates PR via Git MCP │ ▼ 8. Workflow transitions to "review" │ ▼ 9. If autonomy_level != AUTONOMOUS: └── Approval request created └── Client notified via SSE + email │ ▼ 10. Client approves → PR merged → Workflow to "testing" ``` ### Real-time Event Flow ``` Agent Action │ ▼ Event Bus (Redis Pub/Sub) │ ├──▶ SSE Endpoint ──▶ Frontend Dashboard │ ├──▶ Audit Logger ──▶ PostgreSQL │ └──▶ Other Backend Instances (horizontal scaling) ``` --- ## Security Architecture ### Authentication Flow - **Users:** JWT dual-token (access + refresh) via PragmaStack - **Agents:** Service tokens for MCP communication - **MCP Servers:** Internal network only, validated service tokens ### Multi-Tenancy - **Project Isolation:** All queries scoped by project_id - **Row-Level Security:** PostgreSQL RLS for knowledge base - **Agent Scoping:** Every MCP tool requires project_id + agent_id ### Audit Trail - **Hash Chaining:** Tamper-evident event log - **Complete Coverage:** All agent actions, LLM calls, MCP tool invocations --- ## Scalability Considerations ### Horizontal Scaling | Component | Scaling Strategy | |-----------|-----------------| | FastAPI | Multiple instances behind load balancer | | Celery Workers | Add workers per queue as needed | | PostgreSQL | Read replicas, connection pooling | | Redis | Cluster mode for high availability | ### Expected Scale | Metric | Target | |--------|--------| | Concurrent Projects | 50+ | | Concurrent Agent Instances | 200+ | | Background Jobs/minute | 500+ | | SSE Connections | 200+ | --- ## Deployment Architecture ### Local Development ``` docker-compose up ├── PostgreSQL (+ pgvector) ├── Redis ├── FastAPI Backend ├── Next.js Frontend ├── Celery Workers (agent, git, sync queues) ├── Celery Beat (scheduler) ├── Flower (monitoring) └── MCP Servers (7 containers) ``` ### Production ``` ┌─────────────────────────────────────────────────────────────────┐ │ Load Balancer │ └─────────────────────────────┬───────────────────────────────────┘ │ ┌────────────────────┼────────────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ API Instance 1 │ │ API Instance 2 │ │ API Instance N │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ └────────────────────┼────────────────────┘ │ ┌────────────────────┼────────────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ PostgreSQL │ │ Redis Cluster │ │ Celery Workers │ │ (Primary + │ │ │ │ (Auto-scaled) │ │ Replicas) │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` --- ## Related Documents - [Implementation Roadmap](./IMPLEMENTATION_ROADMAP.md) - [Architecture Deep Analysis](./ARCHITECTURE_DEEP_ANALYSIS.md) - [ADRs](../adrs/) - All architecture decision records - [Spikes](../spikes/) - Research documents --- ## Appendix: Full ADR List 1. [ADR-001: MCP Integration Architecture](../adrs/ADR-001-mcp-integration-architecture.md) 2. [ADR-002: Real-time Communication](../adrs/ADR-002-realtime-communication.md) 3. [ADR-003: Background Task Architecture](../adrs/ADR-003-background-task-architecture.md) 4. [ADR-004: LLM Provider Abstraction](../adrs/ADR-004-llm-provider-abstraction.md) 5. [ADR-005: Technology Stack Selection](../adrs/ADR-005-tech-stack-selection.md) 6. [ADR-006: Agent Orchestration](../adrs/ADR-006-agent-orchestration.md) 7. [ADR-007: Agentic Framework Selection](../adrs/ADR-007-agentic-framework-selection.md) 8. [ADR-008: Knowledge Base and RAG](../adrs/ADR-008-knowledge-base-rag.md) 9. [ADR-009: Agent Communication Protocol](../adrs/ADR-009-agent-communication-protocol.md) 10. [ADR-010: Workflow State Machine](../adrs/ADR-010-workflow-state-machine.md) 11. [ADR-011: Issue Synchronization](../adrs/ADR-011-issue-synchronization.md) 12. [ADR-012: Cost Tracking](../adrs/ADR-012-cost-tracking.md) 13. [ADR-013: Audit Logging](../adrs/ADR-013-audit-logging.md) 14. [ADR-014: Client Approval Flow](../adrs/ADR-014-client-approval-flow.md) --- *This document serves as the authoritative architecture reference for Syndarix.*