forked from cardosofelipe/fast-next-template
docs: add remaining ADRs and comprehensive architecture documentation
Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
425
docs/architecture/ARCHITECTURE.md
Normal file
425
docs/architecture/ARCHITECTURE.md
Normal file
@@ -0,0 +1,425 @@
|
||||
# Syndarix Architecture
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-12-29
|
||||
**Status:** Approved
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Syndarix is an autonomous AI-powered software consulting platform that orchestrates specialized AI agents to deliver complete software solutions. This document describes the chosen architecture, key decisions, and component interactions.
|
||||
|
||||
### Core Principles
|
||||
|
||||
1. **Self-Hostable First:** All components are fully self-hostable with permissive licenses (MIT/BSD)
|
||||
2. **Production-Ready:** Use battle-tested technologies, not experimental frameworks
|
||||
3. **Hybrid Architecture:** Combine best-in-class tools rather than monolithic frameworks
|
||||
4. **Auditability:** Every agent action is logged and traceable
|
||||
5. **Human-in-the-Loop:** Configurable autonomy with approval checkpoints
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ SYNDARIX PLATFORM │
|
||||
├─────────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ FRONTEND (Next.js 16) │ │
|
||||
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
|
||||
│ │ │ Dashboard │ │ Project │ │ Agent │ │ Approval │ │ │
|
||||
│ │ │ Pages │ │ Views │ │ Monitor │ │ Queue │ │ │
|
||||
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
|
||||
│ └──────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ REST + SSE + WebSocket │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ BACKEND (FastAPI) │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ ORCHESTRATION LAYER │ │ │
|
||||
│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │
|
||||
│ │ │ │ Agent │ │ Workflow │ │ Approval │ │ │ │
|
||||
│ │ │ │ Orchestrator│ │ Engine │ │ Service │ │ │ │
|
||||
│ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ INTEGRATION LAYER │ │ │
|
||||
│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │
|
||||
│ │ │ │ LLM Gateway │ │ MCP Client │ │ Event │ │ │ │
|
||||
│ │ │ │ (LiteLLM) │ │ Manager │ │ Bus │ │ │ │
|
||||
│ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
|
||||
│ └──────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────────────────┼───────────────────────────┐ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ PostgreSQL │ │ Redis │ │ Celery Workers│ │
|
||||
│ │ + pgvector │ │ (Cache/Queue) │ │ (Background) │ │
|
||||
│ └────────────────┘ └────────────────┘ └────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ MCP SERVERS │ │
|
||||
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
|
||||
│ │ │ LLM │ │Knowledge │ │ Git │ │ Issues │ │ File │ │ │
|
||||
│ │ │ Gateway │ │ Base │ │ MCP │ │ MCP │ │ System │ │ │
|
||||
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
|
||||
│ └──────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Architecture Decisions
|
||||
|
||||
### ADR Summary Matrix
|
||||
|
||||
| ADR | Decision | Key Technology |
|
||||
|-----|----------|----------------|
|
||||
| ADR-001 | MCP Integration | FastMCP 2.0, Unified Singletons |
|
||||
| ADR-002 | Real-time Communication | SSE primary, WebSocket for chat |
|
||||
| ADR-003 | Background Tasks | Celery + Redis |
|
||||
| ADR-004 | LLM Provider | LiteLLM with failover |
|
||||
| ADR-005 | Tech Stack | PragmaStack + extensions |
|
||||
| ADR-006 | Agent Orchestration | Type-Instance pattern |
|
||||
| ADR-007 | Framework Selection | Hybrid (LangGraph + custom) |
|
||||
| ADR-008 | Knowledge Base | pgvector for RAG |
|
||||
| ADR-009 | Agent Communication | Structured messages + Redis Streams |
|
||||
| ADR-010 | Workflows | transitions + PostgreSQL + Celery |
|
||||
| ADR-011 | Issue Sync | Webhook-first + polling fallback |
|
||||
| ADR-012 | Cost Tracking | LiteLLM callbacks + Redis budgets |
|
||||
| ADR-013 | Audit Logging | Structlog + hash chaining |
|
||||
| ADR-014 | Client Approval | Checkpoint-based + notifications |
|
||||
|
||||
---
|
||||
|
||||
## Component Deep Dives
|
||||
|
||||
### 1. Agent Orchestration
|
||||
|
||||
**Pattern:** Type-Instance
|
||||
|
||||
- **Agent Types:** Templates defining model, expertise, personality, capabilities
|
||||
- **Agent Instances:** Runtime instances spawned from types, assigned to projects
|
||||
- **Orchestrator:** Manages lifecycle, routing, and resource tracking
|
||||
|
||||
```
|
||||
Agent Type (Template) Agent Instance (Runtime)
|
||||
┌─────────────────────┐ ┌─────────────────────┐
|
||||
│ name: "Engineer" │───spawn───▶│ id: "eng-001" │
|
||||
│ model: "sonnet" │ │ name: "Dave" │
|
||||
│ expertise: [py, js] │ │ project: "proj-123" │
|
||||
│ capabilities: [...] │ │ context: {...} │
|
||||
└─────────────────────┘ │ status: ACTIVE │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
### 2. LLM Gateway (LiteLLM)
|
||||
|
||||
**Failover Chain:**
|
||||
```
|
||||
Claude 3.5 Sonnet (Primary)
|
||||
│
|
||||
▼ (on failure)
|
||||
GPT-4 Turbo (Fallback)
|
||||
│
|
||||
▼ (on failure)
|
||||
Ollama/Llama 3 (Local)
|
||||
```
|
||||
|
||||
**Model Groups:**
|
||||
| Group | Use Case | Primary Model |
|
||||
|-------|----------|---------------|
|
||||
| high-reasoning | Architecture, complex analysis | Claude 3.5 Sonnet |
|
||||
| fast-response | Quick tasks, status updates | Claude 3 Haiku |
|
||||
| cost-optimized | High-volume, non-critical | Local Llama 3 |
|
||||
|
||||
### 3. Knowledge Base (RAG)
|
||||
|
||||
**Stack:** pgvector + LiteLLM embeddings
|
||||
|
||||
**Chunking Strategy:**
|
||||
| Content | Strategy | Model |
|
||||
|---------|----------|-------|
|
||||
| Code | AST-based (function/class) | voyage-code-3 |
|
||||
| Docs | Heading-based | text-embedding-3-small |
|
||||
| Conversations | Turn-based | text-embedding-3-small |
|
||||
|
||||
**Search:** Hybrid (70% vector + 30% keyword)
|
||||
|
||||
### 4. Workflow Engine
|
||||
|
||||
**Stack:** transitions library + PostgreSQL + Celery
|
||||
|
||||
**Core Workflows:**
|
||||
- **Sprint Workflow:** planning → active → review → done
|
||||
- **Story Workflow:** analysis → design → implementation → review → testing → done
|
||||
- **PR Workflow:** submitted → reviewing → changes_requested → approved → merged
|
||||
|
||||
**Durability:** Event sourcing with state persistence to PostgreSQL
|
||||
|
||||
### 5. Real-time Communication
|
||||
|
||||
**SSE (90% of use cases):**
|
||||
- Agent activity streams
|
||||
- Project progress updates
|
||||
- Approval notifications
|
||||
- Issue change notifications
|
||||
|
||||
**WebSocket (10% - bidirectional):**
|
||||
- Interactive chat with agents
|
||||
- Real-time debugging
|
||||
|
||||
**Event Bus:** Redis Pub/Sub for cross-instance distribution
|
||||
|
||||
### 6. Issue Synchronization
|
||||
|
||||
**Architecture:** Webhook-first + polling fallback
|
||||
|
||||
**Supported Providers:**
|
||||
- Gitea (primary)
|
||||
- GitHub
|
||||
- GitLab
|
||||
|
||||
**Conflict Resolution:** Last-Writer-Wins with version vectors
|
||||
|
||||
### 7. Cost Tracking
|
||||
|
||||
**Real-time Pipeline:**
|
||||
```
|
||||
LLM Request → LiteLLM Callback → Redis INCR → Budget Check
|
||||
│
|
||||
Async Queue → PostgreSQL → SSE Dashboard Update
|
||||
```
|
||||
|
||||
**Budget Enforcement:**
|
||||
- Soft limits: Alerts + model downgrade
|
||||
- Hard limits: Block requests
|
||||
|
||||
### 8. Audit Logging
|
||||
|
||||
**Immutability:** SHA-256 hash chaining
|
||||
|
||||
**Storage Tiers:**
|
||||
| Tier | Storage | Retention |
|
||||
|------|---------|-----------|
|
||||
| Hot | PostgreSQL | 0-90 days |
|
||||
| Cold | S3/MinIO | 90+ days |
|
||||
|
||||
### 9. Client Approval Flow
|
||||
|
||||
**Autonomy Levels:**
|
||||
| Level | Description |
|
||||
|-------|-------------|
|
||||
| FULL_CONTROL | Approve every action |
|
||||
| MILESTONE | Approve sprint boundaries |
|
||||
| AUTONOMOUS | Only critical decisions |
|
||||
|
||||
**Notifications:** SSE + Email + Mobile Push
|
||||
|
||||
---
|
||||
|
||||
## Technology Stack
|
||||
|
||||
### Core Technologies
|
||||
|
||||
| Layer | Technology | Version | License |
|
||||
|-------|------------|---------|---------|
|
||||
| Backend | FastAPI | 0.115+ | MIT |
|
||||
| Frontend | Next.js | 16 | MIT |
|
||||
| Database | PostgreSQL + pgvector | 15+ | PostgreSQL |
|
||||
| Cache/Queue | Redis | 7.0+ | BSD-3 |
|
||||
| Task Queue | Celery | 5.3+ | BSD-3 |
|
||||
| LLM Gateway | LiteLLM | Latest | MIT |
|
||||
| MCP Framework | FastMCP | 2.0+ | MIT |
|
||||
|
||||
### Self-Hostability Guarantee
|
||||
|
||||
**All components are fully self-hostable with no mandatory subscriptions:**
|
||||
|
||||
| Component | Self-Hosted | Managed Alternative (Optional) |
|
||||
|-----------|-------------|--------------------------------|
|
||||
| PostgreSQL | Yes | RDS, Neon, Supabase |
|
||||
| Redis | Yes | Redis Cloud |
|
||||
| LiteLLM | Yes | LiteLLM Enterprise |
|
||||
| Celery | Yes | - |
|
||||
| FastMCP | Yes | - |
|
||||
|
||||
---
|
||||
|
||||
## Data Flow Diagrams
|
||||
|
||||
### Agent Task Execution
|
||||
|
||||
```
|
||||
1. Client creates story in Syndarix
|
||||
│
|
||||
▼
|
||||
2. Story workflow transitions to "implementation"
|
||||
│
|
||||
▼
|
||||
3. Agent Orchestrator spawns Engineer instance
|
||||
│
|
||||
▼
|
||||
4. Engineer queries Knowledge Base (RAG)
|
||||
│
|
||||
▼
|
||||
5. Engineer calls LLM Gateway for code generation
|
||||
│
|
||||
▼
|
||||
6. Engineer calls Git MCP to create branch & commit
|
||||
│
|
||||
▼
|
||||
7. Engineer creates PR via Git MCP
|
||||
│
|
||||
▼
|
||||
8. Workflow transitions to "review"
|
||||
│
|
||||
▼
|
||||
9. If autonomy_level != AUTONOMOUS:
|
||||
└── Approval request created
|
||||
└── Client notified via SSE + email
|
||||
│
|
||||
▼
|
||||
10. Client approves → PR merged → Workflow to "testing"
|
||||
```
|
||||
|
||||
### Real-time Event Flow
|
||||
|
||||
```
|
||||
Agent Action
|
||||
│
|
||||
▼
|
||||
Event Bus (Redis Pub/Sub)
|
||||
│
|
||||
├──▶ SSE Endpoint ──▶ Frontend Dashboard
|
||||
│
|
||||
├──▶ Audit Logger ──▶ PostgreSQL
|
||||
│
|
||||
└──▶ Other Backend Instances (horizontal scaling)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Authentication Flow
|
||||
|
||||
- **Users:** JWT dual-token (access + refresh) via PragmaStack
|
||||
- **Agents:** Service tokens for MCP communication
|
||||
- **MCP Servers:** Internal network only, validated service tokens
|
||||
|
||||
### Multi-Tenancy
|
||||
|
||||
- **Project Isolation:** All queries scoped by project_id
|
||||
- **Row-Level Security:** PostgreSQL RLS for knowledge base
|
||||
- **Agent Scoping:** Every MCP tool requires project_id + agent_id
|
||||
|
||||
### Audit Trail
|
||||
|
||||
- **Hash Chaining:** Tamper-evident event log
|
||||
- **Complete Coverage:** All agent actions, LLM calls, MCP tool invocations
|
||||
|
||||
---
|
||||
|
||||
## Scalability Considerations
|
||||
|
||||
### Horizontal Scaling
|
||||
|
||||
| Component | Scaling Strategy |
|
||||
|-----------|-----------------|
|
||||
| FastAPI | Multiple instances behind load balancer |
|
||||
| Celery Workers | Add workers per queue as needed |
|
||||
| PostgreSQL | Read replicas, connection pooling |
|
||||
| Redis | Cluster mode for high availability |
|
||||
|
||||
### Expected Scale
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Concurrent Projects | 50+ |
|
||||
| Concurrent Agent Instances | 200+ |
|
||||
| Background Jobs/minute | 500+ |
|
||||
| SSE Connections | 200+ |
|
||||
|
||||
---
|
||||
|
||||
## Deployment Architecture
|
||||
|
||||
### Local Development
|
||||
|
||||
```
|
||||
docker-compose up
|
||||
├── PostgreSQL (+ pgvector)
|
||||
├── Redis
|
||||
├── FastAPI Backend
|
||||
├── Next.js Frontend
|
||||
├── Celery Workers (agent, git, sync queues)
|
||||
├── Celery Beat (scheduler)
|
||||
├── Flower (monitoring)
|
||||
└── MCP Servers (7 containers)
|
||||
```
|
||||
|
||||
### Production
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Load Balancer │
|
||||
└─────────────────────────────┬───────────────────────────────────┘
|
||||
│
|
||||
┌────────────────────┼────────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ API Instance 1 │ │ API Instance 2 │ │ API Instance N │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
│ │ │
|
||||
└────────────────────┼────────────────────┘
|
||||
│
|
||||
┌────────────────────┼────────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ PostgreSQL │ │ Redis Cluster │ │ Celery Workers │
|
||||
│ (Primary + │ │ │ │ (Auto-scaled) │
|
||||
│ Replicas) │ │ │ │ │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Documents
|
||||
|
||||
- [Implementation Roadmap](./IMPLEMENTATION_ROADMAP.md)
|
||||
- [Architecture Deep Analysis](./ARCHITECTURE_DEEP_ANALYSIS.md)
|
||||
- [ADRs](../adrs/) - All architecture decision records
|
||||
- [Spikes](../spikes/) - Research documents
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Full ADR List
|
||||
|
||||
1. [ADR-001: MCP Integration Architecture](../adrs/ADR-001-mcp-integration-architecture.md)
|
||||
2. [ADR-002: Real-time Communication](../adrs/ADR-002-realtime-communication.md)
|
||||
3. [ADR-003: Background Task Architecture](../adrs/ADR-003-background-task-architecture.md)
|
||||
4. [ADR-004: LLM Provider Abstraction](../adrs/ADR-004-llm-provider-abstraction.md)
|
||||
5. [ADR-005: Technology Stack Selection](../adrs/ADR-005-tech-stack-selection.md)
|
||||
6. [ADR-006: Agent Orchestration](../adrs/ADR-006-agent-orchestration.md)
|
||||
7. [ADR-007: Agentic Framework Selection](../adrs/ADR-007-agentic-framework-selection.md)
|
||||
8. [ADR-008: Knowledge Base and RAG](../adrs/ADR-008-knowledge-base-rag.md)
|
||||
9. [ADR-009: Agent Communication Protocol](../adrs/ADR-009-agent-communication-protocol.md)
|
||||
10. [ADR-010: Workflow State Machine](../adrs/ADR-010-workflow-state-machine.md)
|
||||
11. [ADR-011: Issue Synchronization](../adrs/ADR-011-issue-synchronization.md)
|
||||
12. [ADR-012: Cost Tracking](../adrs/ADR-012-cost-tracking.md)
|
||||
13. [ADR-013: Audit Logging](../adrs/ADR-013-audit-logging.md)
|
||||
14. [ADR-014: Client Approval Flow](../adrs/ADR-014-client-approval-flow.md)
|
||||
|
||||
---
|
||||
|
||||
*This document serves as the authoritative architecture reference for Syndarix.*
|
||||
@@ -17,9 +17,11 @@ This roadmap outlines the phased implementation approach for Syndarix, prioritiz
|
||||
|
||||
### 0.1 Repository Setup
|
||||
- [x] Fork PragmaStack to Syndarix
|
||||
- [x] Create spike backlog in Gitea
|
||||
- [x] Create spike backlog in Gitea (12 issues)
|
||||
- [x] Complete architecture documentation
|
||||
- [ ] Rebrand codebase (Issue #13 - in progress)
|
||||
- [x] Complete all spike research (SPIKE-001 through SPIKE-012)
|
||||
- [x] Create all ADRs (ADR-001 through ADR-014)
|
||||
- [x] Rebrand codebase (all URLs, names, configs updated)
|
||||
- [ ] Configure CI/CD pipelines
|
||||
- [ ] Set up development environment documentation
|
||||
|
||||
@@ -31,9 +33,12 @@ This roadmap outlines the phased implementation approach for Syndarix, prioritiz
|
||||
- [ ] Set up Docker Compose for local development
|
||||
|
||||
### Deliverables
|
||||
- Fully branded Syndarix repository
|
||||
- Working local development environment
|
||||
- CI/CD pipeline running tests
|
||||
- [x] Fully branded Syndarix repository
|
||||
- [x] Complete architecture documentation (ARCHITECTURE.md)
|
||||
- [x] All spike research completed (12 spikes)
|
||||
- [x] All ADRs documented (14 ADRs)
|
||||
- [ ] Working local development environment (Docker Compose)
|
||||
- [ ] CI/CD pipeline running tests
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user