forked from cardosofelipe/fast-next-template

Files

Felipe Cardoso 88cf4e0abc feat: Update to production model stack and fix remaining inconsistencies

## Model Stack Updates (User's Actual Models)

Updated all documentation to reflect production models:
- Claude Opus 4.5 (primary reasoning)
- GPT 5.1 Codex max (code generation specialist)
- Gemini 3 Pro/Flash (multimodal, fast inference)
- Qwen3-235B (cost-effective, self-hostable)
- DeepSeek V3.2 (self-hosted, open weights)

### Files Updated:
- ADR-004: Full model groups, failover chains, cost tables
- ADR-007: Code example with correct model identifiers
- ADR-012: Cost tracking with new model prices
- ARCHITECTURE.md: Model groups, failover diagram
- IMPLEMENTATION_ROADMAP.md: External services list

## Architecture Diagram Updates

- Added LangGraph Runtime to orchestration layer
- Added technology labels (Type-Instance, transitions)

## Self-Hostability Table Expanded

Added entries for:
- LangGraph (MIT)
- transitions (MIT)
- DeepSeek V3.2 (MIT)
- Qwen3-235B (Apache 2.0)

## Metric Alignments

- Response time: Split into API (<200ms) and Agent (<10s/<60s)
- Cost per project: Adjusted to $100/sprint for Opus 4.5 pricing
- Added concurrent projects (10+) and agents (50+) metrics

## Infrastructure Updates

- Celery workers: 4-8 instances (was 2-4) across 4 queues
- MCP servers: Clarified Phase 2 + Phase 5 deployment
- Sync interval: Clarified 60s fallback + 15min reconciliation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-29 23:35:51 +01:00

21 KiB

Raw Permalink Blame History

Syndarix Architecture

Version: 1.0 Date: 2025-12-29 Status: Approved

Executive Summary

Syndarix is an autonomous AI-powered software consulting platform that orchestrates specialized AI agents to deliver complete software solutions. This document describes the chosen architecture, key decisions, and component interactions.

Core Principles

Self-Hostable First: All components are fully self-hostable with permissive licenses (MIT/BSD)
Production-Ready: Use battle-tested technologies, not experimental frameworks
Hybrid Architecture: Combine best-in-class tools rather than monolithic frameworks
Auditability: Every agent action is logged and traceable
Human-in-the-Loop: Configurable autonomy with approval checkpoints

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────────┐
│                              SYNDARIX PLATFORM                                   │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                  │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │                         FRONTEND (Next.js 16)                             │   │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐         │   │
│  │  │ Dashboard  │  │  Project   │  │  Agent     │  │  Approval  │         │   │
│  │  │   Pages    │  │   Views    │  │  Monitor   │  │   Queue    │         │   │
│  │  └────────────┘  └────────────┘  └────────────┘  └────────────┘         │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                       │                                          │
│                          REST + SSE + WebSocket                                  │
│                                       ▼                                          │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │                         BACKEND (FastAPI)                                 │   │
│  │                                                                           │   │
│  │  ┌─────────────────────────────────────────────────────────────────────┐ │   │
│  │  │                    ORCHESTRATION LAYER                               │ │   │
│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌───────────┐  │ │   │
│  │  │  │   Agent     │  │  Workflow   │  │  Approval   │  │ LangGraph │  │ │   │
│  │  │  │ Orchestrator│  │   Engine    │  │   Service   │  │  Runtime  │  │ │   │
│  │  │  │(Type-Inst.) │  │(transitions)│  │             │  │           │  │ │   │
│  │  │  └─────────────┘  └─────────────┘  └─────────────┘  └───────────┘  │ │   │
│  │  └─────────────────────────────────────────────────────────────────────┘ │   │
│  │                                                                           │   │
│  │  ┌─────────────────────────────────────────────────────────────────────┐ │   │
│  │  │                    INTEGRATION LAYER                                 │ │   │
│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                  │ │   │
│  │  │  │ LLM Gateway │  │  MCP Client │  │   Event     │                  │ │   │
│  │  │  │  (LiteLLM)  │  │   Manager   │  │    Bus      │                  │ │   │
│  │  │  └─────────────┘  └─────────────┘  └─────────────┘                  │ │   │
│  │  └─────────────────────────────────────────────────────────────────────┘ │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                       │                                          │
│           ┌───────────────────────────┼───────────────────────────┐             │
│           ▼                           ▼                           ▼             │
│  ┌────────────────┐          ┌────────────────┐          ┌────────────────┐    │
│  │   PostgreSQL   │          │     Redis      │          │  Celery Workers│    │
│  │   + pgvector   │          │  (Cache/Queue) │          │  (Background)  │    │
│  └────────────────┘          └────────────────┘          └────────────────┘    │
│                                       │                                          │
│                                       ▼                                          │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │                         MCP SERVERS                                       │   │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │   │
│  │  │   LLM    │  │Knowledge │  │   Git    │  │  Issues  │  │   File   │   │   │
│  │  │ Gateway  │  │   Base   │  │   MCP    │  │   MCP    │  │  System  │   │   │
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘   │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                                                                  │
└─────────────────────────────────────────────────────────────────────────────────┘

Key Architecture Decisions

ADR Summary Matrix

ADR	Decision	Key Technology
ADR-001	MCP Integration	FastMCP 2.0, Unified Singletons
ADR-002	Real-time Communication	SSE primary, WebSocket for chat
ADR-003	Background Tasks	Celery + Redis
ADR-004	LLM Provider	LiteLLM with failover
ADR-005	Tech Stack	PragmaStack + extensions
ADR-006	Agent Orchestration	Type-Instance pattern
ADR-007	Framework Selection	Hybrid (LangGraph + transitions + Celery)
ADR-008	Knowledge Base	pgvector for RAG
ADR-009	Agent Communication	Structured messages + Redis Streams
ADR-010	Workflows	transitions + PostgreSQL + Celery
ADR-011	Issue Sync	Webhook-first + polling fallback
ADR-012	Cost Tracking	LiteLLM callbacks + Redis budgets
ADR-013	Audit Logging	Structlog + hash chaining
ADR-014	Client Approval	Checkpoint-based + notifications

Component Deep Dives

1. Agent Orchestration

Pattern: Type-Instance

Agent Types: Templates defining model, expertise, personality, capabilities
Agent Instances: Runtime instances spawned from types, assigned to projects
Orchestrator: Manages lifecycle, routing, and resource tracking

Agent Type (Template)              Agent Instance (Runtime)
┌─────────────────────┐            ┌─────────────────────┐
│ name: "Engineer"    │───spawn───▶│ id: "eng-001"       │
│ model: "sonnet"     │            │ name: "Dave"        │
│ expertise: [py, js] │            │ project: "proj-123" │
│ capabilities: [...]  │            │ context: {...}      │
└─────────────────────┘            │ status: ACTIVE      │
                                   └─────────────────────┘

2. LLM Gateway (LiteLLM)

Failover Chain:

Claude Opus 4.5 (Primary)
         │
         ▼ (on failure/rate limit)
    GPT 5.1 Codex max (Code specialist)
         │
         ▼ (on failure/rate limit)
    Gemini 3 Pro (Multimodal)
         │
         ▼ (on failure)
    Qwen3-235B / DeepSeek V3.2 (Self-hosted)

Model Groups:

Group	Use Case	Primary Model	Fallback
high-reasoning	Architecture, complex analysis	Claude Opus 4.5	GPT 5.1 Codex max
code-generation	Code writing, refactoring	GPT 5.1 Codex max	Claude Opus 4.5
fast-response	Quick tasks, status updates	Gemini 3 Flash	Qwen3-235B
cost-optimized	High-volume, non-critical	Qwen3-235B	DeepSeek V3.2
self-hosted	Privacy-sensitive, air-gapped	DeepSeek V3.2	Qwen3-235B

3. Knowledge Base (RAG)

Stack: pgvector + LiteLLM embeddings

Chunking Strategy:

Content	Strategy	Model
Code	AST-based (function/class)	voyage-code-3
Docs	Heading-based	text-embedding-3-small
Conversations	Turn-based	text-embedding-3-small

Search: Hybrid (70% vector + 30% keyword)

4. Workflow Engine

Stack: transitions library + PostgreSQL + Celery

Core Workflows:

Sprint Workflow: planning → active → review → done
Story Workflow: analysis → design → implementation → review → testing → done
PR Workflow: submitted → reviewing → changes_requested → approved → merged

Durability: Event sourcing with state persistence to PostgreSQL

5. Real-time Communication

SSE (90% of use cases):

Agent activity streams
Project progress updates
Approval notifications
Issue change notifications

WebSocket (10% - bidirectional):

Interactive chat with agents
Real-time debugging

Event Bus: Redis Pub/Sub for cross-instance distribution

6. Issue Synchronization

Architecture: Webhook-first + polling fallback

Supported Providers:

Gitea (primary)
GitHub
GitLab

Conflict Resolution: Last-Writer-Wins with version vectors

7. Cost Tracking

Real-time Pipeline:

LLM Request → LiteLLM Callback → Redis INCR → Budget Check
                    │
              Async Queue → PostgreSQL → SSE Dashboard Update

Budget Enforcement:

Soft limits: Alerts + model downgrade
Hard limits: Block requests

8. Audit Logging

Immutability: SHA-256 hash chaining

Storage Tiers:

Tier	Storage	Retention
Hot	PostgreSQL	0-90 days
Cold	S3/MinIO	90+ days

9. Client Approval Flow

Autonomy Levels:

Level	Description
FULL_CONTROL	Approve every action
MILESTONE	Approve sprint boundaries
AUTONOMOUS	Only critical decisions

Notifications: SSE + Email + Mobile Push

Technology Stack

Core Technologies

Layer	Technology	Version	License
Backend	FastAPI	0.115+	MIT
Frontend	Next.js	16	MIT
Database	PostgreSQL + pgvector	15+	PostgreSQL
Cache/Queue	Redis	7.0+	BSD-3
Task Queue	Celery	5.3+	BSD-3
LLM Gateway	LiteLLM	Latest	MIT
MCP Framework	FastMCP	2.0+	MIT

Self-Hostability Guarantee

All components are fully self-hostable with no mandatory subscriptions:

Component	License	Self-Hosted	Managed Alternative (Optional)
PostgreSQL	PostgreSQL	Yes	RDS, Neon, Supabase
Redis	BSD-3	Yes	Redis Cloud
LiteLLM	MIT	Yes	LiteLLM Enterprise
Celery	BSD-3	Yes	-
FastMCP	MIT	Yes	-
LangGraph	MIT	Yes	LangSmith (observability only)
transitions	MIT	Yes	-
DeepSeek V3.2	MIT	Yes	API available
Qwen3-235B	Apache 2.0	Yes	Alibaba Cloud

Data Flow Diagrams

Agent Task Execution

1. Client creates story in Syndarix
         │
         ▼
2. Story workflow transitions to "implementation"
         │
         ▼
3. Agent Orchestrator spawns Engineer instance
         │
         ▼
4. Engineer queries Knowledge Base (RAG)
         │
         ▼
5. Engineer calls LLM Gateway for code generation
         │
         ▼
6. Engineer calls Git MCP to create branch & commit
         │
         ▼
7. Engineer creates PR via Git MCP
         │
         ▼
8. Workflow transitions to "review"
         │
         ▼
9. If autonomy_level != AUTONOMOUS:
   └── Approval request created
   └── Client notified via SSE + email
         │
         ▼
10. Client approves → PR merged → Workflow to "testing"

Real-time Event Flow

Agent Action
     │
     ▼
Event Bus (Redis Pub/Sub)
     │
     ├──▶ SSE Endpoint ──▶ Frontend Dashboard
     │
     ├──▶ Audit Logger ──▶ PostgreSQL
     │
     └──▶ Other Backend Instances (horizontal scaling)

Security Architecture

Authentication Flow

Users: JWT dual-token (access + refresh) via PragmaStack
Agents: Service tokens for MCP communication
MCP Servers: Internal network only, validated service tokens

Multi-Tenancy

Project Isolation: All queries scoped by project_id
Row-Level Security: PostgreSQL RLS for knowledge base
Agent Scoping: Every MCP tool requires project_id + agent_id

Audit Trail

Hash Chaining: Tamper-evident event log
Complete Coverage: All agent actions, LLM calls, MCP tool invocations

Scalability Considerations

Horizontal Scaling

Component	Scaling Strategy
FastAPI	Multiple instances behind load balancer
Celery Workers	Add workers per queue as needed
PostgreSQL	Read replicas, connection pooling
Redis	Cluster mode for high availability

Expected Scale

Metric	Target
Concurrent Projects	50+
Concurrent Agent Instances	200+
Background Jobs/minute	500+
SSE Connections	200+

Deployment Architecture

Local Development

docker-compose up
├── PostgreSQL (+ pgvector)
├── Redis
├── FastAPI Backend
├── Next.js Frontend
├── Celery Workers (agent, git, sync queues)
├── Celery Beat (scheduler)
├── Flower (monitoring)
└── MCP Servers (7 containers)

Production

┌─────────────────────────────────────────────────────────────────┐
│                        Load Balancer                             │
└─────────────────────────────┬───────────────────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  API Instance 1 │  │  API Instance 2 │  │  API Instance N │
└─────────────────┘  └─────────────────┘  └─────────────────┘
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│   PostgreSQL    │  │  Redis Cluster  │  │  Celery Workers │
│   (Primary +    │  │                 │  │  (Auto-scaled)  │
│    Replicas)    │  │                 │  │                 │
└─────────────────┘  └─────────────────┘  └─────────────────┘

Implementation Roadmap
Architecture Deep Analysis
ADRs - All architecture decision records
Spikes - Research documents

Appendix: Full ADR List

This document serves as the authoritative architecture reference for Syndarix.

21 KiB Raw Permalink Blame History