docs: add architecture decision records (ADRs) for key technical choices

- Added the following ADRs to `docs/adrs/` directory: - ADR-001: MCP Integration Architecture - ADR-002: Real-time Communication Architecture - ADR-003: Background Task Architecture - ADR-004: LLM Provider Abstraction - ADR-005: Technology Stack Selection - Each ADR details the context, decision drivers, considered options, final decisions, and implementation plans. - Documentation aligns technical choices with architecture principles and system requirements for Syndarix.
2025-12-29 13:16:02 +01:00
parent a6a336b66e
commit 6e3cdebbfb
7 changed files with 1565 additions and 0 deletions
--- a/docs/adrs/ADR-001-mcp-integration-architecture.md
+++ b/docs/adrs/ADR-001-mcp-integration-architecture.md
@@ -0,0 +1,134 @@
+# ADR-001: MCP Integration Architecture
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+**Related Spikes:** SPIKE-001
+
+---
+
+## Context
+
+Syndarix requires integration with multiple external services (LLM providers, Git, issue trackers, file systems, CI/CD). The Model Context Protocol (MCP) was identified as the standard for tool integration in AI applications. We need to decide on:
+
+1. The MCP framework to use
+2. Server deployment pattern (singleton vs per-project)
+3. Scoping mechanism for multi-project/multi-agent access
+
+## Decision Drivers
+
+- **Simplicity:** Minimize operational complexity
+- **Resource Efficiency:** Avoid spawning redundant processes
+- **Consistency:** Unified interface across all integrations
+- **Scalability:** Support 10+ concurrent projects
+- **Maintainability:** Easy to add new MCP servers
+
+## Considered Options
+
+### Option 1: Per-Project MCP Servers
+Spawn dedicated MCP server instances for each project.
+
+**Pros:**
+- Complete isolation between projects
+- Simple access control (project owns server)
+
+**Cons:**
+- Resource heavy (7 servers × N projects)
+- Complex orchestration
+- Difficult to share cross-project resources
+
+### Option 2: Unified Singleton MCP Servers (Selected)
+Single instance of each MCP server type, with explicit project/agent scoping.
+
+**Pros:**
+- Resource efficient (7 total servers)
+- Simpler deployment
+- Enables cross-project learning (if desired)
+- Consistent management
+
+**Cons:**
+- Requires explicit scoping in all tools
+- Shared state requires careful design
+
+### Option 3: Hybrid (MCP Proxy)
+Single proxy that routes to per-project backends.
+
+**Pros:**
+- Balance of isolation and efficiency
+
+**Cons:**
+- Added complexity
+- Routing overhead
+
+## Decision
+
+**Adopt Option 2: Unified Singleton MCP Servers with explicit scoping.**
+
+All MCP servers are deployed as singletons. Every tool accepts `project_id` and `agent_id` parameters for:
+- Access control validation
+- Audit logging
+- Context filtering
+
+## Implementation
+
+### MCP Server Registry
+
+| Server | Port | Purpose |
+|--------|------|---------|
+| LLM Gateway | 9001 | Route LLM requests with failover |
+| Git MCP | 9002 | Git operations across providers |
+| Knowledge Base MCP | 9003 | RAG and document search |
+| Issues MCP | 9004 | Issue tracking operations |
+| File System MCP | 9005 | Workspace file operations |
+| Code Analysis MCP | 9006 | Static analysis, linting |
+| CI/CD MCP | 9007 | Pipeline operations |
+
+### Framework Selection
+
+Use **FastMCP 2.0** for all MCP server implementations:
+- Decorator-based tool registration
+- Built-in async support
+- Compatible with SSE transport
+- Type-safe with Pydantic
+
+### Tool Signature Pattern
+
+```python
+@mcp.tool()
+def tool_name(
+    project_id: str,   # Required: project scope
+    agent_id: str,     # Required: calling agent
+    # ... tool-specific params
+) -> Result:
+    validate_access(agent_id, project_id)
+    log_tool_usage(agent_id, project_id, "tool_name")
+    # ... implementation
+```
+
+## Consequences
+
+### Positive
+- Single deployment per MCP type simplifies operations
+- Consistent interface across all tools
+- Easy to add monitoring/logging centrally
+- Cross-project analytics possible
+
+### Negative
+- All tools must include scoping parameters
+- Shared state requires careful design
+- Single point of failure per MCP type (mitigated by multiple instances)
+
+### Neutral
+- Requires MCP client manager in FastAPI backend
+- Authentication handled internally (service tokens for v1)
+
+## Compliance
+
+This decision aligns with:
+- FR-802: MCP-first architecture requirement
+- NFR-201: Horizontal scalability requirement
+- NFR-602: Centralized logging requirement
+
+---
+
+*This ADR supersedes any previous decisions regarding MCP architecture.*
--- a/docs/adrs/ADR-002-realtime-communication.md
+++ b/docs/adrs/ADR-002-realtime-communication.md
@@ -0,0 +1,160 @@
+# ADR-002: Real-time Communication Architecture
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+**Related Spikes:** SPIKE-003
+
+---
+
+## Context
+
+Syndarix requires real-time communication for:
+- Agent activity streams
+- Project progress updates
+- Build/pipeline status
+- Client approval requests
+- Issue change notifications
+- Interactive chat with agents
+
+We need to decide between WebSocket and Server-Sent Events (SSE) for real-time data delivery.
+
+## Decision Drivers
+
+- **Simplicity:** Minimize implementation complexity
+- **Reliability:** Built-in reconnection handling
+- **Scalability:** Support 200+ concurrent connections
+- **Compatibility:** Work through proxies and load balancers
+- **Use Case Fit:** Match communication patterns
+
+## Considered Options
+
+### Option 1: WebSocket Only
+Use WebSocket for all real-time communication.
+
+**Pros:**
+- Bidirectional communication
+- Single protocol to manage
+- Well-supported in FastAPI
+
+**Cons:**
+- Manual reconnection logic required
+- More complex through proxies
+- Overkill for server-to-client streams
+
+### Option 2: SSE Only
+Use Server-Sent Events for all real-time communication.
+
+**Pros:**
+- Built-in automatic reconnection
+- Native HTTP (proxy-friendly)
+- Simpler implementation
+
+**Cons:**
+- Unidirectional only
+- Browser connection limits per domain
+
+### Option 3: SSE Primary + WebSocket for Chat (Selected)
+Use SSE for server-to-client events, WebSocket for bidirectional chat.
+
+**Pros:**
+- Best tool for each use case
+- SSE simplicity for 90% of needs
+- WebSocket only where truly needed
+
+**Cons:**
+- Two protocols to manage
+
+## Decision
+
+**Adopt Option 3: SSE as primary transport, WebSocket for interactive chat.**
+
+### SSE Use Cases (90%)
+- Agent activity streams
+- Project progress updates
+- Build/pipeline status
+- Approval request notifications
+- Issue change notifications
+
+### WebSocket Use Cases (10%)
+- Interactive chat with agents
+- Real-time debugging sessions
+- Future collaboration features
+
+## Implementation
+
+### Event Bus with Redis Pub/Sub
+
+```
+FastAPI Backend ──publish──> Redis Pub/Sub ──subscribe──> SSE Endpoints
+                                   │
+                                   └──> Other Backend Instances
+```
+
+### SSE Endpoint Pattern
+
+```python
+@router.get("/projects/{project_id}/events")
+async def project_events(project_id: str, request: Request):
+    async def event_generator():
+        subscriber = await event_bus.subscribe(f"project:{project_id}")
+        try:
+            while not await request.is_disconnected():
+                event = await asyncio.wait_for(
+                    subscriber.get_event(), timeout=30.0
+                )
+                yield f"event: {event.type}\ndata: {event.json()}\n\n"
+        finally:
+            await subscriber.unsubscribe()
+
+    return StreamingResponse(
+        event_generator(),
+        media_type="text/event-stream"
+    )
+```
+
+### Event Types
+
+| Category | Event Types |
+|----------|-------------|
+| Agent | `agent_started`, `agent_activity`, `agent_completed`, `agent_error` |
+| Project | `issue_created`, `issue_updated`, `issue_closed` |
+| Git | `branch_created`, `commit_pushed`, `pr_created`, `pr_merged` |
+| Workflow | `approval_required`, `sprint_started`, `sprint_completed` |
+| Pipeline | `pipeline_started`, `pipeline_completed`, `pipeline_failed` |
+
+### Client Implementation
+
+- Single SSE connection per project
+- Event multiplexing through event types
+- Exponential backoff on reconnection
+- Native `EventSource` API with automatic reconnect
+
+## Consequences
+
+### Positive
+- Simpler implementation for server-to-client streams
+- Automatic reconnection reduces client complexity
+- Works through all HTTP proxies
+- Reduced server resource usage vs WebSocket
+
+### Negative
+- Two protocols to maintain
+- WebSocket requires manual reconnect logic
+- SSE limited to ~6 connections per domain (HTTP/1.1)
+
+### Mitigation
+- Use HTTP/2 where possible (higher connection limits)
+- Multiplex all project events on single connection
+- WebSocket only for interactive chat sessions
+
+## Compliance
+
+This decision aligns with:
+- FR-105: Real-time agent activity monitoring
+- NFR-102: 200+ concurrent connections requirement
+- NFR-501: Responsive UI updates
+
+---
+
+*This ADR supersedes any previous decisions regarding real-time communication.*
--- a/docs/adrs/ADR-003-background-task-architecture.md
+++ b/docs/adrs/ADR-003-background-task-architecture.md
@@ -0,0 +1,179 @@
+# ADR-003: Background Task Architecture
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+**Related Spikes:** SPIKE-004
+
+---
+
+## Context
+
+Syndarix requires background task processing for:
+- Agent actions (LLM calls, code generation)
+- Git operations (clone, commit, push, PR creation)
+- External synchronization (issue sync with Gitea/GitHub/GitLab)
+- CI/CD pipeline triggers
+- Long-running workflows (sprints, story implementation)
+
+These tasks are too slow for synchronous API responses and need proper queuing, retry, and monitoring.
+
+## Decision Drivers
+
+- **Reliability:** Tasks must complete even if workers restart
+- **Visibility:** Progress tracking for long-running operations
+- **Scalability:** Handle concurrent agent operations
+- **Rate Limiting:** Respect LLM API rate limits
+- **Async Compatibility:** Work with async FastAPI
+
+## Considered Options
+
+### Option 1: FastAPI BackgroundTasks
+Use FastAPI's built-in background tasks.
+
+**Pros:**
+- Simple, no additional infrastructure
+- Direct async integration
+
+**Cons:**
+- No persistence (lost on restart)
+- No retry mechanism
+- No distributed workers
+
+### Option 2: Celery + Redis (Selected)
+Use Celery for task queue with Redis as broker/backend.
+
+**Pros:**
+- Mature, battle-tested
+- Persistent task queue
+- Built-in retry with backoff
+- Distributed workers
+- Task chaining and workflows
+- Monitoring with Flower
+
+**Cons:**
+- Additional infrastructure
+- Sync-only task execution (bridge needed for async)
+
+### Option 3: Dramatiq + Redis
+Use Dramatiq as a simpler Celery alternative.
+
+**Pros:**
+- Simpler API than Celery
+- Good async support
+
+**Cons:**
+- Less mature ecosystem
+- Fewer monitoring tools
+
+### Option 4: ARQ (Async Redis Queue)
+Use ARQ for native async task processing.
+
+**Pros:**
+- Native async
+- Simple API
+
+**Cons:**
+- Less feature-rich
+- Smaller community
+
+## Decision
+
+**Adopt Option 2: Celery + Redis.**
+
+Celery provides the reliability, monitoring, and ecosystem maturity needed for production workloads. Redis serves as both broker and result backend.
+
+## Implementation
+
+### Queue Architecture
+
+```
+┌─────────────────────────────────────────────────┐
+│                 Redis (Broker + Backend)         │
+├─────────────┬─────────────┬─────────────────────┤
+│ agent_queue │  git_queue  │     sync_queue      │
+│ (prefetch=1)│ (prefetch=4)│    (prefetch=4)     │
+└──────┬──────┴──────┬──────┴──────────┬──────────┘
+       │             │                 │
+       ▼             ▼                 ▼
+  ┌─────────┐  ┌─────────┐       ┌─────────┐
+  │ Agent   │  │  Git    │       │  Sync   │
+  │ Workers │  │ Workers │       │ Workers │
+  └─────────┘  └─────────┘       └─────────┘
+```
+
+### Queue Configuration
+
+| Queue | Prefetch | Concurrency | Purpose |
+|-------|----------|-------------|---------|
+| `agent_queue` | 1 | 4 | LLM-based tasks (rate limited) |
+| `git_queue` | 4 | 8 | Git operations |
+| `sync_queue` | 4 | 4 | External sync |
+| `cicd_queue` | 4 | 4 | Pipeline operations |
+
+### Task Patterns
+
+**Progress Reporting:**
+```python
+@celery_app.task(bind=True)
+def implement_story(self, story_id: str, agent_id: str, project_id: str):
+    for i, step in enumerate(steps):
+        self.update_state(
+            state="PROGRESS",
+            meta={"current": i + 1, "total": len(steps)}
+        )
+        # Publish SSE event for real-time UI update
+        event_bus.publish(f"project:{project_id}", {
+            "type": "agent_progress",
+            "step": i + 1,
+            "total": len(steps)
+        })
+        execute_step(step)
+```
+
+**Task Chaining:**
+```python
+workflow = chain(
+    analyze_requirements.s(story_id),
+    design_solution.s(),
+    implement_code.s(),
+    run_tests.s(),
+    create_pr.s()
+)
+```
+
+### Monitoring
+
+- **Flower:** Web UI for task monitoring (port 5555)
+- **Prometheus:** Metrics export for alerting
+- **Dead Letter Queue:** Failed tasks for investigation
+
+## Consequences
+
+### Positive
+- Reliable task execution with persistence
+- Automatic retry with exponential backoff
+- Progress tracking for long operations
+- Distributed workers for scalability
+- Rich monitoring and debugging tools
+
+### Negative
+- Additional infrastructure (Redis, workers)
+- Celery is synchronous (event_loop bridge for async calls)
+- Learning curve for task patterns
+
+### Mitigation
+- Use existing Redis instance (already needed for SSE)
+- Wrap async calls with `asyncio.run()` or `sync_to_async`
+- Document common task patterns
+
+## Compliance
+
+This decision aligns with:
+- FR-304: Long-running implementation workflow
+- NFR-102: 500+ background jobs per minute
+- NFR-402: Task reliability and fault tolerance
+
+---
+
+*This ADR supersedes any previous decisions regarding background task processing.*
--- a/docs/adrs/ADR-004-llm-provider-abstraction.md
+++ b/docs/adrs/ADR-004-llm-provider-abstraction.md
@@ -0,0 +1,189 @@
+# ADR-004: LLM Provider Abstraction
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+**Related Spikes:** SPIKE-005
+
+---
+
+## Context
+
+Syndarix agents require access to large language models (LLMs) from multiple providers:
+- **Anthropic** (Claude) - Primary provider
+- **OpenAI** (GPT-4) - Fallback provider
+- **Local models** (Ollama/Llama) - Cost optimization, privacy
+
+We need a unified abstraction layer that provides:
+- Consistent API across providers
+- Automatic failover on errors
+- Usage tracking and cost management
+- Rate limiting compliance
+
+## Decision Drivers
+
+- **Reliability:** Automatic failover on provider outages
+- **Cost Control:** Track and limit API spending
+- **Flexibility:** Easy to add/swap providers
+- **Consistency:** Single interface for all agents
+- **Async Support:** Compatible with async FastAPI
+
+## Considered Options
+
+### Option 1: Direct Provider SDKs
+Use Anthropic and OpenAI SDKs directly with custom abstraction.
+
+**Pros:**
+- Full control over implementation
+- No external dependencies
+
+**Cons:**
+- Significant development effort
+- Must maintain failover logic
+- Must track token costs manually
+
+### Option 2: LiteLLM (Selected)
+Use LiteLLM as unified abstraction layer.
+
+**Pros:**
+- Unified API for 100+ providers
+- Built-in failover and routing
+- Automatic token counting
+- Cost tracking built-in
+- Redis caching support
+- Active community
+
+**Cons:**
+- External dependency
+- May lag behind provider SDK updates
+
+### Option 3: LangChain
+Use LangChain's LLM abstraction.
+
+**Pros:**
+- Large ecosystem
+- Many integrations
+
+**Cons:**
+- Heavy dependency
+- Overkill for just LLM abstraction
+- Complexity overhead
+
+## Decision
+
+**Adopt Option 2: LiteLLM for unified LLM provider abstraction.**
+
+LiteLLM provides the reliability, monitoring, and multi-provider support needed with minimal overhead.
+
+## Implementation
+
+### Model Groups
+
+| Group Name | Use Case | Primary Model | Fallback |
+|------------|----------|---------------|----------|
+| `high-reasoning` | Complex analysis, architecture | Claude 3.5 Sonnet | GPT-4 Turbo |
+| `fast-response` | Quick tasks, simple queries | Claude 3 Haiku | GPT-4o Mini |
+| `cost-optimized` | High-volume, non-critical | Local Llama 3 | Claude 3 Haiku |
+
+### Failover Chain
+
+```
+Claude 3.5 Sonnet (Anthropic)
+         │
+         ▼ (on failure)
+    GPT-4 Turbo (OpenAI)
+         │
+         ▼ (on failure)
+    Llama 3 (Ollama/Local)
+         │
+         ▼ (on failure)
+    Error with retry
+```
+
+### LLM Gateway Service
+
+```python
+class LLMGateway:
+    def __init__(self):
+        self.router = Router(
+            model_list=model_list,
+            fallbacks=[
+                {"high-reasoning": ["high-reasoning", "local-fallback"]},
+            ],
+            routing_strategy="latency-based-routing",
+            num_retries=3,
+        )
+
+    async def complete(
+        self,
+        agent_id: str,
+        project_id: str,
+        messages: list[dict],
+        model_preference: str = "high-reasoning",
+    ) -> dict:
+        response = await self.router.acompletion(
+            model=model_preference,
+            messages=messages,
+        )
+        await self._track_usage(agent_id, project_id, response)
+        return response
+```
+
+### Cost Tracking
+
+| Model | Input (per 1M tokens) | Output (per 1M tokens) |
+|-------|----------------------|------------------------|
+| Claude 3.5 Sonnet | $3.00 | $15.00 |
+| Claude 3 Haiku | $0.25 | $1.25 |
+| GPT-4 Turbo | $10.00 | $30.00 |
+| GPT-4o Mini | $0.15 | $0.60 |
+| Ollama (local) | $0.00 | $0.00 |
+
+### Agent Type Mapping
+
+| Agent Type | Model Preference | Rationale |
+|------------|------------------|-----------|
+| Product Owner | high-reasoning | Complex requirements analysis |
+| Software Architect | high-reasoning | Architecture decisions |
+| Software Engineer | high-reasoning | Code generation |
+| QA Engineer | fast-response | Test case generation |
+| DevOps Engineer | fast-response | Config generation |
+| Project Manager | fast-response | Status updates |
+
+### Caching Strategy
+
+- **Redis-backed cache** for repeated queries
+- **TTL:** 1 hour for general queries
+- **Skip cache:** For context-dependent generation
+- **Cache key:** Hash of (model, messages, temperature)
+
+## Consequences
+
+### Positive
+- Single interface for all LLM operations
+- Automatic failover improves reliability
+- Built-in cost tracking and budgeting
+- Easy to add new providers
+- Caching reduces API costs
+
+### Negative
+- Dependency on LiteLLM library
+- May lag behind provider SDK features
+- Additional abstraction layer
+
+### Mitigation
+- Pin LiteLLM version, test before upgrades
+- Direct SDK access available if needed
+- Monitor LiteLLM updates for breaking changes
+
+## Compliance
+
+This decision aligns with:
+- FR-101: Agent type model configuration
+- NFR-103: Agent response time targets
+- NFR-402: Failover requirements
+- TR-001: LLM API unavailability mitigation
+
+---
+
+*This ADR supersedes any previous decisions regarding LLM integration.*
--- a/docs/adrs/ADR-005-tech-stack-selection.md
+++ b/docs/adrs/ADR-005-tech-stack-selection.md
@@ -0,0 +1,156 @@
+# ADR-005: Technology Stack Selection
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+
+---
+
+## Context
+
+Syndarix needs a robust, modern technology stack that can support:
+- Multi-agent orchestration with real-time communication
+- Full-stack web application with API backend
+- Background task processing for long-running operations
+- Vector search for RAG (Retrieval-Augmented Generation)
+- Multiple external integrations via MCP
+
+The decision was made to build upon **PragmaStack** as the foundation, extending it with Syndarix-specific components.
+
+## Decision Drivers
+
+- **Productivity:** Rapid development with modern frameworks
+- **Type Safety:** Minimize runtime errors
+- **Async Performance:** Handle concurrent agent operations
+- **Ecosystem:** Rich library support
+- **Familiarity:** Team expertise with selected technologies
+- **Production-Ready:** Proven technologies for production workloads
+
+## Decision
+
+**Adopt PragmaStack as foundation with Syndarix-specific extensions.**
+
+### Core Stack (from PragmaStack)
+
+| Layer | Technology | Version | Rationale |
+|-------|------------|---------|-----------|
+| **Backend** | FastAPI | 0.115+ | Async, OpenAPI, type hints |
+| **Backend Language** | Python | 3.11+ | Type hints, async/await, ecosystem |
+| **Frontend** | Next.js | 16 | React 19, server components, App Router |
+| **Frontend Language** | TypeScript | 5.0+ | Type safety, IDE support |
+| **Database** | PostgreSQL | 15+ | Robust, extensible, pgvector |
+| **ORM** | SQLAlchemy | 2.0+ | Async support, type hints |
+| **Validation** | Pydantic | 2.0+ | Data validation, serialization |
+| **State Management** | Zustand | 4.0+ | Simple, performant |
+| **Data Fetching** | TanStack Query | 5.0+ | Caching, invalidation |
+| **UI Components** | shadcn/ui | Latest | Accessible, customizable |
+| **CSS** | Tailwind CSS | 4.0+ | Utility-first, fast styling |
+| **Auth** | JWT | - | Dual-token (access + refresh) |
+
+### Syndarix Extensions
+
+| Component | Technology | Version | Purpose |
+|-----------|------------|---------|---------|
+| **Task Queue** | Celery | 5.3+ | Background job processing |
+| **Message Broker** | Redis | 7.0+ | Celery broker, caching, pub/sub |
+| **Vector Store** | pgvector | Latest | Embeddings for RAG |
+| **MCP Framework** | FastMCP | 2.0+ | MCP server development |
+| **LLM Abstraction** | LiteLLM | Latest | Multi-provider LLM access |
+| **Real-time** | SSE + WebSocket | - | Event streaming, chat |
+
+### Testing Stack
+
+| Type | Technology | Purpose |
+|------|------------|---------|
+| **Backend Unit** | pytest | 8.0+ | Python testing |
+| **Backend Async** | pytest-asyncio | Async test support |
+| **Backend Coverage** | coverage.py | Code coverage |
+| **Frontend Unit** | Jest | 29+ | React testing |
+| **Frontend Components** | React Testing Library | Component testing |
+| **E2E** | Playwright | 1.40+ | Browser automation |
+
+### DevOps Stack
+
+| Component | Technology | Purpose |
+|-----------|------------|---------|
+| **Containerization** | Docker | 24+ | Application packaging |
+| **Orchestration** | Docker Compose | Local development |
+| **CI/CD** | Gitea Actions | Automated pipelines |
+| **Database Migrations** | Alembic | Schema versioning |
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         Frontend (Next.js 16)                    │
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
+│  │   Pages     │  │ Components  │  │   Stores    │              │
+│  │ (App Router)│  │ (shadcn/ui) │  │  (Zustand)  │              │
+│  └─────────────┘  └─────────────┘  └─────────────┘              │
+└────────────────────────────┬────────────────────────────────────┘
+                             │ REST + SSE + WebSocket
+                             ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      Backend (FastAPI 0.115+)                    │
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
+│  │    API      │  │  Services   │  │    CRUD     │              │
+│  │   Routes    │  │   Layer     │  │   Layer     │              │
+│  └─────────────┘  └─────────────┘  └─────────────┘              │
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
+│  │ LLM Gateway │  │  MCP Client │  │ Event Bus   │              │
+│  │ (LiteLLM)   │  │  Manager    │  │ (Redis)     │              │
+│  └─────────────┘  └─────────────┘  └─────────────┘              │
+└────────────────────────────┬────────────────────────────────────┘
+                             │
+        ┌────────────────────┼────────────────────┐
+        ▼                    ▼                    ▼
+┌───────────────┐  ┌───────────────┐  ┌───────────────────────────┐
+│  PostgreSQL   │  │     Redis     │  │      MCP Servers          │
+│  + pgvector   │  │ (Cache/Queue) │  │ (LLM, Git, KB, Issues...) │
+└───────────────┘  └───────────────┘  └───────────────────────────┘
+                             │
+                             ▼
+                   ┌───────────────┐
+                   │    Celery     │
+                   │   Workers     │
+                   └───────────────┘
+```
+
+## Consequences
+
+### Positive
+- Proven, production-ready stack
+- Strong typing throughout (Python + TypeScript)
+- Excellent async performance
+- Rich ecosystem for extensions
+- Team familiarity reduces learning curve
+
+### Negative
+- Python GIL limits CPU-bound concurrency (mitigated by Celery)
+- Multiple languages (Python + TypeScript) to maintain
+- PostgreSQL requires management (vs serverless options)
+
+### Neutral
+- PragmaStack provides solid foundation but may include unused features
+- Stack is opinionated, limiting some technology choices
+
+## Version Pinning Strategy
+
+| Component | Strategy | Rationale |
+|-----------|----------|-----------|
+| Python | 3.11+ (specific minor) | Stability |
+| Node.js | 20 LTS | Long-term support |
+| FastAPI | 0.115+ | Latest stable |
+| Next.js | 16 | Current major |
+| PostgreSQL | 15+ | Required for features |
+
+## Compliance
+
+This decision aligns with:
+- NFR-601: Code quality standards (TypeScript, type hints)
+- NFR-603: Docker containerization requirement
+- TC-001 through TC-006: Technical constraints
+
+---
+
+*This ADR establishes the foundational technology choices for Syndarix.*
--- a/docs/adrs/ADR-006-agent-orchestration.md
+++ b/docs/adrs/ADR-006-agent-orchestration.md
@@ -0,0 +1,260 @@
+# ADR-006: Agent Orchestration Architecture
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+**Related Spikes:** SPIKE-002
+
+---
+
+## Context
+
+Syndarix requires an agent orchestration system that can:
+- Define reusable agent types with specific capabilities
+- Spawn multiple instances of the same type with unique identities
+- Manage agent state, context, and conversation history
+- Route messages between agents
+- Handle agent failover and recovery
+- Track resource usage per agent
+
+## Decision Drivers
+
+- **Flexibility:** Support diverse agent roles and capabilities
+- **Scalability:** Handle 50+ concurrent agent instances
+- **Isolation:** Each instance maintains separate state
+- **Observability:** Full visibility into agent activities
+- **Reliability:** Graceful handling of failures
+
+## Decision
+
+**Adopt a Type-Instance pattern** where:
+- **Agent Types** define templates (model, expertise, personality)
+- **Agent Instances** are spawned from types with unique identities
+- **Agent Orchestrator** manages lifecycle and communication
+
+## Architecture
+
+### Agent Type Definition
+
+```python
+class AgentType(Base):
+    id = Column(UUID, primary_key=True)
+    name = Column(String(50), unique=True)  # "Software Engineer"
+    role = Column(Enum(AgentRole))          # ENGINEER
+    base_model = Column(String(100))        # "claude-3-5-sonnet-20241022"
+    failover_model = Column(String(100))    # "gpt-4-turbo"
+    expertise = Column(ARRAY(String))       # ["python", "fastapi", "testing"]
+    personality = Column(JSONB)             # {"style": "detailed", "tone": "professional"}
+    system_prompt = Column(Text)            # Base system prompt template
+    capabilities = Column(ARRAY(String))    # ["code_generation", "code_review"]
+    is_active = Column(Boolean, default=True)
+```
+
+### Agent Instance Definition
+
+```python
+class AgentInstance(Base):
+    id = Column(UUID, primary_key=True)
+    name = Column(String(50))               # "Dave"
+    agent_type_id = Column(UUID, ForeignKey)
+    project_id = Column(UUID, ForeignKey)
+    status = Column(Enum(InstanceStatus))   # ACTIVE, IDLE, TERMINATED
+    context = Column(JSONB)                 # Current working context
+    conversation_id = Column(UUID)          # Active conversation
+    rag_collection_id = Column(String)      # Domain knowledge collection
+    token_usage = Column(JSONB)             # {"prompt": 0, "completion": 0}
+    last_active_at = Column(DateTime)
+    created_at = Column(DateTime)
+    terminated_at = Column(DateTime)
+```
+
+### Orchestrator Service
+
+```python
+class AgentOrchestrator:
+    """Central service for agent lifecycle management."""
+
+    async def spawn_agent(
+        self,
+        agent_type_id: UUID,
+        project_id: UUID,
+        name: str,
+        domain_knowledge: list[str] = None
+    ) -> AgentInstance:
+        """Spawn a new agent instance from a type definition."""
+        agent_type = await self.get_agent_type(agent_type_id)
+
+        instance = AgentInstance(
+            name=name,
+            agent_type_id=agent_type_id,
+            project_id=project_id,
+            status=InstanceStatus.ACTIVE,
+            context={"initialized_at": datetime.utcnow().isoformat()},
+        )
+
+        # Initialize RAG collection if domain knowledge provided
+        if domain_knowledge:
+            instance.rag_collection_id = await self._init_rag_collection(
+                instance.id, domain_knowledge
+            )
+
+        await self.db.add(instance)
+        await self.db.commit()
+
+        # Publish spawn event
+        await self.event_bus.publish(f"project:{project_id}", {
+            "type": "agent_spawned",
+            "agent_id": str(instance.id),
+            "name": name,
+            "role": agent_type.role.value
+        })
+
+        return instance
+
+    async def terminate_agent(self, instance_id: UUID) -> None:
+        """Terminate an agent instance and release resources."""
+        instance = await self.get_instance(instance_id)
+        instance.status = InstanceStatus.TERMINATED
+        instance.terminated_at = datetime.utcnow()
+
+        # Cleanup RAG collection
+        if instance.rag_collection_id:
+            await self._cleanup_rag_collection(instance.rag_collection_id)
+
+        await self.db.commit()
+
+    async def send_message(
+        self,
+        from_id: UUID,
+        to_id: UUID,
+        message: AgentMessage
+    ) -> None:
+        """Route a message from one agent to another."""
+        # Validate both agents exist and are active
+        sender = await self.get_instance(from_id)
+        recipient = await self.get_instance(to_id)
+
+        # Persist message
+        await self.message_store.save(message)
+
+        # If recipient is idle, trigger action
+        if recipient.status == InstanceStatus.IDLE:
+            await self._trigger_agent_action(recipient.id, message)
+
+        # Publish for real-time tracking
+        await self.event_bus.publish(f"project:{sender.project_id}", {
+            "type": "agent_message",
+            "from": str(from_id),
+            "to": str(to_id),
+            "preview": message.content[:100]
+        })
+
+    async def broadcast(
+        self,
+        from_id: UUID,
+        target_role: AgentRole,
+        message: AgentMessage
+    ) -> None:
+        """Broadcast a message to all agents of a specific role."""
+        sender = await self.get_instance(from_id)
+        recipients = await self.get_instances_by_role(
+            sender.project_id, target_role
+        )
+
+        for recipient in recipients:
+            await self.send_message(from_id, recipient.id, message)
+```
+
+### Agent Execution Pattern
+
+```python
+class AgentRunner:
+    """Executes agent actions using LLM."""
+
+    def __init__(self, instance: AgentInstance, llm_gateway: LLMGateway):
+        self.instance = instance
+        self.llm = llm_gateway
+
+    async def execute(self, action: str, context: dict) -> dict:
+        """Execute an action using the agent's configured model."""
+        agent_type = await self.get_agent_type(self.instance.agent_type_id)
+
+        # Build messages with system prompt and context
+        messages = [
+            {"role": "system", "content": self._build_system_prompt(agent_type)},
+            *self._get_conversation_history(),
+            {"role": "user", "content": self._build_action_prompt(action, context)}
+        ]
+
+        # Add RAG context if available
+        if self.instance.rag_collection_id:
+            rag_context = await self._query_rag(action, context)
+            messages.insert(1, {
+                "role": "system",
+                "content": f"Relevant context:\n{rag_context}"
+            })
+
+        # Execute with failover
+        response = await self.llm.complete(
+            agent_id=str(self.instance.id),
+            project_id=str(self.instance.project_id),
+            messages=messages,
+            model_preference=self._get_model_preference(agent_type)
+        )
+
+        # Update instance context
+        self.instance.context = {
+            **self.instance.context,
+            "last_action": action,
+            "last_response_at": datetime.utcnow().isoformat()
+        }
+
+        return response
+```
+
+### Agent Roles
+
+| Role | Instances | Primary Capabilities |
+|------|-----------|---------------------|
+| Product Owner | 1 | requirements, prioritization, client_communication |
+| Project Manager | 1 | planning, tracking, coordination |
+| Business Analyst | 1 | analysis, documentation, process_modeling |
+| Software Architect | 1 | design, architecture_decisions, tech_selection |
+| Software Engineer | 1-5 | code_generation, code_review, testing |
+| UI/UX Designer | 1 | design, wireframes, accessibility |
+| QA Engineer | 1-2 | test_planning, test_automation, bug_reporting |
+| DevOps Engineer | 1 | cicd, infrastructure, deployment |
+| AI/ML Engineer | 1 | ml_development, model_training, mlops |
+| Security Expert | 1 | security_review, vulnerability_assessment |
+
+## Consequences
+
+### Positive
+- Clear separation between type definition and instance runtime
+- Multiple instances share type configuration (DRY)
+- Easy to add new agent roles
+- Full observability through events
+- Graceful failure handling with model failover
+
+### Negative
+- Complexity in managing instance lifecycle
+- State synchronization across instances
+- Memory overhead for context storage
+
+### Mitigation
+- Context archival for long-running instances
+- Periodic cleanup of terminated instances
+- State compression for large contexts
+
+## Compliance
+
+This decision aligns with:
+- FR-101: Agent type configuration
+- FR-102: Agent instance spawning
+- FR-103: Agent domain knowledge (RAG)
+- FR-104: Inter-agent communication
+- FR-105: Agent activity monitoring
+
+---
+
+*This ADR establishes the agent orchestration architecture for Syndarix.*
--- a/docs/architecture/ARCHITECTURE_OVERVIEW.md
+++ b/docs/architecture/ARCHITECTURE_OVERVIEW.md
@@ -0,0 +1,487 @@
+# Syndarix Architecture Overview
+
+**Version:** 1.0
+**Date:** 2025-12-29
+**Status:** Draft
+
+---
+
+## Table of Contents
+
+1. [Executive Summary](#1-executive-summary)
+2. [System Context](#2-system-context)
+3. [High-Level Architecture](#3-high-level-architecture)
+4. [Core Components](#4-core-components)
+5. [Data Architecture](#5-data-architecture)
+6. [Integration Architecture](#6-integration-architecture)
+7. [Security Architecture](#7-security-architecture)
+8. [Deployment Architecture](#8-deployment-architecture)
+9. [Cross-Cutting Concerns](#9-cross-cutting-concerns)
+10. [Architecture Decisions](#10-architecture-decisions)
+
+---
+
+## 1. Executive Summary
+
+Syndarix is an AI-powered software consulting agency platform that orchestrates specialized AI agents to deliver complete software solutions autonomously. This document describes the technical architecture that enables:
+
+- **Multi-Agent Orchestration:** 10 specialized agent roles collaborating on projects
+- **MCP-First Integration:** All external tools via Model Context Protocol
+- **Real-time Visibility:** SSE-based event streaming for progress tracking
+- **Autonomous Workflows:** Configurable autonomy levels from full control to autonomous
+- **Full Artifact Delivery:** Code, documentation, tests, and ADRs
+
+### Architecture Principles
+
+1. **MCP-First:** All integrations through unified MCP servers
+2. **Event-Driven:** Async communication via Redis Pub/Sub
+3. **Type-Safe:** Full typing in Python and TypeScript
+4. **Stateless Services:** Horizontal scaling through stateless design
+5. **Explicit Scoping:** All operations scoped to project/agent
+
+---
+
+## 2. System Context
+
+### Context Diagram
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                              EXTERNAL ACTORS                                 │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐  │
+│  │   Client    │    │   Admin     │    │ LLM APIs    │    │ Git Hosts   │  │
+│  │   (Human)   │    │   (Human)   │    │ (Anthropic) │    │  (Gitea)    │  │
+│  └──────┬──────┘    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘  │
+│         │                  │                  │                  │          │
+└─────────│──────────────────│──────────────────│──────────────────│──────────┘
+          │                  │                  │                  │
+          │ Web UI           │ Admin UI         │ API              │ API
+          │ SSE              │                  │                  │
+          ▼                  ▼                  ▼                  ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                                                                             │
+│                              SYNDARIX PLATFORM                              │
+│                                                                             │
+│   ┌─────────────────────────────────────────────────────────────────────┐   │
+│   │                         Agent Orchestration                          │   │
+│   │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐            │   │
+│   │  │   PO   │ │   PM   │ │  Arch  │ │  Eng   │ │   QA   │  ...       │   │
+│   │  └────────┘ └────────┘ └────────┘ └────────┘ └────────┘            │   │
+│   └─────────────────────────────────────────────────────────────────────┘   │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+          │                  │                  │                  │
+          │ Storage          │ Events           │ Tasks            │
+          ▼                  ▼                  ▼                  ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                              INFRASTRUCTURE                                  │
+├─────────────────────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐  │
+│  │ PostgreSQL  │    │    Redis    │    │   Celery    │    │MCP Servers  │  │
+│  │ + pgvector  │    │   Pub/Sub   │    │   Workers   │    │ (7 types)   │  │
+│  └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘  │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Key Actors
+
+| Actor | Type | Interaction |
+|-------|------|-------------|
+| Client | Human | Web UI, approvals, feedback |
+| Admin | Human | Configuration, monitoring |
+| LLM Providers | External | Claude, GPT-4, local models |
+| Git Hosts | External | Gitea, GitHub, GitLab |
+| CI/CD Systems | External | Gitea Actions, etc. |
+
+---
+
+## 3. High-Level Architecture
+
+### Layered Architecture
+
+```
+┌───────────────────────────────────────────────────────────────────┐
+│                      PRESENTATION LAYER                           │
+│  ┌─────────────────────────────────────────────────────────────┐  │
+│  │                    Next.js 16 Frontend                       │  │
+│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │  │
+│  │  │Dashboard │  │ Projects │  │  Agents  │  │  Issues  │    │  │
+│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │  │
+│  └─────────────────────────────────────────────────────────────┘  │
+└───────────────────────────────────────────────────────────────────┘
+                                │
+                                │ REST + SSE + WebSocket
+                                ▼
+┌───────────────────────────────────────────────────────────────────┐
+│                       APPLICATION LAYER                           │
+│  ┌─────────────────────────────────────────────────────────────┐  │
+│  │                    FastAPI Backend                           │  │
+│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │  │
+│  │  │   Auth   │  │   API    │  │ Services │  │  Events  │    │  │
+│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │  │
+│  └─────────────────────────────────────────────────────────────┘  │
+└───────────────────────────────────────────────────────────────────┘
+                                │
+                                ▼
+┌───────────────────────────────────────────────────────────────────┐
+│                       ORCHESTRATION LAYER                         │
+│  ┌─────────────────────────────────────────────────────────────┐  │
+│  │  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐   │  │
+│  │  │    Agent      │  │   Workflow    │  │    Project    │   │  │
+│  │  │ Orchestrator  │  │    Engine     │  │   Manager     │   │  │
+│  │  └───────────────┘  └───────────────┘  └───────────────┘   │  │
+│  └─────────────────────────────────────────────────────────────┘  │
+└───────────────────────────────────────────────────────────────────┘
+                                │
+                                ▼
+┌───────────────────────────────────────────────────────────────────┐
+│                      INTEGRATION LAYER                            │
+│  ┌─────────────────────────────────────────────────────────────┐  │
+│  │                    MCP Client Manager                        │  │
+│  │  Connects to: LLM, Git, KB, Issues, FS, Code, CI/CD MCPs    │  │
+│  └─────────────────────────────────────────────────────────────┘  │
+└───────────────────────────────────────────────────────────────────┘
+                                │
+                                ▼
+┌───────────────────────────────────────────────────────────────────┐
+│                       DATA LAYER                                  │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐            │
+│  │  PostgreSQL  │  │    Redis     │  │  File Store  │            │
+│  │  + pgvector  │  │              │  │              │            │
+│  └──────────────┘  └──────────────┘  └──────────────┘            │
+└───────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 4. Core Components
+
+### 4.1 Agent Orchestrator
+
+**Purpose:** Manages agent lifecycle, spawning, communication, and coordination.
+
+**Responsibilities:**
+- Spawn agent instances from type definitions
+- Route messages between agents
+- Manage agent context and memory
+- Handle agent failover
+- Track resource usage
+
+**Key Patterns:**
+- Type-Instance pattern (types define templates, instances are runtime)
+- Message routing with priority queues
+- Context compression for long-running agents
+
+See: [ADR-006: Agent Orchestration](../adrs/ADR-006-agent-orchestration.md)
+
+### 4.2 Workflow Engine
+
+**Purpose:** Orchestrates multi-step workflows and agent collaboration.
+
+**Responsibilities:**
+- Execute workflow templates (requirements discovery, sprint, etc.)
+- Track workflow state and progress
+- Handle branching and conditions
+- Manage approval gates
+
+**Workflow Types:**
+- Requirements Discovery
+- Architecture Spike
+- Sprint Planning
+- Implementation
+- Sprint Demo
+
+### 4.3 Project Manager (Component)
+
+**Purpose:** Manages project lifecycle, configuration, and state.
+
+**Responsibilities:**
+- Create and configure projects
+- Manage complexity levels
+- Track project status
+- Generate reports
+
+### 4.4 LLM Gateway
+
+**Purpose:** Unified LLM access with failover and cost tracking.
+
+**Implementation:** LiteLLM-based router with:
+- Multiple model groups (high-reasoning, fast-response)
+- Automatic failover chain
+- Per-agent token tracking
+- Redis-backed caching
+
+See: [ADR-004: LLM Provider Abstraction](../adrs/ADR-004-llm-provider-abstraction.md)
+
+### 4.5 MCP Client Manager
+
+**Purpose:** Connects to all MCP servers and routes tool calls.
+
+**Implementation:**
+- SSE connections to 7 MCP server types
+- Automatic reconnection
+- Request/response correlation
+- Scoped tool calls with project_id/agent_id
+
+See: [ADR-001: MCP Integration Architecture](../adrs/ADR-001-mcp-integration-architecture.md)
+
+### 4.6 Event Bus
+
+**Purpose:** Real-time event distribution using Redis Pub/Sub.
+
+**Channels:**
+- `project:{project_id}` - Project-scoped events
+- `agent:{agent_id}` - Agent-specific events
+- `system` - System-wide announcements
+
+See: [ADR-002: Real-time Communication](../adrs/ADR-002-realtime-communication.md)
+
+---
+
+## 5. Data Architecture
+
+### 5.1 Entity Model
+
+```
+┌─────────────┐       ┌─────────────┐       ┌─────────────┐
+│    User     │───1:N─│   Project   │───1:N─│   Sprint    │
+└─────────────┘       └─────────────┘       └─────────────┘
+                             │ 1:N                │ 1:N
+                             │                    │
+                      ┌──────┴──────┐      ┌──────┴──────┐
+                      │             │      │             │
+               ┌──────┴──────┐ ┌────┴────┐ │       ┌─────┴─────┐
+               │ AgentInstance│ │Repository│ │       │   Issue   │
+               └─────────────┘ └─────────┘ │       └───────────┘
+                      │               │     │              │
+                      │ 1:N           │ 1:N │              │ 1:N
+               ┌──────┴──────┐ ┌──────┴────┐│       ┌──────┴──────┐
+               │   Message   │ │PullRequest│└───────│IssueComment │
+               └─────────────┘ └───────────┘        └─────────────┘
+```
+
+### 5.2 Key Entities
+
+| Entity | Purpose | Key Fields |
+|--------|---------|------------|
+| User | Human users | email, auth |
+| Project | Work containers | name, complexity, autonomy_level |
+| AgentType | Agent templates | base_model, expertise, system_prompt |
+| AgentInstance | Running agents | name, project_id, context |
+| Issue | Work items | type, status, external_tracker_fields |
+| Sprint | Time-boxed iterations | goal, velocity |
+| Repository | Git repos | provider, clone_url |
+| KnowledgeDocument | RAG documents | content, embedding_id |
+
+### 5.3 Vector Storage
+
+**pgvector** extension for:
+- Document embeddings (RAG)
+- Semantic search across knowledge base
+- Agent context similarity
+
+---
+
+## 6. Integration Architecture
+
+### 6.1 MCP Server Registry
+
+| Server | Port | Purpose | Priority Providers |
+|--------|------|---------|-------------------|
+| LLM Gateway | 9001 | LLM routing | Anthropic, OpenAI, Ollama |
+| Git MCP | 9002 | Git operations | Gitea, GitHub, GitLab |
+| Knowledge Base | 9003 | RAG search | pgvector |
+| Issues MCP | 9004 | Issue tracking | Gitea, GitHub, GitLab |
+| File System | 9005 | Workspace files | Local FS |
+| Code Analysis | 9006 | Static analysis | Ruff, ESLint |
+| CI/CD MCP | 9007 | Pipelines | Gitea Actions |
+
+### 6.2 External Integration Diagram
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Syndarix Backend                          │
+│                                                                  │
+│  ┌──────────────────────────────────────────────────────────┐   │
+│  │                    MCP Client Manager                     │   │
+│  │                                                          │   │
+│  │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │   │
+│  │  │  LLM   │ │  Git   │ │   KB   │ │ Issues │ │ CI/CD  │ │   │
+│  │  │ Client │ │ Client │ │ Client │ │ Client │ │ Client │ │   │
+│  │  └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ │   │
+│  └──────│──────────│──────────│──────────│──────────│──────┘   │
+└─────────│──────────│──────────│──────────│──────────│──────────┘
+          │          │          │          │          │
+          │ SSE      │ SSE      │ SSE      │ SSE      │ SSE
+          ▼          ▼          ▼          ▼          ▼
+     ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
+     │  LLM   │ │  Git   │ │   KB   │ │ Issues │ │ CI/CD  │
+     │  MCP   │ │  MCP   │ │  MCP   │ │  MCP   │ │  MCP   │
+     │ Server │ │ Server │ │ Server │ │ Server │ │ Server │
+     └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘
+         │          │          │          │          │
+         ▼          ▼          ▼          ▼          ▼
+    ┌─────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
+    │Anthropic│ │ Gitea  │ │pgvector│ │ Gitea  │ │ Gitea  │
+    │ OpenAI  │ │ GitHub │ │        │ │ Issues │ │Actions │
+    │ Ollama  │ │ GitLab │ │        │ │        │ │        │
+    └─────────┘ └────────┘ └────────┘ └────────┘ └────────┘
+```
+
+---
+
+## 7. Security Architecture
+
+### 7.1 Authentication
+
+- **JWT Dual-Token:** Access token (15 min) + Refresh token (7 days)
+- **OAuth 2.0 Provider:** For MCP client authentication
+- **Service Tokens:** Internal service-to-service auth
+
+### 7.2 Authorization
+
+- **RBAC:** Role-based access control
+- **Project Scoping:** All operations scoped to projects
+- **Agent Permissions:** Agents operate within project scope
+
+### 7.3 Data Protection
+
+- **TLS 1.3:** All external communications
+- **Encryption at Rest:** Database encryption
+- **Secrets Management:** Environment-based, never in code
+
+---
+
+## 8. Deployment Architecture
+
+### 8.1 Container Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Docker Compose                            │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
+│  │ Frontend │  │ Backend  │  │ Workers  │  │  Flower  │        │
+│  │ (Next.js)│  │ (FastAPI)│  │ (Celery) │  │(Monitor) │        │
+│  │  :3000   │  │  :8000   │  │          │  │  :5555   │        │
+│  └──────────┘  └──────────┘  └──────────┘  └──────────┘        │
+│                                                                  │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
+│  │ LLM MCP  │  │ Git MCP  │  │  KB MCP  │  │Issues MCP│        │
+│  │  :9001   │  │  :9002   │  │  :9003   │  │  :9004   │        │
+│  └──────────┘  └──────────┘  └──────────┘  └──────────┘        │
+│                                                                  │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐                      │
+│  │  FS MCP  │  │ Code MCP │  │CI/CD MCP │                      │
+│  │  :9005   │  │  :9006   │  │  :9007   │                      │
+│  └──────────┘  └──────────┘  └──────────┘                      │
+│                                                                  │
+│  ┌──────────────────────────────────────────────────────────┐   │
+│  │                      Infrastructure                       │   │
+│  │  ┌──────────┐  ┌──────────┐                              │   │
+│  │  │PostgreSQL│  │  Redis   │                              │   │
+│  │  │  :5432   │  │  :6379   │                              │   │
+│  │  └──────────┘  └──────────┘                              │   │
+│  └──────────────────────────────────────────────────────────┘   │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### 8.2 Scaling Strategy
+
+| Component | Scaling | Strategy |
+|-----------|---------|----------|
+| Frontend | Horizontal | Stateless, behind LB |
+| Backend | Horizontal | Stateless, behind LB |
+| Celery Workers | Horizontal | Queue-based routing |
+| MCP Servers | Horizontal | Stateless singletons |
+| PostgreSQL | Vertical + Read Replicas | Primary/replica |
+| Redis | Cluster | Sentinel or Cluster mode |
+
+---
+
+## 9. Cross-Cutting Concerns
+
+### 9.1 Logging
+
+- **Format:** Structured JSON
+- **Correlation:** Request IDs across services
+- **Levels:** DEBUG, INFO, WARNING, ERROR, CRITICAL
+
+### 9.2 Monitoring
+
+- **Metrics:** Prometheus-compatible export
+- **Traces:** OpenTelemetry (future)
+- **Dashboards:** Grafana (optional)
+
+### 9.3 Error Handling
+
+- **Agent Errors:** Logged, published via SSE
+- **Task Failures:** Celery retry with backoff
+- **Integration Errors:** Circuit breaker pattern
+
+---
+
+## 10. Architecture Decisions
+
+### Summary of ADRs
+
+| ADR | Title | Status |
+|-----|-------|--------|
+| [ADR-001](../adrs/ADR-001-mcp-integration-architecture.md) | MCP Integration Architecture | Accepted |
+| [ADR-002](../adrs/ADR-002-realtime-communication.md) | Real-time Communication | Accepted |
+| [ADR-003](../adrs/ADR-003-background-task-architecture.md) | Background Task Architecture | Accepted |
+| [ADR-004](../adrs/ADR-004-llm-provider-abstraction.md) | LLM Provider Abstraction | Accepted |
+| [ADR-005](../adrs/ADR-005-tech-stack-selection.md) | Tech Stack Selection | Accepted |
+| [ADR-006](../adrs/ADR-006-agent-orchestration.md) | Agent Orchestration | Accepted |
+
+### Key Decisions Summary
+
+1. **Unified Singleton MCP Servers** with project/agent scoping
+2. **SSE for real-time events**, WebSocket only for chat
+3. **Celery + Redis** for background tasks
+4. **LiteLLM** for unified LLM abstraction with failover
+5. **PragmaStack** as foundation with Syndarix extensions
+6. **Type-Instance pattern** for agent orchestration
+
+---
+
+## Appendix A: Technology Stack Quick Reference
+
+| Layer | Technology |
+|-------|------------|
+| Frontend | Next.js 16, React 19, TypeScript, Tailwind, shadcn/ui |
+| Backend | FastAPI, Python 3.11+, SQLAlchemy 2.0, Pydantic 2.0 |
+| Database | PostgreSQL 15+ with pgvector |
+| Cache/Queue | Redis 7.0+ |
+| Task Queue | Celery 5.3+ |
+| MCP | FastMCP 2.0 |
+| LLM | LiteLLM (Claude, GPT-4, Ollama) |
+| Testing | pytest, Jest, Playwright |
+| Container | Docker, Docker Compose |
+
+---
+
+## Appendix B: Port Reference
+
+| Service | Port |
+|---------|------|
+| Frontend | 3000 |
+| Backend | 8000 |
+| PostgreSQL | 5432 |
+| Redis | 6379 |
+| Flower | 5555 |
+| LLM MCP | 9001 |
+| Git MCP | 9002 |
+| KB MCP | 9003 |
+| Issues MCP | 9004 |
+| FS MCP | 9005 |
+| Code MCP | 9006 |
+| CI/CD MCP | 9007 |
+
+---
+
+*This document provides the comprehensive architecture overview for Syndarix. For detailed decisions, see the individual ADRs.*