diff --git a/docs/adrs/ADR-004-llm-provider-abstraction.md b/docs/adrs/ADR-004-llm-provider-abstraction.md
index 2b5a3b6..6cc6931 100644
--- a/docs/adrs/ADR-004-llm-provider-abstraction.md
+++ b/docs/adrs/ADR-004-llm-provider-abstraction.md
@@ -10,9 +10,11 @@
 ## Context
 
 Syndarix agents require access to large language models (LLMs) from multiple providers:
-- **Anthropic** (Claude) - Primary provider
-- **OpenAI** (GPT-4) - Fallback provider
-- **Local models** (Ollama/Llama) - Cost optimization, privacy
+- **Anthropic** (Claude Opus 4.5) - Primary provider, highest reasoning capability
+- **Google** (Gemini 3 Pro/Flash) - Strong multimodal, fast inference
+- **OpenAI** (GPT 5.1 Codex max) - Code generation specialist
+- **Alibaba** (Qwen3-235B) - Cost-effective alternative
+- **DeepSeek** (V3.2) - Open-weights, self-hostable option
 
 We need a unified abstraction layer that provides:
 - Consistent API across providers
@@ -79,25 +81,33 @@ LiteLLM provides the reliability, monitoring, and multi-provider support needed
 
 ### Model Groups
 
-| Group Name | Use Case | Primary Model | Fallback |
-|------------|----------|---------------|----------|
-| `high-reasoning` | Complex analysis, architecture | Claude 3.5 Sonnet | GPT-4 Turbo |
-| `fast-response` | Quick tasks, simple queries | Claude 3 Haiku | GPT-4o Mini |
-| `cost-optimized` | High-volume, non-critical | Local Llama 3 | Claude 3 Haiku |
+| Group Name | Use Case | Primary Model | Fallback Chain |
+|------------|----------|---------------|----------------|
+| `high-reasoning` | Complex analysis, architecture | Claude Opus 4.5 | GPT 5.1 Codex max → Gemini 3 Pro |
+| `code-generation` | Code writing, refactoring | GPT 5.1 Codex max | Claude Opus 4.5 → DeepSeek V3.2 |
+| `fast-response` | Quick tasks, simple queries | Gemini 3 Flash | Qwen3-235B → DeepSeek V3.2 |
+| `cost-optimized` | High-volume, non-critical | Qwen3-235B | DeepSeek V3.2 (self-hosted) |
+| `self-hosted` | Privacy-sensitive, air-gapped | DeepSeek V3.2 | Qwen3-235B |
 
-### Failover Chain
+### Failover Chain (Primary)
 
 ```
-Claude 3.5 Sonnet (Anthropic)
+Claude Opus 4.5 (Anthropic)
+         │
+         ▼ (on failure/rate limit)
+    GPT 5.1 Codex max (OpenAI)
+         │
+         ▼ (on failure/rate limit)
+    Gemini 3 Pro (Google)
+         │
+         ▼ (on failure/rate limit)
+    Qwen3-235B (Alibaba/Self-hosted)
          │
          ▼ (on failure)
-    GPT-4 Turbo (OpenAI)
+    DeepSeek V3.2 (Self-hosted)
          │
-         ▼ (on failure)
-    Llama 3 (Ollama/Local)
-         │
-         ▼ (on failure)
-    Error with retry
+         ▼ (all failed)
+    Error with exponential backoff retry
 ```
 
 ### LLM Gateway Service
@@ -131,24 +141,26 @@ class LLMGateway:
 
 ### Cost Tracking
 
-| Model | Input (per 1M tokens) | Output (per 1M tokens) |
-|-------|----------------------|------------------------|
-| Claude 3.5 Sonnet | $3.00 | $15.00 |
-| Claude 3 Haiku | $0.25 | $1.25 |
-| GPT-4 Turbo | $10.00 | $30.00 |
-| GPT-4o Mini | $0.15 | $0.60 |
-| Ollama (local) | $0.00 | $0.00 |
+| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
+|-------|----------------------|------------------------|-------|
+| Claude Opus 4.5 | $15.00 | $75.00 | Highest reasoning capability |
+| GPT 5.1 Codex max | $12.00 | $60.00 | Code generation specialist |
+| Gemini 3 Pro | $3.50 | $10.50 | Strong multimodal |
+| Gemini 3 Flash | $0.35 | $1.05 | Fast inference |
+| Qwen3-235B | $2.00 | $6.00 | Cost-effective (or self-host: $0) |
+| DeepSeek V3.2 | $0.00 | $0.00 | Self-hosted, open weights |
 
 ### Agent Type Mapping
 
 | Agent Type | Model Preference | Rationale |
 |------------|------------------|-----------|
-| Product Owner | high-reasoning | Complex requirements analysis |
-| Software Architect | high-reasoning | Architecture decisions |
-| Software Engineer | high-reasoning | Code generation |
-| QA Engineer | fast-response | Test case generation |
-| DevOps Engineer | fast-response | Config generation |
-| Project Manager | fast-response | Status updates |
+| Product Owner | high-reasoning | Complex requirements analysis needs Claude Opus 4.5 |
+| Software Architect | high-reasoning | Architecture decisions need top-tier reasoning |
+| Software Engineer | code-generation | GPT 5.1 Codex max optimized for code |
+| QA Engineer | code-generation | Test code generation |
+| DevOps Engineer | fast-response | Config generation (Gemini 3 Flash) |
+| Project Manager | fast-response | Status updates, quick responses |
+| Business Analyst | high-reasoning | Document analysis needs strong reasoning |
 
 ### Caching Strategy
 
diff --git a/docs/adrs/ADR-007-agentic-framework-selection.md b/docs/adrs/ADR-007-agentic-framework-selection.md
index a74fac6..6a32db7 100644
--- a/docs/adrs/ADR-007-agentic-framework-selection.md
+++ b/docs/adrs/ADR-007-agentic-framework-selection.md
@@ -297,9 +297,9 @@ def create_agent_graph() -> StateGraph:
 # 4. LiteLLM handles LLM calls with failover
 async def think_node(state: AgentState) -> AgentState:
     response = await litellm.acompletion(
-        model="claude-3-5-sonnet-latest",  # Claude 3.5 Sonnet (primary)
+        model="claude-opus-4-5",  # Claude Opus 4.5 (primary)
         messages=state["messages"],
-        fallbacks=["gpt-4-turbo", "ollama/llama3"],
+        fallbacks=["gpt-5.1-codex-max", "gemini-3-pro", "qwen3-235b", "deepseek-v3.2"],
         metadata={"agent_id": state["agent_id"]},
     )
     return {"messages": [response.choices[0].message]}
diff --git a/docs/adrs/ADR-011-issue-synchronization.md b/docs/adrs/ADR-011-issue-synchronization.md
index 437a4a2..cce7674 100644
--- a/docs/adrs/ADR-011-issue-synchronization.md
+++ b/docs/adrs/ADR-011-issue-synchronization.md
@@ -61,7 +61,7 @@ External Trackers (Gitea/GitHub/GitLab)
     └───────────────────┘
               │
     ┌─────────┴─────────┐
-    │  Polling Worker   │ (reconciliation every 15 min)
+    │  Polling Worker   │ (fallback: 60s, full reconciliation: 15 min)
     └───────────────────┘
               │
     ┌─────────┴─────────┐
diff --git a/docs/adrs/ADR-012-cost-tracking.md b/docs/adrs/ADR-012-cost-tracking.md
index 60ae9e2..6d6f929 100644
--- a/docs/adrs/ADR-012-cost-tracking.md
+++ b/docs/adrs/ADR-012-cost-tracking.md
@@ -154,22 +154,24 @@ GROUP BY project_id, DATE(timestamp);
 
 ### Cost Model Prices
 
-| Model | Input ($/1M) | Output ($/1M) |
-|-------|-------------|---------------|
-| Claude 3.5 Sonnet | $3.00 | $15.00 |
-| Claude 3 Haiku | $0.25 | $1.25 |
-| GPT-4 Turbo | $10.00 | $30.00 |
-| GPT-4o Mini | $0.15 | $0.60 |
-| Ollama (local) | $0.00 | $0.00 |
+| Model | Input ($/1M) | Output ($/1M) | Notes |
+|-------|-------------|---------------|-------|
+| Claude Opus 4.5 | $15.00 | $75.00 | Highest reasoning capability |
+| GPT 5.1 Codex max | $12.00 | $60.00 | Code generation specialist |
+| Gemini 3 Pro | $3.50 | $10.50 | Strong multimodal |
+| Gemini 3 Flash | $0.35 | $1.05 | Fast inference |
+| Qwen3-235B | $2.00 | $6.00 | Cost-effective (or $0 self-hosted) |
+| DeepSeek V3.2 | $0.00 | $0.00 | Self-hosted, open weights |
 
 ### Cost Optimization Strategies
 
 | Strategy | Savings | Implementation |
 |----------|---------|----------------|
 | Semantic caching | 15-30% | Redis cache for repeated queries |
-| Model cascading | 60-80% | Start with Haiku, escalate to Sonnet |
+| Model cascading | 60-80% | Start with Gemini Flash, escalate to Opus |
 | Prompt compression | 10-20% | Remove redundant context |
-| Local fallback | 100% for some | Ollama for simple tasks |
+| Self-hosted fallback | 100% for some | DeepSeek V3.2/Qwen3 for non-critical tasks |
+| Task-appropriate routing | 40-60% | Route code tasks to GPT 5.1 Codex, simple to Flash |
 
 ## Consequences
 
diff --git a/docs/architecture/ARCHITECTURE.md b/docs/architecture/ARCHITECTURE.md
index 84ac920..a2d7b5a 100644
--- a/docs/architecture/ARCHITECTURE.md
+++ b/docs/architecture/ARCHITECTURE.md
@@ -42,10 +42,11 @@ Syndarix is an autonomous AI-powered software consulting platform that orchestra
 │  │                                                                           │   │
 │  │  ┌─────────────────────────────────────────────────────────────────────┐ │   │
 │  │  │                    ORCHESTRATION LAYER                               │ │   │
-│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                  │ │   │
-│  │  │  │   Agent     │  │  Workflow   │  │  Approval   │                  │ │   │
-│  │  │  │ Orchestrator│  │   Engine    │  │   Service   │                  │ │   │
-│  │  │  └─────────────┘  └─────────────┘  └─────────────┘                  │ │   │
+│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌───────────┐  │ │   │
+│  │  │  │   Agent     │  │  Workflow   │  │  Approval   │  │ LangGraph │  │ │   │
+│  │  │  │ Orchestrator│  │   Engine    │  │   Service   │  │  Runtime  │  │ │   │
+│  │  │  │(Type-Inst.) │  │(transitions)│  │             │  │           │  │ │   │
+│  │  │  └─────────────┘  └─────────────┘  └─────────────┘  └───────────┘  │ │   │
 │  │  └─────────────────────────────────────────────────────────────────────┘ │   │
 │  │                                                                           │   │
 │  │  ┌─────────────────────────────────────────────────────────────────────┐ │   │
@@ -126,21 +127,26 @@ Agent Type (Template)              Agent Instance (Runtime)
 
 **Failover Chain:**
 ```
-Claude 3.5 Sonnet (Primary)
+Claude Opus 4.5 (Primary)
+         │
+         ▼ (on failure/rate limit)
+    GPT 5.1 Codex max (Code specialist)
+         │
+         ▼ (on failure/rate limit)
+    Gemini 3 Pro (Multimodal)
          │
          ▼ (on failure)
-    GPT-4 Turbo (Fallback)
-         │
-         ▼ (on failure)
-    Ollama/Llama 3 (Local)
+    Qwen3-235B / DeepSeek V3.2 (Self-hosted)
 ```
 
 **Model Groups:**
-| Group | Use Case | Primary Model |
-|-------|----------|---------------|
-| high-reasoning | Architecture, complex analysis | Claude 3.5 Sonnet |
-| fast-response | Quick tasks, status updates | Claude 3 Haiku |
-| cost-optimized | High-volume, non-critical | Local Llama 3 |
+| Group | Use Case | Primary Model | Fallback |
+|-------|----------|---------------|----------|
+| high-reasoning | Architecture, complex analysis | Claude Opus 4.5 | GPT 5.1 Codex max |
+| code-generation | Code writing, refactoring | GPT 5.1 Codex max | Claude Opus 4.5 |
+| fast-response | Quick tasks, status updates | Gemini 3 Flash | Qwen3-235B |
+| cost-optimized | High-volume, non-critical | Qwen3-235B | DeepSeek V3.2 |
+| self-hosted | Privacy-sensitive, air-gapped | DeepSeek V3.2 | Qwen3-235B |
 
 ### 3. Knowledge Base (RAG)
 
@@ -245,13 +251,17 @@ LLM Request → LiteLLM Callback → Redis INCR → Budget Check
 
 **All components are fully self-hostable with no mandatory subscriptions:**
 
-| Component | Self-Hosted | Managed Alternative (Optional) |
-|-----------|-------------|--------------------------------|
-| PostgreSQL | Yes | RDS, Neon, Supabase |
-| Redis | Yes | Redis Cloud |
-| LiteLLM | Yes | LiteLLM Enterprise |
-| Celery | Yes | - |
-| FastMCP | Yes | - |
+| Component | License | Self-Hosted | Managed Alternative (Optional) |
+|-----------|---------|-------------|--------------------------------|
+| PostgreSQL | PostgreSQL | Yes | RDS, Neon, Supabase |
+| Redis | BSD-3 | Yes | Redis Cloud |
+| LiteLLM | MIT | Yes | LiteLLM Enterprise |
+| Celery | BSD-3 | Yes | - |
+| FastMCP | MIT | Yes | - |
+| LangGraph | MIT | Yes | LangSmith (observability only) |
+| transitions | MIT | Yes | - |
+| DeepSeek V3.2 | MIT | Yes | API available |
+| Qwen3-235B | Apache 2.0 | Yes | Alibaba Cloud |
 
 ---
 
diff --git a/docs/architecture/IMPLEMENTATION_ROADMAP.md b/docs/architecture/IMPLEMENTATION_ROADMAP.md
index c7f8556..d6dc3c6 100644
--- a/docs/architecture/IMPLEMENTATION_ROADMAP.md
+++ b/docs/architecture/IMPLEMENTATION_ROADMAP.md
@@ -295,10 +295,13 @@ This roadmap outlines the phased implementation approach for Syndarix, prioritiz
 | Metric | Target | Measurement |
 |--------|--------|-------------|
 | Agent task success rate | >90% | Completed tasks / total tasks |
-| Response time (P95) | <2s | API latency |
-| Cost per project | <$50/sprint | LLM + compute costs |
+| API response time (P95) | <200ms | Pure API latency (per NFR-101) |
+| Agent response time | <10s simple, <60s code | End-to-end including LLM (per NFR-103) |
+| Cost per project | <$100/sprint | LLM + compute costs (with Opus 4.5 pricing) |
 | Time to first commit | <1 hour | From requirements to PR |
 | Client satisfaction | >4/5 | Post-sprint survey |
+| Concurrent projects | 10+ | Active projects in parallel |
+| Concurrent agents | 50+ | Agent instances running |
 
 ---
 
@@ -328,15 +331,17 @@ Foundation    Core Platform   MCP Integration  Agent Orch    Workflows     Advan
 ### Infrastructure
 - PostgreSQL (managed or self-hosted)
 - Redis (managed or self-hosted)
-- Celery workers (2-4 instances)
-- MCP servers (7 containers)
+- Celery workers (4-8 instances across 4 queues: agent, git, sync, cicd)
+- MCP servers (7 containers, deployed in Phase 2 + Phase 5)
 - API server (2+ instances)
 - Frontend (static hosting or SSR)
 
 ### External Services
-- Anthropic API (primary LLM)
-- OpenAI API (fallback)
-- Ollama (local models, optional)
+- Anthropic API (Claude Opus 4.5 - primary reasoning)
+- OpenAI API (GPT 5.1 Codex max - code generation)
+- Google API (Gemini 3 Pro/Flash - multimodal, fast)
+- Alibaba API (Qwen3-235B - cost-effective, or self-host)
+- DeepSeek V3.2 (self-hosted, open weights)
 - Gitea/GitHub/GitLab (issue tracking)
 
 ---