feat: Update to production model stack and fix remaining inconsistencies

## Model Stack Updates (User's Actual Models) Updated all documentation to reflect production models: - Claude Opus 4.5 (primary reasoning) - GPT 5.1 Codex max (code generation specialist) - Gemini 3 Pro/Flash (multimodal, fast inference) - Qwen3-235B (cost-effective, self-hostable) - DeepSeek V3.2 (self-hosted, open weights) ### Files Updated: - ADR-004: Full model groups, failover chains, cost tables - ADR-007: Code example with correct model identifiers - ADR-012: Cost tracking with new model prices - ARCHITECTURE.md: Model groups, failover diagram - IMPLEMENTATION_ROADMAP.md: External services list ## Architecture Diagram Updates - Added LangGraph Runtime to orchestration layer - Added technology labels (Type-Instance, transitions) ## Self-Hostability Table Expanded Added entries for: - LangGraph (MIT) - transitions (MIT) - DeepSeek V3.2 (MIT) - Qwen3-235B (Apache 2.0) ## Metric Alignments - Response time: Split into API (<200ms) and Agent (<10s/<60s) - Cost per project: Adjusted to $100/sprint for Opus 4.5 pricing - Added concurrent projects (10+) and agents (50+) metrics ## Infrastructure Updates - Celery workers: 4-8 instances (was 2-4) across 4 queues - MCP servers: Clarified Phase 2 + Phase 5 deployment - Sync interval: Clarified 60s fallback + 15min reconciliation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 23:35:51 +01:00
parent f138417486
commit 88cf4e0abc
6 changed files with 98 additions and 69 deletions
--- a/docs/architecture/ARCHITECTURE.md
+++ b/docs/architecture/ARCHITECTURE.md
@@ -42,10 +42,11 @@ Syndarix is an autonomous AI-powered software consulting platform that orchestra
 │  │                                                                           │   │
 │  │  ┌─────────────────────────────────────────────────────────────────────┐ │   │
 │  │  │                    ORCHESTRATION LAYER                               │ │   │
-│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                  │ │   │
-│  │  │  │   Agent     │  │  Workflow   │  │  Approval   │                  │ │   │
-│  │  │  │ Orchestrator│  │   Engine    │  │   Service   │                  │ │   │
-│  │  │  └─────────────┘  └─────────────┘  └─────────────┘                  │ │   │
+│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌───────────┐  │ │   │
+│  │  │  │   Agent     │  │  Workflow   │  │  Approval   │  │ LangGraph │  │ │   │
+│  │  │  │ Orchestrator│  │   Engine    │  │   Service   │  │  Runtime  │  │ │   │
+│  │  │  │(Type-Inst.) │  │(transitions)│  │             │  │           │  │ │   │
+│  │  │  └─────────────┘  └─────────────┘  └─────────────┘  └───────────┘  │ │   │
 │  │  └─────────────────────────────────────────────────────────────────────┘ │   │
 │  │                                                                           │   │
 │  │  ┌─────────────────────────────────────────────────────────────────────┐ │   │
@@ -126,21 +127,26 @@ Agent Type (Template)              Agent Instance (Runtime)

 **Failover Chain:**
 ```
-Claude 3.5 Sonnet (Primary)
+Claude Opus 4.5 (Primary)
+         │
+         ▼ (on failure/rate limit)
+    GPT 5.1 Codex max (Code specialist)
+         │
+         ▼ (on failure/rate limit)
+    Gemini 3 Pro (Multimodal)
         │
         ▼ (on failure)
-    GPT-4 Turbo (Fallback)
-         │
-         ▼ (on failure)
-    Ollama/Llama 3 (Local)
+    Qwen3-235B / DeepSeek V3.2 (Self-hosted)
 ```

 **Model Groups:**
-| Group | Use Case | Primary Model |
-|-------|----------|---------------|
-| high-reasoning | Architecture, complex analysis | Claude 3.5 Sonnet |
-| fast-response | Quick tasks, status updates | Claude 3 Haiku |
-| cost-optimized | High-volume, non-critical | Local Llama 3 |
+| Group | Use Case | Primary Model | Fallback |
+|-------|----------|---------------|----------|
+| high-reasoning | Architecture, complex analysis | Claude Opus 4.5 | GPT 5.1 Codex max |
+| code-generation | Code writing, refactoring | GPT 5.1 Codex max | Claude Opus 4.5 |
+| fast-response | Quick tasks, status updates | Gemini 3 Flash | Qwen3-235B |
+| cost-optimized | High-volume, non-critical | Qwen3-235B | DeepSeek V3.2 |
+| self-hosted | Privacy-sensitive, air-gapped | DeepSeek V3.2 | Qwen3-235B |

 ### 3. Knowledge Base (RAG)

@@ -245,13 +251,17 @@ LLM Request → LiteLLM Callback → Redis INCR → Budget Check

 **All components are fully self-hostable with no mandatory subscriptions:**

-| Component | Self-Hosted | Managed Alternative (Optional) |
-|-----------|-------------|--------------------------------|
-| PostgreSQL | Yes | RDS, Neon, Supabase |
-| Redis | Yes | Redis Cloud |
-| LiteLLM | Yes | LiteLLM Enterprise |
-| Celery | Yes | - |
-| FastMCP | Yes | - |
+| Component | License | Self-Hosted | Managed Alternative (Optional) |
+|-----------|---------|-------------|--------------------------------|
+| PostgreSQL | PostgreSQL | Yes | RDS, Neon, Supabase |
+| Redis | BSD-3 | Yes | Redis Cloud |
+| LiteLLM | MIT | Yes | LiteLLM Enterprise |
+| Celery | BSD-3 | Yes | - |
+| FastMCP | MIT | Yes | - |
+| LangGraph | MIT | Yes | LangSmith (observability only) |
+| transitions | MIT | Yes | - |
+| DeepSeek V3.2 | MIT | Yes | API available |
+| Qwen3-235B | Apache 2.0 | Yes | Alibaba Cloud |

 ---

--- a/docs/architecture/IMPLEMENTATION_ROADMAP.md
+++ b/docs/architecture/IMPLEMENTATION_ROADMAP.md
@@ -295,10 +295,13 @@ This roadmap outlines the phased implementation approach for Syndarix, prioritiz
 | Metric | Target | Measurement |
 |--------|--------|-------------|
 | Agent task success rate | >90% | Completed tasks / total tasks |
-| Response time (P95) | <2s | API latency |
-| Cost per project | <$50/sprint | LLM + compute costs |
+| API response time (P95) | <200ms | Pure API latency (per NFR-101) |
+| Agent response time | <10s simple, <60s code | End-to-end including LLM (per NFR-103) |
+| Cost per project | <$100/sprint | LLM + compute costs (with Opus 4.5 pricing) |
 | Time to first commit | <1 hour | From requirements to PR |
 | Client satisfaction | >4/5 | Post-sprint survey |
+| Concurrent projects | 10+ | Active projects in parallel |
+| Concurrent agents | 50+ | Agent instances running |

 ---

@@ -328,15 +331,17 @@ Foundation    Core Platform   MCP Integration  Agent Orch    Workflows     Advan
 ### Infrastructure
 - PostgreSQL (managed or self-hosted)
 - Redis (managed or self-hosted)
- Celery workers (2-4 instances)
- MCP servers (7 containers)
+- Celery workers (4-8 instances across 4 queues: agent, git, sync, cicd)
+- MCP servers (7 containers, deployed in Phase 2 + Phase 5)
 - API server (2+ instances)
 - Frontend (static hosting or SSR)

 ### External Services
- Anthropic API (primary LLM)
- OpenAI API (fallback)
- Ollama (local models, optional)
+- Anthropic API (Claude Opus 4.5 - primary reasoning)
+- OpenAI API (GPT 5.1 Codex max - code generation)
+- Google API (Gemini 3 Pro/Flash - multimodal, fast)
+- Alibaba API (Qwen3-235B - cost-effective, or self-host)
+- DeepSeek V3.2 (self-hosted, open weights)
 - Gitea/GitHub/GitLab (issue tracking)

 ---