From 88cf4e0abc336e24c68733919059882048752b3f Mon Sep 17 00:00:00 2001 From: Felipe Cardoso Date: Mon, 29 Dec 2025 23:35:51 +0100 Subject: [PATCH] feat: Update to production model stack and fix remaining inconsistencies MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Model Stack Updates (User's Actual Models) Updated all documentation to reflect production models: - Claude Opus 4.5 (primary reasoning) - GPT 5.1 Codex max (code generation specialist) - Gemini 3 Pro/Flash (multimodal, fast inference) - Qwen3-235B (cost-effective, self-hostable) - DeepSeek V3.2 (self-hosted, open weights) ### Files Updated: - ADR-004: Full model groups, failover chains, cost tables - ADR-007: Code example with correct model identifiers - ADR-012: Cost tracking with new model prices - ARCHITECTURE.md: Model groups, failover diagram - IMPLEMENTATION_ROADMAP.md: External services list ## Architecture Diagram Updates - Added LangGraph Runtime to orchestration layer - Added technology labels (Type-Instance, transitions) ## Self-Hostability Table Expanded Added entries for: - LangGraph (MIT) - transitions (MIT) - DeepSeek V3.2 (MIT) - Qwen3-235B (Apache 2.0) ## Metric Alignments - Response time: Split into API (<200ms) and Agent (<10s/<60s) - Cost per project: Adjusted to $100/sprint for Opus 4.5 pricing - Added concurrent projects (10+) and agents (50+) metrics ## Infrastructure Updates - Celery workers: 4-8 instances (was 2-4) across 4 queues - MCP servers: Clarified Phase 2 + Phase 5 deployment - Sync interval: Clarified 60s fallback + 15min reconciliation πŸ€– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- docs/adrs/ADR-004-llm-provider-abstraction.md | 70 +++++++++++-------- .../ADR-007-agentic-framework-selection.md | 4 +- docs/adrs/ADR-011-issue-synchronization.md | 2 +- docs/adrs/ADR-012-cost-tracking.md | 20 +++--- docs/architecture/ARCHITECTURE.md | 52 ++++++++------ docs/architecture/IMPLEMENTATION_ROADMAP.md | 19 +++-- 6 files changed, 98 insertions(+), 69 deletions(-) diff --git a/docs/adrs/ADR-004-llm-provider-abstraction.md b/docs/adrs/ADR-004-llm-provider-abstraction.md index 2b5a3b6..6cc6931 100644 --- a/docs/adrs/ADR-004-llm-provider-abstraction.md +++ b/docs/adrs/ADR-004-llm-provider-abstraction.md @@ -10,9 +10,11 @@ ## Context Syndarix agents require access to large language models (LLMs) from multiple providers: -- **Anthropic** (Claude) - Primary provider -- **OpenAI** (GPT-4) - Fallback provider -- **Local models** (Ollama/Llama) - Cost optimization, privacy +- **Anthropic** (Claude Opus 4.5) - Primary provider, highest reasoning capability +- **Google** (Gemini 3 Pro/Flash) - Strong multimodal, fast inference +- **OpenAI** (GPT 5.1 Codex max) - Code generation specialist +- **Alibaba** (Qwen3-235B) - Cost-effective alternative +- **DeepSeek** (V3.2) - Open-weights, self-hostable option We need a unified abstraction layer that provides: - Consistent API across providers @@ -79,25 +81,33 @@ LiteLLM provides the reliability, monitoring, and multi-provider support needed ### Model Groups -| Group Name | Use Case | Primary Model | Fallback | -|------------|----------|---------------|----------| -| `high-reasoning` | Complex analysis, architecture | Claude 3.5 Sonnet | GPT-4 Turbo | -| `fast-response` | Quick tasks, simple queries | Claude 3 Haiku | GPT-4o Mini | -| `cost-optimized` | High-volume, non-critical | Local Llama 3 | Claude 3 Haiku | +| Group Name | Use Case | Primary Model | Fallback Chain | +|------------|----------|---------------|----------------| +| `high-reasoning` | Complex analysis, architecture | Claude Opus 4.5 | GPT 5.1 Codex max β†’ Gemini 3 Pro | +| `code-generation` | Code writing, refactoring | GPT 5.1 Codex max | Claude Opus 4.5 β†’ DeepSeek V3.2 | +| `fast-response` | Quick tasks, simple queries | Gemini 3 Flash | Qwen3-235B β†’ DeepSeek V3.2 | +| `cost-optimized` | High-volume, non-critical | Qwen3-235B | DeepSeek V3.2 (self-hosted) | +| `self-hosted` | Privacy-sensitive, air-gapped | DeepSeek V3.2 | Qwen3-235B | -### Failover Chain +### Failover Chain (Primary) ``` -Claude 3.5 Sonnet (Anthropic) +Claude Opus 4.5 (Anthropic) + β”‚ + β–Ό (on failure/rate limit) + GPT 5.1 Codex max (OpenAI) + β”‚ + β–Ό (on failure/rate limit) + Gemini 3 Pro (Google) + β”‚ + β–Ό (on failure/rate limit) + Qwen3-235B (Alibaba/Self-hosted) β”‚ β–Ό (on failure) - GPT-4 Turbo (OpenAI) + DeepSeek V3.2 (Self-hosted) β”‚ - β–Ό (on failure) - Llama 3 (Ollama/Local) - β”‚ - β–Ό (on failure) - Error with retry + β–Ό (all failed) + Error with exponential backoff retry ``` ### LLM Gateway Service @@ -131,24 +141,26 @@ class LLMGateway: ### Cost Tracking -| Model | Input (per 1M tokens) | Output (per 1M tokens) | -|-------|----------------------|------------------------| -| Claude 3.5 Sonnet | $3.00 | $15.00 | -| Claude 3 Haiku | $0.25 | $1.25 | -| GPT-4 Turbo | $10.00 | $30.00 | -| GPT-4o Mini | $0.15 | $0.60 | -| Ollama (local) | $0.00 | $0.00 | +| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes | +|-------|----------------------|------------------------|-------| +| Claude Opus 4.5 | $15.00 | $75.00 | Highest reasoning capability | +| GPT 5.1 Codex max | $12.00 | $60.00 | Code generation specialist | +| Gemini 3 Pro | $3.50 | $10.50 | Strong multimodal | +| Gemini 3 Flash | $0.35 | $1.05 | Fast inference | +| Qwen3-235B | $2.00 | $6.00 | Cost-effective (or self-host: $0) | +| DeepSeek V3.2 | $0.00 | $0.00 | Self-hosted, open weights | ### Agent Type Mapping | Agent Type | Model Preference | Rationale | |------------|------------------|-----------| -| Product Owner | high-reasoning | Complex requirements analysis | -| Software Architect | high-reasoning | Architecture decisions | -| Software Engineer | high-reasoning | Code generation | -| QA Engineer | fast-response | Test case generation | -| DevOps Engineer | fast-response | Config generation | -| Project Manager | fast-response | Status updates | +| Product Owner | high-reasoning | Complex requirements analysis needs Claude Opus 4.5 | +| Software Architect | high-reasoning | Architecture decisions need top-tier reasoning | +| Software Engineer | code-generation | GPT 5.1 Codex max optimized for code | +| QA Engineer | code-generation | Test code generation | +| DevOps Engineer | fast-response | Config generation (Gemini 3 Flash) | +| Project Manager | fast-response | Status updates, quick responses | +| Business Analyst | high-reasoning | Document analysis needs strong reasoning | ### Caching Strategy diff --git a/docs/adrs/ADR-007-agentic-framework-selection.md b/docs/adrs/ADR-007-agentic-framework-selection.md index a74fac6..6a32db7 100644 --- a/docs/adrs/ADR-007-agentic-framework-selection.md +++ b/docs/adrs/ADR-007-agentic-framework-selection.md @@ -297,9 +297,9 @@ def create_agent_graph() -> StateGraph: # 4. LiteLLM handles LLM calls with failover async def think_node(state: AgentState) -> AgentState: response = await litellm.acompletion( - model="claude-3-5-sonnet-latest", # Claude 3.5 Sonnet (primary) + model="claude-opus-4-5", # Claude Opus 4.5 (primary) messages=state["messages"], - fallbacks=["gpt-4-turbo", "ollama/llama3"], + fallbacks=["gpt-5.1-codex-max", "gemini-3-pro", "qwen3-235b", "deepseek-v3.2"], metadata={"agent_id": state["agent_id"]}, ) return {"messages": [response.choices[0].message]} diff --git a/docs/adrs/ADR-011-issue-synchronization.md b/docs/adrs/ADR-011-issue-synchronization.md index 437a4a2..cce7674 100644 --- a/docs/adrs/ADR-011-issue-synchronization.md +++ b/docs/adrs/ADR-011-issue-synchronization.md @@ -61,7 +61,7 @@ External Trackers (Gitea/GitHub/GitLab) β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β” - β”‚ Polling Worker β”‚ (reconciliation every 15 min) + β”‚ Polling Worker β”‚ (fallback: 60s, full reconciliation: 15 min) β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β” diff --git a/docs/adrs/ADR-012-cost-tracking.md b/docs/adrs/ADR-012-cost-tracking.md index 60ae9e2..6d6f929 100644 --- a/docs/adrs/ADR-012-cost-tracking.md +++ b/docs/adrs/ADR-012-cost-tracking.md @@ -154,22 +154,24 @@ GROUP BY project_id, DATE(timestamp); ### Cost Model Prices -| Model | Input ($/1M) | Output ($/1M) | -|-------|-------------|---------------| -| Claude 3.5 Sonnet | $3.00 | $15.00 | -| Claude 3 Haiku | $0.25 | $1.25 | -| GPT-4 Turbo | $10.00 | $30.00 | -| GPT-4o Mini | $0.15 | $0.60 | -| Ollama (local) | $0.00 | $0.00 | +| Model | Input ($/1M) | Output ($/1M) | Notes | +|-------|-------------|---------------|-------| +| Claude Opus 4.5 | $15.00 | $75.00 | Highest reasoning capability | +| GPT 5.1 Codex max | $12.00 | $60.00 | Code generation specialist | +| Gemini 3 Pro | $3.50 | $10.50 | Strong multimodal | +| Gemini 3 Flash | $0.35 | $1.05 | Fast inference | +| Qwen3-235B | $2.00 | $6.00 | Cost-effective (or $0 self-hosted) | +| DeepSeek V3.2 | $0.00 | $0.00 | Self-hosted, open weights | ### Cost Optimization Strategies | Strategy | Savings | Implementation | |----------|---------|----------------| | Semantic caching | 15-30% | Redis cache for repeated queries | -| Model cascading | 60-80% | Start with Haiku, escalate to Sonnet | +| Model cascading | 60-80% | Start with Gemini Flash, escalate to Opus | | Prompt compression | 10-20% | Remove redundant context | -| Local fallback | 100% for some | Ollama for simple tasks | +| Self-hosted fallback | 100% for some | DeepSeek V3.2/Qwen3 for non-critical tasks | +| Task-appropriate routing | 40-60% | Route code tasks to GPT 5.1 Codex, simple to Flash | ## Consequences diff --git a/docs/architecture/ARCHITECTURE.md b/docs/architecture/ARCHITECTURE.md index 84ac920..a2d7b5a 100644 --- a/docs/architecture/ARCHITECTURE.md +++ b/docs/architecture/ARCHITECTURE.md @@ -42,10 +42,11 @@ Syndarix is an autonomous AI-powered software consulting platform that orchestra β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ ORCHESTRATION LAYER β”‚ β”‚ β”‚ -β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ -β”‚ β”‚ β”‚ β”‚ Agent β”‚ β”‚ Workflow β”‚ β”‚ Approval β”‚ β”‚ β”‚ β”‚ -β”‚ β”‚ β”‚ β”‚ Orchestratorβ”‚ β”‚ Engine β”‚ β”‚ Service β”‚ β”‚ β”‚ β”‚ -β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ Agent β”‚ β”‚ Workflow β”‚ β”‚ Approval β”‚ β”‚ LangGraph β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚ Orchestratorβ”‚ β”‚ Engine β”‚ β”‚ Service β”‚ β”‚ Runtime β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β”‚(Type-Inst.) β”‚ β”‚(transitions)β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ @@ -126,21 +127,26 @@ Agent Type (Template) Agent Instance (Runtime) **Failover Chain:** ``` -Claude 3.5 Sonnet (Primary) +Claude Opus 4.5 (Primary) + β”‚ + β–Ό (on failure/rate limit) + GPT 5.1 Codex max (Code specialist) + β”‚ + β–Ό (on failure/rate limit) + Gemini 3 Pro (Multimodal) β”‚ β–Ό (on failure) - GPT-4 Turbo (Fallback) - β”‚ - β–Ό (on failure) - Ollama/Llama 3 (Local) + Qwen3-235B / DeepSeek V3.2 (Self-hosted) ``` **Model Groups:** -| Group | Use Case | Primary Model | -|-------|----------|---------------| -| high-reasoning | Architecture, complex analysis | Claude 3.5 Sonnet | -| fast-response | Quick tasks, status updates | Claude 3 Haiku | -| cost-optimized | High-volume, non-critical | Local Llama 3 | +| Group | Use Case | Primary Model | Fallback | +|-------|----------|---------------|----------| +| high-reasoning | Architecture, complex analysis | Claude Opus 4.5 | GPT 5.1 Codex max | +| code-generation | Code writing, refactoring | GPT 5.1 Codex max | Claude Opus 4.5 | +| fast-response | Quick tasks, status updates | Gemini 3 Flash | Qwen3-235B | +| cost-optimized | High-volume, non-critical | Qwen3-235B | DeepSeek V3.2 | +| self-hosted | Privacy-sensitive, air-gapped | DeepSeek V3.2 | Qwen3-235B | ### 3. Knowledge Base (RAG) @@ -245,13 +251,17 @@ LLM Request β†’ LiteLLM Callback β†’ Redis INCR β†’ Budget Check **All components are fully self-hostable with no mandatory subscriptions:** -| Component | Self-Hosted | Managed Alternative (Optional) | -|-----------|-------------|--------------------------------| -| PostgreSQL | Yes | RDS, Neon, Supabase | -| Redis | Yes | Redis Cloud | -| LiteLLM | Yes | LiteLLM Enterprise | -| Celery | Yes | - | -| FastMCP | Yes | - | +| Component | License | Self-Hosted | Managed Alternative (Optional) | +|-----------|---------|-------------|--------------------------------| +| PostgreSQL | PostgreSQL | Yes | RDS, Neon, Supabase | +| Redis | BSD-3 | Yes | Redis Cloud | +| LiteLLM | MIT | Yes | LiteLLM Enterprise | +| Celery | BSD-3 | Yes | - | +| FastMCP | MIT | Yes | - | +| LangGraph | MIT | Yes | LangSmith (observability only) | +| transitions | MIT | Yes | - | +| DeepSeek V3.2 | MIT | Yes | API available | +| Qwen3-235B | Apache 2.0 | Yes | Alibaba Cloud | --- diff --git a/docs/architecture/IMPLEMENTATION_ROADMAP.md b/docs/architecture/IMPLEMENTATION_ROADMAP.md index c7f8556..d6dc3c6 100644 --- a/docs/architecture/IMPLEMENTATION_ROADMAP.md +++ b/docs/architecture/IMPLEMENTATION_ROADMAP.md @@ -295,10 +295,13 @@ This roadmap outlines the phased implementation approach for Syndarix, prioritiz | Metric | Target | Measurement | |--------|--------|-------------| | Agent task success rate | >90% | Completed tasks / total tasks | -| Response time (P95) | <2s | API latency | -| Cost per project | <$50/sprint | LLM + compute costs | +| API response time (P95) | <200ms | Pure API latency (per NFR-101) | +| Agent response time | <10s simple, <60s code | End-to-end including LLM (per NFR-103) | +| Cost per project | <$100/sprint | LLM + compute costs (with Opus 4.5 pricing) | | Time to first commit | <1 hour | From requirements to PR | | Client satisfaction | >4/5 | Post-sprint survey | +| Concurrent projects | 10+ | Active projects in parallel | +| Concurrent agents | 50+ | Agent instances running | --- @@ -328,15 +331,17 @@ Foundation Core Platform MCP Integration Agent Orch Workflows Advan ### Infrastructure - PostgreSQL (managed or self-hosted) - Redis (managed or self-hosted) -- Celery workers (2-4 instances) -- MCP servers (7 containers) +- Celery workers (4-8 instances across 4 queues: agent, git, sync, cicd) +- MCP servers (7 containers, deployed in Phase 2 + Phase 5) - API server (2+ instances) - Frontend (static hosting or SSR) ### External Services -- Anthropic API (primary LLM) -- OpenAI API (fallback) -- Ollama (local models, optional) +- Anthropic API (Claude Opus 4.5 - primary reasoning) +- OpenAI API (GPT 5.1 Codex max - code generation) +- Google API (Gemini 3 Pro/Flash - multimodal, fast) +- Alibaba API (Qwen3-235B - cost-effective, or self-host) +- DeepSeek V3.2 (self-hosted, open weights) - Gitea/GitHub/GitLab (issue tracking) ---