forked from cardosofelipe/fast-next-template
feat: Update to production model stack and fix remaining inconsistencies
## Model Stack Updates (User's Actual Models) Updated all documentation to reflect production models: - Claude Opus 4.5 (primary reasoning) - GPT 5.1 Codex max (code generation specialist) - Gemini 3 Pro/Flash (multimodal, fast inference) - Qwen3-235B (cost-effective, self-hostable) - DeepSeek V3.2 (self-hosted, open weights) ### Files Updated: - ADR-004: Full model groups, failover chains, cost tables - ADR-007: Code example with correct model identifiers - ADR-012: Cost tracking with new model prices - ARCHITECTURE.md: Model groups, failover diagram - IMPLEMENTATION_ROADMAP.md: External services list ## Architecture Diagram Updates - Added LangGraph Runtime to orchestration layer - Added technology labels (Type-Instance, transitions) ## Self-Hostability Table Expanded Added entries for: - LangGraph (MIT) - transitions (MIT) - DeepSeek V3.2 (MIT) - Qwen3-235B (Apache 2.0) ## Metric Alignments - Response time: Split into API (<200ms) and Agent (<10s/<60s) - Cost per project: Adjusted to $100/sprint for Opus 4.5 pricing - Added concurrent projects (10+) and agents (50+) metrics ## Infrastructure Updates - Celery workers: 4-8 instances (was 2-4) across 4 queues - MCP servers: Clarified Phase 2 + Phase 5 deployment - Sync interval: Clarified 60s fallback + 15min reconciliation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -10,9 +10,11 @@
|
|||||||
## Context
|
## Context
|
||||||
|
|
||||||
Syndarix agents require access to large language models (LLMs) from multiple providers:
|
Syndarix agents require access to large language models (LLMs) from multiple providers:
|
||||||
- **Anthropic** (Claude) - Primary provider
|
- **Anthropic** (Claude Opus 4.5) - Primary provider, highest reasoning capability
|
||||||
- **OpenAI** (GPT-4) - Fallback provider
|
- **Google** (Gemini 3 Pro/Flash) - Strong multimodal, fast inference
|
||||||
- **Local models** (Ollama/Llama) - Cost optimization, privacy
|
- **OpenAI** (GPT 5.1 Codex max) - Code generation specialist
|
||||||
|
- **Alibaba** (Qwen3-235B) - Cost-effective alternative
|
||||||
|
- **DeepSeek** (V3.2) - Open-weights, self-hostable option
|
||||||
|
|
||||||
We need a unified abstraction layer that provides:
|
We need a unified abstraction layer that provides:
|
||||||
- Consistent API across providers
|
- Consistent API across providers
|
||||||
@@ -79,25 +81,33 @@ LiteLLM provides the reliability, monitoring, and multi-provider support needed
|
|||||||
|
|
||||||
### Model Groups
|
### Model Groups
|
||||||
|
|
||||||
| Group Name | Use Case | Primary Model | Fallback |
|
| Group Name | Use Case | Primary Model | Fallback Chain |
|
||||||
|------------|----------|---------------|----------|
|
|------------|----------|---------------|----------------|
|
||||||
| `high-reasoning` | Complex analysis, architecture | Claude 3.5 Sonnet | GPT-4 Turbo |
|
| `high-reasoning` | Complex analysis, architecture | Claude Opus 4.5 | GPT 5.1 Codex max → Gemini 3 Pro |
|
||||||
| `fast-response` | Quick tasks, simple queries | Claude 3 Haiku | GPT-4o Mini |
|
| `code-generation` | Code writing, refactoring | GPT 5.1 Codex max | Claude Opus 4.5 → DeepSeek V3.2 |
|
||||||
| `cost-optimized` | High-volume, non-critical | Local Llama 3 | Claude 3 Haiku |
|
| `fast-response` | Quick tasks, simple queries | Gemini 3 Flash | Qwen3-235B → DeepSeek V3.2 |
|
||||||
|
| `cost-optimized` | High-volume, non-critical | Qwen3-235B | DeepSeek V3.2 (self-hosted) |
|
||||||
|
| `self-hosted` | Privacy-sensitive, air-gapped | DeepSeek V3.2 | Qwen3-235B |
|
||||||
|
|
||||||
### Failover Chain
|
### Failover Chain (Primary)
|
||||||
|
|
||||||
```
|
```
|
||||||
Claude 3.5 Sonnet (Anthropic)
|
Claude Opus 4.5 (Anthropic)
|
||||||
|
│
|
||||||
|
▼ (on failure/rate limit)
|
||||||
|
GPT 5.1 Codex max (OpenAI)
|
||||||
|
│
|
||||||
|
▼ (on failure/rate limit)
|
||||||
|
Gemini 3 Pro (Google)
|
||||||
|
│
|
||||||
|
▼ (on failure/rate limit)
|
||||||
|
Qwen3-235B (Alibaba/Self-hosted)
|
||||||
│
|
│
|
||||||
▼ (on failure)
|
▼ (on failure)
|
||||||
GPT-4 Turbo (OpenAI)
|
DeepSeek V3.2 (Self-hosted)
|
||||||
│
|
│
|
||||||
▼ (on failure)
|
▼ (all failed)
|
||||||
Llama 3 (Ollama/Local)
|
Error with exponential backoff retry
|
||||||
│
|
|
||||||
▼ (on failure)
|
|
||||||
Error with retry
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### LLM Gateway Service
|
### LLM Gateway Service
|
||||||
@@ -131,24 +141,26 @@ class LLMGateway:
|
|||||||
|
|
||||||
### Cost Tracking
|
### Cost Tracking
|
||||||
|
|
||||||
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|
||||||
|-------|----------------------|------------------------|
|
|-------|----------------------|------------------------|-------|
|
||||||
| Claude 3.5 Sonnet | $3.00 | $15.00 |
|
| Claude Opus 4.5 | $15.00 | $75.00 | Highest reasoning capability |
|
||||||
| Claude 3 Haiku | $0.25 | $1.25 |
|
| GPT 5.1 Codex max | $12.00 | $60.00 | Code generation specialist |
|
||||||
| GPT-4 Turbo | $10.00 | $30.00 |
|
| Gemini 3 Pro | $3.50 | $10.50 | Strong multimodal |
|
||||||
| GPT-4o Mini | $0.15 | $0.60 |
|
| Gemini 3 Flash | $0.35 | $1.05 | Fast inference |
|
||||||
| Ollama (local) | $0.00 | $0.00 |
|
| Qwen3-235B | $2.00 | $6.00 | Cost-effective (or self-host: $0) |
|
||||||
|
| DeepSeek V3.2 | $0.00 | $0.00 | Self-hosted, open weights |
|
||||||
|
|
||||||
### Agent Type Mapping
|
### Agent Type Mapping
|
||||||
|
|
||||||
| Agent Type | Model Preference | Rationale |
|
| Agent Type | Model Preference | Rationale |
|
||||||
|------------|------------------|-----------|
|
|------------|------------------|-----------|
|
||||||
| Product Owner | high-reasoning | Complex requirements analysis |
|
| Product Owner | high-reasoning | Complex requirements analysis needs Claude Opus 4.5 |
|
||||||
| Software Architect | high-reasoning | Architecture decisions |
|
| Software Architect | high-reasoning | Architecture decisions need top-tier reasoning |
|
||||||
| Software Engineer | high-reasoning | Code generation |
|
| Software Engineer | code-generation | GPT 5.1 Codex max optimized for code |
|
||||||
| QA Engineer | fast-response | Test case generation |
|
| QA Engineer | code-generation | Test code generation |
|
||||||
| DevOps Engineer | fast-response | Config generation |
|
| DevOps Engineer | fast-response | Config generation (Gemini 3 Flash) |
|
||||||
| Project Manager | fast-response | Status updates |
|
| Project Manager | fast-response | Status updates, quick responses |
|
||||||
|
| Business Analyst | high-reasoning | Document analysis needs strong reasoning |
|
||||||
|
|
||||||
### Caching Strategy
|
### Caching Strategy
|
||||||
|
|
||||||
|
|||||||
@@ -297,9 +297,9 @@ def create_agent_graph() -> StateGraph:
|
|||||||
# 4. LiteLLM handles LLM calls with failover
|
# 4. LiteLLM handles LLM calls with failover
|
||||||
async def think_node(state: AgentState) -> AgentState:
|
async def think_node(state: AgentState) -> AgentState:
|
||||||
response = await litellm.acompletion(
|
response = await litellm.acompletion(
|
||||||
model="claude-3-5-sonnet-latest", # Claude 3.5 Sonnet (primary)
|
model="claude-opus-4-5", # Claude Opus 4.5 (primary)
|
||||||
messages=state["messages"],
|
messages=state["messages"],
|
||||||
fallbacks=["gpt-4-turbo", "ollama/llama3"],
|
fallbacks=["gpt-5.1-codex-max", "gemini-3-pro", "qwen3-235b", "deepseek-v3.2"],
|
||||||
metadata={"agent_id": state["agent_id"]},
|
metadata={"agent_id": state["agent_id"]},
|
||||||
)
|
)
|
||||||
return {"messages": [response.choices[0].message]}
|
return {"messages": [response.choices[0].message]}
|
||||||
|
|||||||
@@ -61,7 +61,7 @@ External Trackers (Gitea/GitHub/GitLab)
|
|||||||
└───────────────────┘
|
└───────────────────┘
|
||||||
│
|
│
|
||||||
┌─────────┴─────────┐
|
┌─────────┴─────────┐
|
||||||
│ Polling Worker │ (reconciliation every 15 min)
|
│ Polling Worker │ (fallback: 60s, full reconciliation: 15 min)
|
||||||
└───────────────────┘
|
└───────────────────┘
|
||||||
│
|
│
|
||||||
┌─────────┴─────────┐
|
┌─────────┴─────────┐
|
||||||
|
|||||||
@@ -154,22 +154,24 @@ GROUP BY project_id, DATE(timestamp);
|
|||||||
|
|
||||||
### Cost Model Prices
|
### Cost Model Prices
|
||||||
|
|
||||||
| Model | Input ($/1M) | Output ($/1M) |
|
| Model | Input ($/1M) | Output ($/1M) | Notes |
|
||||||
|-------|-------------|---------------|
|
|-------|-------------|---------------|-------|
|
||||||
| Claude 3.5 Sonnet | $3.00 | $15.00 |
|
| Claude Opus 4.5 | $15.00 | $75.00 | Highest reasoning capability |
|
||||||
| Claude 3 Haiku | $0.25 | $1.25 |
|
| GPT 5.1 Codex max | $12.00 | $60.00 | Code generation specialist |
|
||||||
| GPT-4 Turbo | $10.00 | $30.00 |
|
| Gemini 3 Pro | $3.50 | $10.50 | Strong multimodal |
|
||||||
| GPT-4o Mini | $0.15 | $0.60 |
|
| Gemini 3 Flash | $0.35 | $1.05 | Fast inference |
|
||||||
| Ollama (local) | $0.00 | $0.00 |
|
| Qwen3-235B | $2.00 | $6.00 | Cost-effective (or $0 self-hosted) |
|
||||||
|
| DeepSeek V3.2 | $0.00 | $0.00 | Self-hosted, open weights |
|
||||||
|
|
||||||
### Cost Optimization Strategies
|
### Cost Optimization Strategies
|
||||||
|
|
||||||
| Strategy | Savings | Implementation |
|
| Strategy | Savings | Implementation |
|
||||||
|----------|---------|----------------|
|
|----------|---------|----------------|
|
||||||
| Semantic caching | 15-30% | Redis cache for repeated queries |
|
| Semantic caching | 15-30% | Redis cache for repeated queries |
|
||||||
| Model cascading | 60-80% | Start with Haiku, escalate to Sonnet |
|
| Model cascading | 60-80% | Start with Gemini Flash, escalate to Opus |
|
||||||
| Prompt compression | 10-20% | Remove redundant context |
|
| Prompt compression | 10-20% | Remove redundant context |
|
||||||
| Local fallback | 100% for some | Ollama for simple tasks |
|
| Self-hosted fallback | 100% for some | DeepSeek V3.2/Qwen3 for non-critical tasks |
|
||||||
|
| Task-appropriate routing | 40-60% | Route code tasks to GPT 5.1 Codex, simple to Flash |
|
||||||
|
|
||||||
## Consequences
|
## Consequences
|
||||||
|
|
||||||
|
|||||||
@@ -42,10 +42,11 @@ Syndarix is an autonomous AI-powered software consulting platform that orchestra
|
|||||||
│ │ │ │
|
│ │ │ │
|
||||||
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
|
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
|
||||||
│ │ │ ORCHESTRATION LAYER │ │ │
|
│ │ │ ORCHESTRATION LAYER │ │ │
|
||||||
│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │
|
│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │ │ │
|
||||||
│ │ │ │ Agent │ │ Workflow │ │ Approval │ │ │ │
|
│ │ │ │ Agent │ │ Workflow │ │ Approval │ │ LangGraph │ │ │ │
|
||||||
│ │ │ │ Orchestrator│ │ Engine │ │ Service │ │ │ │
|
│ │ │ │ Orchestrator│ │ Engine │ │ Service │ │ Runtime │ │ │ │
|
||||||
│ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │
|
│ │ │ │(Type-Inst.) │ │(transitions)│ │ │ │ │ │ │ │
|
||||||
|
│ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │ │ │
|
||||||
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
|
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
|
||||||
│ │ │ │
|
│ │ │ │
|
||||||
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
|
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
|
||||||
@@ -126,21 +127,26 @@ Agent Type (Template) Agent Instance (Runtime)
|
|||||||
|
|
||||||
**Failover Chain:**
|
**Failover Chain:**
|
||||||
```
|
```
|
||||||
Claude 3.5 Sonnet (Primary)
|
Claude Opus 4.5 (Primary)
|
||||||
|
│
|
||||||
|
▼ (on failure/rate limit)
|
||||||
|
GPT 5.1 Codex max (Code specialist)
|
||||||
|
│
|
||||||
|
▼ (on failure/rate limit)
|
||||||
|
Gemini 3 Pro (Multimodal)
|
||||||
│
|
│
|
||||||
▼ (on failure)
|
▼ (on failure)
|
||||||
GPT-4 Turbo (Fallback)
|
Qwen3-235B / DeepSeek V3.2 (Self-hosted)
|
||||||
│
|
|
||||||
▼ (on failure)
|
|
||||||
Ollama/Llama 3 (Local)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Model Groups:**
|
**Model Groups:**
|
||||||
| Group | Use Case | Primary Model |
|
| Group | Use Case | Primary Model | Fallback |
|
||||||
|-------|----------|---------------|
|
|-------|----------|---------------|----------|
|
||||||
| high-reasoning | Architecture, complex analysis | Claude 3.5 Sonnet |
|
| high-reasoning | Architecture, complex analysis | Claude Opus 4.5 | GPT 5.1 Codex max |
|
||||||
| fast-response | Quick tasks, status updates | Claude 3 Haiku |
|
| code-generation | Code writing, refactoring | GPT 5.1 Codex max | Claude Opus 4.5 |
|
||||||
| cost-optimized | High-volume, non-critical | Local Llama 3 |
|
| fast-response | Quick tasks, status updates | Gemini 3 Flash | Qwen3-235B |
|
||||||
|
| cost-optimized | High-volume, non-critical | Qwen3-235B | DeepSeek V3.2 |
|
||||||
|
| self-hosted | Privacy-sensitive, air-gapped | DeepSeek V3.2 | Qwen3-235B |
|
||||||
|
|
||||||
### 3. Knowledge Base (RAG)
|
### 3. Knowledge Base (RAG)
|
||||||
|
|
||||||
@@ -245,13 +251,17 @@ LLM Request → LiteLLM Callback → Redis INCR → Budget Check
|
|||||||
|
|
||||||
**All components are fully self-hostable with no mandatory subscriptions:**
|
**All components are fully self-hostable with no mandatory subscriptions:**
|
||||||
|
|
||||||
| Component | Self-Hosted | Managed Alternative (Optional) |
|
| Component | License | Self-Hosted | Managed Alternative (Optional) |
|
||||||
|-----------|-------------|--------------------------------|
|
|-----------|---------|-------------|--------------------------------|
|
||||||
| PostgreSQL | Yes | RDS, Neon, Supabase |
|
| PostgreSQL | PostgreSQL | Yes | RDS, Neon, Supabase |
|
||||||
| Redis | Yes | Redis Cloud |
|
| Redis | BSD-3 | Yes | Redis Cloud |
|
||||||
| LiteLLM | Yes | LiteLLM Enterprise |
|
| LiteLLM | MIT | Yes | LiteLLM Enterprise |
|
||||||
| Celery | Yes | - |
|
| Celery | BSD-3 | Yes | - |
|
||||||
| FastMCP | Yes | - |
|
| FastMCP | MIT | Yes | - |
|
||||||
|
| LangGraph | MIT | Yes | LangSmith (observability only) |
|
||||||
|
| transitions | MIT | Yes | - |
|
||||||
|
| DeepSeek V3.2 | MIT | Yes | API available |
|
||||||
|
| Qwen3-235B | Apache 2.0 | Yes | Alibaba Cloud |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -295,10 +295,13 @@ This roadmap outlines the phased implementation approach for Syndarix, prioritiz
|
|||||||
| Metric | Target | Measurement |
|
| Metric | Target | Measurement |
|
||||||
|--------|--------|-------------|
|
|--------|--------|-------------|
|
||||||
| Agent task success rate | >90% | Completed tasks / total tasks |
|
| Agent task success rate | >90% | Completed tasks / total tasks |
|
||||||
| Response time (P95) | <2s | API latency |
|
| API response time (P95) | <200ms | Pure API latency (per NFR-101) |
|
||||||
| Cost per project | <$50/sprint | LLM + compute costs |
|
| Agent response time | <10s simple, <60s code | End-to-end including LLM (per NFR-103) |
|
||||||
|
| Cost per project | <$100/sprint | LLM + compute costs (with Opus 4.5 pricing) |
|
||||||
| Time to first commit | <1 hour | From requirements to PR |
|
| Time to first commit | <1 hour | From requirements to PR |
|
||||||
| Client satisfaction | >4/5 | Post-sprint survey |
|
| Client satisfaction | >4/5 | Post-sprint survey |
|
||||||
|
| Concurrent projects | 10+ | Active projects in parallel |
|
||||||
|
| Concurrent agents | 50+ | Agent instances running |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -328,15 +331,17 @@ Foundation Core Platform MCP Integration Agent Orch Workflows Advan
|
|||||||
### Infrastructure
|
### Infrastructure
|
||||||
- PostgreSQL (managed or self-hosted)
|
- PostgreSQL (managed or self-hosted)
|
||||||
- Redis (managed or self-hosted)
|
- Redis (managed or self-hosted)
|
||||||
- Celery workers (2-4 instances)
|
- Celery workers (4-8 instances across 4 queues: agent, git, sync, cicd)
|
||||||
- MCP servers (7 containers)
|
- MCP servers (7 containers, deployed in Phase 2 + Phase 5)
|
||||||
- API server (2+ instances)
|
- API server (2+ instances)
|
||||||
- Frontend (static hosting or SSR)
|
- Frontend (static hosting or SSR)
|
||||||
|
|
||||||
### External Services
|
### External Services
|
||||||
- Anthropic API (primary LLM)
|
- Anthropic API (Claude Opus 4.5 - primary reasoning)
|
||||||
- OpenAI API (fallback)
|
- OpenAI API (GPT 5.1 Codex max - code generation)
|
||||||
- Ollama (local models, optional)
|
- Google API (Gemini 3 Pro/Flash - multimodal, fast)
|
||||||
|
- Alibaba API (Qwen3-235B - cost-effective, or self-host)
|
||||||
|
- DeepSeek V3.2 (self-hosted, open weights)
|
||||||
- Gitea/GitHub/GitLab (issue tracking)
|
- Gitea/GitHub/GitLab (issue tracking)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
Reference in New Issue
Block a user