feat: Update to production model stack and fix remaining inconsistencies
## Model Stack Updates (User's Actual Models) Updated all documentation to reflect production models: - Claude Opus 4.5 (primary reasoning) - GPT 5.1 Codex max (code generation specialist) - Gemini 3 Pro/Flash (multimodal, fast inference) - Qwen3-235B (cost-effective, self-hostable) - DeepSeek V3.2 (self-hosted, open weights) ### Files Updated: - ADR-004: Full model groups, failover chains, cost tables - ADR-007: Code example with correct model identifiers - ADR-012: Cost tracking with new model prices - ARCHITECTURE.md: Model groups, failover diagram - IMPLEMENTATION_ROADMAP.md: External services list ## Architecture Diagram Updates - Added LangGraph Runtime to orchestration layer - Added technology labels (Type-Instance, transitions) ## Self-Hostability Table Expanded Added entries for: - LangGraph (MIT) - transitions (MIT) - DeepSeek V3.2 (MIT) - Qwen3-235B (Apache 2.0) ## Metric Alignments - Response time: Split into API (<200ms) and Agent (<10s/<60s) - Cost per project: Adjusted to $100/sprint for Opus 4.5 pricing - Added concurrent projects (10+) and agents (50+) metrics ## Infrastructure Updates - Celery workers: 4-8 instances (was 2-4) across 4 queues - MCP servers: Clarified Phase 2 + Phase 5 deployment - Sync interval: Clarified 60s fallback + 15min reconciliation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -295,10 +295,13 @@ This roadmap outlines the phased implementation approach for Syndarix, prioritiz
|
||||
| Metric | Target | Measurement |
|
||||
|--------|--------|-------------|
|
||||
| Agent task success rate | >90% | Completed tasks / total tasks |
|
||||
| Response time (P95) | <2s | API latency |
|
||||
| Cost per project | <$50/sprint | LLM + compute costs |
|
||||
| API response time (P95) | <200ms | Pure API latency (per NFR-101) |
|
||||
| Agent response time | <10s simple, <60s code | End-to-end including LLM (per NFR-103) |
|
||||
| Cost per project | <$100/sprint | LLM + compute costs (with Opus 4.5 pricing) |
|
||||
| Time to first commit | <1 hour | From requirements to PR |
|
||||
| Client satisfaction | >4/5 | Post-sprint survey |
|
||||
| Concurrent projects | 10+ | Active projects in parallel |
|
||||
| Concurrent agents | 50+ | Agent instances running |
|
||||
|
||||
---
|
||||
|
||||
@@ -328,15 +331,17 @@ Foundation Core Platform MCP Integration Agent Orch Workflows Advan
|
||||
### Infrastructure
|
||||
- PostgreSQL (managed or self-hosted)
|
||||
- Redis (managed or self-hosted)
|
||||
- Celery workers (2-4 instances)
|
||||
- MCP servers (7 containers)
|
||||
- Celery workers (4-8 instances across 4 queues: agent, git, sync, cicd)
|
||||
- MCP servers (7 containers, deployed in Phase 2 + Phase 5)
|
||||
- API server (2+ instances)
|
||||
- Frontend (static hosting or SSR)
|
||||
|
||||
### External Services
|
||||
- Anthropic API (primary LLM)
|
||||
- OpenAI API (fallback)
|
||||
- Ollama (local models, optional)
|
||||
- Anthropic API (Claude Opus 4.5 - primary reasoning)
|
||||
- OpenAI API (GPT 5.1 Codex max - code generation)
|
||||
- Google API (Gemini 3 Pro/Flash - multimodal, fast)
|
||||
- Alibaba API (Qwen3-235B - cost-effective, or self-host)
|
||||
- DeepSeek V3.2 (self-hosted, open weights)
|
||||
- Gitea/GitHub/GitLab (issue tracking)
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user