forked from cardosofelipe/fast-next-template
feat: Update to production model stack and fix remaining inconsistencies
## Model Stack Updates (User's Actual Models) Updated all documentation to reflect production models: - Claude Opus 4.5 (primary reasoning) - GPT 5.1 Codex max (code generation specialist) - Gemini 3 Pro/Flash (multimodal, fast inference) - Qwen3-235B (cost-effective, self-hostable) - DeepSeek V3.2 (self-hosted, open weights) ### Files Updated: - ADR-004: Full model groups, failover chains, cost tables - ADR-007: Code example with correct model identifiers - ADR-012: Cost tracking with new model prices - ARCHITECTURE.md: Model groups, failover diagram - IMPLEMENTATION_ROADMAP.md: External services list ## Architecture Diagram Updates - Added LangGraph Runtime to orchestration layer - Added technology labels (Type-Instance, transitions) ## Self-Hostability Table Expanded Added entries for: - LangGraph (MIT) - transitions (MIT) - DeepSeek V3.2 (MIT) - Qwen3-235B (Apache 2.0) ## Metric Alignments - Response time: Split into API (<200ms) and Agent (<10s/<60s) - Cost per project: Adjusted to $100/sprint for Opus 4.5 pricing - Added concurrent projects (10+) and agents (50+) metrics ## Infrastructure Updates - Celery workers: 4-8 instances (was 2-4) across 4 queues - MCP servers: Clarified Phase 2 + Phase 5 deployment - Sync interval: Clarified 60s fallback + 15min reconciliation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -154,22 +154,24 @@ GROUP BY project_id, DATE(timestamp);
|
||||
|
||||
### Cost Model Prices
|
||||
|
||||
| Model | Input ($/1M) | Output ($/1M) |
|
||||
|-------|-------------|---------------|
|
||||
| Claude 3.5 Sonnet | $3.00 | $15.00 |
|
||||
| Claude 3 Haiku | $0.25 | $1.25 |
|
||||
| GPT-4 Turbo | $10.00 | $30.00 |
|
||||
| GPT-4o Mini | $0.15 | $0.60 |
|
||||
| Ollama (local) | $0.00 | $0.00 |
|
||||
| Model | Input ($/1M) | Output ($/1M) | Notes |
|
||||
|-------|-------------|---------------|-------|
|
||||
| Claude Opus 4.5 | $15.00 | $75.00 | Highest reasoning capability |
|
||||
| GPT 5.1 Codex max | $12.00 | $60.00 | Code generation specialist |
|
||||
| Gemini 3 Pro | $3.50 | $10.50 | Strong multimodal |
|
||||
| Gemini 3 Flash | $0.35 | $1.05 | Fast inference |
|
||||
| Qwen3-235B | $2.00 | $6.00 | Cost-effective (or $0 self-hosted) |
|
||||
| DeepSeek V3.2 | $0.00 | $0.00 | Self-hosted, open weights |
|
||||
|
||||
### Cost Optimization Strategies
|
||||
|
||||
| Strategy | Savings | Implementation |
|
||||
|----------|---------|----------------|
|
||||
| Semantic caching | 15-30% | Redis cache for repeated queries |
|
||||
| Model cascading | 60-80% | Start with Haiku, escalate to Sonnet |
|
||||
| Model cascading | 60-80% | Start with Gemini Flash, escalate to Opus |
|
||||
| Prompt compression | 10-20% | Remove redundant context |
|
||||
| Local fallback | 100% for some | Ollama for simple tasks |
|
||||
| Self-hosted fallback | 100% for some | DeepSeek V3.2/Qwen3 for non-critical tasks |
|
||||
| Task-appropriate routing | 40-60% | Route code tasks to GPT 5.1 Codex, simple to Flash |
|
||||
|
||||
## Consequences
|
||||
|
||||
|
||||
Reference in New Issue
Block a user