[SPIKE-010] Cost Tracking & Budget Management #10

New Issue

cardosofelipe · 2025-12-29T03:51:02Z

cardosofelipe commented

2025-12-29 03:51:02 +00:00

Objective

Design system for tracking LLM API costs and enforcing budget limits.

Key Questions

How do we capture token usage per API call?
How do we attribute costs to projects/agents?
How do we set and enforce budget limits?
What alerts/notifications for budget thresholds?
How do we handle cost optimization (model selection)?

Metrics to Track

Tokens (input/output) per call
Cost per call (based on model pricing)
Aggregated cost per: project, agent, sprint, day
Trend analysis

Research Areas

LLM provider pricing APIs
Token counting before/after calls
Time-series storage for cost data
Alert/notification patterns

Expected Deliverables

Cost tracking schema
Budget enforcement logic
Dashboard/reporting design
Alert configuration
ADR documenting the approach

Acceptance Criteria

All LLM calls tracked with cost
Costs attributable to project/agent
Budget limits enforced
Alerts trigger at thresholds

Labels

spike, architecture, observability

## Objective Design system for tracking LLM API costs and enforcing budget limits. ## Key Questions 1. How do we capture token usage per API call? 2. How do we attribute costs to projects/agents? 3. How do we set and enforce budget limits? 4. What alerts/notifications for budget thresholds? 5. How do we handle cost optimization (model selection)? ## Metrics to Track - Tokens (input/output) per call - Cost per call (based on model pricing) - Aggregated cost per: project, agent, sprint, day - Trend analysis ## Research Areas - [ ] LLM provider pricing APIs - [ ] Token counting before/after calls - [ ] Time-series storage for cost data - [ ] Alert/notification patterns ## Expected Deliverables - Cost tracking schema - Budget enforcement logic - Dashboard/reporting design - Alert configuration - ADR documenting the approach ## Acceptance Criteria - [ ] All LLM calls tracked with cost - [ ] Costs attributable to project/agent - [ ] Budget limits enforced - [ ] Alerts trigger at thresholds ## Labels `spike`, `architecture`, `observability`

cardosofelipe commented

2025-12-29 12:23:54 +00:00

SPIKE-010: Cost Tracking Research Completed

The comprehensive spike document has been created at docs/spikes/SPIKE-010-cost-tracking.md.

Executive Summary

Syndarix requires comprehensive LLM cost tracking to manage expenses across multiple providers (Anthropic, OpenAI, local Ollama). The research recommends a multi-layered cost tracking architecture:

LiteLLM Callbacks for real-time usage capture at the gateway level
PostgreSQL for persistent usage records with time-series aggregation
Redis for real-time budget enforcement and rate limiting
Celery Beat for scheduled budget checks and alert processing
SSE Events for real-time dashboard updates

Key Findings

Area	Recommendation
Token Tracking	LiteLLM provides built-in `response.usage` and `kwargs["response_cost"]`
Cost Attribution	Hierarchical: Organization > Project > Sprint > Agent Instance > Request
Budget Enforcement	Soft limits with alerts for weekly/monthly; Hard limits for daily budgets
Cost Optimization	60-80% savings possible through caching, cascading, and compression

Cost Optimization Strategies

Semantic Caching (15-30% savings): Cache responses by semantic similarity using vector embeddings
Model Cascading (up to 87% savings): Route 90% of queries to cheaper models, escalate only when needed
Prompt Compression (up to 80% savings): Use LLMLingua for intelligent prompt compression with minimal quality loss

Database Schema Highlights

token_usage: Individual LLM request records with full attribution
budgets: Configurable daily/weekly/monthly budgets with soft/hard limits
budget_alerts: Alert tracking with severity levels and acknowledgment
daily_cost_summaries: Materialized aggregations for fast reporting

LLM Pricing Reference (per 1M tokens)

Model	Input	Output
Claude 3.5 Sonnet	$3.00	$15.00
Claude 3 Haiku	$0.25	$1.25
GPT-4 Turbo	$10.00	$30.00
GPT-4o-mini	$0.15	$0.60
Local Ollama	$0.00	$0.00

Implementation Roadmap

Phase 1 (Week 1-2): Core infrastructure (schema, Redis, callbacks)
Phase 2 (Week 2-3): Budget management and alerts
Phase 3 (Week 3-4): Cost optimization (caching, cascading)
Phase 4 (Week 4-5): Reporting dashboard
Phase 5 (Week 5-6): Testing and documentation

SPIKE-005: LLM Provider Abstraction (LiteLLM baseline)
SPIKE-003: Real-time Updates (SSE architecture)
SPIKE-004: Celery + Redis Integration (background tasks)

Next Steps:

Create ADR-010 based on these findings
Begin Phase 1 implementation
Define specific budget limits for Syndarix projects

Full details available in SPIKE-010-cost-tracking.md.

## SPIKE-010: Cost Tracking Research Completed The comprehensive spike document has been created at `docs/spikes/SPIKE-010-cost-tracking.md`. ### Executive Summary Syndarix requires comprehensive LLM cost tracking to manage expenses across multiple providers (Anthropic, OpenAI, local Ollama). The research recommends a **multi-layered cost tracking architecture**: 1. **LiteLLM Callbacks** for real-time usage capture at the gateway level 2. **PostgreSQL** for persistent usage records with time-series aggregation 3. **Redis** for real-time budget enforcement and rate limiting 4. **Celery Beat** for scheduled budget checks and alert processing 5. **SSE Events** for real-time dashboard updates ### Key Findings | Area | Recommendation | |------|----------------| | Token Tracking | LiteLLM provides built-in `response.usage` and `kwargs["response_cost"]` | | Cost Attribution | Hierarchical: Organization > Project > Sprint > Agent Instance > Request | | Budget Enforcement | Soft limits with alerts for weekly/monthly; Hard limits for daily budgets | | Cost Optimization | 60-80% savings possible through caching, cascading, and compression | ### Cost Optimization Strategies 1. **Semantic Caching** (15-30% savings): Cache responses by semantic similarity using vector embeddings 2. **Model Cascading** (up to 87% savings): Route 90% of queries to cheaper models, escalate only when needed 3. **Prompt Compression** (up to 80% savings): Use LLMLingua for intelligent prompt compression with minimal quality loss ### Database Schema Highlights - `token_usage`: Individual LLM request records with full attribution - `budgets`: Configurable daily/weekly/monthly budgets with soft/hard limits - `budget_alerts`: Alert tracking with severity levels and acknowledgment - `daily_cost_summaries`: Materialized aggregations for fast reporting ### LLM Pricing Reference (per 1M tokens) | Model | Input | Output | |-------|-------|--------| | Claude 3.5 Sonnet | $3.00 | $15.00 | | Claude 3 Haiku | $0.25 | $1.25 | | GPT-4 Turbo | $10.00 | $30.00 | | GPT-4o-mini | $0.15 | $0.60 | | Local Ollama | $0.00 | $0.00 | ### Implementation Roadmap - **Phase 1 (Week 1-2)**: Core infrastructure (schema, Redis, callbacks) - **Phase 2 (Week 2-3)**: Budget management and alerts - **Phase 3 (Week 3-4)**: Cost optimization (caching, cascading) - **Phase 4 (Week 4-5)**: Reporting dashboard - **Phase 5 (Week 5-6)**: Testing and documentation ### Related Spikes - SPIKE-005: LLM Provider Abstraction (LiteLLM baseline) - SPIKE-003: Real-time Updates (SSE architecture) - SPIKE-004: Celery + Redis Integration (background tasks) --- **Next Steps:** 1. Create ADR-010 based on these findings 2. Begin Phase 1 implementation 3. Define specific budget limits for Syndarix projects Full details available in [SPIKE-010-cost-tracking.md](docs/spikes/SPIKE-010-cost-tracking.md).

cardosofelipe referenced this issue from a commit

2025-12-29 12:31:10 +00:00

docs: add architecture spikes and deep analysis documentation

cardosofelipe closed this issue

2025-12-29 12:31:47 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cardosofelipe/syndarix#10