# ADR-012: Cost Tracking and Budget Management **Status:** Accepted **Date:** 2025-12-29 **Deciders:** Architecture Team **Related Spikes:** SPIKE-010 --- ## Context Syndarix agents make potentially expensive LLM API calls. Without proper cost tracking and budget enforcement, projects could incur unexpected charges. We need: - Real-time cost visibility - Per-project budget enforcement - Cost optimization strategies - Historical analytics ## Decision Drivers - **Visibility:** Real-time cost tracking per agent/project - **Control:** Budget enforcement with soft/hard limits - **Optimization:** Identify and reduce unnecessary costs - **Attribution:** Clear cost allocation for billing ## Decision **Implement multi-layered cost tracking** using: 1. **LiteLLM Callbacks** for real-time usage capture 2. **Redis** for budget enforcement 3. **PostgreSQL** for persistent analytics 4. **SSE Events** for dashboard updates ## Implementation ### Cost Attribution Hierarchy ``` Organization (Billing Entity) └── Project (Cost Center) └── Sprint (Time-bounded Budget) └── Agent Instance (Worker) └── LLM Request (Atomic Cost Unit) ``` ### LiteLLM Callback ```python from litellm.integrations.custom_logger import CustomLogger class SyndarixCostLogger(CustomLogger): async def async_log_success_event(self, kwargs, response_obj, start_time, end_time): agent_id = kwargs.get("metadata", {}).get("agent_id") project_id = kwargs.get("metadata", {}).get("project_id") model = kwargs.get("model") cost = kwargs.get("response_cost", 0) usage = response_obj.usage # Real-time budget check (Redis) await self.budget_service.increment( project_id=project_id, cost=cost, tokens=usage.total_tokens ) # Persistent record (async queue to PostgreSQL) await self.usage_queue.enqueue({ "agent_id": agent_id, "project_id": project_id, "model": model, "prompt_tokens": usage.prompt_tokens, "completion_tokens": usage.completion_tokens, "cost_usd": cost, "timestamp": datetime.utcnow() }) # Check budget status budget_status = await self.budget_service.check_status(project_id) if budget_status == "exceeded": await self.notify_budget_exceeded(project_id) ``` ### Budget Enforcement ```python class BudgetService: async def check_budget(self, project_id: str) -> BudgetStatus: """Check current budget status.""" budget = await self.get_budget(project_id) usage = await self.redis.get(f"cost:{project_id}:daily") percentage = (usage / budget.daily_limit) * 100 if percentage >= 100 and budget.enforcement == "hard": return BudgetStatus.BLOCKED elif percentage >= 100: return BudgetStatus.EXCEEDED elif percentage >= 80: return BudgetStatus.WARNING elif percentage >= 50: return BudgetStatus.APPROACHING else: return BudgetStatus.OK async def enforce(self, project_id: str) -> bool: """Returns True if request should proceed.""" status = await self.check_budget(project_id) if status == BudgetStatus.BLOCKED: raise BudgetExceededException(project_id) if status in [BudgetStatus.EXCEEDED, BudgetStatus.WARNING]: # Auto-downgrade to cheaper model await self.set_model_override(project_id, "cost-optimized") return True ``` ### Database Schema ```sql CREATE TABLE token_usage ( id UUID PRIMARY KEY, agent_id UUID, project_id UUID NOT NULL, model VARCHAR(100) NOT NULL, prompt_tokens INTEGER NOT NULL, completion_tokens INTEGER NOT NULL, total_tokens INTEGER NOT NULL, cost_usd DECIMAL(10, 6) NOT NULL, timestamp TIMESTAMPTZ NOT NULL ); CREATE TABLE project_budgets ( id UUID PRIMARY KEY, project_id UUID NOT NULL UNIQUE, daily_limit_usd DECIMAL(10, 2) DEFAULT 50.00, weekly_limit_usd DECIMAL(10, 2) DEFAULT 250.00, monthly_limit_usd DECIMAL(10, 2) DEFAULT 1000.00, enforcement VARCHAR(20) DEFAULT 'soft', -- 'soft', 'hard' alert_thresholds JSONB DEFAULT '[50, 80, 100]' ); -- Materialized view for analytics CREATE MATERIALIZED VIEW daily_cost_summary AS SELECT project_id, DATE(timestamp) as date, SUM(cost_usd) as total_cost, SUM(total_tokens) as total_tokens, COUNT(*) as request_count FROM token_usage GROUP BY project_id, DATE(timestamp); ``` ### Cost Model Prices | Model | Input ($/1M) | Output ($/1M) | Notes | |-------|-------------|---------------|-------| | Claude Opus 4.5 | $15.00 | $75.00 | Highest reasoning capability | | GPT 5.1 Codex max | $12.00 | $60.00 | Code generation specialist | | Gemini 3 Pro | $3.50 | $10.50 | Strong multimodal | | Gemini 3 Flash | $0.35 | $1.05 | Fast inference | | Qwen3-235B | $2.00 | $6.00 | Cost-effective (or $0 self-hosted) | | DeepSeek V3.2 | $0.00 | $0.00 | Self-hosted, open weights | ### Cost Optimization Strategies | Strategy | Savings | Implementation | |----------|---------|----------------| | Semantic caching | 15-30% | Redis cache for repeated queries | | Model cascading | 60-80% | Start with Gemini Flash, escalate to Opus | | Prompt compression | 10-20% | Remove redundant context | | Self-hosted fallback | 100% for some | DeepSeek V3.2/Qwen3 for non-critical tasks | | Task-appropriate routing | 40-60% | Route code tasks to GPT 5.1 Codex, simple to Flash | ## Consequences ### Positive - Complete cost visibility at all levels - Automatic budget enforcement - Cost optimization reduces spend significantly - Real-time dashboard updates ### Negative - Redis dependency for real-time tracking - Additional complexity in LLM gateway ### Mitigation - Redis already required for other features - Clear separation of concerns in cost tracking module ## Compliance This decision aligns with: - FR-801: Real-time cost tracking - FR-802: Budget configuration (soft/hard limits) - FR-803: Budget alerts - FR-804: Cost analytics - NFR-602: Logging and monitoring (cost observability) - BR-002: Cost overruns from API usage (risk mitigation) --- *This ADR establishes the cost tracking and budget management architecture for Syndarix.*