Files
syndarix/docs/adrs/ADR-012-cost-tracking.md
Felipe Cardoso 88cf4e0abc feat: Update to production model stack and fix remaining inconsistencies
## Model Stack Updates (User's Actual Models)

Updated all documentation to reflect production models:
- Claude Opus 4.5 (primary reasoning)
- GPT 5.1 Codex max (code generation specialist)
- Gemini 3 Pro/Flash (multimodal, fast inference)
- Qwen3-235B (cost-effective, self-hostable)
- DeepSeek V3.2 (self-hosted, open weights)

### Files Updated:
- ADR-004: Full model groups, failover chains, cost tables
- ADR-007: Code example with correct model identifiers
- ADR-012: Cost tracking with new model prices
- ARCHITECTURE.md: Model groups, failover diagram
- IMPLEMENTATION_ROADMAP.md: External services list

## Architecture Diagram Updates

- Added LangGraph Runtime to orchestration layer
- Added technology labels (Type-Instance, transitions)

## Self-Hostability Table Expanded

Added entries for:
- LangGraph (MIT)
- transitions (MIT)
- DeepSeek V3.2 (MIT)
- Qwen3-235B (Apache 2.0)

## Metric Alignments

- Response time: Split into API (<200ms) and Agent (<10s/<60s)
- Cost per project: Adjusted to $100/sprint for Opus 4.5 pricing
- Added concurrent projects (10+) and agents (50+) metrics

## Infrastructure Updates

- Celery workers: 4-8 instances (was 2-4) across 4 queues
- MCP servers: Clarified Phase 2 + Phase 5 deployment
- Sync interval: Clarified 60s fallback + 15min reconciliation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 23:35:51 +01:00

6.3 KiB

ADR-012: Cost Tracking and Budget Management

Status: Accepted Date: 2025-12-29 Deciders: Architecture Team Related Spikes: SPIKE-010


Context

Syndarix agents make potentially expensive LLM API calls. Without proper cost tracking and budget enforcement, projects could incur unexpected charges. We need:

  • Real-time cost visibility
  • Per-project budget enforcement
  • Cost optimization strategies
  • Historical analytics

Decision Drivers

  • Visibility: Real-time cost tracking per agent/project
  • Control: Budget enforcement with soft/hard limits
  • Optimization: Identify and reduce unnecessary costs
  • Attribution: Clear cost allocation for billing

Decision

Implement multi-layered cost tracking using:

  1. LiteLLM Callbacks for real-time usage capture
  2. Redis for budget enforcement
  3. PostgreSQL for persistent analytics
  4. SSE Events for dashboard updates

Implementation

Cost Attribution Hierarchy

Organization (Billing Entity)
  └── Project (Cost Center)
        └── Sprint (Time-bounded Budget)
              └── Agent Instance (Worker)
                    └── LLM Request (Atomic Cost Unit)

LiteLLM Callback

from litellm.integrations.custom_logger import CustomLogger

class SyndarixCostLogger(CustomLogger):
    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
        agent_id = kwargs.get("metadata", {}).get("agent_id")
        project_id = kwargs.get("metadata", {}).get("project_id")
        model = kwargs.get("model")
        cost = kwargs.get("response_cost", 0)
        usage = response_obj.usage

        # Real-time budget check (Redis)
        await self.budget_service.increment(
            project_id=project_id,
            cost=cost,
            tokens=usage.total_tokens
        )

        # Persistent record (async queue to PostgreSQL)
        await self.usage_queue.enqueue({
            "agent_id": agent_id,
            "project_id": project_id,
            "model": model,
            "prompt_tokens": usage.prompt_tokens,
            "completion_tokens": usage.completion_tokens,
            "cost_usd": cost,
            "timestamp": datetime.utcnow()
        })

        # Check budget status
        budget_status = await self.budget_service.check_status(project_id)
        if budget_status == "exceeded":
            await self.notify_budget_exceeded(project_id)

Budget Enforcement

class BudgetService:
    async def check_budget(self, project_id: str) -> BudgetStatus:
        """Check current budget status."""
        budget = await self.get_budget(project_id)
        usage = await self.redis.get(f"cost:{project_id}:daily")

        percentage = (usage / budget.daily_limit) * 100

        if percentage >= 100 and budget.enforcement == "hard":
            return BudgetStatus.BLOCKED
        elif percentage >= 100:
            return BudgetStatus.EXCEEDED
        elif percentage >= 80:
            return BudgetStatus.WARNING
        elif percentage >= 50:
            return BudgetStatus.APPROACHING
        else:
            return BudgetStatus.OK

    async def enforce(self, project_id: str) -> bool:
        """Returns True if request should proceed."""
        status = await self.check_budget(project_id)

        if status == BudgetStatus.BLOCKED:
            raise BudgetExceededException(project_id)

        if status in [BudgetStatus.EXCEEDED, BudgetStatus.WARNING]:
            # Auto-downgrade to cheaper model
            await self.set_model_override(project_id, "cost-optimized")

        return True

Database Schema

CREATE TABLE token_usage (
    id UUID PRIMARY KEY,
    agent_id UUID,
    project_id UUID NOT NULL,
    model VARCHAR(100) NOT NULL,
    prompt_tokens INTEGER NOT NULL,
    completion_tokens INTEGER NOT NULL,
    total_tokens INTEGER NOT NULL,
    cost_usd DECIMAL(10, 6) NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL
);

CREATE TABLE project_budgets (
    id UUID PRIMARY KEY,
    project_id UUID NOT NULL UNIQUE,
    daily_limit_usd DECIMAL(10, 2) DEFAULT 50.00,
    weekly_limit_usd DECIMAL(10, 2) DEFAULT 250.00,
    monthly_limit_usd DECIMAL(10, 2) DEFAULT 1000.00,
    enforcement VARCHAR(20) DEFAULT 'soft',  -- 'soft', 'hard'
    alert_thresholds JSONB DEFAULT '[50, 80, 100]'
);

-- Materialized view for analytics
CREATE MATERIALIZED VIEW daily_cost_summary AS
SELECT
    project_id,
    DATE(timestamp) as date,
    SUM(cost_usd) as total_cost,
    SUM(total_tokens) as total_tokens,
    COUNT(*) as request_count
FROM token_usage
GROUP BY project_id, DATE(timestamp);

Cost Model Prices

Model Input ($/1M) Output ($/1M) Notes
Claude Opus 4.5 $15.00 $75.00 Highest reasoning capability
GPT 5.1 Codex max $12.00 $60.00 Code generation specialist
Gemini 3 Pro $3.50 $10.50 Strong multimodal
Gemini 3 Flash $0.35 $1.05 Fast inference
Qwen3-235B $2.00 $6.00 Cost-effective (or $0 self-hosted)
DeepSeek V3.2 $0.00 $0.00 Self-hosted, open weights

Cost Optimization Strategies

Strategy Savings Implementation
Semantic caching 15-30% Redis cache for repeated queries
Model cascading 60-80% Start with Gemini Flash, escalate to Opus
Prompt compression 10-20% Remove redundant context
Self-hosted fallback 100% for some DeepSeek V3.2/Qwen3 for non-critical tasks
Task-appropriate routing 40-60% Route code tasks to GPT 5.1 Codex, simple to Flash

Consequences

Positive

  • Complete cost visibility at all levels
  • Automatic budget enforcement
  • Cost optimization reduces spend significantly
  • Real-time dashboard updates

Negative

  • Redis dependency for real-time tracking
  • Additional complexity in LLM gateway

Mitigation

  • Redis already required for other features
  • Clear separation of concerns in cost tracking module

Compliance

This decision aligns with:

  • FR-801: Real-time cost tracking
  • FR-802: Budget configuration (soft/hard limits)
  • FR-803: Budget alerts
  • FR-804: Cost analytics
  • NFR-602: Logging and monitoring (cost observability)
  • BR-002: Cost overruns from API usage (risk mitigation)

This ADR establishes the cost tracking and budget management architecture for Syndarix.