forked from cardosofelipe/fast-next-template
docs: add remaining ADRs and comprehensive architecture documentation
Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
199
docs/adrs/ADR-012-cost-tracking.md
Normal file
199
docs/adrs/ADR-012-cost-tracking.md
Normal file
@@ -0,0 +1,199 @@
|
||||
# ADR-012: Cost Tracking and Budget Management
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2025-12-29
|
||||
**Deciders:** Architecture Team
|
||||
**Related Spikes:** SPIKE-010
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Syndarix agents make potentially expensive LLM API calls. Without proper cost tracking and budget enforcement, projects could incur unexpected charges. We need:
|
||||
- Real-time cost visibility
|
||||
- Per-project budget enforcement
|
||||
- Cost optimization strategies
|
||||
- Historical analytics
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
- **Visibility:** Real-time cost tracking per agent/project
|
||||
- **Control:** Budget enforcement with soft/hard limits
|
||||
- **Optimization:** Identify and reduce unnecessary costs
|
||||
- **Attribution:** Clear cost allocation for billing
|
||||
|
||||
## Decision
|
||||
|
||||
**Implement multi-layered cost tracking** using:
|
||||
1. **LiteLLM Callbacks** for real-time usage capture
|
||||
2. **Redis** for budget enforcement
|
||||
3. **PostgreSQL** for persistent analytics
|
||||
4. **SSE Events** for dashboard updates
|
||||
|
||||
## Implementation
|
||||
|
||||
### Cost Attribution Hierarchy
|
||||
|
||||
```
|
||||
Organization (Billing Entity)
|
||||
└── Project (Cost Center)
|
||||
└── Sprint (Time-bounded Budget)
|
||||
└── Agent Instance (Worker)
|
||||
└── LLM Request (Atomic Cost Unit)
|
||||
```
|
||||
|
||||
### LiteLLM Callback
|
||||
|
||||
```python
|
||||
from litellm.integrations.custom_logger import CustomLogger
|
||||
|
||||
class SyndarixCostLogger(CustomLogger):
|
||||
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
|
||||
agent_id = kwargs.get("metadata", {}).get("agent_id")
|
||||
project_id = kwargs.get("metadata", {}).get("project_id")
|
||||
model = kwargs.get("model")
|
||||
cost = kwargs.get("response_cost", 0)
|
||||
usage = response_obj.usage
|
||||
|
||||
# Real-time budget check (Redis)
|
||||
await self.budget_service.increment(
|
||||
project_id=project_id,
|
||||
cost=cost,
|
||||
tokens=usage.total_tokens
|
||||
)
|
||||
|
||||
# Persistent record (async queue to PostgreSQL)
|
||||
await self.usage_queue.enqueue({
|
||||
"agent_id": agent_id,
|
||||
"project_id": project_id,
|
||||
"model": model,
|
||||
"prompt_tokens": usage.prompt_tokens,
|
||||
"completion_tokens": usage.completion_tokens,
|
||||
"cost_usd": cost,
|
||||
"timestamp": datetime.utcnow()
|
||||
})
|
||||
|
||||
# Check budget status
|
||||
budget_status = await self.budget_service.check_status(project_id)
|
||||
if budget_status == "exceeded":
|
||||
await self.notify_budget_exceeded(project_id)
|
||||
```
|
||||
|
||||
### Budget Enforcement
|
||||
|
||||
```python
|
||||
class BudgetService:
|
||||
async def check_budget(self, project_id: str) -> BudgetStatus:
|
||||
"""Check current budget status."""
|
||||
budget = await self.get_budget(project_id)
|
||||
usage = await self.redis.get(f"cost:{project_id}:daily")
|
||||
|
||||
percentage = (usage / budget.daily_limit) * 100
|
||||
|
||||
if percentage >= 100 and budget.enforcement == "hard":
|
||||
return BudgetStatus.BLOCKED
|
||||
elif percentage >= 100:
|
||||
return BudgetStatus.EXCEEDED
|
||||
elif percentage >= 80:
|
||||
return BudgetStatus.WARNING
|
||||
elif percentage >= 50:
|
||||
return BudgetStatus.APPROACHING
|
||||
else:
|
||||
return BudgetStatus.OK
|
||||
|
||||
async def enforce(self, project_id: str) -> bool:
|
||||
"""Returns True if request should proceed."""
|
||||
status = await self.check_budget(project_id)
|
||||
|
||||
if status == BudgetStatus.BLOCKED:
|
||||
raise BudgetExceededException(project_id)
|
||||
|
||||
if status in [BudgetStatus.EXCEEDED, BudgetStatus.WARNING]:
|
||||
# Auto-downgrade to cheaper model
|
||||
await self.set_model_override(project_id, "cost-optimized")
|
||||
|
||||
return True
|
||||
```
|
||||
|
||||
### Database Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE token_usage (
|
||||
id UUID PRIMARY KEY,
|
||||
agent_id UUID,
|
||||
project_id UUID NOT NULL,
|
||||
model VARCHAR(100) NOT NULL,
|
||||
prompt_tokens INTEGER NOT NULL,
|
||||
completion_tokens INTEGER NOT NULL,
|
||||
total_tokens INTEGER NOT NULL,
|
||||
cost_usd DECIMAL(10, 6) NOT NULL,
|
||||
timestamp TIMESTAMPTZ NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE project_budgets (
|
||||
id UUID PRIMARY KEY,
|
||||
project_id UUID NOT NULL UNIQUE,
|
||||
daily_limit_usd DECIMAL(10, 2) DEFAULT 50.00,
|
||||
weekly_limit_usd DECIMAL(10, 2) DEFAULT 250.00,
|
||||
monthly_limit_usd DECIMAL(10, 2) DEFAULT 1000.00,
|
||||
enforcement VARCHAR(20) DEFAULT 'soft', -- 'soft', 'hard'
|
||||
alert_thresholds JSONB DEFAULT '[50, 80, 100]'
|
||||
);
|
||||
|
||||
-- Materialized view for analytics
|
||||
CREATE MATERIALIZED VIEW daily_cost_summary AS
|
||||
SELECT
|
||||
project_id,
|
||||
DATE(timestamp) as date,
|
||||
SUM(cost_usd) as total_cost,
|
||||
SUM(total_tokens) as total_tokens,
|
||||
COUNT(*) as request_count
|
||||
FROM token_usage
|
||||
GROUP BY project_id, DATE(timestamp);
|
||||
```
|
||||
|
||||
### Cost Model Prices
|
||||
|
||||
| Model | Input ($/1M) | Output ($/1M) |
|
||||
|-------|-------------|---------------|
|
||||
| Claude 3.5 Sonnet | $3.00 | $15.00 |
|
||||
| Claude 3 Haiku | $0.25 | $1.25 |
|
||||
| GPT-4 Turbo | $10.00 | $30.00 |
|
||||
| GPT-4o Mini | $0.15 | $0.60 |
|
||||
| Ollama (local) | $0.00 | $0.00 |
|
||||
|
||||
### Cost Optimization Strategies
|
||||
|
||||
| Strategy | Savings | Implementation |
|
||||
|----------|---------|----------------|
|
||||
| Semantic caching | 15-30% | Redis cache for repeated queries |
|
||||
| Model cascading | 60-80% | Start with Haiku, escalate to Sonnet |
|
||||
| Prompt compression | 10-20% | Remove redundant context |
|
||||
| Local fallback | 100% for some | Ollama for simple tasks |
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Complete cost visibility at all levels
|
||||
- Automatic budget enforcement
|
||||
- Cost optimization reduces spend significantly
|
||||
- Real-time dashboard updates
|
||||
|
||||
### Negative
|
||||
- Redis dependency for real-time tracking
|
||||
- Additional complexity in LLM gateway
|
||||
|
||||
### Mitigation
|
||||
- Redis already required for other features
|
||||
- Clear separation of concerns in cost tracking module
|
||||
|
||||
## Compliance
|
||||
|
||||
This decision aligns with:
|
||||
- FR-401: Cost tracking per agent/project
|
||||
- FR-402: Budget enforcement
|
||||
- NFR-302: Budget alert system
|
||||
|
||||
---
|
||||
|
||||
*This ADR establishes the cost tracking and budget management architecture for Syndarix.*
|
||||
Reference in New Issue
Block a user