docs: add remaining ADRs and comprehensive architecture documentation

Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:54:43 +01:00
parent bd702734c2
commit 406b25cda0
9 changed files with 1899 additions and 5 deletions
--- a/docs/adrs/ADR-012-cost-tracking.md
+++ b/docs/adrs/ADR-012-cost-tracking.md
@@ -0,0 +1,199 @@
+# ADR-012: Cost Tracking and Budget Management
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+**Related Spikes:** SPIKE-010
+
+---
+
+## Context
+
+Syndarix agents make potentially expensive LLM API calls. Without proper cost tracking and budget enforcement, projects could incur unexpected charges. We need:
+- Real-time cost visibility
+- Per-project budget enforcement
+- Cost optimization strategies
+- Historical analytics
+
+## Decision Drivers
+
+- **Visibility:** Real-time cost tracking per agent/project
+- **Control:** Budget enforcement with soft/hard limits
+- **Optimization:** Identify and reduce unnecessary costs
+- **Attribution:** Clear cost allocation for billing
+
+## Decision
+
+**Implement multi-layered cost tracking** using:
+1. **LiteLLM Callbacks** for real-time usage capture
+2. **Redis** for budget enforcement
+3. **PostgreSQL** for persistent analytics
+4. **SSE Events** for dashboard updates
+
+## Implementation
+
+### Cost Attribution Hierarchy
+
+```
+Organization (Billing Entity)
+  └── Project (Cost Center)
+        └── Sprint (Time-bounded Budget)
+              └── Agent Instance (Worker)
+                    └── LLM Request (Atomic Cost Unit)
+```
+
+### LiteLLM Callback
+
+```python
+from litellm.integrations.custom_logger import CustomLogger
+
+class SyndarixCostLogger(CustomLogger):
+    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
+        agent_id = kwargs.get("metadata", {}).get("agent_id")
+        project_id = kwargs.get("metadata", {}).get("project_id")
+        model = kwargs.get("model")
+        cost = kwargs.get("response_cost", 0)
+        usage = response_obj.usage
+
+        # Real-time budget check (Redis)
+        await self.budget_service.increment(
+            project_id=project_id,
+            cost=cost,
+            tokens=usage.total_tokens
+        )
+
+        # Persistent record (async queue to PostgreSQL)
+        await self.usage_queue.enqueue({
+            "agent_id": agent_id,
+            "project_id": project_id,
+            "model": model,
+            "prompt_tokens": usage.prompt_tokens,
+            "completion_tokens": usage.completion_tokens,
+            "cost_usd": cost,
+            "timestamp": datetime.utcnow()
+        })
+
+        # Check budget status
+        budget_status = await self.budget_service.check_status(project_id)
+        if budget_status == "exceeded":
+            await self.notify_budget_exceeded(project_id)
+```
+
+### Budget Enforcement
+
+```python
+class BudgetService:
+    async def check_budget(self, project_id: str) -> BudgetStatus:
+        """Check current budget status."""
+        budget = await self.get_budget(project_id)
+        usage = await self.redis.get(f"cost:{project_id}:daily")
+
+        percentage = (usage / budget.daily_limit) * 100
+
+        if percentage >= 100 and budget.enforcement == "hard":
+            return BudgetStatus.BLOCKED
+        elif percentage >= 100:
+            return BudgetStatus.EXCEEDED
+        elif percentage >= 80:
+            return BudgetStatus.WARNING
+        elif percentage >= 50:
+            return BudgetStatus.APPROACHING
+        else:
+            return BudgetStatus.OK
+
+    async def enforce(self, project_id: str) -> bool:
+        """Returns True if request should proceed."""
+        status = await self.check_budget(project_id)
+
+        if status == BudgetStatus.BLOCKED:
+            raise BudgetExceededException(project_id)
+
+        if status in [BudgetStatus.EXCEEDED, BudgetStatus.WARNING]:
+            # Auto-downgrade to cheaper model
+            await self.set_model_override(project_id, "cost-optimized")
+
+        return True
+```
+
+### Database Schema
+
+```sql
+CREATE TABLE token_usage (
+    id UUID PRIMARY KEY,
+    agent_id UUID,
+    project_id UUID NOT NULL,
+    model VARCHAR(100) NOT NULL,
+    prompt_tokens INTEGER NOT NULL,
+    completion_tokens INTEGER NOT NULL,
+    total_tokens INTEGER NOT NULL,
+    cost_usd DECIMAL(10, 6) NOT NULL,
+    timestamp TIMESTAMPTZ NOT NULL
+);
+
+CREATE TABLE project_budgets (
+    id UUID PRIMARY KEY,
+    project_id UUID NOT NULL UNIQUE,
+    daily_limit_usd DECIMAL(10, 2) DEFAULT 50.00,
+    weekly_limit_usd DECIMAL(10, 2) DEFAULT 250.00,
+    monthly_limit_usd DECIMAL(10, 2) DEFAULT 1000.00,
+    enforcement VARCHAR(20) DEFAULT 'soft',  -- 'soft', 'hard'
+    alert_thresholds JSONB DEFAULT '[50, 80, 100]'
+);
+
+-- Materialized view for analytics
+CREATE MATERIALIZED VIEW daily_cost_summary AS
+SELECT
+    project_id,
+    DATE(timestamp) as date,
+    SUM(cost_usd) as total_cost,
+    SUM(total_tokens) as total_tokens,
+    COUNT(*) as request_count
+FROM token_usage
+GROUP BY project_id, DATE(timestamp);
+```
+
+### Cost Model Prices
+
+| Model | Input ($/1M) | Output ($/1M) |
+|-------|-------------|---------------|
+| Claude 3.5 Sonnet | $3.00 | $15.00 |
+| Claude 3 Haiku | $0.25 | $1.25 |
+| GPT-4 Turbo | $10.00 | $30.00 |
+| GPT-4o Mini | $0.15 | $0.60 |
+| Ollama (local) | $0.00 | $0.00 |
+
+### Cost Optimization Strategies
+
+| Strategy | Savings | Implementation |
+|----------|---------|----------------|
+| Semantic caching | 15-30% | Redis cache for repeated queries |
+| Model cascading | 60-80% | Start with Haiku, escalate to Sonnet |
+| Prompt compression | 10-20% | Remove redundant context |
+| Local fallback | 100% for some | Ollama for simple tasks |
+
+## Consequences
+
+### Positive
+- Complete cost visibility at all levels
+- Automatic budget enforcement
+- Cost optimization reduces spend significantly
+- Real-time dashboard updates
+
+### Negative
+- Redis dependency for real-time tracking
+- Additional complexity in LLM gateway
+
+### Mitigation
+- Redis already required for other features
+- Clear separation of concerns in cost tracking module
+
+## Compliance
+
+This decision aligns with:
+- FR-401: Cost tracking per agent/project
+- FR-402: Budget enforcement
+- NFR-302: Budget alert system
+
+---
+
+*This ADR establishes the cost tracking and budget management architecture for Syndarix.*