# ADR-004: LLM Provider Abstraction **Status:** Accepted **Date:** 2025-12-29 **Deciders:** Architecture Team **Related Spikes:** SPIKE-005 --- ## Context Syndarix agents require access to large language models (LLMs) from multiple providers: - **Anthropic** (Claude) - Primary provider - **OpenAI** (GPT-4) - Fallback provider - **Local models** (Ollama/Llama) - Cost optimization, privacy We need a unified abstraction layer that provides: - Consistent API across providers - Automatic failover on errors - Usage tracking and cost management - Rate limiting compliance ## Decision Drivers - **Reliability:** Automatic failover on provider outages - **Cost Control:** Track and limit API spending - **Flexibility:** Easy to add/swap providers - **Consistency:** Single interface for all agents - **Async Support:** Compatible with async FastAPI ## Considered Options ### Option 1: Direct Provider SDKs Use Anthropic and OpenAI SDKs directly with custom abstraction. **Pros:** - Full control over implementation - No external dependencies **Cons:** - Significant development effort - Must maintain failover logic - Must track token costs manually ### Option 2: LiteLLM (Selected) Use LiteLLM as unified abstraction layer. **Pros:** - Unified API for 100+ providers - Built-in failover and routing - Automatic token counting - Cost tracking built-in - Redis caching support - Active community **Cons:** - External dependency - May lag behind provider SDK updates ### Option 3: LangChain Use LangChain's LLM abstraction. **Pros:** - Large ecosystem - Many integrations **Cons:** - Heavy dependency - Overkill for just LLM abstraction - Complexity overhead ## Decision **Adopt Option 2: LiteLLM for unified LLM provider abstraction.** LiteLLM provides the reliability, monitoring, and multi-provider support needed with minimal overhead. ## Implementation ### Model Groups | Group Name | Use Case | Primary Model | Fallback | |------------|----------|---------------|----------| | `high-reasoning` | Complex analysis, architecture | Claude 3.5 Sonnet | GPT-4 Turbo | | `fast-response` | Quick tasks, simple queries | Claude 3 Haiku | GPT-4o Mini | | `cost-optimized` | High-volume, non-critical | Local Llama 3 | Claude 3 Haiku | ### Failover Chain ``` Claude 3.5 Sonnet (Anthropic) │ ▼ (on failure) GPT-4 Turbo (OpenAI) │ ▼ (on failure) Llama 3 (Ollama/Local) │ ▼ (on failure) Error with retry ``` ### LLM Gateway Service ```python class LLMGateway: def __init__(self): self.router = Router( model_list=model_list, fallbacks=[ {"high-reasoning": ["high-reasoning", "local-fallback"]}, ], routing_strategy="latency-based-routing", num_retries=3, ) async def complete( self, agent_id: str, project_id: str, messages: list[dict], model_preference: str = "high-reasoning", ) -> dict: response = await self.router.acompletion( model=model_preference, messages=messages, ) await self._track_usage(agent_id, project_id, response) return response ``` ### Cost Tracking | Model | Input (per 1M tokens) | Output (per 1M tokens) | |-------|----------------------|------------------------| | Claude 3.5 Sonnet | $3.00 | $15.00 | | Claude 3 Haiku | $0.25 | $1.25 | | GPT-4 Turbo | $10.00 | $30.00 | | GPT-4o Mini | $0.15 | $0.60 | | Ollama (local) | $0.00 | $0.00 | ### Agent Type Mapping | Agent Type | Model Preference | Rationale | |------------|------------------|-----------| | Product Owner | high-reasoning | Complex requirements analysis | | Software Architect | high-reasoning | Architecture decisions | | Software Engineer | high-reasoning | Code generation | | QA Engineer | fast-response | Test case generation | | DevOps Engineer | fast-response | Config generation | | Project Manager | fast-response | Status updates | ### Caching Strategy - **Redis-backed cache** for repeated queries - **TTL:** 1 hour for general queries - **Skip cache:** For context-dependent generation - **Cache key:** Hash of (model, messages, temperature) ## Consequences ### Positive - Single interface for all LLM operations - Automatic failover improves reliability - Built-in cost tracking and budgeting - Easy to add new providers - Caching reduces API costs ### Negative - Dependency on LiteLLM library - May lag behind provider SDK features - Additional abstraction layer ### Mitigation - Pin LiteLLM version, test before upgrades - Direct SDK access available if needed - Monitor LiteLLM updates for breaking changes ## Compliance This decision aligns with: - FR-101: Agent type model configuration - NFR-103: Agent response time targets - NFR-402: Failover requirements - TR-001: LLM API unavailability mitigation --- *This ADR supersedes any previous decisions regarding LLM integration.*