docs: add architecture decision records (ADRs) for key technical choices

- Added the following ADRs to `docs/adrs/` directory: - ADR-001: MCP Integration Architecture - ADR-002: Real-time Communication Architecture - ADR-003: Background Task Architecture - ADR-004: LLM Provider Abstraction - ADR-005: Technology Stack Selection - Each ADR details the context, decision drivers, considered options, final decisions, and implementation plans. - Documentation aligns technical choices with architecture principles and system requirements for Syndarix.
2025-12-29 13:16:02 +01:00
parent a6a336b66e
commit 6e3cdebbfb
7 changed files with 1565 additions and 0 deletions
--- a/docs/adrs/ADR-004-llm-provider-abstraction.md
+++ b/docs/adrs/ADR-004-llm-provider-abstraction.md
@@ -0,0 +1,189 @@
+# ADR-004: LLM Provider Abstraction
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+**Related Spikes:** SPIKE-005
+
+---
+
+## Context
+
+Syndarix agents require access to large language models (LLMs) from multiple providers:
+- **Anthropic** (Claude) - Primary provider
+- **OpenAI** (GPT-4) - Fallback provider
+- **Local models** (Ollama/Llama) - Cost optimization, privacy
+
+We need a unified abstraction layer that provides:
+- Consistent API across providers
+- Automatic failover on errors
+- Usage tracking and cost management
+- Rate limiting compliance
+
+## Decision Drivers
+
+- **Reliability:** Automatic failover on provider outages
+- **Cost Control:** Track and limit API spending
+- **Flexibility:** Easy to add/swap providers
+- **Consistency:** Single interface for all agents
+- **Async Support:** Compatible with async FastAPI
+
+## Considered Options
+
+### Option 1: Direct Provider SDKs
+Use Anthropic and OpenAI SDKs directly with custom abstraction.
+
+**Pros:**
+- Full control over implementation
+- No external dependencies
+
+**Cons:**
+- Significant development effort
+- Must maintain failover logic
+- Must track token costs manually
+
+### Option 2: LiteLLM (Selected)
+Use LiteLLM as unified abstraction layer.
+
+**Pros:**
+- Unified API for 100+ providers
+- Built-in failover and routing
+- Automatic token counting
+- Cost tracking built-in
+- Redis caching support
+- Active community
+
+**Cons:**
+- External dependency
+- May lag behind provider SDK updates
+
+### Option 3: LangChain
+Use LangChain's LLM abstraction.
+
+**Pros:**
+- Large ecosystem
+- Many integrations
+
+**Cons:**
+- Heavy dependency
+- Overkill for just LLM abstraction
+- Complexity overhead
+
+## Decision
+
+**Adopt Option 2: LiteLLM for unified LLM provider abstraction.**
+
+LiteLLM provides the reliability, monitoring, and multi-provider support needed with minimal overhead.
+
+## Implementation
+
+### Model Groups
+
+| Group Name | Use Case | Primary Model | Fallback |
+|------------|----------|---------------|----------|
+| `high-reasoning` | Complex analysis, architecture | Claude 3.5 Sonnet | GPT-4 Turbo |
+| `fast-response` | Quick tasks, simple queries | Claude 3 Haiku | GPT-4o Mini |
+| `cost-optimized` | High-volume, non-critical | Local Llama 3 | Claude 3 Haiku |
+
+### Failover Chain
+
+```
+Claude 3.5 Sonnet (Anthropic)
+         │
+         ▼ (on failure)
+    GPT-4 Turbo (OpenAI)
+         │
+         ▼ (on failure)
+    Llama 3 (Ollama/Local)
+         │
+         ▼ (on failure)
+    Error with retry
+```
+
+### LLM Gateway Service
+
+```python
+class LLMGateway:
+    def __init__(self):
+        self.router = Router(
+            model_list=model_list,
+            fallbacks=[
+                {"high-reasoning": ["high-reasoning", "local-fallback"]},
+            ],
+            routing_strategy="latency-based-routing",
+            num_retries=3,
+        )
+
+    async def complete(
+        self,
+        agent_id: str,
+        project_id: str,
+        messages: list[dict],
+        model_preference: str = "high-reasoning",
+    ) -> dict:
+        response = await self.router.acompletion(
+            model=model_preference,
+            messages=messages,
+        )
+        await self._track_usage(agent_id, project_id, response)
+        return response
+```
+
+### Cost Tracking
+
+| Model | Input (per 1M tokens) | Output (per 1M tokens) |
+|-------|----------------------|------------------------|
+| Claude 3.5 Sonnet | $3.00 | $15.00 |
+| Claude 3 Haiku | $0.25 | $1.25 |
+| GPT-4 Turbo | $10.00 | $30.00 |
+| GPT-4o Mini | $0.15 | $0.60 |
+| Ollama (local) | $0.00 | $0.00 |
+
+### Agent Type Mapping
+
+| Agent Type | Model Preference | Rationale |
+|------------|------------------|-----------|
+| Product Owner | high-reasoning | Complex requirements analysis |
+| Software Architect | high-reasoning | Architecture decisions |
+| Software Engineer | high-reasoning | Code generation |
+| QA Engineer | fast-response | Test case generation |
+| DevOps Engineer | fast-response | Config generation |
+| Project Manager | fast-response | Status updates |
+
+### Caching Strategy
+
+- **Redis-backed cache** for repeated queries
+- **TTL:** 1 hour for general queries
+- **Skip cache:** For context-dependent generation
+- **Cache key:** Hash of (model, messages, temperature)
+
+## Consequences
+
+### Positive
+- Single interface for all LLM operations
+- Automatic failover improves reliability
+- Built-in cost tracking and budgeting
+- Easy to add new providers
+- Caching reduces API costs
+
+### Negative
+- Dependency on LiteLLM library
+- May lag behind provider SDK features
+- Additional abstraction layer
+
+### Mitigation
+- Pin LiteLLM version, test before upgrades
+- Direct SDK access available if needed
+- Monitor LiteLLM updates for breaking changes
+
+## Compliance
+
+This decision aligns with:
+- FR-101: Agent type model configuration
+- NFR-103: Agent response time targets
+- NFR-402: Failover requirements
+- TR-001: LLM API unavailability mitigation
+
+---
+
+*This ADR supersedes any previous decisions regarding LLM integration.*