Files
syndarix/docs/adrs/ADR-004-llm-provider-abstraction.md
Felipe Cardoso 6e3cdebbfb docs: add architecture decision records (ADRs) for key technical choices
- Added the following ADRs to `docs/adrs/` directory:
  - ADR-001: MCP Integration Architecture
  - ADR-002: Real-time Communication Architecture
  - ADR-003: Background Task Architecture
  - ADR-004: LLM Provider Abstraction
  - ADR-005: Technology Stack Selection
- Each ADR details the context, decision drivers, considered options, final decisions, and implementation plans.
- Documentation aligns technical choices with architecture principles and system requirements for Syndarix.
2025-12-29 13:16:02 +01:00

4.9 KiB

ADR-004: LLM Provider Abstraction

Status: Accepted Date: 2025-12-29 Deciders: Architecture Team Related Spikes: SPIKE-005


Context

Syndarix agents require access to large language models (LLMs) from multiple providers:

  • Anthropic (Claude) - Primary provider
  • OpenAI (GPT-4) - Fallback provider
  • Local models (Ollama/Llama) - Cost optimization, privacy

We need a unified abstraction layer that provides:

  • Consistent API across providers
  • Automatic failover on errors
  • Usage tracking and cost management
  • Rate limiting compliance

Decision Drivers

  • Reliability: Automatic failover on provider outages
  • Cost Control: Track and limit API spending
  • Flexibility: Easy to add/swap providers
  • Consistency: Single interface for all agents
  • Async Support: Compatible with async FastAPI

Considered Options

Option 1: Direct Provider SDKs

Use Anthropic and OpenAI SDKs directly with custom abstraction.

Pros:

  • Full control over implementation
  • No external dependencies

Cons:

  • Significant development effort
  • Must maintain failover logic
  • Must track token costs manually

Option 2: LiteLLM (Selected)

Use LiteLLM as unified abstraction layer.

Pros:

  • Unified API for 100+ providers
  • Built-in failover and routing
  • Automatic token counting
  • Cost tracking built-in
  • Redis caching support
  • Active community

Cons:

  • External dependency
  • May lag behind provider SDK updates

Option 3: LangChain

Use LangChain's LLM abstraction.

Pros:

  • Large ecosystem
  • Many integrations

Cons:

  • Heavy dependency
  • Overkill for just LLM abstraction
  • Complexity overhead

Decision

Adopt Option 2: LiteLLM for unified LLM provider abstraction.

LiteLLM provides the reliability, monitoring, and multi-provider support needed with minimal overhead.

Implementation

Model Groups

Group Name Use Case Primary Model Fallback
high-reasoning Complex analysis, architecture Claude 3.5 Sonnet GPT-4 Turbo
fast-response Quick tasks, simple queries Claude 3 Haiku GPT-4o Mini
cost-optimized High-volume, non-critical Local Llama 3 Claude 3 Haiku

Failover Chain

Claude 3.5 Sonnet (Anthropic)
         │
         ▼ (on failure)
    GPT-4 Turbo (OpenAI)
         │
         ▼ (on failure)
    Llama 3 (Ollama/Local)
         │
         ▼ (on failure)
    Error with retry

LLM Gateway Service

class LLMGateway:
    def __init__(self):
        self.router = Router(
            model_list=model_list,
            fallbacks=[
                {"high-reasoning": ["high-reasoning", "local-fallback"]},
            ],
            routing_strategy="latency-based-routing",
            num_retries=3,
        )

    async def complete(
        self,
        agent_id: str,
        project_id: str,
        messages: list[dict],
        model_preference: str = "high-reasoning",
    ) -> dict:
        response = await self.router.acompletion(
            model=model_preference,
            messages=messages,
        )
        await self._track_usage(agent_id, project_id, response)
        return response

Cost Tracking

Model Input (per 1M tokens) Output (per 1M tokens)
Claude 3.5 Sonnet $3.00 $15.00
Claude 3 Haiku $0.25 $1.25
GPT-4 Turbo $10.00 $30.00
GPT-4o Mini $0.15 $0.60
Ollama (local) $0.00 $0.00

Agent Type Mapping

Agent Type Model Preference Rationale
Product Owner high-reasoning Complex requirements analysis
Software Architect high-reasoning Architecture decisions
Software Engineer high-reasoning Code generation
QA Engineer fast-response Test case generation
DevOps Engineer fast-response Config generation
Project Manager fast-response Status updates

Caching Strategy

  • Redis-backed cache for repeated queries
  • TTL: 1 hour for general queries
  • Skip cache: For context-dependent generation
  • Cache key: Hash of (model, messages, temperature)

Consequences

Positive

  • Single interface for all LLM operations
  • Automatic failover improves reliability
  • Built-in cost tracking and budgeting
  • Easy to add new providers
  • Caching reduces API costs

Negative

  • Dependency on LiteLLM library
  • May lag behind provider SDK features
  • Additional abstraction layer

Mitigation

  • Pin LiteLLM version, test before upgrades
  • Direct SDK access available if needed
  • Monitor LiteLLM updates for breaking changes

Compliance

This decision aligns with:

  • FR-101: Agent type model configuration
  • NFR-103: Agent response time targets
  • NFR-402: Failover requirements
  • TR-001: LLM API unavailability mitigation

This ADR supersedes any previous decisions regarding LLM integration.