forked from cardosofelipe/fast-next-template

Files

Felipe Cardoso 6e3cdebbfb docs: add architecture decision records (ADRs) for key technical choices

- Added the following ADRs to `docs/adrs/` directory:
  - ADR-001: MCP Integration Architecture
  - ADR-002: Real-time Communication Architecture
  - ADR-003: Background Task Architecture
  - ADR-004: LLM Provider Abstraction
  - ADR-005: Technology Stack Selection
- Each ADR details the context, decision drivers, considered options, final decisions, and implementation plans.
- Documentation aligns technical choices with architecture principles and system requirements for Syndarix.

2025-12-29 13:16:02 +01:00

4.9 KiB

Raw Blame History

ADR-004: LLM Provider Abstraction

Status: Accepted Date: 2025-12-29 Deciders: Architecture Team Related Spikes: SPIKE-005

Context

Syndarix agents require access to large language models (LLMs) from multiple providers:

Anthropic (Claude) - Primary provider
OpenAI (GPT-4) - Fallback provider
Local models (Ollama/Llama) - Cost optimization, privacy

We need a unified abstraction layer that provides:

Consistent API across providers
Automatic failover on errors
Usage tracking and cost management
Rate limiting compliance

Decision Drivers

Reliability: Automatic failover on provider outages
Cost Control: Track and limit API spending
Flexibility: Easy to add/swap providers
Consistency: Single interface for all agents
Async Support: Compatible with async FastAPI

Considered Options

Option 1: Direct Provider SDKs

Use Anthropic and OpenAI SDKs directly with custom abstraction.

Pros:

Full control over implementation
No external dependencies

Cons:

Significant development effort
Must maintain failover logic
Must track token costs manually

Option 2: LiteLLM (Selected)

Use LiteLLM as unified abstraction layer.

Pros:

Unified API for 100+ providers
Built-in failover and routing
Automatic token counting
Cost tracking built-in
Redis caching support
Active community

Cons:

External dependency
May lag behind provider SDK updates

Option 3: LangChain

Use LangChain's LLM abstraction.

Pros:

Large ecosystem
Many integrations

Cons:

Heavy dependency
Overkill for just LLM abstraction
Complexity overhead

Decision

Adopt Option 2: LiteLLM for unified LLM provider abstraction.

LiteLLM provides the reliability, monitoring, and multi-provider support needed with minimal overhead.

Implementation

Model Groups

Group Name	Use Case	Primary Model	Fallback
`high-reasoning`	Complex analysis, architecture	Claude 3.5 Sonnet	GPT-4 Turbo
`fast-response`	Quick tasks, simple queries	Claude 3 Haiku	GPT-4o Mini
`cost-optimized`	High-volume, non-critical	Local Llama 3	Claude 3 Haiku

Failover Chain

Claude 3.5 Sonnet (Anthropic)
         │
         ▼ (on failure)
    GPT-4 Turbo (OpenAI)
         │
         ▼ (on failure)
    Llama 3 (Ollama/Local)
         │
         ▼ (on failure)
    Error with retry

LLM Gateway Service

class LLMGateway:
    def __init__(self):
        self.router = Router(
            model_list=model_list,
            fallbacks=[
                {"high-reasoning": ["high-reasoning", "local-fallback"]},
            ],
            routing_strategy="latency-based-routing",
            num_retries=3,
        )

    async def complete(
        self,
        agent_id: str,
        project_id: str,
        messages: list[dict],
        model_preference: str = "high-reasoning",
    ) -> dict:
        response = await self.router.acompletion(
            model=model_preference,
            messages=messages,
        )
        await self._track_usage(agent_id, project_id, response)
        return response

Cost Tracking

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude 3.5 Sonnet	$3.00	$15.00
Claude 3 Haiku	$0.25	$1.25
GPT-4 Turbo	$10.00	$30.00
GPT-4o Mini	$0.15	$0.60
Ollama (local)	$0.00	$0.00

Agent Type Mapping

Agent Type	Model Preference	Rationale
Product Owner	high-reasoning	Complex requirements analysis
Software Architect	high-reasoning	Architecture decisions
Software Engineer	high-reasoning	Code generation
QA Engineer	fast-response	Test case generation
DevOps Engineer	fast-response	Config generation
Project Manager	fast-response	Status updates

Caching Strategy

Redis-backed cache for repeated queries
TTL: 1 hour for general queries
Skip cache: For context-dependent generation
Cache key: Hash of (model, messages, temperature)

Consequences

Positive

Single interface for all LLM operations
Automatic failover improves reliability
Built-in cost tracking and budgeting
Easy to add new providers
Caching reduces API costs

Negative

Dependency on LiteLLM library
May lag behind provider SDK features
Additional abstraction layer

Mitigation

Pin LiteLLM version, test before upgrades
Direct SDK access available if needed
Monitor LiteLLM updates for breaking changes

Compliance

This decision aligns with:

FR-101: Agent type model configuration
NFR-103: Agent response time targets
NFR-402: Failover requirements
TR-001: LLM API unavailability mitigation

This ADR supersedes any previous decisions regarding LLM integration.

4.9 KiB Raw Blame History

ADR-004: LLM Provider Abstraction

Context

Decision Drivers

Considered Options

Option 1: Direct Provider SDKs

Option 2: LiteLLM (Selected)

Option 3: LangChain

Decision

Implementation

Model Groups

Failover Chain

LLM Gateway Service

Cost Tracking

Agent Type Mapping

Caching Strategy

Consequences

Positive

Negative

Mitigation

Compliance

4.9 KiB

Raw Blame History