[SPIKE-005] LLM Provider Abstraction #5

Closed
opened 2025-12-29 03:50:15 +00:00 by cardosofelipe · 1 comment

Objective

Design an abstraction layer that supports multiple LLM providers with failover capability.

Providers to Support

  1. Anthropic (Claude) - Primary
  2. OpenAI (GPT-4) - Secondary/Failover
  3. Ollama - Self-hosted option

Key Questions

  1. How do we abstract provider-specific APIs?
  2. How do we handle failover between providers?
  3. How do we normalize tool/function calling across providers?
  4. How do we track token usage and costs per provider?
  5. How do we handle streaming responses uniformly?

Research Areas

  • LiteLLM or similar abstraction libraries
  • Provider-specific quirks (tool calling, context limits)
  • Failover patterns and health checks
  • Cost tracking per model/call

Expected Deliverables

  • Unified LLM client interface
  • Provider implementations (Anthropic, OpenAI)
  • Failover logic
  • Cost tracking integration
  • ADR documenting the pattern

Acceptance Criteria

  • Same code works with multiple providers
  • Automatic failover on provider error
  • Tool calling works uniformly
  • Token/cost tracking functional
  • Streaming works with all providers

Labels

spike, architecture, llm

## Objective Design an abstraction layer that supports multiple LLM providers with failover capability. ## Providers to Support 1. **Anthropic** (Claude) - Primary 2. **OpenAI** (GPT-4) - Secondary/Failover 3. **Ollama** - Self-hosted option ## Key Questions 1. How do we abstract provider-specific APIs? 2. How do we handle failover between providers? 3. How do we normalize tool/function calling across providers? 4. How do we track token usage and costs per provider? 5. How do we handle streaming responses uniformly? ## Research Areas - [ ] LiteLLM or similar abstraction libraries - [ ] Provider-specific quirks (tool calling, context limits) - [ ] Failover patterns and health checks - [ ] Cost tracking per model/call ## Expected Deliverables - Unified LLM client interface - Provider implementations (Anthropic, OpenAI) - Failover logic - Cost tracking integration - ADR documenting the pattern ## Acceptance Criteria - [ ] Same code works with multiple providers - [ ] Automatic failover on provider error - [ ] Tool calling works uniformly - [ ] Token/cost tracking functional - [ ] Streaming works with all providers ## Labels `spike`, `architecture`, `llm`
Author
Owner

Spike Completed

Research completed and documented in:

  • Spike Document: docs/spikes/SPIKE-005-llm-provider-abstraction.md
  • ADR: docs/adrs/ADR-004-llm-provider-abstraction.md

Key Findings:

  • LiteLLM provides unified API for 100+ LLM providers
  • Built-in failover and routing with latency-based strategy
  • Model groups: high-reasoning (Claude 3.5 Sonnet), fast-response (Claude 3 Haiku)
  • Cost tracking per agent/project with token usage
  • Redis-backed caching for repeated queries

Decision:

Adopt LiteLLM as the unified LLM abstraction layer with automatic failover, usage-based routing, and Redis-backed caching.

This spike can be closed.

## Spike Completed Research completed and documented in: - **Spike Document:** `docs/spikes/SPIKE-005-llm-provider-abstraction.md` - **ADR:** `docs/adrs/ADR-004-llm-provider-abstraction.md` ### Key Findings: - **LiteLLM** provides unified API for 100+ LLM providers - Built-in **failover and routing** with latency-based strategy - Model groups: `high-reasoning` (Claude 3.5 Sonnet), `fast-response` (Claude 3 Haiku) - **Cost tracking** per agent/project with token usage - **Redis-backed caching** for repeated queries ### Decision: Adopt LiteLLM as the unified LLM abstraction layer with automatic failover, usage-based routing, and Redis-backed caching. This spike can be closed.
Sign in to join this conversation.