forked from cardosofelipe/pragma-stack
Compare commits
2 Commits
9901dc7f51
...
6e3cdebbfb
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
6e3cdebbfb | ||
|
|
a6a336b66e |
134
docs/adrs/ADR-001-mcp-integration-architecture.md
Normal file
134
docs/adrs/ADR-001-mcp-integration-architecture.md
Normal file
@@ -0,0 +1,134 @@
|
|||||||
|
# ADR-001: MCP Integration Architecture
|
||||||
|
|
||||||
|
**Status:** Accepted
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Deciders:** Architecture Team
|
||||||
|
**Related Spikes:** SPIKE-001
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Syndarix requires integration with multiple external services (LLM providers, Git, issue trackers, file systems, CI/CD). The Model Context Protocol (MCP) was identified as the standard for tool integration in AI applications. We need to decide on:
|
||||||
|
|
||||||
|
1. The MCP framework to use
|
||||||
|
2. Server deployment pattern (singleton vs per-project)
|
||||||
|
3. Scoping mechanism for multi-project/multi-agent access
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
- **Simplicity:** Minimize operational complexity
|
||||||
|
- **Resource Efficiency:** Avoid spawning redundant processes
|
||||||
|
- **Consistency:** Unified interface across all integrations
|
||||||
|
- **Scalability:** Support 10+ concurrent projects
|
||||||
|
- **Maintainability:** Easy to add new MCP servers
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
### Option 1: Per-Project MCP Servers
|
||||||
|
Spawn dedicated MCP server instances for each project.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Complete isolation between projects
|
||||||
|
- Simple access control (project owns server)
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Resource heavy (7 servers × N projects)
|
||||||
|
- Complex orchestration
|
||||||
|
- Difficult to share cross-project resources
|
||||||
|
|
||||||
|
### Option 2: Unified Singleton MCP Servers (Selected)
|
||||||
|
Single instance of each MCP server type, with explicit project/agent scoping.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Resource efficient (7 total servers)
|
||||||
|
- Simpler deployment
|
||||||
|
- Enables cross-project learning (if desired)
|
||||||
|
- Consistent management
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Requires explicit scoping in all tools
|
||||||
|
- Shared state requires careful design
|
||||||
|
|
||||||
|
### Option 3: Hybrid (MCP Proxy)
|
||||||
|
Single proxy that routes to per-project backends.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Balance of isolation and efficiency
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Added complexity
|
||||||
|
- Routing overhead
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt Option 2: Unified Singleton MCP Servers with explicit scoping.**
|
||||||
|
|
||||||
|
All MCP servers are deployed as singletons. Every tool accepts `project_id` and `agent_id` parameters for:
|
||||||
|
- Access control validation
|
||||||
|
- Audit logging
|
||||||
|
- Context filtering
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### MCP Server Registry
|
||||||
|
|
||||||
|
| Server | Port | Purpose |
|
||||||
|
|--------|------|---------|
|
||||||
|
| LLM Gateway | 9001 | Route LLM requests with failover |
|
||||||
|
| Git MCP | 9002 | Git operations across providers |
|
||||||
|
| Knowledge Base MCP | 9003 | RAG and document search |
|
||||||
|
| Issues MCP | 9004 | Issue tracking operations |
|
||||||
|
| File System MCP | 9005 | Workspace file operations |
|
||||||
|
| Code Analysis MCP | 9006 | Static analysis, linting |
|
||||||
|
| CI/CD MCP | 9007 | Pipeline operations |
|
||||||
|
|
||||||
|
### Framework Selection
|
||||||
|
|
||||||
|
Use **FastMCP 2.0** for all MCP server implementations:
|
||||||
|
- Decorator-based tool registration
|
||||||
|
- Built-in async support
|
||||||
|
- Compatible with SSE transport
|
||||||
|
- Type-safe with Pydantic
|
||||||
|
|
||||||
|
### Tool Signature Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
@mcp.tool()
|
||||||
|
def tool_name(
|
||||||
|
project_id: str, # Required: project scope
|
||||||
|
agent_id: str, # Required: calling agent
|
||||||
|
# ... tool-specific params
|
||||||
|
) -> Result:
|
||||||
|
validate_access(agent_id, project_id)
|
||||||
|
log_tool_usage(agent_id, project_id, "tool_name")
|
||||||
|
# ... implementation
|
||||||
|
```
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
- Single deployment per MCP type simplifies operations
|
||||||
|
- Consistent interface across all tools
|
||||||
|
- Easy to add monitoring/logging centrally
|
||||||
|
- Cross-project analytics possible
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
- All tools must include scoping parameters
|
||||||
|
- Shared state requires careful design
|
||||||
|
- Single point of failure per MCP type (mitigated by multiple instances)
|
||||||
|
|
||||||
|
### Neutral
|
||||||
|
- Requires MCP client manager in FastAPI backend
|
||||||
|
- Authentication handled internally (service tokens for v1)
|
||||||
|
|
||||||
|
## Compliance
|
||||||
|
|
||||||
|
This decision aligns with:
|
||||||
|
- FR-802: MCP-first architecture requirement
|
||||||
|
- NFR-201: Horizontal scalability requirement
|
||||||
|
- NFR-602: Centralized logging requirement
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This ADR supersedes any previous decisions regarding MCP architecture.*
|
||||||
160
docs/adrs/ADR-002-realtime-communication.md
Normal file
160
docs/adrs/ADR-002-realtime-communication.md
Normal file
@@ -0,0 +1,160 @@
|
|||||||
|
# ADR-002: Real-time Communication Architecture
|
||||||
|
|
||||||
|
**Status:** Accepted
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Deciders:** Architecture Team
|
||||||
|
**Related Spikes:** SPIKE-003
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Syndarix requires real-time communication for:
|
||||||
|
- Agent activity streams
|
||||||
|
- Project progress updates
|
||||||
|
- Build/pipeline status
|
||||||
|
- Client approval requests
|
||||||
|
- Issue change notifications
|
||||||
|
- Interactive chat with agents
|
||||||
|
|
||||||
|
We need to decide between WebSocket and Server-Sent Events (SSE) for real-time data delivery.
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
- **Simplicity:** Minimize implementation complexity
|
||||||
|
- **Reliability:** Built-in reconnection handling
|
||||||
|
- **Scalability:** Support 200+ concurrent connections
|
||||||
|
- **Compatibility:** Work through proxies and load balancers
|
||||||
|
- **Use Case Fit:** Match communication patterns
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
### Option 1: WebSocket Only
|
||||||
|
Use WebSocket for all real-time communication.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Bidirectional communication
|
||||||
|
- Single protocol to manage
|
||||||
|
- Well-supported in FastAPI
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Manual reconnection logic required
|
||||||
|
- More complex through proxies
|
||||||
|
- Overkill for server-to-client streams
|
||||||
|
|
||||||
|
### Option 2: SSE Only
|
||||||
|
Use Server-Sent Events for all real-time communication.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Built-in automatic reconnection
|
||||||
|
- Native HTTP (proxy-friendly)
|
||||||
|
- Simpler implementation
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Unidirectional only
|
||||||
|
- Browser connection limits per domain
|
||||||
|
|
||||||
|
### Option 3: SSE Primary + WebSocket for Chat (Selected)
|
||||||
|
Use SSE for server-to-client events, WebSocket for bidirectional chat.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Best tool for each use case
|
||||||
|
- SSE simplicity for 90% of needs
|
||||||
|
- WebSocket only where truly needed
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Two protocols to manage
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt Option 3: SSE as primary transport, WebSocket for interactive chat.**
|
||||||
|
|
||||||
|
### SSE Use Cases (90%)
|
||||||
|
- Agent activity streams
|
||||||
|
- Project progress updates
|
||||||
|
- Build/pipeline status
|
||||||
|
- Approval request notifications
|
||||||
|
- Issue change notifications
|
||||||
|
|
||||||
|
### WebSocket Use Cases (10%)
|
||||||
|
- Interactive chat with agents
|
||||||
|
- Real-time debugging sessions
|
||||||
|
- Future collaboration features
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Event Bus with Redis Pub/Sub
|
||||||
|
|
||||||
|
```
|
||||||
|
FastAPI Backend ──publish──> Redis Pub/Sub ──subscribe──> SSE Endpoints
|
||||||
|
│
|
||||||
|
└──> Other Backend Instances
|
||||||
|
```
|
||||||
|
|
||||||
|
### SSE Endpoint Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
@router.get("/projects/{project_id}/events")
|
||||||
|
async def project_events(project_id: str, request: Request):
|
||||||
|
async def event_generator():
|
||||||
|
subscriber = await event_bus.subscribe(f"project:{project_id}")
|
||||||
|
try:
|
||||||
|
while not await request.is_disconnected():
|
||||||
|
event = await asyncio.wait_for(
|
||||||
|
subscriber.get_event(), timeout=30.0
|
||||||
|
)
|
||||||
|
yield f"event: {event.type}\ndata: {event.json()}\n\n"
|
||||||
|
finally:
|
||||||
|
await subscriber.unsubscribe()
|
||||||
|
|
||||||
|
return StreamingResponse(
|
||||||
|
event_generator(),
|
||||||
|
media_type="text/event-stream"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Event Types
|
||||||
|
|
||||||
|
| Category | Event Types |
|
||||||
|
|----------|-------------|
|
||||||
|
| Agent | `agent_started`, `agent_activity`, `agent_completed`, `agent_error` |
|
||||||
|
| Project | `issue_created`, `issue_updated`, `issue_closed` |
|
||||||
|
| Git | `branch_created`, `commit_pushed`, `pr_created`, `pr_merged` |
|
||||||
|
| Workflow | `approval_required`, `sprint_started`, `sprint_completed` |
|
||||||
|
| Pipeline | `pipeline_started`, `pipeline_completed`, `pipeline_failed` |
|
||||||
|
|
||||||
|
### Client Implementation
|
||||||
|
|
||||||
|
- Single SSE connection per project
|
||||||
|
- Event multiplexing through event types
|
||||||
|
- Exponential backoff on reconnection
|
||||||
|
- Native `EventSource` API with automatic reconnect
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
- Simpler implementation for server-to-client streams
|
||||||
|
- Automatic reconnection reduces client complexity
|
||||||
|
- Works through all HTTP proxies
|
||||||
|
- Reduced server resource usage vs WebSocket
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
- Two protocols to maintain
|
||||||
|
- WebSocket requires manual reconnect logic
|
||||||
|
- SSE limited to ~6 connections per domain (HTTP/1.1)
|
||||||
|
|
||||||
|
### Mitigation
|
||||||
|
- Use HTTP/2 where possible (higher connection limits)
|
||||||
|
- Multiplex all project events on single connection
|
||||||
|
- WebSocket only for interactive chat sessions
|
||||||
|
|
||||||
|
## Compliance
|
||||||
|
|
||||||
|
This decision aligns with:
|
||||||
|
- FR-105: Real-time agent activity monitoring
|
||||||
|
- NFR-102: 200+ concurrent connections requirement
|
||||||
|
- NFR-501: Responsive UI updates
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This ADR supersedes any previous decisions regarding real-time communication.*
|
||||||
179
docs/adrs/ADR-003-background-task-architecture.md
Normal file
179
docs/adrs/ADR-003-background-task-architecture.md
Normal file
@@ -0,0 +1,179 @@
|
|||||||
|
# ADR-003: Background Task Architecture
|
||||||
|
|
||||||
|
**Status:** Accepted
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Deciders:** Architecture Team
|
||||||
|
**Related Spikes:** SPIKE-004
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Syndarix requires background task processing for:
|
||||||
|
- Agent actions (LLM calls, code generation)
|
||||||
|
- Git operations (clone, commit, push, PR creation)
|
||||||
|
- External synchronization (issue sync with Gitea/GitHub/GitLab)
|
||||||
|
- CI/CD pipeline triggers
|
||||||
|
- Long-running workflows (sprints, story implementation)
|
||||||
|
|
||||||
|
These tasks are too slow for synchronous API responses and need proper queuing, retry, and monitoring.
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
- **Reliability:** Tasks must complete even if workers restart
|
||||||
|
- **Visibility:** Progress tracking for long-running operations
|
||||||
|
- **Scalability:** Handle concurrent agent operations
|
||||||
|
- **Rate Limiting:** Respect LLM API rate limits
|
||||||
|
- **Async Compatibility:** Work with async FastAPI
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
### Option 1: FastAPI BackgroundTasks
|
||||||
|
Use FastAPI's built-in background tasks.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Simple, no additional infrastructure
|
||||||
|
- Direct async integration
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- No persistence (lost on restart)
|
||||||
|
- No retry mechanism
|
||||||
|
- No distributed workers
|
||||||
|
|
||||||
|
### Option 2: Celery + Redis (Selected)
|
||||||
|
Use Celery for task queue with Redis as broker/backend.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Mature, battle-tested
|
||||||
|
- Persistent task queue
|
||||||
|
- Built-in retry with backoff
|
||||||
|
- Distributed workers
|
||||||
|
- Task chaining and workflows
|
||||||
|
- Monitoring with Flower
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Additional infrastructure
|
||||||
|
- Sync-only task execution (bridge needed for async)
|
||||||
|
|
||||||
|
### Option 3: Dramatiq + Redis
|
||||||
|
Use Dramatiq as a simpler Celery alternative.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Simpler API than Celery
|
||||||
|
- Good async support
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Less mature ecosystem
|
||||||
|
- Fewer monitoring tools
|
||||||
|
|
||||||
|
### Option 4: ARQ (Async Redis Queue)
|
||||||
|
Use ARQ for native async task processing.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Native async
|
||||||
|
- Simple API
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Less feature-rich
|
||||||
|
- Smaller community
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt Option 2: Celery + Redis.**
|
||||||
|
|
||||||
|
Celery provides the reliability, monitoring, and ecosystem maturity needed for production workloads. Redis serves as both broker and result backend.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Queue Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────┐
|
||||||
|
│ Redis (Broker + Backend) │
|
||||||
|
├─────────────┬─────────────┬─────────────────────┤
|
||||||
|
│ agent_queue │ git_queue │ sync_queue │
|
||||||
|
│ (prefetch=1)│ (prefetch=4)│ (prefetch=4) │
|
||||||
|
└──────┬──────┴──────┬──────┴──────────┬──────────┘
|
||||||
|
│ │ │
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||||
|
│ Agent │ │ Git │ │ Sync │
|
||||||
|
│ Workers │ │ Workers │ │ Workers │
|
||||||
|
└─────────┘ └─────────┘ └─────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Queue Configuration
|
||||||
|
|
||||||
|
| Queue | Prefetch | Concurrency | Purpose |
|
||||||
|
|-------|----------|-------------|---------|
|
||||||
|
| `agent_queue` | 1 | 4 | LLM-based tasks (rate limited) |
|
||||||
|
| `git_queue` | 4 | 8 | Git operations |
|
||||||
|
| `sync_queue` | 4 | 4 | External sync |
|
||||||
|
| `cicd_queue` | 4 | 4 | Pipeline operations |
|
||||||
|
|
||||||
|
### Task Patterns
|
||||||
|
|
||||||
|
**Progress Reporting:**
|
||||||
|
```python
|
||||||
|
@celery_app.task(bind=True)
|
||||||
|
def implement_story(self, story_id: str, agent_id: str, project_id: str):
|
||||||
|
for i, step in enumerate(steps):
|
||||||
|
self.update_state(
|
||||||
|
state="PROGRESS",
|
||||||
|
meta={"current": i + 1, "total": len(steps)}
|
||||||
|
)
|
||||||
|
# Publish SSE event for real-time UI update
|
||||||
|
event_bus.publish(f"project:{project_id}", {
|
||||||
|
"type": "agent_progress",
|
||||||
|
"step": i + 1,
|
||||||
|
"total": len(steps)
|
||||||
|
})
|
||||||
|
execute_step(step)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Task Chaining:**
|
||||||
|
```python
|
||||||
|
workflow = chain(
|
||||||
|
analyze_requirements.s(story_id),
|
||||||
|
design_solution.s(),
|
||||||
|
implement_code.s(),
|
||||||
|
run_tests.s(),
|
||||||
|
create_pr.s()
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Monitoring
|
||||||
|
|
||||||
|
- **Flower:** Web UI for task monitoring (port 5555)
|
||||||
|
- **Prometheus:** Metrics export for alerting
|
||||||
|
- **Dead Letter Queue:** Failed tasks for investigation
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
- Reliable task execution with persistence
|
||||||
|
- Automatic retry with exponential backoff
|
||||||
|
- Progress tracking for long operations
|
||||||
|
- Distributed workers for scalability
|
||||||
|
- Rich monitoring and debugging tools
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
- Additional infrastructure (Redis, workers)
|
||||||
|
- Celery is synchronous (event_loop bridge for async calls)
|
||||||
|
- Learning curve for task patterns
|
||||||
|
|
||||||
|
### Mitigation
|
||||||
|
- Use existing Redis instance (already needed for SSE)
|
||||||
|
- Wrap async calls with `asyncio.run()` or `sync_to_async`
|
||||||
|
- Document common task patterns
|
||||||
|
|
||||||
|
## Compliance
|
||||||
|
|
||||||
|
This decision aligns with:
|
||||||
|
- FR-304: Long-running implementation workflow
|
||||||
|
- NFR-102: 500+ background jobs per minute
|
||||||
|
- NFR-402: Task reliability and fault tolerance
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This ADR supersedes any previous decisions regarding background task processing.*
|
||||||
189
docs/adrs/ADR-004-llm-provider-abstraction.md
Normal file
189
docs/adrs/ADR-004-llm-provider-abstraction.md
Normal file
@@ -0,0 +1,189 @@
|
|||||||
|
# ADR-004: LLM Provider Abstraction
|
||||||
|
|
||||||
|
**Status:** Accepted
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Deciders:** Architecture Team
|
||||||
|
**Related Spikes:** SPIKE-005
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Syndarix agents require access to large language models (LLMs) from multiple providers:
|
||||||
|
- **Anthropic** (Claude) - Primary provider
|
||||||
|
- **OpenAI** (GPT-4) - Fallback provider
|
||||||
|
- **Local models** (Ollama/Llama) - Cost optimization, privacy
|
||||||
|
|
||||||
|
We need a unified abstraction layer that provides:
|
||||||
|
- Consistent API across providers
|
||||||
|
- Automatic failover on errors
|
||||||
|
- Usage tracking and cost management
|
||||||
|
- Rate limiting compliance
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
- **Reliability:** Automatic failover on provider outages
|
||||||
|
- **Cost Control:** Track and limit API spending
|
||||||
|
- **Flexibility:** Easy to add/swap providers
|
||||||
|
- **Consistency:** Single interface for all agents
|
||||||
|
- **Async Support:** Compatible with async FastAPI
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
### Option 1: Direct Provider SDKs
|
||||||
|
Use Anthropic and OpenAI SDKs directly with custom abstraction.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Full control over implementation
|
||||||
|
- No external dependencies
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Significant development effort
|
||||||
|
- Must maintain failover logic
|
||||||
|
- Must track token costs manually
|
||||||
|
|
||||||
|
### Option 2: LiteLLM (Selected)
|
||||||
|
Use LiteLLM as unified abstraction layer.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Unified API for 100+ providers
|
||||||
|
- Built-in failover and routing
|
||||||
|
- Automatic token counting
|
||||||
|
- Cost tracking built-in
|
||||||
|
- Redis caching support
|
||||||
|
- Active community
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- External dependency
|
||||||
|
- May lag behind provider SDK updates
|
||||||
|
|
||||||
|
### Option 3: LangChain
|
||||||
|
Use LangChain's LLM abstraction.
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
- Large ecosystem
|
||||||
|
- Many integrations
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
- Heavy dependency
|
||||||
|
- Overkill for just LLM abstraction
|
||||||
|
- Complexity overhead
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt Option 2: LiteLLM for unified LLM provider abstraction.**
|
||||||
|
|
||||||
|
LiteLLM provides the reliability, monitoring, and multi-provider support needed with minimal overhead.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Model Groups
|
||||||
|
|
||||||
|
| Group Name | Use Case | Primary Model | Fallback |
|
||||||
|
|------------|----------|---------------|----------|
|
||||||
|
| `high-reasoning` | Complex analysis, architecture | Claude 3.5 Sonnet | GPT-4 Turbo |
|
||||||
|
| `fast-response` | Quick tasks, simple queries | Claude 3 Haiku | GPT-4o Mini |
|
||||||
|
| `cost-optimized` | High-volume, non-critical | Local Llama 3 | Claude 3 Haiku |
|
||||||
|
|
||||||
|
### Failover Chain
|
||||||
|
|
||||||
|
```
|
||||||
|
Claude 3.5 Sonnet (Anthropic)
|
||||||
|
│
|
||||||
|
▼ (on failure)
|
||||||
|
GPT-4 Turbo (OpenAI)
|
||||||
|
│
|
||||||
|
▼ (on failure)
|
||||||
|
Llama 3 (Ollama/Local)
|
||||||
|
│
|
||||||
|
▼ (on failure)
|
||||||
|
Error with retry
|
||||||
|
```
|
||||||
|
|
||||||
|
### LLM Gateway Service
|
||||||
|
|
||||||
|
```python
|
||||||
|
class LLMGateway:
|
||||||
|
def __init__(self):
|
||||||
|
self.router = Router(
|
||||||
|
model_list=model_list,
|
||||||
|
fallbacks=[
|
||||||
|
{"high-reasoning": ["high-reasoning", "local-fallback"]},
|
||||||
|
],
|
||||||
|
routing_strategy="latency-based-routing",
|
||||||
|
num_retries=3,
|
||||||
|
)
|
||||||
|
|
||||||
|
async def complete(
|
||||||
|
self,
|
||||||
|
agent_id: str,
|
||||||
|
project_id: str,
|
||||||
|
messages: list[dict],
|
||||||
|
model_preference: str = "high-reasoning",
|
||||||
|
) -> dict:
|
||||||
|
response = await self.router.acompletion(
|
||||||
|
model=model_preference,
|
||||||
|
messages=messages,
|
||||||
|
)
|
||||||
|
await self._track_usage(agent_id, project_id, response)
|
||||||
|
return response
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cost Tracking
|
||||||
|
|
||||||
|
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|
||||||
|
|-------|----------------------|------------------------|
|
||||||
|
| Claude 3.5 Sonnet | $3.00 | $15.00 |
|
||||||
|
| Claude 3 Haiku | $0.25 | $1.25 |
|
||||||
|
| GPT-4 Turbo | $10.00 | $30.00 |
|
||||||
|
| GPT-4o Mini | $0.15 | $0.60 |
|
||||||
|
| Ollama (local) | $0.00 | $0.00 |
|
||||||
|
|
||||||
|
### Agent Type Mapping
|
||||||
|
|
||||||
|
| Agent Type | Model Preference | Rationale |
|
||||||
|
|------------|------------------|-----------|
|
||||||
|
| Product Owner | high-reasoning | Complex requirements analysis |
|
||||||
|
| Software Architect | high-reasoning | Architecture decisions |
|
||||||
|
| Software Engineer | high-reasoning | Code generation |
|
||||||
|
| QA Engineer | fast-response | Test case generation |
|
||||||
|
| DevOps Engineer | fast-response | Config generation |
|
||||||
|
| Project Manager | fast-response | Status updates |
|
||||||
|
|
||||||
|
### Caching Strategy
|
||||||
|
|
||||||
|
- **Redis-backed cache** for repeated queries
|
||||||
|
- **TTL:** 1 hour for general queries
|
||||||
|
- **Skip cache:** For context-dependent generation
|
||||||
|
- **Cache key:** Hash of (model, messages, temperature)
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
- Single interface for all LLM operations
|
||||||
|
- Automatic failover improves reliability
|
||||||
|
- Built-in cost tracking and budgeting
|
||||||
|
- Easy to add new providers
|
||||||
|
- Caching reduces API costs
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
- Dependency on LiteLLM library
|
||||||
|
- May lag behind provider SDK features
|
||||||
|
- Additional abstraction layer
|
||||||
|
|
||||||
|
### Mitigation
|
||||||
|
- Pin LiteLLM version, test before upgrades
|
||||||
|
- Direct SDK access available if needed
|
||||||
|
- Monitor LiteLLM updates for breaking changes
|
||||||
|
|
||||||
|
## Compliance
|
||||||
|
|
||||||
|
This decision aligns with:
|
||||||
|
- FR-101: Agent type model configuration
|
||||||
|
- NFR-103: Agent response time targets
|
||||||
|
- NFR-402: Failover requirements
|
||||||
|
- TR-001: LLM API unavailability mitigation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This ADR supersedes any previous decisions regarding LLM integration.*
|
||||||
156
docs/adrs/ADR-005-tech-stack-selection.md
Normal file
156
docs/adrs/ADR-005-tech-stack-selection.md
Normal file
@@ -0,0 +1,156 @@
|
|||||||
|
# ADR-005: Technology Stack Selection
|
||||||
|
|
||||||
|
**Status:** Accepted
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Deciders:** Architecture Team
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Syndarix needs a robust, modern technology stack that can support:
|
||||||
|
- Multi-agent orchestration with real-time communication
|
||||||
|
- Full-stack web application with API backend
|
||||||
|
- Background task processing for long-running operations
|
||||||
|
- Vector search for RAG (Retrieval-Augmented Generation)
|
||||||
|
- Multiple external integrations via MCP
|
||||||
|
|
||||||
|
The decision was made to build upon **PragmaStack** as the foundation, extending it with Syndarix-specific components.
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
- **Productivity:** Rapid development with modern frameworks
|
||||||
|
- **Type Safety:** Minimize runtime errors
|
||||||
|
- **Async Performance:** Handle concurrent agent operations
|
||||||
|
- **Ecosystem:** Rich library support
|
||||||
|
- **Familiarity:** Team expertise with selected technologies
|
||||||
|
- **Production-Ready:** Proven technologies for production workloads
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt PragmaStack as foundation with Syndarix-specific extensions.**
|
||||||
|
|
||||||
|
### Core Stack (from PragmaStack)
|
||||||
|
|
||||||
|
| Layer | Technology | Version | Rationale |
|
||||||
|
|-------|------------|---------|-----------|
|
||||||
|
| **Backend** | FastAPI | 0.115+ | Async, OpenAPI, type hints |
|
||||||
|
| **Backend Language** | Python | 3.11+ | Type hints, async/await, ecosystem |
|
||||||
|
| **Frontend** | Next.js | 16 | React 19, server components, App Router |
|
||||||
|
| **Frontend Language** | TypeScript | 5.0+ | Type safety, IDE support |
|
||||||
|
| **Database** | PostgreSQL | 15+ | Robust, extensible, pgvector |
|
||||||
|
| **ORM** | SQLAlchemy | 2.0+ | Async support, type hints |
|
||||||
|
| **Validation** | Pydantic | 2.0+ | Data validation, serialization |
|
||||||
|
| **State Management** | Zustand | 4.0+ | Simple, performant |
|
||||||
|
| **Data Fetching** | TanStack Query | 5.0+ | Caching, invalidation |
|
||||||
|
| **UI Components** | shadcn/ui | Latest | Accessible, customizable |
|
||||||
|
| **CSS** | Tailwind CSS | 4.0+ | Utility-first, fast styling |
|
||||||
|
| **Auth** | JWT | - | Dual-token (access + refresh) |
|
||||||
|
|
||||||
|
### Syndarix Extensions
|
||||||
|
|
||||||
|
| Component | Technology | Version | Purpose |
|
||||||
|
|-----------|------------|---------|---------|
|
||||||
|
| **Task Queue** | Celery | 5.3+ | Background job processing |
|
||||||
|
| **Message Broker** | Redis | 7.0+ | Celery broker, caching, pub/sub |
|
||||||
|
| **Vector Store** | pgvector | Latest | Embeddings for RAG |
|
||||||
|
| **MCP Framework** | FastMCP | 2.0+ | MCP server development |
|
||||||
|
| **LLM Abstraction** | LiteLLM | Latest | Multi-provider LLM access |
|
||||||
|
| **Real-time** | SSE + WebSocket | - | Event streaming, chat |
|
||||||
|
|
||||||
|
### Testing Stack
|
||||||
|
|
||||||
|
| Type | Technology | Purpose |
|
||||||
|
|------|------------|---------|
|
||||||
|
| **Backend Unit** | pytest | 8.0+ | Python testing |
|
||||||
|
| **Backend Async** | pytest-asyncio | Async test support |
|
||||||
|
| **Backend Coverage** | coverage.py | Code coverage |
|
||||||
|
| **Frontend Unit** | Jest | 29+ | React testing |
|
||||||
|
| **Frontend Components** | React Testing Library | Component testing |
|
||||||
|
| **E2E** | Playwright | 1.40+ | Browser automation |
|
||||||
|
|
||||||
|
### DevOps Stack
|
||||||
|
|
||||||
|
| Component | Technology | Purpose |
|
||||||
|
|-----------|------------|---------|
|
||||||
|
| **Containerization** | Docker | 24+ | Application packaging |
|
||||||
|
| **Orchestration** | Docker Compose | Local development |
|
||||||
|
| **CI/CD** | Gitea Actions | Automated pipelines |
|
||||||
|
| **Database Migrations** | Alembic | Schema versioning |
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Frontend (Next.js 16) │
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||||
|
│ │ Pages │ │ Components │ │ Stores │ │
|
||||||
|
│ │ (App Router)│ │ (shadcn/ui) │ │ (Zustand) │ │
|
||||||
|
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||||
|
└────────────────────────────┬────────────────────────────────────┘
|
||||||
|
│ REST + SSE + WebSocket
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Backend (FastAPI 0.115+) │
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||||
|
│ │ API │ │ Services │ │ CRUD │ │
|
||||||
|
│ │ Routes │ │ Layer │ │ Layer │ │
|
||||||
|
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||||
|
│ │ LLM Gateway │ │ MCP Client │ │ Event Bus │ │
|
||||||
|
│ │ (LiteLLM) │ │ Manager │ │ (Redis) │ │
|
||||||
|
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||||
|
└────────────────────────────┬────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌────────────────────┼────────────────────┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐
|
||||||
|
│ PostgreSQL │ │ Redis │ │ MCP Servers │
|
||||||
|
│ + pgvector │ │ (Cache/Queue) │ │ (LLM, Git, KB, Issues...) │
|
||||||
|
└───────────────┘ └───────────────┘ └───────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌───────────────┐
|
||||||
|
│ Celery │
|
||||||
|
│ Workers │
|
||||||
|
└───────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
- Proven, production-ready stack
|
||||||
|
- Strong typing throughout (Python + TypeScript)
|
||||||
|
- Excellent async performance
|
||||||
|
- Rich ecosystem for extensions
|
||||||
|
- Team familiarity reduces learning curve
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
- Python GIL limits CPU-bound concurrency (mitigated by Celery)
|
||||||
|
- Multiple languages (Python + TypeScript) to maintain
|
||||||
|
- PostgreSQL requires management (vs serverless options)
|
||||||
|
|
||||||
|
### Neutral
|
||||||
|
- PragmaStack provides solid foundation but may include unused features
|
||||||
|
- Stack is opinionated, limiting some technology choices
|
||||||
|
|
||||||
|
## Version Pinning Strategy
|
||||||
|
|
||||||
|
| Component | Strategy | Rationale |
|
||||||
|
|-----------|----------|-----------|
|
||||||
|
| Python | 3.11+ (specific minor) | Stability |
|
||||||
|
| Node.js | 20 LTS | Long-term support |
|
||||||
|
| FastAPI | 0.115+ | Latest stable |
|
||||||
|
| Next.js | 16 | Current major |
|
||||||
|
| PostgreSQL | 15+ | Required for features |
|
||||||
|
|
||||||
|
## Compliance
|
||||||
|
|
||||||
|
This decision aligns with:
|
||||||
|
- NFR-601: Code quality standards (TypeScript, type hints)
|
||||||
|
- NFR-603: Docker containerization requirement
|
||||||
|
- TC-001 through TC-006: Technical constraints
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This ADR establishes the foundational technology choices for Syndarix.*
|
||||||
260
docs/adrs/ADR-006-agent-orchestration.md
Normal file
260
docs/adrs/ADR-006-agent-orchestration.md
Normal file
@@ -0,0 +1,260 @@
|
|||||||
|
# ADR-006: Agent Orchestration Architecture
|
||||||
|
|
||||||
|
**Status:** Accepted
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Deciders:** Architecture Team
|
||||||
|
**Related Spikes:** SPIKE-002
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Syndarix requires an agent orchestration system that can:
|
||||||
|
- Define reusable agent types with specific capabilities
|
||||||
|
- Spawn multiple instances of the same type with unique identities
|
||||||
|
- Manage agent state, context, and conversation history
|
||||||
|
- Route messages between agents
|
||||||
|
- Handle agent failover and recovery
|
||||||
|
- Track resource usage per agent
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
- **Flexibility:** Support diverse agent roles and capabilities
|
||||||
|
- **Scalability:** Handle 50+ concurrent agent instances
|
||||||
|
- **Isolation:** Each instance maintains separate state
|
||||||
|
- **Observability:** Full visibility into agent activities
|
||||||
|
- **Reliability:** Graceful handling of failures
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt a Type-Instance pattern** where:
|
||||||
|
- **Agent Types** define templates (model, expertise, personality)
|
||||||
|
- **Agent Instances** are spawned from types with unique identities
|
||||||
|
- **Agent Orchestrator** manages lifecycle and communication
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Agent Type Definition
|
||||||
|
|
||||||
|
```python
|
||||||
|
class AgentType(Base):
|
||||||
|
id = Column(UUID, primary_key=True)
|
||||||
|
name = Column(String(50), unique=True) # "Software Engineer"
|
||||||
|
role = Column(Enum(AgentRole)) # ENGINEER
|
||||||
|
base_model = Column(String(100)) # "claude-3-5-sonnet-20241022"
|
||||||
|
failover_model = Column(String(100)) # "gpt-4-turbo"
|
||||||
|
expertise = Column(ARRAY(String)) # ["python", "fastapi", "testing"]
|
||||||
|
personality = Column(JSONB) # {"style": "detailed", "tone": "professional"}
|
||||||
|
system_prompt = Column(Text) # Base system prompt template
|
||||||
|
capabilities = Column(ARRAY(String)) # ["code_generation", "code_review"]
|
||||||
|
is_active = Column(Boolean, default=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Agent Instance Definition
|
||||||
|
|
||||||
|
```python
|
||||||
|
class AgentInstance(Base):
|
||||||
|
id = Column(UUID, primary_key=True)
|
||||||
|
name = Column(String(50)) # "Dave"
|
||||||
|
agent_type_id = Column(UUID, ForeignKey)
|
||||||
|
project_id = Column(UUID, ForeignKey)
|
||||||
|
status = Column(Enum(InstanceStatus)) # ACTIVE, IDLE, TERMINATED
|
||||||
|
context = Column(JSONB) # Current working context
|
||||||
|
conversation_id = Column(UUID) # Active conversation
|
||||||
|
rag_collection_id = Column(String) # Domain knowledge collection
|
||||||
|
token_usage = Column(JSONB) # {"prompt": 0, "completion": 0}
|
||||||
|
last_active_at = Column(DateTime)
|
||||||
|
created_at = Column(DateTime)
|
||||||
|
terminated_at = Column(DateTime)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Orchestrator Service
|
||||||
|
|
||||||
|
```python
|
||||||
|
class AgentOrchestrator:
|
||||||
|
"""Central service for agent lifecycle management."""
|
||||||
|
|
||||||
|
async def spawn_agent(
|
||||||
|
self,
|
||||||
|
agent_type_id: UUID,
|
||||||
|
project_id: UUID,
|
||||||
|
name: str,
|
||||||
|
domain_knowledge: list[str] = None
|
||||||
|
) -> AgentInstance:
|
||||||
|
"""Spawn a new agent instance from a type definition."""
|
||||||
|
agent_type = await self.get_agent_type(agent_type_id)
|
||||||
|
|
||||||
|
instance = AgentInstance(
|
||||||
|
name=name,
|
||||||
|
agent_type_id=agent_type_id,
|
||||||
|
project_id=project_id,
|
||||||
|
status=InstanceStatus.ACTIVE,
|
||||||
|
context={"initialized_at": datetime.utcnow().isoformat()},
|
||||||
|
)
|
||||||
|
|
||||||
|
# Initialize RAG collection if domain knowledge provided
|
||||||
|
if domain_knowledge:
|
||||||
|
instance.rag_collection_id = await self._init_rag_collection(
|
||||||
|
instance.id, domain_knowledge
|
||||||
|
)
|
||||||
|
|
||||||
|
await self.db.add(instance)
|
||||||
|
await self.db.commit()
|
||||||
|
|
||||||
|
# Publish spawn event
|
||||||
|
await self.event_bus.publish(f"project:{project_id}", {
|
||||||
|
"type": "agent_spawned",
|
||||||
|
"agent_id": str(instance.id),
|
||||||
|
"name": name,
|
||||||
|
"role": agent_type.role.value
|
||||||
|
})
|
||||||
|
|
||||||
|
return instance
|
||||||
|
|
||||||
|
async def terminate_agent(self, instance_id: UUID) -> None:
|
||||||
|
"""Terminate an agent instance and release resources."""
|
||||||
|
instance = await self.get_instance(instance_id)
|
||||||
|
instance.status = InstanceStatus.TERMINATED
|
||||||
|
instance.terminated_at = datetime.utcnow()
|
||||||
|
|
||||||
|
# Cleanup RAG collection
|
||||||
|
if instance.rag_collection_id:
|
||||||
|
await self._cleanup_rag_collection(instance.rag_collection_id)
|
||||||
|
|
||||||
|
await self.db.commit()
|
||||||
|
|
||||||
|
async def send_message(
|
||||||
|
self,
|
||||||
|
from_id: UUID,
|
||||||
|
to_id: UUID,
|
||||||
|
message: AgentMessage
|
||||||
|
) -> None:
|
||||||
|
"""Route a message from one agent to another."""
|
||||||
|
# Validate both agents exist and are active
|
||||||
|
sender = await self.get_instance(from_id)
|
||||||
|
recipient = await self.get_instance(to_id)
|
||||||
|
|
||||||
|
# Persist message
|
||||||
|
await self.message_store.save(message)
|
||||||
|
|
||||||
|
# If recipient is idle, trigger action
|
||||||
|
if recipient.status == InstanceStatus.IDLE:
|
||||||
|
await self._trigger_agent_action(recipient.id, message)
|
||||||
|
|
||||||
|
# Publish for real-time tracking
|
||||||
|
await self.event_bus.publish(f"project:{sender.project_id}", {
|
||||||
|
"type": "agent_message",
|
||||||
|
"from": str(from_id),
|
||||||
|
"to": str(to_id),
|
||||||
|
"preview": message.content[:100]
|
||||||
|
})
|
||||||
|
|
||||||
|
async def broadcast(
|
||||||
|
self,
|
||||||
|
from_id: UUID,
|
||||||
|
target_role: AgentRole,
|
||||||
|
message: AgentMessage
|
||||||
|
) -> None:
|
||||||
|
"""Broadcast a message to all agents of a specific role."""
|
||||||
|
sender = await self.get_instance(from_id)
|
||||||
|
recipients = await self.get_instances_by_role(
|
||||||
|
sender.project_id, target_role
|
||||||
|
)
|
||||||
|
|
||||||
|
for recipient in recipients:
|
||||||
|
await self.send_message(from_id, recipient.id, message)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Agent Execution Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
class AgentRunner:
|
||||||
|
"""Executes agent actions using LLM."""
|
||||||
|
|
||||||
|
def __init__(self, instance: AgentInstance, llm_gateway: LLMGateway):
|
||||||
|
self.instance = instance
|
||||||
|
self.llm = llm_gateway
|
||||||
|
|
||||||
|
async def execute(self, action: str, context: dict) -> dict:
|
||||||
|
"""Execute an action using the agent's configured model."""
|
||||||
|
agent_type = await self.get_agent_type(self.instance.agent_type_id)
|
||||||
|
|
||||||
|
# Build messages with system prompt and context
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": self._build_system_prompt(agent_type)},
|
||||||
|
*self._get_conversation_history(),
|
||||||
|
{"role": "user", "content": self._build_action_prompt(action, context)}
|
||||||
|
]
|
||||||
|
|
||||||
|
# Add RAG context if available
|
||||||
|
if self.instance.rag_collection_id:
|
||||||
|
rag_context = await self._query_rag(action, context)
|
||||||
|
messages.insert(1, {
|
||||||
|
"role": "system",
|
||||||
|
"content": f"Relevant context:\n{rag_context}"
|
||||||
|
})
|
||||||
|
|
||||||
|
# Execute with failover
|
||||||
|
response = await self.llm.complete(
|
||||||
|
agent_id=str(self.instance.id),
|
||||||
|
project_id=str(self.instance.project_id),
|
||||||
|
messages=messages,
|
||||||
|
model_preference=self._get_model_preference(agent_type)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Update instance context
|
||||||
|
self.instance.context = {
|
||||||
|
**self.instance.context,
|
||||||
|
"last_action": action,
|
||||||
|
"last_response_at": datetime.utcnow().isoformat()
|
||||||
|
}
|
||||||
|
|
||||||
|
return response
|
||||||
|
```
|
||||||
|
|
||||||
|
### Agent Roles
|
||||||
|
|
||||||
|
| Role | Instances | Primary Capabilities |
|
||||||
|
|------|-----------|---------------------|
|
||||||
|
| Product Owner | 1 | requirements, prioritization, client_communication |
|
||||||
|
| Project Manager | 1 | planning, tracking, coordination |
|
||||||
|
| Business Analyst | 1 | analysis, documentation, process_modeling |
|
||||||
|
| Software Architect | 1 | design, architecture_decisions, tech_selection |
|
||||||
|
| Software Engineer | 1-5 | code_generation, code_review, testing |
|
||||||
|
| UI/UX Designer | 1 | design, wireframes, accessibility |
|
||||||
|
| QA Engineer | 1-2 | test_planning, test_automation, bug_reporting |
|
||||||
|
| DevOps Engineer | 1 | cicd, infrastructure, deployment |
|
||||||
|
| AI/ML Engineer | 1 | ml_development, model_training, mlops |
|
||||||
|
| Security Expert | 1 | security_review, vulnerability_assessment |
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
- Clear separation between type definition and instance runtime
|
||||||
|
- Multiple instances share type configuration (DRY)
|
||||||
|
- Easy to add new agent roles
|
||||||
|
- Full observability through events
|
||||||
|
- Graceful failure handling with model failover
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
- Complexity in managing instance lifecycle
|
||||||
|
- State synchronization across instances
|
||||||
|
- Memory overhead for context storage
|
||||||
|
|
||||||
|
### Mitigation
|
||||||
|
- Context archival for long-running instances
|
||||||
|
- Periodic cleanup of terminated instances
|
||||||
|
- State compression for large contexts
|
||||||
|
|
||||||
|
## Compliance
|
||||||
|
|
||||||
|
This decision aligns with:
|
||||||
|
- FR-101: Agent type configuration
|
||||||
|
- FR-102: Agent instance spawning
|
||||||
|
- FR-103: Agent domain knowledge (RAG)
|
||||||
|
- FR-104: Inter-agent communication
|
||||||
|
- FR-105: Agent activity monitoring
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This ADR establishes the agent orchestration architecture for Syndarix.*
|
||||||
487
docs/architecture/ARCHITECTURE_OVERVIEW.md
Normal file
487
docs/architecture/ARCHITECTURE_OVERVIEW.md
Normal file
@@ -0,0 +1,487 @@
|
|||||||
|
# Syndarix Architecture Overview
|
||||||
|
|
||||||
|
**Version:** 1.0
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Status:** Draft
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [Executive Summary](#1-executive-summary)
|
||||||
|
2. [System Context](#2-system-context)
|
||||||
|
3. [High-Level Architecture](#3-high-level-architecture)
|
||||||
|
4. [Core Components](#4-core-components)
|
||||||
|
5. [Data Architecture](#5-data-architecture)
|
||||||
|
6. [Integration Architecture](#6-integration-architecture)
|
||||||
|
7. [Security Architecture](#7-security-architecture)
|
||||||
|
8. [Deployment Architecture](#8-deployment-architecture)
|
||||||
|
9. [Cross-Cutting Concerns](#9-cross-cutting-concerns)
|
||||||
|
10. [Architecture Decisions](#10-architecture-decisions)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Executive Summary
|
||||||
|
|
||||||
|
Syndarix is an AI-powered software consulting agency platform that orchestrates specialized AI agents to deliver complete software solutions autonomously. This document describes the technical architecture that enables:
|
||||||
|
|
||||||
|
- **Multi-Agent Orchestration:** 10 specialized agent roles collaborating on projects
|
||||||
|
- **MCP-First Integration:** All external tools via Model Context Protocol
|
||||||
|
- **Real-time Visibility:** SSE-based event streaming for progress tracking
|
||||||
|
- **Autonomous Workflows:** Configurable autonomy levels from full control to autonomous
|
||||||
|
- **Full Artifact Delivery:** Code, documentation, tests, and ADRs
|
||||||
|
|
||||||
|
### Architecture Principles
|
||||||
|
|
||||||
|
1. **MCP-First:** All integrations through unified MCP servers
|
||||||
|
2. **Event-Driven:** Async communication via Redis Pub/Sub
|
||||||
|
3. **Type-Safe:** Full typing in Python and TypeScript
|
||||||
|
4. **Stateless Services:** Horizontal scaling through stateless design
|
||||||
|
5. **Explicit Scoping:** All operations scoped to project/agent
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. System Context
|
||||||
|
|
||||||
|
### Context Diagram
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ EXTERNAL ACTORS │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||||
|
│ │ Client │ │ Admin │ │ LLM APIs │ │ Git Hosts │ │
|
||||||
|
│ │ (Human) │ │ (Human) │ │ (Anthropic) │ │ (Gitea) │ │
|
||||||
|
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
└─────────│──────────────────│──────────────────│──────────────────│──────────┘
|
||||||
|
│ │ │ │
|
||||||
|
│ Web UI │ Admin UI │ API │ API
|
||||||
|
│ SSE │ │ │
|
||||||
|
▼ ▼ ▼ ▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ │
|
||||||
|
│ SYNDARIX PLATFORM │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Agent Orchestration │ │
|
||||||
|
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
|
||||||
|
│ │ │ PO │ │ PM │ │ Arch │ │ Eng │ │ QA │ ... │ │
|
||||||
|
│ │ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │ │
|
||||||
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│ │ │ │
|
||||||
|
│ Storage │ Events │ Tasks │
|
||||||
|
▼ ▼ ▼ ▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ INFRASTRUCTURE │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||||
|
│ │ PostgreSQL │ │ Redis │ │ Celery │ │MCP Servers │ │
|
||||||
|
│ │ + pgvector │ │ Pub/Sub │ │ Workers │ │ (7 types) │ │
|
||||||
|
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Actors
|
||||||
|
|
||||||
|
| Actor | Type | Interaction |
|
||||||
|
|-------|------|-------------|
|
||||||
|
| Client | Human | Web UI, approvals, feedback |
|
||||||
|
| Admin | Human | Configuration, monitoring |
|
||||||
|
| LLM Providers | External | Claude, GPT-4, local models |
|
||||||
|
| Git Hosts | External | Gitea, GitHub, GitLab |
|
||||||
|
| CI/CD Systems | External | Gitea Actions, etc. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. High-Level Architecture
|
||||||
|
|
||||||
|
### Layered Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌───────────────────────────────────────────────────────────────────┐
|
||||||
|
│ PRESENTATION LAYER │
|
||||||
|
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Next.js 16 Frontend │ │
|
||||||
|
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
|
||||||
|
│ │ │Dashboard │ │ Projects │ │ Agents │ │ Issues │ │ │
|
||||||
|
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
|
||||||
|
│ └─────────────────────────────────────────────────────────────┘ │
|
||||||
|
└───────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
│ REST + SSE + WebSocket
|
||||||
|
▼
|
||||||
|
┌───────────────────────────────────────────────────────────────────┐
|
||||||
|
│ APPLICATION LAYER │
|
||||||
|
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ FastAPI Backend │ │
|
||||||
|
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
|
||||||
|
│ │ │ Auth │ │ API │ │ Services │ │ Events │ │ │
|
||||||
|
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
|
||||||
|
│ └─────────────────────────────────────────────────────────────┘ │
|
||||||
|
└───────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌───────────────────────────────────────────────────────────────────┐
|
||||||
|
│ ORCHESTRATION LAYER │
|
||||||
|
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │
|
||||||
|
│ │ │ Agent │ │ Workflow │ │ Project │ │ │
|
||||||
|
│ │ │ Orchestrator │ │ Engine │ │ Manager │ │ │
|
||||||
|
│ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │
|
||||||
|
│ └─────────────────────────────────────────────────────────────┘ │
|
||||||
|
└───────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌───────────────────────────────────────────────────────────────────┐
|
||||||
|
│ INTEGRATION LAYER │
|
||||||
|
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ MCP Client Manager │ │
|
||||||
|
│ │ Connects to: LLM, Git, KB, Issues, FS, Code, CI/CD MCPs │ │
|
||||||
|
│ └─────────────────────────────────────────────────────────────┘ │
|
||||||
|
└───────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌───────────────────────────────────────────────────────────────────┐
|
||||||
|
│ DATA LAYER │
|
||||||
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||||
|
│ │ PostgreSQL │ │ Redis │ │ File Store │ │
|
||||||
|
│ │ + pgvector │ │ │ │ │ │
|
||||||
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||||
|
└───────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Core Components
|
||||||
|
|
||||||
|
### 4.1 Agent Orchestrator
|
||||||
|
|
||||||
|
**Purpose:** Manages agent lifecycle, spawning, communication, and coordination.
|
||||||
|
|
||||||
|
**Responsibilities:**
|
||||||
|
- Spawn agent instances from type definitions
|
||||||
|
- Route messages between agents
|
||||||
|
- Manage agent context and memory
|
||||||
|
- Handle agent failover
|
||||||
|
- Track resource usage
|
||||||
|
|
||||||
|
**Key Patterns:**
|
||||||
|
- Type-Instance pattern (types define templates, instances are runtime)
|
||||||
|
- Message routing with priority queues
|
||||||
|
- Context compression for long-running agents
|
||||||
|
|
||||||
|
See: [ADR-006: Agent Orchestration](../adrs/ADR-006-agent-orchestration.md)
|
||||||
|
|
||||||
|
### 4.2 Workflow Engine
|
||||||
|
|
||||||
|
**Purpose:** Orchestrates multi-step workflows and agent collaboration.
|
||||||
|
|
||||||
|
**Responsibilities:**
|
||||||
|
- Execute workflow templates (requirements discovery, sprint, etc.)
|
||||||
|
- Track workflow state and progress
|
||||||
|
- Handle branching and conditions
|
||||||
|
- Manage approval gates
|
||||||
|
|
||||||
|
**Workflow Types:**
|
||||||
|
- Requirements Discovery
|
||||||
|
- Architecture Spike
|
||||||
|
- Sprint Planning
|
||||||
|
- Implementation
|
||||||
|
- Sprint Demo
|
||||||
|
|
||||||
|
### 4.3 Project Manager (Component)
|
||||||
|
|
||||||
|
**Purpose:** Manages project lifecycle, configuration, and state.
|
||||||
|
|
||||||
|
**Responsibilities:**
|
||||||
|
- Create and configure projects
|
||||||
|
- Manage complexity levels
|
||||||
|
- Track project status
|
||||||
|
- Generate reports
|
||||||
|
|
||||||
|
### 4.4 LLM Gateway
|
||||||
|
|
||||||
|
**Purpose:** Unified LLM access with failover and cost tracking.
|
||||||
|
|
||||||
|
**Implementation:** LiteLLM-based router with:
|
||||||
|
- Multiple model groups (high-reasoning, fast-response)
|
||||||
|
- Automatic failover chain
|
||||||
|
- Per-agent token tracking
|
||||||
|
- Redis-backed caching
|
||||||
|
|
||||||
|
See: [ADR-004: LLM Provider Abstraction](../adrs/ADR-004-llm-provider-abstraction.md)
|
||||||
|
|
||||||
|
### 4.5 MCP Client Manager
|
||||||
|
|
||||||
|
**Purpose:** Connects to all MCP servers and routes tool calls.
|
||||||
|
|
||||||
|
**Implementation:**
|
||||||
|
- SSE connections to 7 MCP server types
|
||||||
|
- Automatic reconnection
|
||||||
|
- Request/response correlation
|
||||||
|
- Scoped tool calls with project_id/agent_id
|
||||||
|
|
||||||
|
See: [ADR-001: MCP Integration Architecture](../adrs/ADR-001-mcp-integration-architecture.md)
|
||||||
|
|
||||||
|
### 4.6 Event Bus
|
||||||
|
|
||||||
|
**Purpose:** Real-time event distribution using Redis Pub/Sub.
|
||||||
|
|
||||||
|
**Channels:**
|
||||||
|
- `project:{project_id}` - Project-scoped events
|
||||||
|
- `agent:{agent_id}` - Agent-specific events
|
||||||
|
- `system` - System-wide announcements
|
||||||
|
|
||||||
|
See: [ADR-002: Real-time Communication](../adrs/ADR-002-realtime-communication.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Data Architecture
|
||||||
|
|
||||||
|
### 5.1 Entity Model
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||||
|
│ User │───1:N─│ Project │───1:N─│ Sprint │
|
||||||
|
└─────────────┘ └─────────────┘ └─────────────┘
|
||||||
|
│ 1:N │ 1:N
|
||||||
|
│ │
|
||||||
|
┌──────┴──────┐ ┌──────┴──────┐
|
||||||
|
│ │ │ │
|
||||||
|
┌──────┴──────┐ ┌────┴────┐ │ ┌─────┴─────┐
|
||||||
|
│ AgentInstance│ │Repository│ │ │ Issue │
|
||||||
|
└─────────────┘ └─────────┘ │ └───────────┘
|
||||||
|
│ │ │ │
|
||||||
|
│ 1:N │ 1:N │ │ 1:N
|
||||||
|
┌──────┴──────┐ ┌──────┴────┐│ ┌──────┴──────┐
|
||||||
|
│ Message │ │PullRequest│└───────│IssueComment │
|
||||||
|
└─────────────┘ └───────────┘ └─────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.2 Key Entities
|
||||||
|
|
||||||
|
| Entity | Purpose | Key Fields |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| User | Human users | email, auth |
|
||||||
|
| Project | Work containers | name, complexity, autonomy_level |
|
||||||
|
| AgentType | Agent templates | base_model, expertise, system_prompt |
|
||||||
|
| AgentInstance | Running agents | name, project_id, context |
|
||||||
|
| Issue | Work items | type, status, external_tracker_fields |
|
||||||
|
| Sprint | Time-boxed iterations | goal, velocity |
|
||||||
|
| Repository | Git repos | provider, clone_url |
|
||||||
|
| KnowledgeDocument | RAG documents | content, embedding_id |
|
||||||
|
|
||||||
|
### 5.3 Vector Storage
|
||||||
|
|
||||||
|
**pgvector** extension for:
|
||||||
|
- Document embeddings (RAG)
|
||||||
|
- Semantic search across knowledge base
|
||||||
|
- Agent context similarity
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Integration Architecture
|
||||||
|
|
||||||
|
### 6.1 MCP Server Registry
|
||||||
|
|
||||||
|
| Server | Port | Purpose | Priority Providers |
|
||||||
|
|--------|------|---------|-------------------|
|
||||||
|
| LLM Gateway | 9001 | LLM routing | Anthropic, OpenAI, Ollama |
|
||||||
|
| Git MCP | 9002 | Git operations | Gitea, GitHub, GitLab |
|
||||||
|
| Knowledge Base | 9003 | RAG search | pgvector |
|
||||||
|
| Issues MCP | 9004 | Issue tracking | Gitea, GitHub, GitLab |
|
||||||
|
| File System | 9005 | Workspace files | Local FS |
|
||||||
|
| Code Analysis | 9006 | Static analysis | Ruff, ESLint |
|
||||||
|
| CI/CD MCP | 9007 | Pipelines | Gitea Actions |
|
||||||
|
|
||||||
|
### 6.2 External Integration Diagram
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Syndarix Backend │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ MCP Client Manager │ │
|
||||||
|
│ │ │ │
|
||||||
|
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
|
||||||
|
│ │ │ LLM │ │ Git │ │ KB │ │ Issues │ │ CI/CD │ │ │
|
||||||
|
│ │ │ Client │ │ Client │ │ Client │ │ Client │ │ Client │ │ │
|
||||||
|
│ │ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ │ │
|
||||||
|
│ └──────│──────────│──────────│──────────│──────────│──────┘ │
|
||||||
|
└─────────│──────────│──────────│──────────│──────────│──────────┘
|
||||||
|
│ │ │ │ │
|
||||||
|
│ SSE │ SSE │ SSE │ SSE │ SSE
|
||||||
|
▼ ▼ ▼ ▼ ▼
|
||||||
|
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
|
||||||
|
│ LLM │ │ Git │ │ KB │ │ Issues │ │ CI/CD │
|
||||||
|
│ MCP │ │ MCP │ │ MCP │ │ MCP │ │ MCP │
|
||||||
|
│ Server │ │ Server │ │ Server │ │ Server │ │ Server │
|
||||||
|
└───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘ └───┬────┘
|
||||||
|
│ │ │ │ │
|
||||||
|
▼ ▼ ▼ ▼ ▼
|
||||||
|
┌─────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
|
||||||
|
│Anthropic│ │ Gitea │ │pgvector│ │ Gitea │ │ Gitea │
|
||||||
|
│ OpenAI │ │ GitHub │ │ │ │ Issues │ │Actions │
|
||||||
|
│ Ollama │ │ GitLab │ │ │ │ │ │ │
|
||||||
|
└─────────┘ └────────┘ └────────┘ └────────┘ └────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Security Architecture
|
||||||
|
|
||||||
|
### 7.1 Authentication
|
||||||
|
|
||||||
|
- **JWT Dual-Token:** Access token (15 min) + Refresh token (7 days)
|
||||||
|
- **OAuth 2.0 Provider:** For MCP client authentication
|
||||||
|
- **Service Tokens:** Internal service-to-service auth
|
||||||
|
|
||||||
|
### 7.2 Authorization
|
||||||
|
|
||||||
|
- **RBAC:** Role-based access control
|
||||||
|
- **Project Scoping:** All operations scoped to projects
|
||||||
|
- **Agent Permissions:** Agents operate within project scope
|
||||||
|
|
||||||
|
### 7.3 Data Protection
|
||||||
|
|
||||||
|
- **TLS 1.3:** All external communications
|
||||||
|
- **Encryption at Rest:** Database encryption
|
||||||
|
- **Secrets Management:** Environment-based, never in code
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Deployment Architecture
|
||||||
|
|
||||||
|
### 8.1 Container Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Docker Compose │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ │
|
||||||
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||||
|
│ │ Frontend │ │ Backend │ │ Workers │ │ Flower │ │
|
||||||
|
│ │ (Next.js)│ │ (FastAPI)│ │ (Celery) │ │(Monitor) │ │
|
||||||
|
│ │ :3000 │ │ :8000 │ │ │ │ :5555 │ │
|
||||||
|
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||||
|
│ │ LLM MCP │ │ Git MCP │ │ KB MCP │ │Issues MCP│ │
|
||||||
|
│ │ :9001 │ │ :9002 │ │ :9003 │ │ :9004 │ │
|
||||||
|
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||||
|
│ │ FS MCP │ │ Code MCP │ │CI/CD MCP │ │
|
||||||
|
│ │ :9005 │ │ :9006 │ │ :9007 │ │
|
||||||
|
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Infrastructure │ │
|
||||||
|
│ │ ┌──────────┐ ┌──────────┐ │ │
|
||||||
|
│ │ │PostgreSQL│ │ Redis │ │ │
|
||||||
|
│ │ │ :5432 │ │ :6379 │ │ │
|
||||||
|
│ │ └──────────┘ └──────────┘ │ │
|
||||||
|
│ └──────────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8.2 Scaling Strategy
|
||||||
|
|
||||||
|
| Component | Scaling | Strategy |
|
||||||
|
|-----------|---------|----------|
|
||||||
|
| Frontend | Horizontal | Stateless, behind LB |
|
||||||
|
| Backend | Horizontal | Stateless, behind LB |
|
||||||
|
| Celery Workers | Horizontal | Queue-based routing |
|
||||||
|
| MCP Servers | Horizontal | Stateless singletons |
|
||||||
|
| PostgreSQL | Vertical + Read Replicas | Primary/replica |
|
||||||
|
| Redis | Cluster | Sentinel or Cluster mode |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Cross-Cutting Concerns
|
||||||
|
|
||||||
|
### 9.1 Logging
|
||||||
|
|
||||||
|
- **Format:** Structured JSON
|
||||||
|
- **Correlation:** Request IDs across services
|
||||||
|
- **Levels:** DEBUG, INFO, WARNING, ERROR, CRITICAL
|
||||||
|
|
||||||
|
### 9.2 Monitoring
|
||||||
|
|
||||||
|
- **Metrics:** Prometheus-compatible export
|
||||||
|
- **Traces:** OpenTelemetry (future)
|
||||||
|
- **Dashboards:** Grafana (optional)
|
||||||
|
|
||||||
|
### 9.3 Error Handling
|
||||||
|
|
||||||
|
- **Agent Errors:** Logged, published via SSE
|
||||||
|
- **Task Failures:** Celery retry with backoff
|
||||||
|
- **Integration Errors:** Circuit breaker pattern
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Architecture Decisions
|
||||||
|
|
||||||
|
### Summary of ADRs
|
||||||
|
|
||||||
|
| ADR | Title | Status |
|
||||||
|
|-----|-------|--------|
|
||||||
|
| [ADR-001](../adrs/ADR-001-mcp-integration-architecture.md) | MCP Integration Architecture | Accepted |
|
||||||
|
| [ADR-002](../adrs/ADR-002-realtime-communication.md) | Real-time Communication | Accepted |
|
||||||
|
| [ADR-003](../adrs/ADR-003-background-task-architecture.md) | Background Task Architecture | Accepted |
|
||||||
|
| [ADR-004](../adrs/ADR-004-llm-provider-abstraction.md) | LLM Provider Abstraction | Accepted |
|
||||||
|
| [ADR-005](../adrs/ADR-005-tech-stack-selection.md) | Tech Stack Selection | Accepted |
|
||||||
|
| [ADR-006](../adrs/ADR-006-agent-orchestration.md) | Agent Orchestration | Accepted |
|
||||||
|
|
||||||
|
### Key Decisions Summary
|
||||||
|
|
||||||
|
1. **Unified Singleton MCP Servers** with project/agent scoping
|
||||||
|
2. **SSE for real-time events**, WebSocket only for chat
|
||||||
|
3. **Celery + Redis** for background tasks
|
||||||
|
4. **LiteLLM** for unified LLM abstraction with failover
|
||||||
|
5. **PragmaStack** as foundation with Syndarix extensions
|
||||||
|
6. **Type-Instance pattern** for agent orchestration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix A: Technology Stack Quick Reference
|
||||||
|
|
||||||
|
| Layer | Technology |
|
||||||
|
|-------|------------|
|
||||||
|
| Frontend | Next.js 16, React 19, TypeScript, Tailwind, shadcn/ui |
|
||||||
|
| Backend | FastAPI, Python 3.11+, SQLAlchemy 2.0, Pydantic 2.0 |
|
||||||
|
| Database | PostgreSQL 15+ with pgvector |
|
||||||
|
| Cache/Queue | Redis 7.0+ |
|
||||||
|
| Task Queue | Celery 5.3+ |
|
||||||
|
| MCP | FastMCP 2.0 |
|
||||||
|
| LLM | LiteLLM (Claude, GPT-4, Ollama) |
|
||||||
|
| Testing | pytest, Jest, Playwright |
|
||||||
|
| Container | Docker, Docker Compose |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix B: Port Reference
|
||||||
|
|
||||||
|
| Service | Port |
|
||||||
|
|---------|------|
|
||||||
|
| Frontend | 3000 |
|
||||||
|
| Backend | 8000 |
|
||||||
|
| PostgreSQL | 5432 |
|
||||||
|
| Redis | 6379 |
|
||||||
|
| Flower | 5555 |
|
||||||
|
| LLM MCP | 9001 |
|
||||||
|
| Git MCP | 9002 |
|
||||||
|
| KB MCP | 9003 |
|
||||||
|
| Issues MCP | 9004 |
|
||||||
|
| FS MCP | 9005 |
|
||||||
|
| Code MCP | 9006 |
|
||||||
|
| CI/CD MCP | 9007 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This document provides the comprehensive architecture overview for Syndarix. For detailed decisions, see the individual ADRs.*
|
||||||
288
docs/spikes/SPIKE-001-mcp-integration-pattern.md
Normal file
288
docs/spikes/SPIKE-001-mcp-integration-pattern.md
Normal file
@@ -0,0 +1,288 @@
|
|||||||
|
# SPIKE-001: MCP Integration Pattern
|
||||||
|
|
||||||
|
**Status:** Completed
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Author:** Architecture Team
|
||||||
|
**Related Issue:** #1
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
Research the optimal pattern for integrating Model Context Protocol (MCP) servers with FastAPI backend, focusing on unified singleton servers with project/agent scoping.
|
||||||
|
|
||||||
|
## Research Questions
|
||||||
|
|
||||||
|
1. What is the recommended MCP SDK for Python/FastAPI?
|
||||||
|
2. How should we structure unified MCP servers vs per-project servers?
|
||||||
|
3. What is the best pattern for project/agent scoping in MCP tools?
|
||||||
|
4. How do we handle authentication between Syndarix and MCP servers?
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
### 1. FastMCP 2.0 - Recommended Framework
|
||||||
|
|
||||||
|
**FastMCP** is a high-level, Pythonic framework for building MCP servers that significantly reduces boilerplate compared to the low-level MCP SDK.
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- Decorator-based tool registration (`@mcp.tool()`)
|
||||||
|
- Built-in context management for resources and prompts
|
||||||
|
- Support for server-sent events (SSE) and stdio transports
|
||||||
|
- Type-safe with Pydantic model support
|
||||||
|
- Async-first design compatible with FastAPI
|
||||||
|
|
||||||
|
**Installation:**
|
||||||
|
```bash
|
||||||
|
pip install fastmcp
|
||||||
|
```
|
||||||
|
|
||||||
|
**Basic Example:**
|
||||||
|
```python
|
||||||
|
from fastmcp import FastMCP
|
||||||
|
|
||||||
|
mcp = FastMCP("syndarix-knowledge-base")
|
||||||
|
|
||||||
|
@mcp.tool()
|
||||||
|
def search_knowledge(
|
||||||
|
project_id: str,
|
||||||
|
query: str,
|
||||||
|
scope: str = "project"
|
||||||
|
) -> list[dict]:
|
||||||
|
"""Search the knowledge base with project scoping."""
|
||||||
|
# Implementation here
|
||||||
|
return results
|
||||||
|
|
||||||
|
@mcp.resource("project://{project_id}/config")
|
||||||
|
def get_project_config(project_id: str) -> dict:
|
||||||
|
"""Get project configuration."""
|
||||||
|
return config
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Unified Singleton Pattern (Recommended)
|
||||||
|
|
||||||
|
**Decision:** Use unified singleton MCP servers instead of per-project servers.
|
||||||
|
|
||||||
|
**Architecture:**
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ Syndarix Backend │
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||||
|
│ │ Agent 1 │ │ Agent 2 │ │ Agent 3 │ │
|
||||||
|
│ │ (project A) │ │ (project A) │ │ (project B) │ │
|
||||||
|
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
||||||
|
│ │ │ │ │
|
||||||
|
│ └────────────────┼────────────────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ ┌─────────────────────────────────────────────────┐ │
|
||||||
|
│ │ MCP Client (Singleton) │ │
|
||||||
|
│ │ Maintains connections to all MCP servers │ │
|
||||||
|
│ └─────────────────────────────────────────────────┘ │
|
||||||
|
└──────────────────────────┬──────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────────┼───────────────┐
|
||||||
|
│ │ │
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌────────────┐ ┌────────────┐ ┌────────────┐
|
||||||
|
│ Git MCP │ │ KB MCP │ │ LLM MCP │
|
||||||
|
│ (Singleton)│ │ (Singleton)│ │ (Singleton)│
|
||||||
|
└────────────┘ └────────────┘ └────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why Singleton:**
|
||||||
|
- Resource efficiency (one process per MCP type)
|
||||||
|
- Shared connection pools
|
||||||
|
- Centralized logging and monitoring
|
||||||
|
- Simpler deployment (7 services vs N×7)
|
||||||
|
- Cross-project learning possible (if needed)
|
||||||
|
|
||||||
|
**Scoping Pattern:**
|
||||||
|
```python
|
||||||
|
@mcp.tool()
|
||||||
|
def search_knowledge(
|
||||||
|
project_id: str, # Required - scopes to project
|
||||||
|
agent_id: str, # Required - identifies calling agent
|
||||||
|
query: str,
|
||||||
|
scope: Literal["project", "global"] = "project"
|
||||||
|
) -> SearchResults:
|
||||||
|
"""
|
||||||
|
All tools accept project_id and agent_id for:
|
||||||
|
- Access control validation
|
||||||
|
- Audit logging
|
||||||
|
- Context filtering
|
||||||
|
"""
|
||||||
|
# Validate agent has access to project
|
||||||
|
validate_access(agent_id, project_id)
|
||||||
|
|
||||||
|
# Log the access
|
||||||
|
log_tool_usage(agent_id, project_id, "search_knowledge")
|
||||||
|
|
||||||
|
# Perform scoped search
|
||||||
|
if scope == "project":
|
||||||
|
return search_project_kb(project_id, query)
|
||||||
|
else:
|
||||||
|
return search_global_kb(query)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. MCP Server Registry Architecture
|
||||||
|
|
||||||
|
```python
|
||||||
|
# mcp/registry.py
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Dict
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MCPServerConfig:
|
||||||
|
name: str
|
||||||
|
port: int
|
||||||
|
transport: str # "sse" or "stdio"
|
||||||
|
enabled: bool = True
|
||||||
|
|
||||||
|
MCP_SERVERS: Dict[str, MCPServerConfig] = {
|
||||||
|
"llm_gateway": MCPServerConfig("llm-gateway", 9001, "sse"),
|
||||||
|
"git": MCPServerConfig("git-mcp", 9002, "sse"),
|
||||||
|
"knowledge_base": MCPServerConfig("kb-mcp", 9003, "sse"),
|
||||||
|
"issues": MCPServerConfig("issues-mcp", 9004, "sse"),
|
||||||
|
"file_system": MCPServerConfig("fs-mcp", 9005, "sse"),
|
||||||
|
"code_analysis": MCPServerConfig("code-mcp", 9006, "sse"),
|
||||||
|
"cicd": MCPServerConfig("cicd-mcp", 9007, "sse"),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Authentication Pattern
|
||||||
|
|
||||||
|
**MCP OAuth 2.0 Integration:**
|
||||||
|
```python
|
||||||
|
from fastmcp import FastMCP
|
||||||
|
from fastmcp.auth import OAuth2Bearer
|
||||||
|
|
||||||
|
mcp = FastMCP(
|
||||||
|
"syndarix-mcp",
|
||||||
|
auth=OAuth2Bearer(
|
||||||
|
token_url="https://syndarix.local/oauth/token",
|
||||||
|
scopes=["mcp:read", "mcp:write"]
|
||||||
|
)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Internal Service Auth (Recommended for v1):**
|
||||||
|
```python
|
||||||
|
# For internal deployment, use service tokens
|
||||||
|
@mcp.tool()
|
||||||
|
def create_issue(
|
||||||
|
service_token: str, # Validated internally
|
||||||
|
project_id: str,
|
||||||
|
title: str,
|
||||||
|
body: str
|
||||||
|
) -> Issue:
|
||||||
|
validate_service_token(service_token)
|
||||||
|
# ... implementation
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. FastAPI Integration Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/mcp/client.py
|
||||||
|
from mcp import ClientSession
|
||||||
|
from mcp.client.sse import sse_client
|
||||||
|
from contextlib import asynccontextmanager
|
||||||
|
|
||||||
|
class MCPClientManager:
|
||||||
|
def __init__(self):
|
||||||
|
self._sessions: dict[str, ClientSession] = {}
|
||||||
|
|
||||||
|
async def connect_all(self):
|
||||||
|
"""Connect to all configured MCP servers."""
|
||||||
|
for name, config in MCP_SERVERS.items():
|
||||||
|
if config.enabled:
|
||||||
|
session = await self._connect_server(config)
|
||||||
|
self._sessions[name] = session
|
||||||
|
|
||||||
|
async def call_tool(
|
||||||
|
self,
|
||||||
|
server: str,
|
||||||
|
tool_name: str,
|
||||||
|
arguments: dict
|
||||||
|
) -> Any:
|
||||||
|
"""Call a tool on a specific MCP server."""
|
||||||
|
session = self._sessions[server]
|
||||||
|
result = await session.call_tool(tool_name, arguments)
|
||||||
|
return result.content
|
||||||
|
|
||||||
|
# Usage in FastAPI
|
||||||
|
mcp_client = MCPClientManager()
|
||||||
|
|
||||||
|
@app.on_event("startup")
|
||||||
|
async def startup():
|
||||||
|
await mcp_client.connect_all()
|
||||||
|
|
||||||
|
@app.post("/api/v1/knowledge/search")
|
||||||
|
async def search_knowledge(request: SearchRequest):
|
||||||
|
result = await mcp_client.call_tool(
|
||||||
|
"knowledge_base",
|
||||||
|
"search_knowledge",
|
||||||
|
{
|
||||||
|
"project_id": request.project_id,
|
||||||
|
"agent_id": request.agent_id,
|
||||||
|
"query": request.query
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return result
|
||||||
|
```
|
||||||
|
|
||||||
|
## Recommendations
|
||||||
|
|
||||||
|
### Immediate Actions
|
||||||
|
|
||||||
|
1. **Use FastMCP 2.0** for all MCP server implementations
|
||||||
|
2. **Implement unified singleton pattern** with explicit scoping
|
||||||
|
3. **Use SSE transport** for MCP server connections
|
||||||
|
4. **Service tokens** for internal auth (v1), OAuth 2.0 for future
|
||||||
|
|
||||||
|
### MCP Server Priority
|
||||||
|
|
||||||
|
1. **LLM Gateway** - Critical for agent operation
|
||||||
|
2. **Knowledge Base** - Required for RAG functionality
|
||||||
|
3. **Git MCP** - Required for code delivery
|
||||||
|
4. **Issues MCP** - Required for project management
|
||||||
|
5. **File System** - Required for workspace operations
|
||||||
|
6. **Code Analysis** - Enhance code quality
|
||||||
|
7. **CI/CD** - Automate deployments
|
||||||
|
|
||||||
|
### Code Organization
|
||||||
|
|
||||||
|
```
|
||||||
|
syndarix/
|
||||||
|
├── backend/
|
||||||
|
│ └── app/
|
||||||
|
│ └── mcp/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── client.py # MCP client manager
|
||||||
|
│ ├── registry.py # Server configurations
|
||||||
|
│ └── schemas.py # Tool argument schemas
|
||||||
|
└── mcp_servers/
|
||||||
|
├── llm_gateway/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── server.py
|
||||||
|
│ └── tools.py
|
||||||
|
├── knowledge_base/
|
||||||
|
├── git/
|
||||||
|
├── issues/
|
||||||
|
├── file_system/
|
||||||
|
├── code_analysis/
|
||||||
|
└── cicd/
|
||||||
|
```
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [FastMCP Documentation](https://gofastmcp.com)
|
||||||
|
- [MCP Protocol Specification](https://spec.modelcontextprotocol.io)
|
||||||
|
- [Anthropic MCP SDK](https://github.com/anthropics/anthropic-sdk-mcp)
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt FastMCP 2.0** with unified singleton servers and explicit project/agent scoping for all MCP integrations.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Spike completed. Findings will inform ADR-001: MCP Integration Architecture.*
|
||||||
338
docs/spikes/SPIKE-003-realtime-updates.md
Normal file
338
docs/spikes/SPIKE-003-realtime-updates.md
Normal file
@@ -0,0 +1,338 @@
|
|||||||
|
# SPIKE-003: Real-time Updates Architecture
|
||||||
|
|
||||||
|
**Status:** Completed
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Author:** Architecture Team
|
||||||
|
**Related Issue:** #3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
Evaluate WebSocket vs Server-Sent Events (SSE) for real-time updates in Syndarix, focusing on agent activity streams, progress updates, and client notifications.
|
||||||
|
|
||||||
|
## Research Questions
|
||||||
|
|
||||||
|
1. What are the trade-offs between WebSocket and SSE?
|
||||||
|
2. Which pattern best fits Syndarix's use cases?
|
||||||
|
3. How do we handle reconnection and reliability?
|
||||||
|
4. What is the FastAPI implementation approach?
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
### 1. Use Case Analysis
|
||||||
|
|
||||||
|
| Use Case | Direction | Frequency | Latency Req |
|
||||||
|
|----------|-----------|-----------|-------------|
|
||||||
|
| Agent activity feed | Server → Client | High | Low |
|
||||||
|
| Sprint progress | Server → Client | Medium | Low |
|
||||||
|
| Build status | Server → Client | Low | Medium |
|
||||||
|
| Client approval requests | Server → Client | Low | High |
|
||||||
|
| Client messages | Client → Server | Low | Medium |
|
||||||
|
| Issue updates | Server → Client | Medium | Low |
|
||||||
|
|
||||||
|
**Key Insight:** 90%+ of real-time communication is **server-to-client** (unidirectional).
|
||||||
|
|
||||||
|
### 2. Technology Comparison
|
||||||
|
|
||||||
|
| Feature | Server-Sent Events (SSE) | WebSocket |
|
||||||
|
|---------|-------------------------|-----------|
|
||||||
|
| Direction | Unidirectional (server → client) | Bidirectional |
|
||||||
|
| Protocol | HTTP/1.1 or HTTP/2 | Custom (ws://) |
|
||||||
|
| Reconnection | Built-in automatic | Manual implementation |
|
||||||
|
| Connection limits | Limited per domain | Similar limits |
|
||||||
|
| Browser support | Excellent | Excellent |
|
||||||
|
| Through proxies | Native HTTP | May require config |
|
||||||
|
| Complexity | Simple | More complex |
|
||||||
|
| FastAPI support | Native | Native |
|
||||||
|
|
||||||
|
### 3. Recommendation: SSE for Primary, WebSocket for Chat
|
||||||
|
|
||||||
|
**SSE (Recommended for 90% of use cases):**
|
||||||
|
- Agent activity streams
|
||||||
|
- Progress updates
|
||||||
|
- Build/pipeline status
|
||||||
|
- Issue change notifications
|
||||||
|
- Approval request alerts
|
||||||
|
|
||||||
|
**WebSocket (For bidirectional needs):**
|
||||||
|
- Live chat with agents
|
||||||
|
- Interactive debugging sessions
|
||||||
|
- Real-time collaboration (future)
|
||||||
|
|
||||||
|
### 4. FastAPI SSE Implementation
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/api/v1/events.py
|
||||||
|
from fastapi import APIRouter, Request
|
||||||
|
from fastapi.responses import StreamingResponse
|
||||||
|
from app.services.events import EventBus
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
@router.get("/projects/{project_id}/events")
|
||||||
|
async def project_events(
|
||||||
|
project_id: str,
|
||||||
|
request: Request,
|
||||||
|
current_user: User = Depends(get_current_user)
|
||||||
|
):
|
||||||
|
"""Stream real-time events for a project."""
|
||||||
|
|
||||||
|
async def event_generator():
|
||||||
|
event_bus = EventBus()
|
||||||
|
subscriber = await event_bus.subscribe(
|
||||||
|
channel=f"project:{project_id}",
|
||||||
|
user_id=current_user.id
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
# Check if client disconnected
|
||||||
|
if await request.is_disconnected():
|
||||||
|
break
|
||||||
|
|
||||||
|
# Wait for next event (with timeout for keepalive)
|
||||||
|
try:
|
||||||
|
event = await asyncio.wait_for(
|
||||||
|
subscriber.get_event(),
|
||||||
|
timeout=30.0
|
||||||
|
)
|
||||||
|
yield f"event: {event.type}\ndata: {event.json()}\n\n"
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
# Send keepalive comment
|
||||||
|
yield ": keepalive\n\n"
|
||||||
|
finally:
|
||||||
|
await event_bus.unsubscribe(subscriber)
|
||||||
|
|
||||||
|
return StreamingResponse(
|
||||||
|
event_generator(),
|
||||||
|
media_type="text/event-stream",
|
||||||
|
headers={
|
||||||
|
"Cache-Control": "no-cache",
|
||||||
|
"Connection": "keep-alive",
|
||||||
|
"X-Accel-Buffering": "no", # Disable nginx buffering
|
||||||
|
}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Event Bus Architecture with Redis
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/services/events.py
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import AsyncIterator
|
||||||
|
import redis.asyncio as redis
|
||||||
|
import json
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Event:
|
||||||
|
type: str
|
||||||
|
data: dict
|
||||||
|
project_id: str
|
||||||
|
agent_id: str | None = None
|
||||||
|
timestamp: float = None
|
||||||
|
|
||||||
|
class EventBus:
|
||||||
|
def __init__(self, redis_url: str):
|
||||||
|
self.redis = redis.from_url(redis_url)
|
||||||
|
self.pubsub = self.redis.pubsub()
|
||||||
|
|
||||||
|
async def publish(self, channel: str, event: Event):
|
||||||
|
"""Publish an event to a channel."""
|
||||||
|
await self.redis.publish(
|
||||||
|
channel,
|
||||||
|
json.dumps(event.__dict__)
|
||||||
|
)
|
||||||
|
|
||||||
|
async def subscribe(self, channel: str) -> "Subscriber":
|
||||||
|
"""Subscribe to a channel."""
|
||||||
|
await self.pubsub.subscribe(channel)
|
||||||
|
return Subscriber(self.pubsub, channel)
|
||||||
|
|
||||||
|
class Subscriber:
|
||||||
|
def __init__(self, pubsub, channel: str):
|
||||||
|
self.pubsub = pubsub
|
||||||
|
self.channel = channel
|
||||||
|
|
||||||
|
async def get_event(self) -> Event:
|
||||||
|
"""Get the next event (blocking)."""
|
||||||
|
while True:
|
||||||
|
message = await self.pubsub.get_message(
|
||||||
|
ignore_subscribe_messages=True,
|
||||||
|
timeout=1.0
|
||||||
|
)
|
||||||
|
if message and message["type"] == "message":
|
||||||
|
data = json.loads(message["data"])
|
||||||
|
return Event(**data)
|
||||||
|
|
||||||
|
async def unsubscribe(self):
|
||||||
|
await self.pubsub.unsubscribe(self.channel)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Client-Side Implementation
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// frontend/lib/events.ts
|
||||||
|
class EventSource {
|
||||||
|
private eventSource: EventSource | null = null;
|
||||||
|
private reconnectDelay = 1000;
|
||||||
|
private maxReconnectDelay = 30000;
|
||||||
|
|
||||||
|
connect(projectId: string, onEvent: (event: ProjectEvent) => void) {
|
||||||
|
const url = `/api/v1/projects/${projectId}/events`;
|
||||||
|
|
||||||
|
this.eventSource = new EventSource(url, {
|
||||||
|
withCredentials: true
|
||||||
|
});
|
||||||
|
|
||||||
|
this.eventSource.onopen = () => {
|
||||||
|
console.log('SSE connected');
|
||||||
|
this.reconnectDelay = 1000; // Reset on success
|
||||||
|
};
|
||||||
|
|
||||||
|
this.eventSource.addEventListener('agent_activity', (e) => {
|
||||||
|
onEvent({ type: 'agent_activity', data: JSON.parse(e.data) });
|
||||||
|
});
|
||||||
|
|
||||||
|
this.eventSource.addEventListener('issue_update', (e) => {
|
||||||
|
onEvent({ type: 'issue_update', data: JSON.parse(e.data) });
|
||||||
|
});
|
||||||
|
|
||||||
|
this.eventSource.addEventListener('approval_required', (e) => {
|
||||||
|
onEvent({ type: 'approval_required', data: JSON.parse(e.data) });
|
||||||
|
});
|
||||||
|
|
||||||
|
this.eventSource.onerror = () => {
|
||||||
|
this.eventSource?.close();
|
||||||
|
// Exponential backoff reconnect
|
||||||
|
setTimeout(() => this.connect(projectId, onEvent), this.reconnectDelay);
|
||||||
|
this.reconnectDelay = Math.min(
|
||||||
|
this.reconnectDelay * 2,
|
||||||
|
this.maxReconnectDelay
|
||||||
|
);
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
disconnect() {
|
||||||
|
this.eventSource?.close();
|
||||||
|
this.eventSource = null;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Event Types
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/schemas/events.py
|
||||||
|
from enum import Enum
|
||||||
|
from pydantic import BaseModel
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
class EventType(str, Enum):
|
||||||
|
# Agent Events
|
||||||
|
AGENT_STARTED = "agent_started"
|
||||||
|
AGENT_ACTIVITY = "agent_activity"
|
||||||
|
AGENT_COMPLETED = "agent_completed"
|
||||||
|
AGENT_ERROR = "agent_error"
|
||||||
|
|
||||||
|
# Project Events
|
||||||
|
ISSUE_CREATED = "issue_created"
|
||||||
|
ISSUE_UPDATED = "issue_updated"
|
||||||
|
ISSUE_CLOSED = "issue_closed"
|
||||||
|
|
||||||
|
# Git Events
|
||||||
|
BRANCH_CREATED = "branch_created"
|
||||||
|
COMMIT_PUSHED = "commit_pushed"
|
||||||
|
PR_CREATED = "pr_created"
|
||||||
|
PR_MERGED = "pr_merged"
|
||||||
|
|
||||||
|
# Workflow Events
|
||||||
|
APPROVAL_REQUIRED = "approval_required"
|
||||||
|
SPRINT_STARTED = "sprint_started"
|
||||||
|
SPRINT_COMPLETED = "sprint_completed"
|
||||||
|
|
||||||
|
# Pipeline Events
|
||||||
|
PIPELINE_STARTED = "pipeline_started"
|
||||||
|
PIPELINE_COMPLETED = "pipeline_completed"
|
||||||
|
PIPELINE_FAILED = "pipeline_failed"
|
||||||
|
|
||||||
|
class ProjectEvent(BaseModel):
|
||||||
|
id: str
|
||||||
|
type: EventType
|
||||||
|
project_id: str
|
||||||
|
agent_id: str | None
|
||||||
|
data: dict
|
||||||
|
timestamp: datetime
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. WebSocket for Chat (Secondary)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/api/v1/chat.py
|
||||||
|
from fastapi import WebSocket, WebSocketDisconnect
|
||||||
|
from app.services.agent_chat import AgentChatService
|
||||||
|
|
||||||
|
@router.websocket("/projects/{project_id}/agents/{agent_id}/chat")
|
||||||
|
async def agent_chat(
|
||||||
|
websocket: WebSocket,
|
||||||
|
project_id: str,
|
||||||
|
agent_id: str
|
||||||
|
):
|
||||||
|
"""Bidirectional chat with an agent."""
|
||||||
|
await websocket.accept()
|
||||||
|
|
||||||
|
chat_service = AgentChatService(project_id, agent_id)
|
||||||
|
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
# Receive message from client
|
||||||
|
message = await websocket.receive_json()
|
||||||
|
|
||||||
|
# Stream response from agent
|
||||||
|
async for chunk in chat_service.get_response(message):
|
||||||
|
await websocket.send_json({
|
||||||
|
"type": "chunk",
|
||||||
|
"content": chunk
|
||||||
|
})
|
||||||
|
|
||||||
|
await websocket.send_json({"type": "done"})
|
||||||
|
except WebSocketDisconnect:
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Considerations
|
||||||
|
|
||||||
|
### Connection Limits
|
||||||
|
- Browser limit: ~6 connections per domain (HTTP/1.1)
|
||||||
|
- Recommendation: Use single SSE connection per project, multiplex events
|
||||||
|
|
||||||
|
### Scalability
|
||||||
|
- Redis Pub/Sub handles cross-instance event distribution
|
||||||
|
- Consider Redis Streams for message persistence (audit/replay)
|
||||||
|
|
||||||
|
### Keepalive
|
||||||
|
- Send comment every 30 seconds to prevent timeout
|
||||||
|
- Client reconnects automatically on disconnect
|
||||||
|
|
||||||
|
## Recommendations
|
||||||
|
|
||||||
|
1. **Use SSE for all server-to-client events** (simpler, auto-reconnect)
|
||||||
|
2. **Use WebSocket only for interactive chat** with agents
|
||||||
|
3. **Redis Pub/Sub for event distribution** across instances
|
||||||
|
4. **Single SSE connection per project** with event multiplexing
|
||||||
|
5. **Exponential backoff** for client reconnection
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [FastAPI SSE](https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse)
|
||||||
|
- [MDN EventSource](https://developer.mozilla.org/en-US/docs/Web/API/EventSource)
|
||||||
|
- [Redis Pub/Sub](https://redis.io/topics/pubsub)
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt SSE as the primary real-time transport** with WebSocket reserved for bidirectional chat. Use Redis Pub/Sub for event distribution.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Spike completed. Findings will inform ADR-002: Real-time Communication Architecture.*
|
||||||
420
docs/spikes/SPIKE-004-celery-redis-integration.md
Normal file
420
docs/spikes/SPIKE-004-celery-redis-integration.md
Normal file
@@ -0,0 +1,420 @@
|
|||||||
|
# SPIKE-004: Celery + Redis Integration
|
||||||
|
|
||||||
|
**Status:** Completed
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Author:** Architecture Team
|
||||||
|
**Related Issue:** #4
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
Research best practices for integrating Celery with FastAPI for background task processing, focusing on agent orchestration, long-running workflows, and task monitoring.
|
||||||
|
|
||||||
|
## Research Questions
|
||||||
|
|
||||||
|
1. How to properly integrate Celery with async FastAPI?
|
||||||
|
2. What is the optimal task queue architecture for Syndarix?
|
||||||
|
3. How to handle long-running agent tasks?
|
||||||
|
4. What monitoring and visibility patterns should we use?
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
### 1. Celery + FastAPI Integration Pattern
|
||||||
|
|
||||||
|
**Challenge:** Celery is synchronous, FastAPI is async.
|
||||||
|
|
||||||
|
**Solution:** Use `celery.result.AsyncResult` with async polling or callbacks.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/core/celery.py
|
||||||
|
from celery import Celery
|
||||||
|
from app.core.config import settings
|
||||||
|
|
||||||
|
celery_app = Celery(
|
||||||
|
"syndarix",
|
||||||
|
broker=settings.REDIS_URL,
|
||||||
|
backend=settings.REDIS_URL,
|
||||||
|
include=[
|
||||||
|
"app.tasks.agent_tasks",
|
||||||
|
"app.tasks.git_tasks",
|
||||||
|
"app.tasks.sync_tasks",
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
celery_app.conf.update(
|
||||||
|
task_serializer="json",
|
||||||
|
accept_content=["json"],
|
||||||
|
result_serializer="json",
|
||||||
|
timezone="UTC",
|
||||||
|
enable_utc=True,
|
||||||
|
task_track_started=True,
|
||||||
|
task_time_limit=3600, # 1 hour max
|
||||||
|
task_soft_time_limit=3300, # 55 min soft limit
|
||||||
|
worker_prefetch_multiplier=1, # One task at a time for LLM tasks
|
||||||
|
task_acks_late=True, # Acknowledge after completion
|
||||||
|
task_reject_on_worker_lost=True, # Retry if worker dies
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Task Queue Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ FastAPI Backend │
|
||||||
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||||
|
│ │ API Layer │ │ Services │ │ Events │ │
|
||||||
|
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
||||||
|
│ │ │ │ │
|
||||||
|
│ └────────────────┼────────────────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ ┌────────────────────────────────┐ │
|
||||||
|
│ │ Task Dispatcher │ │
|
||||||
|
│ │ (Celery send_task) │ │
|
||||||
|
│ └────────────────┬───────────────┘ │
|
||||||
|
└──────────────────────────┼──────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌──────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Redis (Broker + Backend) │
|
||||||
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||||
|
│ │ agent_queue │ │ git_queue │ │ sync_queue │ │
|
||||||
|
│ │ (priority) │ │ │ │ │ │
|
||||||
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||||
|
└──────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────────┼───────────────┐
|
||||||
|
│ │ │
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌────────────┐ ┌────────────┐ ┌────────────┐
|
||||||
|
│ Worker │ │ Worker │ │ Worker │
|
||||||
|
│ (agents) │ │ (git) │ │ (sync) │
|
||||||
|
│ prefetch=1 │ │ prefetch=4 │ │ prefetch=4 │
|
||||||
|
└────────────┘ └────────────┘ └────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Queue Configuration
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/core/celery.py
|
||||||
|
celery_app.conf.task_queues = [
|
||||||
|
Queue("agent_queue", routing_key="agent.#"),
|
||||||
|
Queue("git_queue", routing_key="git.#"),
|
||||||
|
Queue("sync_queue", routing_key="sync.#"),
|
||||||
|
Queue("cicd_queue", routing_key="cicd.#"),
|
||||||
|
]
|
||||||
|
|
||||||
|
celery_app.conf.task_routes = {
|
||||||
|
"app.tasks.agent_tasks.*": {"queue": "agent_queue"},
|
||||||
|
"app.tasks.git_tasks.*": {"queue": "git_queue"},
|
||||||
|
"app.tasks.sync_tasks.*": {"queue": "sync_queue"},
|
||||||
|
"app.tasks.cicd_tasks.*": {"queue": "cicd_queue"},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Agent Task Implementation
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/tasks/agent_tasks.py
|
||||||
|
from celery import Task
|
||||||
|
from app.core.celery import celery_app
|
||||||
|
from app.services.agent_runner import AgentRunner
|
||||||
|
from app.services.events import EventBus
|
||||||
|
|
||||||
|
class AgentTask(Task):
|
||||||
|
"""Base class for agent tasks with retry and monitoring."""
|
||||||
|
|
||||||
|
autoretry_for = (ConnectionError, TimeoutError)
|
||||||
|
retry_backoff = True
|
||||||
|
retry_backoff_max = 600
|
||||||
|
retry_jitter = True
|
||||||
|
max_retries = 3
|
||||||
|
|
||||||
|
def on_failure(self, exc, task_id, args, kwargs, einfo):
|
||||||
|
"""Handle task failure."""
|
||||||
|
project_id = kwargs.get("project_id")
|
||||||
|
agent_id = kwargs.get("agent_id")
|
||||||
|
EventBus().publish(f"project:{project_id}", {
|
||||||
|
"type": "agent_error",
|
||||||
|
"agent_id": agent_id,
|
||||||
|
"error": str(exc)
|
||||||
|
})
|
||||||
|
|
||||||
|
@celery_app.task(bind=True, base=AgentTask)
|
||||||
|
def run_agent_action(
|
||||||
|
self,
|
||||||
|
agent_id: str,
|
||||||
|
project_id: str,
|
||||||
|
action: str,
|
||||||
|
context: dict
|
||||||
|
) -> dict:
|
||||||
|
"""
|
||||||
|
Execute an agent action as a background task.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
agent_id: The agent instance ID
|
||||||
|
project_id: The project context
|
||||||
|
action: The action to perform
|
||||||
|
context: Action-specific context
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Action result dictionary
|
||||||
|
"""
|
||||||
|
runner = AgentRunner(agent_id, project_id)
|
||||||
|
|
||||||
|
# Update task state for monitoring
|
||||||
|
self.update_state(
|
||||||
|
state="RUNNING",
|
||||||
|
meta={"agent_id": agent_id, "action": action}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Publish start event
|
||||||
|
EventBus().publish(f"project:{project_id}", {
|
||||||
|
"type": "agent_started",
|
||||||
|
"agent_id": agent_id,
|
||||||
|
"action": action,
|
||||||
|
"task_id": self.request.id
|
||||||
|
})
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = runner.execute(action, context)
|
||||||
|
|
||||||
|
# Publish completion event
|
||||||
|
EventBus().publish(f"project:{project_id}", {
|
||||||
|
"type": "agent_completed",
|
||||||
|
"agent_id": agent_id,
|
||||||
|
"action": action,
|
||||||
|
"result_summary": result.get("summary")
|
||||||
|
})
|
||||||
|
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
# Will trigger on_failure
|
||||||
|
raise
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Long-Running Task Patterns
|
||||||
|
|
||||||
|
**Progress Reporting:**
|
||||||
|
```python
|
||||||
|
@celery_app.task(bind=True)
|
||||||
|
def implement_story(self, story_id: str, agent_id: str, project_id: str):
|
||||||
|
"""Implement a user story with progress reporting."""
|
||||||
|
|
||||||
|
steps = [
|
||||||
|
("analyzing", "Analyzing requirements"),
|
||||||
|
("designing", "Designing solution"),
|
||||||
|
("implementing", "Writing code"),
|
||||||
|
("testing", "Running tests"),
|
||||||
|
("documenting", "Updating documentation"),
|
||||||
|
]
|
||||||
|
|
||||||
|
for i, (state, description) in enumerate(steps):
|
||||||
|
self.update_state(
|
||||||
|
state="PROGRESS",
|
||||||
|
meta={
|
||||||
|
"current": i + 1,
|
||||||
|
"total": len(steps),
|
||||||
|
"status": description
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Do the actual work
|
||||||
|
execute_step(state, story_id, agent_id)
|
||||||
|
|
||||||
|
# Publish progress event
|
||||||
|
EventBus().publish(f"project:{project_id}", {
|
||||||
|
"type": "agent_progress",
|
||||||
|
"agent_id": agent_id,
|
||||||
|
"step": i + 1,
|
||||||
|
"total": len(steps),
|
||||||
|
"description": description
|
||||||
|
})
|
||||||
|
|
||||||
|
return {"status": "completed", "story_id": story_id}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Task Chaining:**
|
||||||
|
```python
|
||||||
|
from celery import chain, group
|
||||||
|
|
||||||
|
# Sequential workflow
|
||||||
|
workflow = chain(
|
||||||
|
analyze_requirements.s(story_id),
|
||||||
|
design_solution.s(),
|
||||||
|
implement_code.s(),
|
||||||
|
run_tests.s(),
|
||||||
|
create_pr.s()
|
||||||
|
)
|
||||||
|
|
||||||
|
# Parallel execution
|
||||||
|
parallel_tests = group(
|
||||||
|
run_unit_tests.s(project_id),
|
||||||
|
run_integration_tests.s(project_id),
|
||||||
|
run_linting.s(project_id)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. FastAPI Integration
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/api/v1/agents.py
|
||||||
|
from fastapi import APIRouter, BackgroundTasks
|
||||||
|
from app.tasks.agent_tasks import run_agent_action
|
||||||
|
from celery.result import AsyncResult
|
||||||
|
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
@router.post("/agents/{agent_id}/actions")
|
||||||
|
async def trigger_agent_action(
|
||||||
|
agent_id: str,
|
||||||
|
action: AgentActionRequest,
|
||||||
|
background_tasks: BackgroundTasks
|
||||||
|
):
|
||||||
|
"""Trigger an agent action as a background task."""
|
||||||
|
|
||||||
|
# Dispatch to Celery
|
||||||
|
task = run_agent_action.delay(
|
||||||
|
agent_id=agent_id,
|
||||||
|
project_id=action.project_id,
|
||||||
|
action=action.action,
|
||||||
|
context=action.context
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"task_id": task.id,
|
||||||
|
"status": "queued"
|
||||||
|
}
|
||||||
|
|
||||||
|
@router.get("/tasks/{task_id}")
|
||||||
|
async def get_task_status(task_id: str):
|
||||||
|
"""Get the status of a background task."""
|
||||||
|
|
||||||
|
result = AsyncResult(task_id)
|
||||||
|
|
||||||
|
if result.state == "PENDING":
|
||||||
|
return {"status": "pending"}
|
||||||
|
elif result.state == "RUNNING":
|
||||||
|
return {"status": "running", **result.info}
|
||||||
|
elif result.state == "PROGRESS":
|
||||||
|
return {"status": "progress", **result.info}
|
||||||
|
elif result.state == "SUCCESS":
|
||||||
|
return {"status": "completed", "result": result.result}
|
||||||
|
elif result.state == "FAILURE":
|
||||||
|
return {"status": "failed", "error": str(result.result)}
|
||||||
|
|
||||||
|
return {"status": result.state}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Worker Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run different workers for different queues
|
||||||
|
|
||||||
|
# Agent worker (single task at a time for LLM rate limiting)
|
||||||
|
celery -A app.core.celery worker \
|
||||||
|
-Q agent_queue \
|
||||||
|
-c 4 \
|
||||||
|
--prefetch-multiplier=1 \
|
||||||
|
-n agent_worker@%h
|
||||||
|
|
||||||
|
# Git worker (can handle multiple concurrent tasks)
|
||||||
|
celery -A app.core.celery worker \
|
||||||
|
-Q git_queue \
|
||||||
|
-c 8 \
|
||||||
|
--prefetch-multiplier=4 \
|
||||||
|
-n git_worker@%h
|
||||||
|
|
||||||
|
# Sync worker
|
||||||
|
celery -A app.core.celery worker \
|
||||||
|
-Q sync_queue \
|
||||||
|
-c 4 \
|
||||||
|
--prefetch-multiplier=4 \
|
||||||
|
-n sync_worker@%h
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Monitoring with Flower
|
||||||
|
|
||||||
|
```python
|
||||||
|
# docker-compose.yml
|
||||||
|
services:
|
||||||
|
flower:
|
||||||
|
image: mher/flower:latest
|
||||||
|
command: celery flower --broker=redis://redis:6379/0
|
||||||
|
ports:
|
||||||
|
- "5555:5555"
|
||||||
|
environment:
|
||||||
|
- CELERY_BROKER_URL=redis://redis:6379/0
|
||||||
|
- FLOWER_BASIC_AUTH=admin:password
|
||||||
|
```
|
||||||
|
|
||||||
|
### 9. Task Scheduling (Celery Beat)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/core/celery.py
|
||||||
|
from celery.schedules import crontab
|
||||||
|
|
||||||
|
celery_app.conf.beat_schedule = {
|
||||||
|
# Sync issues every minute
|
||||||
|
"sync-external-issues": {
|
||||||
|
"task": "app.tasks.sync_tasks.sync_all_issues",
|
||||||
|
"schedule": 60.0,
|
||||||
|
},
|
||||||
|
# Health check every 5 minutes
|
||||||
|
"agent-health-check": {
|
||||||
|
"task": "app.tasks.agent_tasks.health_check_all_agents",
|
||||||
|
"schedule": 300.0,
|
||||||
|
},
|
||||||
|
# Daily cleanup at midnight
|
||||||
|
"cleanup-old-tasks": {
|
||||||
|
"task": "app.tasks.maintenance.cleanup_old_tasks",
|
||||||
|
"schedule": crontab(hour=0, minute=0),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
1. **One task per LLM call** - Avoid rate limiting issues
|
||||||
|
2. **Progress reporting** - Update state for long-running tasks
|
||||||
|
3. **Idempotent tasks** - Handle retries gracefully
|
||||||
|
4. **Separate queues** - Isolate slow tasks from fast ones
|
||||||
|
5. **Task result expiry** - Set `result_expires` to avoid Redis bloat
|
||||||
|
6. **Soft time limits** - Allow graceful shutdown before hard kill
|
||||||
|
|
||||||
|
## Recommendations
|
||||||
|
|
||||||
|
1. **Use Celery for all long-running operations**
|
||||||
|
- Agent actions
|
||||||
|
- Git operations
|
||||||
|
- External sync
|
||||||
|
- CI/CD triggers
|
||||||
|
|
||||||
|
2. **Use Redis as both broker and backend**
|
||||||
|
- Simplifies infrastructure
|
||||||
|
- Fast enough for our scale
|
||||||
|
|
||||||
|
3. **Configure separate queues**
|
||||||
|
- `agent_queue` with prefetch=1
|
||||||
|
- `git_queue` with prefetch=4
|
||||||
|
- `sync_queue` with prefetch=4
|
||||||
|
|
||||||
|
4. **Implement proper monitoring**
|
||||||
|
- Flower for web UI
|
||||||
|
- Prometheus metrics export
|
||||||
|
- Dead letter queue for failed tasks
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [Celery Documentation](https://docs.celeryq.dev/)
|
||||||
|
- [FastAPI Background Tasks](https://fastapi.tiangolo.com/tutorial/background-tasks/)
|
||||||
|
- [Celery Best Practices](https://docs.celeryq.dev/en/stable/userguide/tasks.html#tips-and-best-practices)
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt Celery + Redis** for all background task processing with queue-based routing and progress reporting via Redis Pub/Sub events.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Spike completed. Findings will inform ADR-003: Background Task Architecture.*
|
||||||
516
docs/spikes/SPIKE-005-llm-provider-abstraction.md
Normal file
516
docs/spikes/SPIKE-005-llm-provider-abstraction.md
Normal file
@@ -0,0 +1,516 @@
|
|||||||
|
# SPIKE-005: LLM Provider Abstraction
|
||||||
|
|
||||||
|
**Status:** Completed
|
||||||
|
**Date:** 2025-12-29
|
||||||
|
**Author:** Architecture Team
|
||||||
|
**Related Issue:** #5
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
Research the best approach for unified LLM provider abstraction with support for multiple providers, automatic failover, and cost tracking.
|
||||||
|
|
||||||
|
## Research Questions
|
||||||
|
|
||||||
|
1. What libraries exist for unified LLM access?
|
||||||
|
2. How to implement automatic failover between providers?
|
||||||
|
3. How to track token usage and costs per agent/project?
|
||||||
|
4. What caching strategies can reduce API costs?
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
### 1. LiteLLM - Recommended Solution
|
||||||
|
|
||||||
|
**LiteLLM** provides a unified interface to 100+ LLM providers using the OpenAI SDK format.
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- Unified API across providers (Anthropic, OpenAI, local, etc.)
|
||||||
|
- Built-in failover and load balancing
|
||||||
|
- Token counting and cost tracking
|
||||||
|
- Streaming support
|
||||||
|
- Async support
|
||||||
|
- Caching with Redis
|
||||||
|
|
||||||
|
**Installation:**
|
||||||
|
```bash
|
||||||
|
pip install litellm
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Basic Usage
|
||||||
|
|
||||||
|
```python
|
||||||
|
from litellm import completion, acompletion
|
||||||
|
import litellm
|
||||||
|
|
||||||
|
# Configure providers
|
||||||
|
litellm.api_key = os.getenv("ANTHROPIC_API_KEY")
|
||||||
|
litellm.set_verbose = True # For debugging
|
||||||
|
|
||||||
|
# Synchronous call
|
||||||
|
response = completion(
|
||||||
|
model="claude-3-5-sonnet-20241022",
|
||||||
|
messages=[{"role": "user", "content": "Hello!"}]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Async call (for FastAPI)
|
||||||
|
response = await acompletion(
|
||||||
|
model="claude-3-5-sonnet-20241022",
|
||||||
|
messages=[{"role": "user", "content": "Hello!"}]
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Model Naming Convention
|
||||||
|
|
||||||
|
LiteLLM uses prefixed model names:
|
||||||
|
|
||||||
|
| Provider | Model Format |
|
||||||
|
|----------|--------------|
|
||||||
|
| Anthropic | `claude-3-5-sonnet-20241022` |
|
||||||
|
| OpenAI | `gpt-4-turbo` |
|
||||||
|
| Azure OpenAI | `azure/deployment-name` |
|
||||||
|
| Ollama | `ollama/llama3` |
|
||||||
|
| Together AI | `together_ai/togethercomputer/llama-2-70b` |
|
||||||
|
|
||||||
|
### 4. Failover Configuration
|
||||||
|
|
||||||
|
```python
|
||||||
|
from litellm import Router
|
||||||
|
|
||||||
|
# Define model list with fallbacks
|
||||||
|
model_list = [
|
||||||
|
{
|
||||||
|
"model_name": "primary-agent",
|
||||||
|
"litellm_params": {
|
||||||
|
"model": "claude-3-5-sonnet-20241022",
|
||||||
|
"api_key": os.getenv("ANTHROPIC_API_KEY"),
|
||||||
|
},
|
||||||
|
"model_info": {"id": 1}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"model_name": "primary-agent", # Same name = fallback
|
||||||
|
"litellm_params": {
|
||||||
|
"model": "gpt-4-turbo",
|
||||||
|
"api_key": os.getenv("OPENAI_API_KEY"),
|
||||||
|
},
|
||||||
|
"model_info": {"id": 2}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"model_name": "primary-agent",
|
||||||
|
"litellm_params": {
|
||||||
|
"model": "ollama/llama3",
|
||||||
|
"api_base": "http://localhost:11434",
|
||||||
|
},
|
||||||
|
"model_info": {"id": 3}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
|
||||||
|
# Initialize router with failover
|
||||||
|
router = Router(
|
||||||
|
model_list=model_list,
|
||||||
|
fallbacks=[
|
||||||
|
{"primary-agent": ["primary-agent"]} # Try all models with same name
|
||||||
|
],
|
||||||
|
routing_strategy="simple-shuffle", # or "latency-based-routing"
|
||||||
|
num_retries=3,
|
||||||
|
retry_after=5, # seconds
|
||||||
|
timeout=60,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Use router
|
||||||
|
response = await router.acompletion(
|
||||||
|
model="primary-agent",
|
||||||
|
messages=[{"role": "user", "content": "Hello!"}]
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Syndarix LLM Gateway Architecture
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/services/llm_gateway.py
|
||||||
|
from litellm import Router, acompletion
|
||||||
|
from app.core.config import settings
|
||||||
|
from app.models.agent import AgentType
|
||||||
|
from app.services.cost_tracker import CostTracker
|
||||||
|
from app.services.events import EventBus
|
||||||
|
|
||||||
|
class LLMGateway:
|
||||||
|
"""Unified LLM gateway with failover and cost tracking."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.router = self._build_router()
|
||||||
|
self.cost_tracker = CostTracker()
|
||||||
|
self.event_bus = EventBus()
|
||||||
|
|
||||||
|
def _build_router(self) -> Router:
|
||||||
|
"""Build LiteLLM router from configuration."""
|
||||||
|
model_list = []
|
||||||
|
|
||||||
|
# Add Anthropic models
|
||||||
|
if settings.ANTHROPIC_API_KEY:
|
||||||
|
model_list.extend([
|
||||||
|
{
|
||||||
|
"model_name": "high-reasoning",
|
||||||
|
"litellm_params": {
|
||||||
|
"model": "claude-3-5-sonnet-20241022",
|
||||||
|
"api_key": settings.ANTHROPIC_API_KEY,
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"model_name": "fast-response",
|
||||||
|
"litellm_params": {
|
||||||
|
"model": "claude-3-haiku-20240307",
|
||||||
|
"api_key": settings.ANTHROPIC_API_KEY,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
])
|
||||||
|
|
||||||
|
# Add OpenAI fallbacks
|
||||||
|
if settings.OPENAI_API_KEY:
|
||||||
|
model_list.extend([
|
||||||
|
{
|
||||||
|
"model_name": "high-reasoning",
|
||||||
|
"litellm_params": {
|
||||||
|
"model": "gpt-4-turbo",
|
||||||
|
"api_key": settings.OPENAI_API_KEY,
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"model_name": "fast-response",
|
||||||
|
"litellm_params": {
|
||||||
|
"model": "gpt-4o-mini",
|
||||||
|
"api_key": settings.OPENAI_API_KEY,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
])
|
||||||
|
|
||||||
|
# Add local models (Ollama)
|
||||||
|
if settings.OLLAMA_URL:
|
||||||
|
model_list.append({
|
||||||
|
"model_name": "local-fallback",
|
||||||
|
"litellm_params": {
|
||||||
|
"model": "ollama/llama3",
|
||||||
|
"api_base": settings.OLLAMA_URL,
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
return Router(
|
||||||
|
model_list=model_list,
|
||||||
|
fallbacks=[
|
||||||
|
{"high-reasoning": ["high-reasoning", "local-fallback"]},
|
||||||
|
{"fast-response": ["fast-response", "local-fallback"]},
|
||||||
|
],
|
||||||
|
routing_strategy="latency-based-routing",
|
||||||
|
num_retries=3,
|
||||||
|
timeout=120,
|
||||||
|
)
|
||||||
|
|
||||||
|
async def complete(
|
||||||
|
self,
|
||||||
|
agent_id: str,
|
||||||
|
project_id: str,
|
||||||
|
messages: list[dict],
|
||||||
|
model_preference: str = "high-reasoning",
|
||||||
|
stream: bool = False,
|
||||||
|
**kwargs
|
||||||
|
) -> dict:
|
||||||
|
"""
|
||||||
|
Generate a completion with automatic failover and cost tracking.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
agent_id: The calling agent's ID
|
||||||
|
project_id: The project context
|
||||||
|
messages: Chat messages
|
||||||
|
model_preference: "high-reasoning" or "fast-response"
|
||||||
|
stream: Whether to stream the response
|
||||||
|
**kwargs: Additional LiteLLM parameters
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Completion response dictionary
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
if stream:
|
||||||
|
return self._stream_completion(
|
||||||
|
agent_id, project_id, messages, model_preference, **kwargs
|
||||||
|
)
|
||||||
|
|
||||||
|
response = await self.router.acompletion(
|
||||||
|
model=model_preference,
|
||||||
|
messages=messages,
|
||||||
|
**kwargs
|
||||||
|
)
|
||||||
|
|
||||||
|
# Track usage
|
||||||
|
await self._track_usage(
|
||||||
|
agent_id=agent_id,
|
||||||
|
project_id=project_id,
|
||||||
|
model=response.model,
|
||||||
|
usage=response.usage,
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"content": response.choices[0].message.content,
|
||||||
|
"model": response.model,
|
||||||
|
"usage": {
|
||||||
|
"prompt_tokens": response.usage.prompt_tokens,
|
||||||
|
"completion_tokens": response.usage.completion_tokens,
|
||||||
|
"total_tokens": response.usage.total_tokens,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
# Publish error event
|
||||||
|
await self.event_bus.publish(f"project:{project_id}", {
|
||||||
|
"type": "llm_error",
|
||||||
|
"agent_id": agent_id,
|
||||||
|
"error": str(e)
|
||||||
|
})
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def _stream_completion(
|
||||||
|
self,
|
||||||
|
agent_id: str,
|
||||||
|
project_id: str,
|
||||||
|
messages: list[dict],
|
||||||
|
model_preference: str,
|
||||||
|
**kwargs
|
||||||
|
):
|
||||||
|
"""Stream a completion response."""
|
||||||
|
response = await self.router.acompletion(
|
||||||
|
model=model_preference,
|
||||||
|
messages=messages,
|
||||||
|
stream=True,
|
||||||
|
**kwargs
|
||||||
|
)
|
||||||
|
|
||||||
|
async for chunk in response:
|
||||||
|
if chunk.choices[0].delta.content:
|
||||||
|
yield chunk.choices[0].delta.content
|
||||||
|
|
||||||
|
async def _track_usage(
|
||||||
|
self,
|
||||||
|
agent_id: str,
|
||||||
|
project_id: str,
|
||||||
|
model: str,
|
||||||
|
usage: dict
|
||||||
|
):
|
||||||
|
"""Track token usage and costs."""
|
||||||
|
await self.cost_tracker.record_usage(
|
||||||
|
agent_id=agent_id,
|
||||||
|
project_id=project_id,
|
||||||
|
model=model,
|
||||||
|
prompt_tokens=usage.prompt_tokens,
|
||||||
|
completion_tokens=usage.completion_tokens,
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Cost Tracking
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/services/cost_tracker.py
|
||||||
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
from app.models.usage import TokenUsage
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
# Cost per 1M tokens (approximate)
|
||||||
|
MODEL_COSTS = {
|
||||||
|
"claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
|
||||||
|
"claude-3-haiku-20240307": {"input": 0.25, "output": 1.25},
|
||||||
|
"gpt-4-turbo": {"input": 10.00, "output": 30.00},
|
||||||
|
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
|
||||||
|
"ollama/llama3": {"input": 0.00, "output": 0.00}, # Local
|
||||||
|
}
|
||||||
|
|
||||||
|
class CostTracker:
|
||||||
|
def __init__(self, db: AsyncSession):
|
||||||
|
self.db = db
|
||||||
|
|
||||||
|
async def record_usage(
|
||||||
|
self,
|
||||||
|
agent_id: str,
|
||||||
|
project_id: str,
|
||||||
|
model: str,
|
||||||
|
prompt_tokens: int,
|
||||||
|
completion_tokens: int,
|
||||||
|
):
|
||||||
|
"""Record token usage and calculate cost."""
|
||||||
|
costs = MODEL_COSTS.get(model, {"input": 0, "output": 0})
|
||||||
|
|
||||||
|
input_cost = (prompt_tokens / 1_000_000) * costs["input"]
|
||||||
|
output_cost = (completion_tokens / 1_000_000) * costs["output"]
|
||||||
|
total_cost = input_cost + output_cost
|
||||||
|
|
||||||
|
usage = TokenUsage(
|
||||||
|
agent_id=agent_id,
|
||||||
|
project_id=project_id,
|
||||||
|
model=model,
|
||||||
|
prompt_tokens=prompt_tokens,
|
||||||
|
completion_tokens=completion_tokens,
|
||||||
|
total_tokens=prompt_tokens + completion_tokens,
|
||||||
|
cost_usd=total_cost,
|
||||||
|
timestamp=datetime.utcnow(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.db.add(usage)
|
||||||
|
await self.db.commit()
|
||||||
|
|
||||||
|
async def get_project_usage(
|
||||||
|
self,
|
||||||
|
project_id: str,
|
||||||
|
start_date: datetime = None,
|
||||||
|
end_date: datetime = None,
|
||||||
|
) -> dict:
|
||||||
|
"""Get usage summary for a project."""
|
||||||
|
# Query aggregated usage
|
||||||
|
...
|
||||||
|
|
||||||
|
async def check_budget(
|
||||||
|
self,
|
||||||
|
project_id: str,
|
||||||
|
budget_limit: float,
|
||||||
|
) -> bool:
|
||||||
|
"""Check if project is within budget."""
|
||||||
|
usage = await self.get_project_usage(project_id)
|
||||||
|
return usage["total_cost_usd"] < budget_limit
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Caching with Redis
|
||||||
|
|
||||||
|
```python
|
||||||
|
import litellm
|
||||||
|
from litellm import Cache
|
||||||
|
|
||||||
|
# Configure Redis cache
|
||||||
|
litellm.cache = Cache(
|
||||||
|
type="redis",
|
||||||
|
host=settings.REDIS_HOST,
|
||||||
|
port=settings.REDIS_PORT,
|
||||||
|
password=settings.REDIS_PASSWORD,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Enable caching
|
||||||
|
litellm.enable_cache()
|
||||||
|
|
||||||
|
# Cached completions (same input = cached response)
|
||||||
|
response = await litellm.acompletion(
|
||||||
|
model="claude-3-5-sonnet-20241022",
|
||||||
|
messages=[{"role": "user", "content": "What is 2+2?"}],
|
||||||
|
cache={"ttl": 3600} # Cache for 1 hour
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Agent Type Model Mapping
|
||||||
|
|
||||||
|
```python
|
||||||
|
# app/models/agent_type.py
|
||||||
|
from sqlalchemy import Column, String, Enum as SQLEnum
|
||||||
|
from app.db.base import Base
|
||||||
|
|
||||||
|
class ModelPreference(str, Enum):
|
||||||
|
HIGH_REASONING = "high-reasoning"
|
||||||
|
FAST_RESPONSE = "fast-response"
|
||||||
|
COST_OPTIMIZED = "cost-optimized"
|
||||||
|
|
||||||
|
class AgentType(Base):
|
||||||
|
__tablename__ = "agent_types"
|
||||||
|
|
||||||
|
id = Column(UUID, primary_key=True)
|
||||||
|
name = Column(String(50), unique=True)
|
||||||
|
role = Column(String(50))
|
||||||
|
|
||||||
|
# LLM configuration
|
||||||
|
model_preference = Column(
|
||||||
|
SQLEnum(ModelPreference),
|
||||||
|
default=ModelPreference.HIGH_REASONING
|
||||||
|
)
|
||||||
|
max_tokens = Column(Integer, default=4096)
|
||||||
|
temperature = Column(Float, default=0.7)
|
||||||
|
|
||||||
|
# System prompt
|
||||||
|
system_prompt = Column(Text)
|
||||||
|
|
||||||
|
# Mapping agent types to models
|
||||||
|
AGENT_MODEL_MAPPING = {
|
||||||
|
"Product Owner": ModelPreference.HIGH_REASONING,
|
||||||
|
"Project Manager": ModelPreference.FAST_RESPONSE,
|
||||||
|
"Business Analyst": ModelPreference.HIGH_REASONING,
|
||||||
|
"Software Architect": ModelPreference.HIGH_REASONING,
|
||||||
|
"Software Engineer": ModelPreference.HIGH_REASONING,
|
||||||
|
"UI/UX Designer": ModelPreference.HIGH_REASONING,
|
||||||
|
"QA Engineer": ModelPreference.FAST_RESPONSE,
|
||||||
|
"DevOps Engineer": ModelPreference.FAST_RESPONSE,
|
||||||
|
"AI/ML Engineer": ModelPreference.HIGH_REASONING,
|
||||||
|
"Security Expert": ModelPreference.HIGH_REASONING,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Rate Limiting Strategy
|
||||||
|
|
||||||
|
```python
|
||||||
|
from litellm import Router
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
# Configure rate limits per model
|
||||||
|
router = Router(
|
||||||
|
model_list=model_list,
|
||||||
|
redis_host=settings.REDIS_HOST,
|
||||||
|
redis_port=settings.REDIS_PORT,
|
||||||
|
routing_strategy="usage-based-routing", # Route based on rate limits
|
||||||
|
)
|
||||||
|
|
||||||
|
# Custom rate limiter
|
||||||
|
class RateLimiter:
|
||||||
|
def __init__(self, requests_per_minute: int = 60):
|
||||||
|
self.rpm = requests_per_minute
|
||||||
|
self.semaphore = asyncio.Semaphore(requests_per_minute)
|
||||||
|
|
||||||
|
async def acquire(self):
|
||||||
|
await self.semaphore.acquire()
|
||||||
|
# Release after 60 seconds
|
||||||
|
asyncio.create_task(self._release_after(60))
|
||||||
|
|
||||||
|
async def _release_after(self, seconds: int):
|
||||||
|
await asyncio.sleep(seconds)
|
||||||
|
self.semaphore.release()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Recommendations
|
||||||
|
|
||||||
|
1. **Use LiteLLM as the unified abstraction layer**
|
||||||
|
- Simplifies multi-provider support
|
||||||
|
- Built-in failover and retry
|
||||||
|
- Consistent API across providers
|
||||||
|
|
||||||
|
2. **Configure model groups by use case**
|
||||||
|
- `high-reasoning`: Complex analysis, architecture decisions
|
||||||
|
- `fast-response`: Quick tasks, simple queries
|
||||||
|
- `cost-optimized`: Non-critical, high-volume tasks
|
||||||
|
|
||||||
|
3. **Implement automatic failover chain**
|
||||||
|
- Primary: Claude 3.5 Sonnet
|
||||||
|
- Fallback 1: GPT-4 Turbo
|
||||||
|
- Fallback 2: Local Llama 3 (if available)
|
||||||
|
|
||||||
|
4. **Track all usage and costs**
|
||||||
|
- Per agent, per project
|
||||||
|
- Set budget alerts
|
||||||
|
- Generate usage reports
|
||||||
|
|
||||||
|
5. **Cache frequently repeated queries**
|
||||||
|
- Use Redis-backed cache
|
||||||
|
- Cache embeddings for RAG
|
||||||
|
- Cache deterministic transformations
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [LiteLLM Documentation](https://docs.litellm.ai/)
|
||||||
|
- [LiteLLM Router](https://docs.litellm.ai/docs/routing)
|
||||||
|
- [Anthropic Rate Limits](https://docs.anthropic.com/en/api/rate-limits)
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Adopt LiteLLM** as the unified LLM abstraction layer with automatic failover, usage-based routing, and Redis-backed caching.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Spike completed. Findings will inform ADR-004: LLM Provider Integration Architecture.*
|
||||||
Reference in New Issue
Block a user