Files
fast-next-template/docs/adrs/ADR-003-background-task-architecture.md
Felipe Cardoso 6e3cdebbfb docs: add architecture decision records (ADRs) for key technical choices
- Added the following ADRs to `docs/adrs/` directory:
  - ADR-001: MCP Integration Architecture
  - ADR-002: Real-time Communication Architecture
  - ADR-003: Background Task Architecture
  - ADR-004: LLM Provider Abstraction
  - ADR-005: Technology Stack Selection
- Each ADR details the context, decision drivers, considered options, final decisions, and implementation plans.
- Documentation aligns technical choices with architecture principles and system requirements for Syndarix.
2025-12-29 13:16:02 +01:00

180 lines
5.1 KiB
Markdown

# ADR-003: Background Task Architecture
**Status:** Accepted
**Date:** 2025-12-29
**Deciders:** Architecture Team
**Related Spikes:** SPIKE-004
---
## Context
Syndarix requires background task processing for:
- Agent actions (LLM calls, code generation)
- Git operations (clone, commit, push, PR creation)
- External synchronization (issue sync with Gitea/GitHub/GitLab)
- CI/CD pipeline triggers
- Long-running workflows (sprints, story implementation)
These tasks are too slow for synchronous API responses and need proper queuing, retry, and monitoring.
## Decision Drivers
- **Reliability:** Tasks must complete even if workers restart
- **Visibility:** Progress tracking for long-running operations
- **Scalability:** Handle concurrent agent operations
- **Rate Limiting:** Respect LLM API rate limits
- **Async Compatibility:** Work with async FastAPI
## Considered Options
### Option 1: FastAPI BackgroundTasks
Use FastAPI's built-in background tasks.
**Pros:**
- Simple, no additional infrastructure
- Direct async integration
**Cons:**
- No persistence (lost on restart)
- No retry mechanism
- No distributed workers
### Option 2: Celery + Redis (Selected)
Use Celery for task queue with Redis as broker/backend.
**Pros:**
- Mature, battle-tested
- Persistent task queue
- Built-in retry with backoff
- Distributed workers
- Task chaining and workflows
- Monitoring with Flower
**Cons:**
- Additional infrastructure
- Sync-only task execution (bridge needed for async)
### Option 3: Dramatiq + Redis
Use Dramatiq as a simpler Celery alternative.
**Pros:**
- Simpler API than Celery
- Good async support
**Cons:**
- Less mature ecosystem
- Fewer monitoring tools
### Option 4: ARQ (Async Redis Queue)
Use ARQ for native async task processing.
**Pros:**
- Native async
- Simple API
**Cons:**
- Less feature-rich
- Smaller community
## Decision
**Adopt Option 2: Celery + Redis.**
Celery provides the reliability, monitoring, and ecosystem maturity needed for production workloads. Redis serves as both broker and result backend.
## Implementation
### Queue Architecture
```
┌─────────────────────────────────────────────────┐
│ Redis (Broker + Backend) │
├─────────────┬─────────────┬─────────────────────┤
│ agent_queue │ git_queue │ sync_queue │
│ (prefetch=1)│ (prefetch=4)│ (prefetch=4) │
└──────┬──────┴──────┬──────┴──────────┬──────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Agent │ │ Git │ │ Sync │
│ Workers │ │ Workers │ │ Workers │
└─────────┘ └─────────┘ └─────────┘
```
### Queue Configuration
| Queue | Prefetch | Concurrency | Purpose |
|-------|----------|-------------|---------|
| `agent_queue` | 1 | 4 | LLM-based tasks (rate limited) |
| `git_queue` | 4 | 8 | Git operations |
| `sync_queue` | 4 | 4 | External sync |
| `cicd_queue` | 4 | 4 | Pipeline operations |
### Task Patterns
**Progress Reporting:**
```python
@celery_app.task(bind=True)
def implement_story(self, story_id: str, agent_id: str, project_id: str):
for i, step in enumerate(steps):
self.update_state(
state="PROGRESS",
meta={"current": i + 1, "total": len(steps)}
)
# Publish SSE event for real-time UI update
event_bus.publish(f"project:{project_id}", {
"type": "agent_progress",
"step": i + 1,
"total": len(steps)
})
execute_step(step)
```
**Task Chaining:**
```python
workflow = chain(
analyze_requirements.s(story_id),
design_solution.s(),
implement_code.s(),
run_tests.s(),
create_pr.s()
)
```
### Monitoring
- **Flower:** Web UI for task monitoring (port 5555)
- **Prometheus:** Metrics export for alerting
- **Dead Letter Queue:** Failed tasks for investigation
## Consequences
### Positive
- Reliable task execution with persistence
- Automatic retry with exponential backoff
- Progress tracking for long operations
- Distributed workers for scalability
- Rich monitoring and debugging tools
### Negative
- Additional infrastructure (Redis, workers)
- Celery is synchronous (event_loop bridge for async calls)
- Learning curve for task patterns
### Mitigation
- Use existing Redis instance (already needed for SSE)
- Wrap async calls with `asyncio.run()` or `sync_to_async`
- Document common task patterns
## Compliance
This decision aligns with:
- FR-304: Long-running implementation workflow
- NFR-102: 500+ background jobs per minute
- NFR-402: Task reliability and fault tolerance
---
*This ADR supersedes any previous decisions regarding background task processing.*