- Added the following ADRs to `docs/adrs/` directory: - ADR-001: MCP Integration Architecture - ADR-002: Real-time Communication Architecture - ADR-003: Background Task Architecture - ADR-004: LLM Provider Abstraction - ADR-005: Technology Stack Selection - Each ADR details the context, decision drivers, considered options, final decisions, and implementation plans. - Documentation aligns technical choices with architecture principles and system requirements for Syndarix.
5.1 KiB
ADR-003: Background Task Architecture
Status: Accepted Date: 2025-12-29 Deciders: Architecture Team Related Spikes: SPIKE-004
Context
Syndarix requires background task processing for:
- Agent actions (LLM calls, code generation)
- Git operations (clone, commit, push, PR creation)
- External synchronization (issue sync with Gitea/GitHub/GitLab)
- CI/CD pipeline triggers
- Long-running workflows (sprints, story implementation)
These tasks are too slow for synchronous API responses and need proper queuing, retry, and monitoring.
Decision Drivers
- Reliability: Tasks must complete even if workers restart
- Visibility: Progress tracking for long-running operations
- Scalability: Handle concurrent agent operations
- Rate Limiting: Respect LLM API rate limits
- Async Compatibility: Work with async FastAPI
Considered Options
Option 1: FastAPI BackgroundTasks
Use FastAPI's built-in background tasks.
Pros:
- Simple, no additional infrastructure
- Direct async integration
Cons:
- No persistence (lost on restart)
- No retry mechanism
- No distributed workers
Option 2: Celery + Redis (Selected)
Use Celery for task queue with Redis as broker/backend.
Pros:
- Mature, battle-tested
- Persistent task queue
- Built-in retry with backoff
- Distributed workers
- Task chaining and workflows
- Monitoring with Flower
Cons:
- Additional infrastructure
- Sync-only task execution (bridge needed for async)
Option 3: Dramatiq + Redis
Use Dramatiq as a simpler Celery alternative.
Pros:
- Simpler API than Celery
- Good async support
Cons:
- Less mature ecosystem
- Fewer monitoring tools
Option 4: ARQ (Async Redis Queue)
Use ARQ for native async task processing.
Pros:
- Native async
- Simple API
Cons:
- Less feature-rich
- Smaller community
Decision
Adopt Option 2: Celery + Redis.
Celery provides the reliability, monitoring, and ecosystem maturity needed for production workloads. Redis serves as both broker and result backend.
Implementation
Queue Architecture
┌─────────────────────────────────────────────────┐
│ Redis (Broker + Backend) │
├─────────────┬─────────────┬─────────────────────┤
│ agent_queue │ git_queue │ sync_queue │
│ (prefetch=1)│ (prefetch=4)│ (prefetch=4) │
└──────┬──────┴──────┬──────┴──────────┬──────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Agent │ │ Git │ │ Sync │
│ Workers │ │ Workers │ │ Workers │
└─────────┘ └─────────┘ └─────────┘
Queue Configuration
| Queue | Prefetch | Concurrency | Purpose |
|---|---|---|---|
agent_queue |
1 | 4 | LLM-based tasks (rate limited) |
git_queue |
4 | 8 | Git operations |
sync_queue |
4 | 4 | External sync |
cicd_queue |
4 | 4 | Pipeline operations |
Task Patterns
Progress Reporting:
@celery_app.task(bind=True)
def implement_story(self, story_id: str, agent_id: str, project_id: str):
for i, step in enumerate(steps):
self.update_state(
state="PROGRESS",
meta={"current": i + 1, "total": len(steps)}
)
# Publish SSE event for real-time UI update
event_bus.publish(f"project:{project_id}", {
"type": "agent_progress",
"step": i + 1,
"total": len(steps)
})
execute_step(step)
Task Chaining:
workflow = chain(
analyze_requirements.s(story_id),
design_solution.s(),
implement_code.s(),
run_tests.s(),
create_pr.s()
)
Monitoring
- Flower: Web UI for task monitoring (port 5555)
- Prometheus: Metrics export for alerting
- Dead Letter Queue: Failed tasks for investigation
Consequences
Positive
- Reliable task execution with persistence
- Automatic retry with exponential backoff
- Progress tracking for long operations
- Distributed workers for scalability
- Rich monitoring and debugging tools
Negative
- Additional infrastructure (Redis, workers)
- Celery is synchronous (event_loop bridge for async calls)
- Learning curve for task patterns
Mitigation
- Use existing Redis instance (already needed for SSE)
- Wrap async calls with
asyncio.run()orsync_to_async - Document common task patterns
Compliance
This decision aligns with:
- FR-304: Long-running implementation workflow
- NFR-102: 500+ background jobs per minute
- NFR-402: Task reliability and fault tolerance
This ADR supersedes any previous decisions regarding background task processing.