Files
syndarix/docs/adrs/ADR-003-background-task-architecture.md
Felipe Cardoso 6e3cdebbfb docs: add architecture decision records (ADRs) for key technical choices
- Added the following ADRs to `docs/adrs/` directory:
  - ADR-001: MCP Integration Architecture
  - ADR-002: Real-time Communication Architecture
  - ADR-003: Background Task Architecture
  - ADR-004: LLM Provider Abstraction
  - ADR-005: Technology Stack Selection
- Each ADR details the context, decision drivers, considered options, final decisions, and implementation plans.
- Documentation aligns technical choices with architecture principles and system requirements for Syndarix.
2025-12-29 13:16:02 +01:00

5.1 KiB

ADR-003: Background Task Architecture

Status: Accepted Date: 2025-12-29 Deciders: Architecture Team Related Spikes: SPIKE-004


Context

Syndarix requires background task processing for:

  • Agent actions (LLM calls, code generation)
  • Git operations (clone, commit, push, PR creation)
  • External synchronization (issue sync with Gitea/GitHub/GitLab)
  • CI/CD pipeline triggers
  • Long-running workflows (sprints, story implementation)

These tasks are too slow for synchronous API responses and need proper queuing, retry, and monitoring.

Decision Drivers

  • Reliability: Tasks must complete even if workers restart
  • Visibility: Progress tracking for long-running operations
  • Scalability: Handle concurrent agent operations
  • Rate Limiting: Respect LLM API rate limits
  • Async Compatibility: Work with async FastAPI

Considered Options

Option 1: FastAPI BackgroundTasks

Use FastAPI's built-in background tasks.

Pros:

  • Simple, no additional infrastructure
  • Direct async integration

Cons:

  • No persistence (lost on restart)
  • No retry mechanism
  • No distributed workers

Option 2: Celery + Redis (Selected)

Use Celery for task queue with Redis as broker/backend.

Pros:

  • Mature, battle-tested
  • Persistent task queue
  • Built-in retry with backoff
  • Distributed workers
  • Task chaining and workflows
  • Monitoring with Flower

Cons:

  • Additional infrastructure
  • Sync-only task execution (bridge needed for async)

Option 3: Dramatiq + Redis

Use Dramatiq as a simpler Celery alternative.

Pros:

  • Simpler API than Celery
  • Good async support

Cons:

  • Less mature ecosystem
  • Fewer monitoring tools

Option 4: ARQ (Async Redis Queue)

Use ARQ for native async task processing.

Pros:

  • Native async
  • Simple API

Cons:

  • Less feature-rich
  • Smaller community

Decision

Adopt Option 2: Celery + Redis.

Celery provides the reliability, monitoring, and ecosystem maturity needed for production workloads. Redis serves as both broker and result backend.

Implementation

Queue Architecture

┌─────────────────────────────────────────────────┐
│                 Redis (Broker + Backend)         │
├─────────────┬─────────────┬─────────────────────┤
│ agent_queue │  git_queue  │     sync_queue      │
│ (prefetch=1)│ (prefetch=4)│    (prefetch=4)     │
└──────┬──────┴──────┬──────┴──────────┬──────────┘
       │             │                 │
       ▼             ▼                 ▼
  ┌─────────┐  ┌─────────┐       ┌─────────┐
  │ Agent   │  │  Git    │       │  Sync   │
  │ Workers │  │ Workers │       │ Workers │
  └─────────┘  └─────────┘       └─────────┘

Queue Configuration

Queue Prefetch Concurrency Purpose
agent_queue 1 4 LLM-based tasks (rate limited)
git_queue 4 8 Git operations
sync_queue 4 4 External sync
cicd_queue 4 4 Pipeline operations

Task Patterns

Progress Reporting:

@celery_app.task(bind=True)
def implement_story(self, story_id: str, agent_id: str, project_id: str):
    for i, step in enumerate(steps):
        self.update_state(
            state="PROGRESS",
            meta={"current": i + 1, "total": len(steps)}
        )
        # Publish SSE event for real-time UI update
        event_bus.publish(f"project:{project_id}", {
            "type": "agent_progress",
            "step": i + 1,
            "total": len(steps)
        })
        execute_step(step)

Task Chaining:

workflow = chain(
    analyze_requirements.s(story_id),
    design_solution.s(),
    implement_code.s(),
    run_tests.s(),
    create_pr.s()
)

Monitoring

  • Flower: Web UI for task monitoring (port 5555)
  • Prometheus: Metrics export for alerting
  • Dead Letter Queue: Failed tasks for investigation

Consequences

Positive

  • Reliable task execution with persistence
  • Automatic retry with exponential backoff
  • Progress tracking for long operations
  • Distributed workers for scalability
  • Rich monitoring and debugging tools

Negative

  • Additional infrastructure (Redis, workers)
  • Celery is synchronous (event_loop bridge for async calls)
  • Learning curve for task patterns

Mitigation

  • Use existing Redis instance (already needed for SSE)
  • Wrap async calls with asyncio.run() or sync_to_async
  • Document common task patterns

Compliance

This decision aligns with:

  • FR-304: Long-running implementation workflow
  • NFR-102: 500+ background jobs per minute
  • NFR-402: Task reliability and fault tolerance

This ADR supersedes any previous decisions regarding background task processing.