syndarix/docs/adrs/ADR-002-realtime-communication.md

# ADR-002: Real-time Communication Architecture

**Status:** Accepted
**Date:** 2025-12-29
**Deciders:** Architecture Team
**Related Spikes:** SPIKE-003

---

## Context

Syndarix requires real-time communication for:
- Agent activity streams
- Project progress updates
- Build/pipeline status
- Client approval requests
- Issue change notifications
- Interactive chat with agents

We need to decide between WebSocket and Server-Sent Events (SSE) for real-time data delivery.

## Decision Drivers

- **Simplicity:** Minimize implementation complexity
- **Reliability:** Built-in reconnection handling
- **Scalability:** Support 200+ concurrent connections
- **Compatibility:** Work through proxies and load balancers
- **Use Case Fit:** Match communication patterns

## Considered Options

### Option 1: WebSocket Only
Use WebSocket for all real-time communication.

**Pros:**
- Bidirectional communication
- Single protocol to manage
- Well-supported in FastAPI

**Cons:**
- Manual reconnection logic required
- More complex through proxies
- Overkill for server-to-client streams

### Option 2: SSE Only
Use Server-Sent Events for all real-time communication.

**Pros:**
- Built-in automatic reconnection
- Native HTTP (proxy-friendly)
- Simpler implementation

**Cons:**
- Unidirectional only
- Browser connection limits per domain

### Option 3: SSE Primary + WebSocket for Chat (Selected)
Use SSE for server-to-client events, WebSocket for bidirectional chat.

**Pros:**
- Best tool for each use case
- SSE simplicity for 90% of needs
- WebSocket only where truly needed

**Cons:**
- Two protocols to manage

## Decision

**Adopt Option 3: SSE as primary transport, WebSocket for interactive chat.**

### SSE Use Cases (90%)
- Agent activity streams
- Project progress updates
- Build/pipeline status
- Approval request notifications
- Issue change notifications

### WebSocket Use Cases (10%)
- Interactive chat with agents
- Real-time debugging sessions
- Future collaboration features

## Implementation

### Event Bus with Redis Pub/Sub

```
FastAPI Backend ──publish──> Redis Pub/Sub ──subscribe──> SSE Endpoints
                                   │
                                   └──> Other Backend Instances
```

### SSE Endpoint Pattern

```python
@router.get("/projects/{project_id}/events")
async def project_events(project_id: str, request: Request):
    async def event_generator():
        subscriber = await event_bus.subscribe(f"project:{project_id}")
        try:
            while not await request.is_disconnected():
                event = await asyncio.wait_for(
                    subscriber.get_event(), timeout=30.0
                )
                yield f"event: {event.type}\ndata: {event.json()}\n\n"
        finally:
            await subscriber.unsubscribe()

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )
```

### Event Types

| Category | Event Types |
|----------|-------------|
| Agent | `agent_started`, `agent_activity`, `agent_completed`, `agent_error` |
| Project | `issue_created`, `issue_updated`, `issue_closed` |
| Git | `branch_created`, `commit_pushed`, `pr_created`, `pr_merged` |
| Workflow | `approval_required`, `sprint_started`, `sprint_completed` |
| Pipeline | `pipeline_started`, `pipeline_completed`, `pipeline_failed` |

### Client Implementation

- Single SSE connection per project
- Event multiplexing through event types
- Exponential backoff on reconnection
- Native `EventSource` API with automatic reconnect

## Consequences

### Positive
- Simpler implementation for server-to-client streams
- Automatic reconnection reduces client complexity
- Works through all HTTP proxies
- Reduced server resource usage vs WebSocket

### Negative
- Two protocols to maintain
- WebSocket requires manual reconnect logic
- SSE limited to ~6 connections per domain (HTTP/1.1)

### Mitigation
- Use HTTP/2 where possible (higher connection limits)
- Multiplex all project events on single connection
- WebSocket only for interactive chat sessions

## Compliance

This decision aligns with:
- FR-105: Real-time agent activity monitoring
- NFR-102: 200+ concurrent connections requirement
- NFR-501: Responsive UI updates

---

*This ADR supersedes any previous decisions regarding real-time communication.*