# ADR-002: Real-time Communication Architecture **Status:** Accepted **Date:** 2025-12-29 **Deciders:** Architecture Team **Related Spikes:** SPIKE-003 --- ## Context Syndarix requires real-time communication for: - Agent activity streams - Project progress updates - Build/pipeline status - Client approval requests - Issue change notifications - Interactive chat with agents We need to decide between WebSocket and Server-Sent Events (SSE) for real-time data delivery. ## Decision Drivers - **Simplicity:** Minimize implementation complexity - **Reliability:** Built-in reconnection handling - **Scalability:** Support 200+ concurrent connections - **Compatibility:** Work through proxies and load balancers - **Use Case Fit:** Match communication patterns ## Considered Options ### Option 1: WebSocket Only Use WebSocket for all real-time communication. **Pros:** - Bidirectional communication - Single protocol to manage - Well-supported in FastAPI **Cons:** - Manual reconnection logic required - More complex through proxies - Overkill for server-to-client streams ### Option 2: SSE Only Use Server-Sent Events for all real-time communication. **Pros:** - Built-in automatic reconnection - Native HTTP (proxy-friendly) - Simpler implementation **Cons:** - Unidirectional only - Browser connection limits per domain ### Option 3: SSE Primary + WebSocket for Chat (Selected) Use SSE for server-to-client events, WebSocket for bidirectional chat. **Pros:** - Best tool for each use case - SSE simplicity for 90% of needs - WebSocket only where truly needed **Cons:** - Two protocols to manage ## Decision **Adopt Option 3: SSE as primary transport, WebSocket for interactive chat.** ### SSE Use Cases (90%) - Agent activity streams - Project progress updates - Build/pipeline status - Approval request notifications - Issue change notifications ### WebSocket Use Cases (10%) - Interactive chat with agents - Real-time debugging sessions - Future collaboration features ## Implementation ### Event Bus with Redis Pub/Sub ``` FastAPI Backend ──publish──> Redis Pub/Sub ──subscribe──> SSE Endpoints │ └──> Other Backend Instances ``` ### SSE Endpoint Pattern ```python @router.get("/projects/{project_id}/events") async def project_events(project_id: str, request: Request): async def event_generator(): subscriber = await event_bus.subscribe(f"project:{project_id}") try: while not await request.is_disconnected(): event = await asyncio.wait_for( subscriber.get_event(), timeout=30.0 ) yield f"event: {event.type}\ndata: {event.json()}\n\n" finally: await subscriber.unsubscribe() return StreamingResponse( event_generator(), media_type="text/event-stream" ) ``` ### Event Types | Category | Event Types | |----------|-------------| | Agent | `agent_started`, `agent_activity`, `agent_completed`, `agent_error` | | Project | `issue_created`, `issue_updated`, `issue_closed` | | Git | `branch_created`, `commit_pushed`, `pr_created`, `pr_merged` | | Workflow | `approval_required`, `sprint_started`, `sprint_completed` | | Pipeline | `pipeline_started`, `pipeline_completed`, `pipeline_failed` | ### Client Implementation - Single SSE connection per project - Event multiplexing through event types - Exponential backoff on reconnection - Native `EventSource` API with automatic reconnect ## Consequences ### Positive - Simpler implementation for server-to-client streams - Automatic reconnection reduces client complexity - Works through all HTTP proxies - Reduced server resource usage vs WebSocket ### Negative - Two protocols to maintain - WebSocket requires manual reconnect logic - SSE limited to ~6 connections per domain (HTTP/1.1) ### Mitigation - Use HTTP/2 where possible (higher connection limits) - Multiplex all project events on single connection - WebSocket only for interactive chat sessions ## Compliance This decision aligns with: - FR-105: Real-time agent activity monitoring - NFR-102: 200+ concurrent connections requirement - NFR-501: Responsive UI updates --- *This ADR supersedes any previous decisions regarding real-time communication.*