docs: add architecture decision records (ADRs) for key technical choices

- Added the following ADRs to `docs/adrs/` directory: - ADR-001: MCP Integration Architecture - ADR-002: Real-time Communication Architecture - ADR-003: Background Task Architecture - ADR-004: LLM Provider Abstraction - ADR-005: Technology Stack Selection - Each ADR details the context, decision drivers, considered options, final decisions, and implementation plans. - Documentation aligns technical choices with architecture principles and system requirements for Syndarix.
2025-12-29 13:16:02 +01:00
parent a6a336b66e
commit 6e3cdebbfb
7 changed files with 1565 additions and 0 deletions
--- a/docs/adrs/ADR-002-realtime-communication.md
+++ b/docs/adrs/ADR-002-realtime-communication.md
@@ -0,0 +1,160 @@
+# ADR-002: Real-time Communication Architecture
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+**Related Spikes:** SPIKE-003
+
+---
+
+## Context
+
+Syndarix requires real-time communication for:
+- Agent activity streams
+- Project progress updates
+- Build/pipeline status
+- Client approval requests
+- Issue change notifications
+- Interactive chat with agents
+
+We need to decide between WebSocket and Server-Sent Events (SSE) for real-time data delivery.
+
+## Decision Drivers
+
+- **Simplicity:** Minimize implementation complexity
+- **Reliability:** Built-in reconnection handling
+- **Scalability:** Support 200+ concurrent connections
+- **Compatibility:** Work through proxies and load balancers
+- **Use Case Fit:** Match communication patterns
+
+## Considered Options
+
+### Option 1: WebSocket Only
+Use WebSocket for all real-time communication.
+
+**Pros:**
+- Bidirectional communication
+- Single protocol to manage
+- Well-supported in FastAPI
+
+**Cons:**
+- Manual reconnection logic required
+- More complex through proxies
+- Overkill for server-to-client streams
+
+### Option 2: SSE Only
+Use Server-Sent Events for all real-time communication.
+
+**Pros:**
+- Built-in automatic reconnection
+- Native HTTP (proxy-friendly)
+- Simpler implementation
+
+**Cons:**
+- Unidirectional only
+- Browser connection limits per domain
+
+### Option 3: SSE Primary + WebSocket for Chat (Selected)
+Use SSE for server-to-client events, WebSocket for bidirectional chat.
+
+**Pros:**
+- Best tool for each use case
+- SSE simplicity for 90% of needs
+- WebSocket only where truly needed
+
+**Cons:**
+- Two protocols to manage
+
+## Decision
+
+**Adopt Option 3: SSE as primary transport, WebSocket for interactive chat.**
+
+### SSE Use Cases (90%)
+- Agent activity streams
+- Project progress updates
+- Build/pipeline status
+- Approval request notifications
+- Issue change notifications
+
+### WebSocket Use Cases (10%)
+- Interactive chat with agents
+- Real-time debugging sessions
+- Future collaboration features
+
+## Implementation
+
+### Event Bus with Redis Pub/Sub
+
+```
+FastAPI Backend ──publish──> Redis Pub/Sub ──subscribe──> SSE Endpoints
+                                   │
+                                   └──> Other Backend Instances
+```
+
+### SSE Endpoint Pattern
+
+```python
+@router.get("/projects/{project_id}/events")
+async def project_events(project_id: str, request: Request):
+    async def event_generator():
+        subscriber = await event_bus.subscribe(f"project:{project_id}")
+        try:
+            while not await request.is_disconnected():
+                event = await asyncio.wait_for(
+                    subscriber.get_event(), timeout=30.0
+                )
+                yield f"event: {event.type}\ndata: {event.json()}\n\n"
+        finally:
+            await subscriber.unsubscribe()
+
+    return StreamingResponse(
+        event_generator(),
+        media_type="text/event-stream"
+    )
+```
+
+### Event Types
+
+| Category | Event Types |
+|----------|-------------|
+| Agent | `agent_started`, `agent_activity`, `agent_completed`, `agent_error` |
+| Project | `issue_created`, `issue_updated`, `issue_closed` |
+| Git | `branch_created`, `commit_pushed`, `pr_created`, `pr_merged` |
+| Workflow | `approval_required`, `sprint_started`, `sprint_completed` |
+| Pipeline | `pipeline_started`, `pipeline_completed`, `pipeline_failed` |
+
+### Client Implementation
+
+- Single SSE connection per project
+- Event multiplexing through event types
+- Exponential backoff on reconnection
+- Native `EventSource` API with automatic reconnect
+
+## Consequences
+
+### Positive
+- Simpler implementation for server-to-client streams
+- Automatic reconnection reduces client complexity
+- Works through all HTTP proxies
+- Reduced server resource usage vs WebSocket
+
+### Negative
+- Two protocols to maintain
+- WebSocket requires manual reconnect logic
+- SSE limited to ~6 connections per domain (HTTP/1.1)
+
+### Mitigation
+- Use HTTP/2 where possible (higher connection limits)
+- Multiplex all project events on single connection
+- WebSocket only for interactive chat sessions
+
+## Compliance
+
+This decision aligns with:
+- FR-105: Real-time agent activity monitoring
+- NFR-102: 200+ concurrent connections requirement
+- NFR-501: Responsive UI updates
+
+---
+
+*This ADR supersedes any previous decisions regarding real-time communication.*