forked from cardosofelipe/fast-next-template
docs: add architecture decision records (ADRs) for key technical choices
- Added the following ADRs to `docs/adrs/` directory: - ADR-001: MCP Integration Architecture - ADR-002: Real-time Communication Architecture - ADR-003: Background Task Architecture - ADR-004: LLM Provider Abstraction - ADR-005: Technology Stack Selection - Each ADR details the context, decision drivers, considered options, final decisions, and implementation plans. - Documentation aligns technical choices with architecture principles and system requirements for Syndarix.
This commit is contained in:
160
docs/adrs/ADR-002-realtime-communication.md
Normal file
160
docs/adrs/ADR-002-realtime-communication.md
Normal file
@@ -0,0 +1,160 @@
|
||||
# ADR-002: Real-time Communication Architecture
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2025-12-29
|
||||
**Deciders:** Architecture Team
|
||||
**Related Spikes:** SPIKE-003
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Syndarix requires real-time communication for:
|
||||
- Agent activity streams
|
||||
- Project progress updates
|
||||
- Build/pipeline status
|
||||
- Client approval requests
|
||||
- Issue change notifications
|
||||
- Interactive chat with agents
|
||||
|
||||
We need to decide between WebSocket and Server-Sent Events (SSE) for real-time data delivery.
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
- **Simplicity:** Minimize implementation complexity
|
||||
- **Reliability:** Built-in reconnection handling
|
||||
- **Scalability:** Support 200+ concurrent connections
|
||||
- **Compatibility:** Work through proxies and load balancers
|
||||
- **Use Case Fit:** Match communication patterns
|
||||
|
||||
## Considered Options
|
||||
|
||||
### Option 1: WebSocket Only
|
||||
Use WebSocket for all real-time communication.
|
||||
|
||||
**Pros:**
|
||||
- Bidirectional communication
|
||||
- Single protocol to manage
|
||||
- Well-supported in FastAPI
|
||||
|
||||
**Cons:**
|
||||
- Manual reconnection logic required
|
||||
- More complex through proxies
|
||||
- Overkill for server-to-client streams
|
||||
|
||||
### Option 2: SSE Only
|
||||
Use Server-Sent Events for all real-time communication.
|
||||
|
||||
**Pros:**
|
||||
- Built-in automatic reconnection
|
||||
- Native HTTP (proxy-friendly)
|
||||
- Simpler implementation
|
||||
|
||||
**Cons:**
|
||||
- Unidirectional only
|
||||
- Browser connection limits per domain
|
||||
|
||||
### Option 3: SSE Primary + WebSocket for Chat (Selected)
|
||||
Use SSE for server-to-client events, WebSocket for bidirectional chat.
|
||||
|
||||
**Pros:**
|
||||
- Best tool for each use case
|
||||
- SSE simplicity for 90% of needs
|
||||
- WebSocket only where truly needed
|
||||
|
||||
**Cons:**
|
||||
- Two protocols to manage
|
||||
|
||||
## Decision
|
||||
|
||||
**Adopt Option 3: SSE as primary transport, WebSocket for interactive chat.**
|
||||
|
||||
### SSE Use Cases (90%)
|
||||
- Agent activity streams
|
||||
- Project progress updates
|
||||
- Build/pipeline status
|
||||
- Approval request notifications
|
||||
- Issue change notifications
|
||||
|
||||
### WebSocket Use Cases (10%)
|
||||
- Interactive chat with agents
|
||||
- Real-time debugging sessions
|
||||
- Future collaboration features
|
||||
|
||||
## Implementation
|
||||
|
||||
### Event Bus with Redis Pub/Sub
|
||||
|
||||
```
|
||||
FastAPI Backend ──publish──> Redis Pub/Sub ──subscribe──> SSE Endpoints
|
||||
│
|
||||
└──> Other Backend Instances
|
||||
```
|
||||
|
||||
### SSE Endpoint Pattern
|
||||
|
||||
```python
|
||||
@router.get("/projects/{project_id}/events")
|
||||
async def project_events(project_id: str, request: Request):
|
||||
async def event_generator():
|
||||
subscriber = await event_bus.subscribe(f"project:{project_id}")
|
||||
try:
|
||||
while not await request.is_disconnected():
|
||||
event = await asyncio.wait_for(
|
||||
subscriber.get_event(), timeout=30.0
|
||||
)
|
||||
yield f"event: {event.type}\ndata: {event.json()}\n\n"
|
||||
finally:
|
||||
await subscriber.unsubscribe()
|
||||
|
||||
return StreamingResponse(
|
||||
event_generator(),
|
||||
media_type="text/event-stream"
|
||||
)
|
||||
```
|
||||
|
||||
### Event Types
|
||||
|
||||
| Category | Event Types |
|
||||
|----------|-------------|
|
||||
| Agent | `agent_started`, `agent_activity`, `agent_completed`, `agent_error` |
|
||||
| Project | `issue_created`, `issue_updated`, `issue_closed` |
|
||||
| Git | `branch_created`, `commit_pushed`, `pr_created`, `pr_merged` |
|
||||
| Workflow | `approval_required`, `sprint_started`, `sprint_completed` |
|
||||
| Pipeline | `pipeline_started`, `pipeline_completed`, `pipeline_failed` |
|
||||
|
||||
### Client Implementation
|
||||
|
||||
- Single SSE connection per project
|
||||
- Event multiplexing through event types
|
||||
- Exponential backoff on reconnection
|
||||
- Native `EventSource` API with automatic reconnect
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Simpler implementation for server-to-client streams
|
||||
- Automatic reconnection reduces client complexity
|
||||
- Works through all HTTP proxies
|
||||
- Reduced server resource usage vs WebSocket
|
||||
|
||||
### Negative
|
||||
- Two protocols to maintain
|
||||
- WebSocket requires manual reconnect logic
|
||||
- SSE limited to ~6 connections per domain (HTTP/1.1)
|
||||
|
||||
### Mitigation
|
||||
- Use HTTP/2 where possible (higher connection limits)
|
||||
- Multiplex all project events on single connection
|
||||
- WebSocket only for interactive chat sessions
|
||||
|
||||
## Compliance
|
||||
|
||||
This decision aligns with:
|
||||
- FR-105: Real-time agent activity monitoring
|
||||
- NFR-102: 200+ concurrent connections requirement
|
||||
- NFR-501: Responsive UI updates
|
||||
|
||||
---
|
||||
|
||||
*This ADR supersedes any previous decisions regarding real-time communication.*
|
||||
Reference in New Issue
Block a user