Files
syndarix/docs/adrs/ADR-009-agent-communication-protocol.md
Felipe Cardoso 406b25cda0 docs: add remaining ADRs and comprehensive architecture documentation
Added 7 new Architecture Decision Records completing the full set:
- ADR-008: Knowledge Base and RAG (pgvector)
- ADR-009: Agent Communication Protocol (structured messages)
- ADR-010: Workflow State Machine (transitions + PostgreSQL)
- ADR-011: Issue Synchronization (webhook-first + polling)
- ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets)
- ADR-013: Audit Logging (hash chaining + tiered storage)
- ADR-014: Client Approval Flow (checkpoint-based)

Added comprehensive ARCHITECTURE.md that:
- Summarizes all 14 ADRs in decision matrix
- Documents full system architecture with diagrams
- Explains all component interactions
- Details technology stack with self-hostability guarantee
- Covers security, scalability, and deployment

Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:54:43 +01:00

167 lines
4.9 KiB
Markdown

# ADR-009: Agent Communication Protocol
**Status:** Accepted
**Date:** 2025-12-29
**Deciders:** Architecture Team
**Related Spikes:** SPIKE-007
---
## Context
Syndarix requires a robust protocol for inter-agent communication. 10+ specialized AI agents must collaborate on software projects, sharing context, delegating tasks, and resolving conflicts.
## Decision Drivers
- **Auditability:** All communication must be traceable
- **Flexibility:** Support various communication patterns
- **Performance:** Low-latency for interactive collaboration
- **Reliability:** Messages must not be lost
## Considered Options
### Option 1: Pure Natural Language
Agents communicate via free-form text messages.
**Pros:** Simple, flexible
**Cons:** Difficult to route, parse, and audit
### Option 2: Rigid RPC Protocol
Strongly-typed function calls between agents.
**Pros:** Predictable, type-safe
**Cons:** Loses LLM reasoning flexibility
### Option 3: Structured Envelope + Natural Language Payload (Selected)
JSON envelope for routing/auditing with natural language content.
**Pros:** Best of both worlds - routeable and auditable while preserving LLM capabilities
**Cons:** Slightly more complex
## Decision
**Adopt structured message envelopes with natural language payloads**, inspired by Google's A2A protocol concepts.
## Implementation
### Message Schema
```python
@dataclass
class AgentMessage:
id: UUID # Unique message ID
type: Literal["request", "response", "broadcast", "notification"]
# Routing
from_agent: AgentIdentity # Source agent
to_agent: AgentIdentity | None # Target (None = broadcast)
routing: Literal["direct", "role", "broadcast", "topic"]
# Action
action: str # e.g., "request_guidance", "task_handoff"
priority: Literal["low", "normal", "high", "urgent"]
# Context
project_id: str
conversation_id: str | None # For threading
correlation_id: UUID | None # For request/response matching
# Content
content: str # Natural language message
attachments: list[Attachment] # Code snippets, files, etc.
# Metadata
created_at: datetime
expires_at: datetime | None
requires_response: bool
```
### Routing Strategies
| Strategy | Syntax | Use Case |
|----------|--------|----------|
| Direct | `to: "agent-123"` | Specific agent |
| Role-based | `to: "@engineers"` | All agents of role |
| Broadcast | `to: "@all"` | Project-wide |
| Topic-based | `to: "#auth-module"` | Subscribed agents |
### Communication Modes
```python
class MessageMode(str, Enum):
SYNC = "sync" # Await response (< 30s)
ASYNC = "async" # Queue, callback later
FIRE_AND_FORGET = "fire" # No response expected
STREAM = "stream" # Continuous updates
```
### Message Bus Implementation
```python
class AgentMessageBus:
"""Redis Streams-based message bus for agent communication."""
async def send(self, message: AgentMessage) -> None:
# Persist to PostgreSQL for audit
await self.store.save(message)
# Publish to Redis for real-time delivery
channel = self._get_channel(message)
await self.redis.xadd(channel, message.to_dict())
# Publish SSE event for UI visibility
await self.event_bus.publish(
f"project:{message.project_id}",
{"type": "agent_message", "preview": message.content[:100]}
)
async def subscribe(self, agent_id: str) -> AsyncIterator[AgentMessage]:
"""Subscribe to messages for an agent."""
channels = [
f"agent:{agent_id}", # Direct messages
f"role:{agent.role}", # Role-based
f"project:{agent.project_id}", # Broadcasts
]
# ... Redis Streams consumer group logic
```
### Context Hierarchy
1. **Conversation Context** (short-term): Current thread, last N exchanges
2. **Session Context** (medium-term): Sprint goals, recent decisions
3. **Project Context** (long-term): Architecture, requirements, knowledge base
### Conflict Resolution
When agents disagree:
1. **Peer Resolution:** Agents attempt consensus (2 attempts)
2. **Supervisor Escalation:** Product Owner or Architect decides
3. **Human Override:** Client approval if configured
## Consequences
### Positive
- Full audit trail of all agent communication
- Flexible routing supports various collaboration patterns
- Natural language preserves LLM reasoning quality
- Real-time UI visibility into agent collaboration
### Negative
- Additional complexity vs simple function calls
- Message persistence storage requirements
### Mitigation
- Archival policy for old messages
- Compression for large attachments
## Compliance
This decision aligns with:
- FR-104: Inter-agent communication
- FR-105: Agent activity monitoring
- NFR-602: Comprehensive audit logging
---
*This ADR establishes the agent communication protocol for Syndarix.*