forked from cardosofelipe/fast-next-template
Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
167 lines
4.9 KiB
Markdown
167 lines
4.9 KiB
Markdown
# ADR-009: Agent Communication Protocol
|
|
|
|
**Status:** Accepted
|
|
**Date:** 2025-12-29
|
|
**Deciders:** Architecture Team
|
|
**Related Spikes:** SPIKE-007
|
|
|
|
---
|
|
|
|
## Context
|
|
|
|
Syndarix requires a robust protocol for inter-agent communication. 10+ specialized AI agents must collaborate on software projects, sharing context, delegating tasks, and resolving conflicts.
|
|
|
|
## Decision Drivers
|
|
|
|
- **Auditability:** All communication must be traceable
|
|
- **Flexibility:** Support various communication patterns
|
|
- **Performance:** Low-latency for interactive collaboration
|
|
- **Reliability:** Messages must not be lost
|
|
|
|
## Considered Options
|
|
|
|
### Option 1: Pure Natural Language
|
|
Agents communicate via free-form text messages.
|
|
|
|
**Pros:** Simple, flexible
|
|
**Cons:** Difficult to route, parse, and audit
|
|
|
|
### Option 2: Rigid RPC Protocol
|
|
Strongly-typed function calls between agents.
|
|
|
|
**Pros:** Predictable, type-safe
|
|
**Cons:** Loses LLM reasoning flexibility
|
|
|
|
### Option 3: Structured Envelope + Natural Language Payload (Selected)
|
|
JSON envelope for routing/auditing with natural language content.
|
|
|
|
**Pros:** Best of both worlds - routeable and auditable while preserving LLM capabilities
|
|
**Cons:** Slightly more complex
|
|
|
|
## Decision
|
|
|
|
**Adopt structured message envelopes with natural language payloads**, inspired by Google's A2A protocol concepts.
|
|
|
|
## Implementation
|
|
|
|
### Message Schema
|
|
|
|
```python
|
|
@dataclass
|
|
class AgentMessage:
|
|
id: UUID # Unique message ID
|
|
type: Literal["request", "response", "broadcast", "notification"]
|
|
|
|
# Routing
|
|
from_agent: AgentIdentity # Source agent
|
|
to_agent: AgentIdentity | None # Target (None = broadcast)
|
|
routing: Literal["direct", "role", "broadcast", "topic"]
|
|
|
|
# Action
|
|
action: str # e.g., "request_guidance", "task_handoff"
|
|
priority: Literal["low", "normal", "high", "urgent"]
|
|
|
|
# Context
|
|
project_id: str
|
|
conversation_id: str | None # For threading
|
|
correlation_id: UUID | None # For request/response matching
|
|
|
|
# Content
|
|
content: str # Natural language message
|
|
attachments: list[Attachment] # Code snippets, files, etc.
|
|
|
|
# Metadata
|
|
created_at: datetime
|
|
expires_at: datetime | None
|
|
requires_response: bool
|
|
```
|
|
|
|
### Routing Strategies
|
|
|
|
| Strategy | Syntax | Use Case |
|
|
|----------|--------|----------|
|
|
| Direct | `to: "agent-123"` | Specific agent |
|
|
| Role-based | `to: "@engineers"` | All agents of role |
|
|
| Broadcast | `to: "@all"` | Project-wide |
|
|
| Topic-based | `to: "#auth-module"` | Subscribed agents |
|
|
|
|
### Communication Modes
|
|
|
|
```python
|
|
class MessageMode(str, Enum):
|
|
SYNC = "sync" # Await response (< 30s)
|
|
ASYNC = "async" # Queue, callback later
|
|
FIRE_AND_FORGET = "fire" # No response expected
|
|
STREAM = "stream" # Continuous updates
|
|
```
|
|
|
|
### Message Bus Implementation
|
|
|
|
```python
|
|
class AgentMessageBus:
|
|
"""Redis Streams-based message bus for agent communication."""
|
|
|
|
async def send(self, message: AgentMessage) -> None:
|
|
# Persist to PostgreSQL for audit
|
|
await self.store.save(message)
|
|
|
|
# Publish to Redis for real-time delivery
|
|
channel = self._get_channel(message)
|
|
await self.redis.xadd(channel, message.to_dict())
|
|
|
|
# Publish SSE event for UI visibility
|
|
await self.event_bus.publish(
|
|
f"project:{message.project_id}",
|
|
{"type": "agent_message", "preview": message.content[:100]}
|
|
)
|
|
|
|
async def subscribe(self, agent_id: str) -> AsyncIterator[AgentMessage]:
|
|
"""Subscribe to messages for an agent."""
|
|
channels = [
|
|
f"agent:{agent_id}", # Direct messages
|
|
f"role:{agent.role}", # Role-based
|
|
f"project:{agent.project_id}", # Broadcasts
|
|
]
|
|
# ... Redis Streams consumer group logic
|
|
```
|
|
|
|
### Context Hierarchy
|
|
|
|
1. **Conversation Context** (short-term): Current thread, last N exchanges
|
|
2. **Session Context** (medium-term): Sprint goals, recent decisions
|
|
3. **Project Context** (long-term): Architecture, requirements, knowledge base
|
|
|
|
### Conflict Resolution
|
|
|
|
When agents disagree:
|
|
1. **Peer Resolution:** Agents attempt consensus (2 attempts)
|
|
2. **Supervisor Escalation:** Product Owner or Architect decides
|
|
3. **Human Override:** Client approval if configured
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
- Full audit trail of all agent communication
|
|
- Flexible routing supports various collaboration patterns
|
|
- Natural language preserves LLM reasoning quality
|
|
- Real-time UI visibility into agent collaboration
|
|
|
|
### Negative
|
|
- Additional complexity vs simple function calls
|
|
- Message persistence storage requirements
|
|
|
|
### Mitigation
|
|
- Archival policy for old messages
|
|
- Compression for large attachments
|
|
|
|
## Compliance
|
|
|
|
This decision aligns with:
|
|
- FR-104: Inter-agent communication
|
|
- FR-105: Agent activity monitoring
|
|
- NFR-602: Comprehensive audit logging
|
|
|
|
---
|
|
|
|
*This ADR establishes the agent communication protocol for Syndarix.*
|