docs: add remaining ADRs and comprehensive architecture documentation
Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
166
docs/adrs/ADR-009-agent-communication-protocol.md
Normal file
166
docs/adrs/ADR-009-agent-communication-protocol.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# ADR-009: Agent Communication Protocol
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2025-12-29
|
||||
**Deciders:** Architecture Team
|
||||
**Related Spikes:** SPIKE-007
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Syndarix requires a robust protocol for inter-agent communication. 10+ specialized AI agents must collaborate on software projects, sharing context, delegating tasks, and resolving conflicts.
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
- **Auditability:** All communication must be traceable
|
||||
- **Flexibility:** Support various communication patterns
|
||||
- **Performance:** Low-latency for interactive collaboration
|
||||
- **Reliability:** Messages must not be lost
|
||||
|
||||
## Considered Options
|
||||
|
||||
### Option 1: Pure Natural Language
|
||||
Agents communicate via free-form text messages.
|
||||
|
||||
**Pros:** Simple, flexible
|
||||
**Cons:** Difficult to route, parse, and audit
|
||||
|
||||
### Option 2: Rigid RPC Protocol
|
||||
Strongly-typed function calls between agents.
|
||||
|
||||
**Pros:** Predictable, type-safe
|
||||
**Cons:** Loses LLM reasoning flexibility
|
||||
|
||||
### Option 3: Structured Envelope + Natural Language Payload (Selected)
|
||||
JSON envelope for routing/auditing with natural language content.
|
||||
|
||||
**Pros:** Best of both worlds - routeable and auditable while preserving LLM capabilities
|
||||
**Cons:** Slightly more complex
|
||||
|
||||
## Decision
|
||||
|
||||
**Adopt structured message envelopes with natural language payloads**, inspired by Google's A2A protocol concepts.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Message Schema
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class AgentMessage:
|
||||
id: UUID # Unique message ID
|
||||
type: Literal["request", "response", "broadcast", "notification"]
|
||||
|
||||
# Routing
|
||||
from_agent: AgentIdentity # Source agent
|
||||
to_agent: AgentIdentity | None # Target (None = broadcast)
|
||||
routing: Literal["direct", "role", "broadcast", "topic"]
|
||||
|
||||
# Action
|
||||
action: str # e.g., "request_guidance", "task_handoff"
|
||||
priority: Literal["low", "normal", "high", "urgent"]
|
||||
|
||||
# Context
|
||||
project_id: str
|
||||
conversation_id: str | None # For threading
|
||||
correlation_id: UUID | None # For request/response matching
|
||||
|
||||
# Content
|
||||
content: str # Natural language message
|
||||
attachments: list[Attachment] # Code snippets, files, etc.
|
||||
|
||||
# Metadata
|
||||
created_at: datetime
|
||||
expires_at: datetime | None
|
||||
requires_response: bool
|
||||
```
|
||||
|
||||
### Routing Strategies
|
||||
|
||||
| Strategy | Syntax | Use Case |
|
||||
|----------|--------|----------|
|
||||
| Direct | `to: "agent-123"` | Specific agent |
|
||||
| Role-based | `to: "@engineers"` | All agents of role |
|
||||
| Broadcast | `to: "@all"` | Project-wide |
|
||||
| Topic-based | `to: "#auth-module"` | Subscribed agents |
|
||||
|
||||
### Communication Modes
|
||||
|
||||
```python
|
||||
class MessageMode(str, Enum):
|
||||
SYNC = "sync" # Await response (< 30s)
|
||||
ASYNC = "async" # Queue, callback later
|
||||
FIRE_AND_FORGET = "fire" # No response expected
|
||||
STREAM = "stream" # Continuous updates
|
||||
```
|
||||
|
||||
### Message Bus Implementation
|
||||
|
||||
```python
|
||||
class AgentMessageBus:
|
||||
"""Redis Streams-based message bus for agent communication."""
|
||||
|
||||
async def send(self, message: AgentMessage) -> None:
|
||||
# Persist to PostgreSQL for audit
|
||||
await self.store.save(message)
|
||||
|
||||
# Publish to Redis for real-time delivery
|
||||
channel = self._get_channel(message)
|
||||
await self.redis.xadd(channel, message.to_dict())
|
||||
|
||||
# Publish SSE event for UI visibility
|
||||
await self.event_bus.publish(
|
||||
f"project:{message.project_id}",
|
||||
{"type": "agent_message", "preview": message.content[:100]}
|
||||
)
|
||||
|
||||
async def subscribe(self, agent_id: str) -> AsyncIterator[AgentMessage]:
|
||||
"""Subscribe to messages for an agent."""
|
||||
channels = [
|
||||
f"agent:{agent_id}", # Direct messages
|
||||
f"role:{agent.role}", # Role-based
|
||||
f"project:{agent.project_id}", # Broadcasts
|
||||
]
|
||||
# ... Redis Streams consumer group logic
|
||||
```
|
||||
|
||||
### Context Hierarchy
|
||||
|
||||
1. **Conversation Context** (short-term): Current thread, last N exchanges
|
||||
2. **Session Context** (medium-term): Sprint goals, recent decisions
|
||||
3. **Project Context** (long-term): Architecture, requirements, knowledge base
|
||||
|
||||
### Conflict Resolution
|
||||
|
||||
When agents disagree:
|
||||
1. **Peer Resolution:** Agents attempt consensus (2 attempts)
|
||||
2. **Supervisor Escalation:** Product Owner or Architect decides
|
||||
3. **Human Override:** Client approval if configured
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Full audit trail of all agent communication
|
||||
- Flexible routing supports various collaboration patterns
|
||||
- Natural language preserves LLM reasoning quality
|
||||
- Real-time UI visibility into agent collaboration
|
||||
|
||||
### Negative
|
||||
- Additional complexity vs simple function calls
|
||||
- Message persistence storage requirements
|
||||
|
||||
### Mitigation
|
||||
- Archival policy for old messages
|
||||
- Compression for large attachments
|
||||
|
||||
## Compliance
|
||||
|
||||
This decision aligns with:
|
||||
- FR-104: Inter-agent communication
|
||||
- FR-105: Agent activity monitoring
|
||||
- NFR-602: Comprehensive audit logging
|
||||
|
||||
---
|
||||
|
||||
*This ADR establishes the agent communication protocol for Syndarix.*
|
||||
Reference in New Issue
Block a user