Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.9 KiB
ADR-009: Agent Communication Protocol
Status: Accepted Date: 2025-12-29 Deciders: Architecture Team Related Spikes: SPIKE-007
Context
Syndarix requires a robust protocol for inter-agent communication. 10+ specialized AI agents must collaborate on software projects, sharing context, delegating tasks, and resolving conflicts.
Decision Drivers
- Auditability: All communication must be traceable
- Flexibility: Support various communication patterns
- Performance: Low-latency for interactive collaboration
- Reliability: Messages must not be lost
Considered Options
Option 1: Pure Natural Language
Agents communicate via free-form text messages.
Pros: Simple, flexible Cons: Difficult to route, parse, and audit
Option 2: Rigid RPC Protocol
Strongly-typed function calls between agents.
Pros: Predictable, type-safe Cons: Loses LLM reasoning flexibility
Option 3: Structured Envelope + Natural Language Payload (Selected)
JSON envelope for routing/auditing with natural language content.
Pros: Best of both worlds - routeable and auditable while preserving LLM capabilities Cons: Slightly more complex
Decision
Adopt structured message envelopes with natural language payloads, inspired by Google's A2A protocol concepts.
Implementation
Message Schema
@dataclass
class AgentMessage:
id: UUID # Unique message ID
type: Literal["request", "response", "broadcast", "notification"]
# Routing
from_agent: AgentIdentity # Source agent
to_agent: AgentIdentity | None # Target (None = broadcast)
routing: Literal["direct", "role", "broadcast", "topic"]
# Action
action: str # e.g., "request_guidance", "task_handoff"
priority: Literal["low", "normal", "high", "urgent"]
# Context
project_id: str
conversation_id: str | None # For threading
correlation_id: UUID | None # For request/response matching
# Content
content: str # Natural language message
attachments: list[Attachment] # Code snippets, files, etc.
# Metadata
created_at: datetime
expires_at: datetime | None
requires_response: bool
Routing Strategies
| Strategy | Syntax | Use Case |
|---|---|---|
| Direct | to: "agent-123" |
Specific agent |
| Role-based | to: "@engineers" |
All agents of role |
| Broadcast | to: "@all" |
Project-wide |
| Topic-based | to: "#auth-module" |
Subscribed agents |
Communication Modes
class MessageMode(str, Enum):
SYNC = "sync" # Await response (< 30s)
ASYNC = "async" # Queue, callback later
FIRE_AND_FORGET = "fire" # No response expected
STREAM = "stream" # Continuous updates
Message Bus Implementation
class AgentMessageBus:
"""Redis Streams-based message bus for agent communication."""
async def send(self, message: AgentMessage) -> None:
# Persist to PostgreSQL for audit
await self.store.save(message)
# Publish to Redis for real-time delivery
channel = self._get_channel(message)
await self.redis.xadd(channel, message.to_dict())
# Publish SSE event for UI visibility
await self.event_bus.publish(
f"project:{message.project_id}",
{"type": "agent_message", "preview": message.content[:100]}
)
async def subscribe(self, agent_id: str) -> AsyncIterator[AgentMessage]:
"""Subscribe to messages for an agent."""
channels = [
f"agent:{agent_id}", # Direct messages
f"role:{agent.role}", # Role-based
f"project:{agent.project_id}", # Broadcasts
]
# ... Redis Streams consumer group logic
Context Hierarchy
- Conversation Context (short-term): Current thread, last N exchanges
- Session Context (medium-term): Sprint goals, recent decisions
- Project Context (long-term): Architecture, requirements, knowledge base
Conflict Resolution
When agents disagree:
- Peer Resolution: Agents attempt consensus (2 attempts)
- Supervisor Escalation: Product Owner or Architect decides
- Human Override: Client approval if configured
Consequences
Positive
- Full audit trail of all agent communication
- Flexible routing supports various collaboration patterns
- Natural language preserves LLM reasoning quality
- Real-time UI visibility into agent collaboration
Negative
- Additional complexity vs simple function calls
- Message persistence storage requirements
Mitigation
- Archival policy for old messages
- Compression for large attachments
Compliance
This decision aligns with:
- FR-104: Inter-agent communication
- FR-105: Agent activity monitoring
- NFR-602: Comprehensive audit logging
This ADR establishes the agent communication protocol for Syndarix.