syndarix/docs/adrs/ADR-009-agent-communication-protocol.md

# ADR-009: Agent Communication Protocol

**Status:** Accepted
**Date:** 2025-12-29
**Deciders:** Architecture Team
**Related Spikes:** SPIKE-007

---

## Context

Syndarix requires a robust protocol for inter-agent communication. 10+ specialized AI agents must collaborate on software projects, sharing context, delegating tasks, and resolving conflicts.

## Decision Drivers

- **Auditability:** All communication must be traceable
- **Flexibility:** Support various communication patterns
- **Performance:** Low-latency for interactive collaboration
- **Reliability:** Messages must not be lost

## Considered Options

### Option 1: Pure Natural Language
Agents communicate via free-form text messages.

**Pros:** Simple, flexible
**Cons:** Difficult to route, parse, and audit

### Option 2: Rigid RPC Protocol
Strongly-typed function calls between agents.

**Pros:** Predictable, type-safe
**Cons:** Loses LLM reasoning flexibility

### Option 3: Structured Envelope + Natural Language Payload (Selected)
JSON envelope for routing/auditing with natural language content.

**Pros:** Best of both worlds - routeable and auditable while preserving LLM capabilities
**Cons:** Slightly more complex

## Decision

**Adopt structured message envelopes with natural language payloads**, inspired by Google's A2A protocol concepts.

## Implementation

### Message Schema

```python
@dataclass
class AgentMessage:
    id: UUID                           # Unique message ID
    type: Literal["request", "response", "broadcast", "notification"]

    # Routing
    from_agent: AgentIdentity          # Source agent
    to_agent: AgentIdentity | None     # Target (None = broadcast)
    routing: Literal["direct", "role", "broadcast", "topic"]

    # Action
    action: str                        # e.g., "request_guidance", "task_handoff"
    priority: Literal["low", "normal", "high", "urgent"]

    # Context
    project_id: str
    conversation_id: str | None        # For threading
    correlation_id: UUID | None        # For request/response matching

    # Content
    content: str                       # Natural language message
    attachments: list[Attachment]      # Code snippets, files, etc.

    # Metadata
    created_at: datetime
    expires_at: datetime | None
    requires_response: bool
```

### Routing Strategies

| Strategy | Syntax | Use Case |
|----------|--------|----------|
| Direct | `to: "agent-123"` | Specific agent |
| Role-based | `to: "@engineers"` | All agents of role |
| Broadcast | `to: "@all"` | Project-wide |
| Topic-based | `to: "#auth-module"` | Subscribed agents |

### Communication Modes

```python
class MessageMode(str, Enum):
    SYNC = "sync"              # Await response (< 30s)
    ASYNC = "async"            # Queue, callback later
    FIRE_AND_FORGET = "fire"   # No response expected
    STREAM = "stream"          # Continuous updates
```

### Message Bus Implementation

```python
class AgentMessageBus:
    """Redis Streams-based message bus for agent communication."""

    async def send(self, message: AgentMessage) -> None:
        # Persist to PostgreSQL for audit
        await self.store.save(message)

        # Publish to Redis for real-time delivery
        channel = self._get_channel(message)
        await self.redis.xadd(channel, message.to_dict())

        # Publish SSE event for UI visibility
        await self.event_bus.publish(
            f"project:{message.project_id}",
            {"type": "agent_message", "preview": message.content[:100]}
        )

    async def subscribe(self, agent_id: str) -> AsyncIterator[AgentMessage]:
        """Subscribe to messages for an agent."""
        channels = [
            f"agent:{agent_id}",           # Direct messages
            f"role:{agent.role}",          # Role-based
            f"project:{agent.project_id}", # Broadcasts
        ]
        # ... Redis Streams consumer group logic
```

### Context Hierarchy

1. **Conversation Context** (short-term): Current thread, last N exchanges
2. **Session Context** (medium-term): Sprint goals, recent decisions
3. **Project Context** (long-term): Architecture, requirements, knowledge base

### Conflict Resolution

When agents disagree:
1. **Peer Resolution:** Agents attempt consensus (2 attempts)
2. **Supervisor Escalation:** Product Owner or Architect decides
3. **Human Override:** Client approval if configured

## Consequences

### Positive
- Full audit trail of all agent communication
- Flexible routing supports various collaboration patterns
- Natural language preserves LLM reasoning quality
- Real-time UI visibility into agent collaboration

### Negative
- Additional complexity vs simple function calls
- Message persistence storage requirements

### Mitigation
- Archival policy for old messages
- Compression for large attachments

## Compliance

This decision aligns with:
- FR-104: Inter-agent communication
- FR-105: Agent activity monitoring
- NFR-602: Comprehensive audit logging

---

*This ADR establishes the agent communication protocol for Syndarix.*