Files
syndarix/docs/adrs/ADR-009-agent-communication-protocol.md
Felipe Cardoso 406b25cda0 docs: add remaining ADRs and comprehensive architecture documentation
Added 7 new Architecture Decision Records completing the full set:
- ADR-008: Knowledge Base and RAG (pgvector)
- ADR-009: Agent Communication Protocol (structured messages)
- ADR-010: Workflow State Machine (transitions + PostgreSQL)
- ADR-011: Issue Synchronization (webhook-first + polling)
- ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets)
- ADR-013: Audit Logging (hash chaining + tiered storage)
- ADR-014: Client Approval Flow (checkpoint-based)

Added comprehensive ARCHITECTURE.md that:
- Summarizes all 14 ADRs in decision matrix
- Documents full system architecture with diagrams
- Explains all component interactions
- Details technology stack with self-hostability guarantee
- Covers security, scalability, and deployment

Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:54:43 +01:00

4.9 KiB

ADR-009: Agent Communication Protocol

Status: Accepted Date: 2025-12-29 Deciders: Architecture Team Related Spikes: SPIKE-007


Context

Syndarix requires a robust protocol for inter-agent communication. 10+ specialized AI agents must collaborate on software projects, sharing context, delegating tasks, and resolving conflicts.

Decision Drivers

  • Auditability: All communication must be traceable
  • Flexibility: Support various communication patterns
  • Performance: Low-latency for interactive collaboration
  • Reliability: Messages must not be lost

Considered Options

Option 1: Pure Natural Language

Agents communicate via free-form text messages.

Pros: Simple, flexible Cons: Difficult to route, parse, and audit

Option 2: Rigid RPC Protocol

Strongly-typed function calls between agents.

Pros: Predictable, type-safe Cons: Loses LLM reasoning flexibility

Option 3: Structured Envelope + Natural Language Payload (Selected)

JSON envelope for routing/auditing with natural language content.

Pros: Best of both worlds - routeable and auditable while preserving LLM capabilities Cons: Slightly more complex

Decision

Adopt structured message envelopes with natural language payloads, inspired by Google's A2A protocol concepts.

Implementation

Message Schema

@dataclass
class AgentMessage:
    id: UUID                           # Unique message ID
    type: Literal["request", "response", "broadcast", "notification"]

    # Routing
    from_agent: AgentIdentity          # Source agent
    to_agent: AgentIdentity | None     # Target (None = broadcast)
    routing: Literal["direct", "role", "broadcast", "topic"]

    # Action
    action: str                        # e.g., "request_guidance", "task_handoff"
    priority: Literal["low", "normal", "high", "urgent"]

    # Context
    project_id: str
    conversation_id: str | None        # For threading
    correlation_id: UUID | None        # For request/response matching

    # Content
    content: str                       # Natural language message
    attachments: list[Attachment]      # Code snippets, files, etc.

    # Metadata
    created_at: datetime
    expires_at: datetime | None
    requires_response: bool

Routing Strategies

Strategy Syntax Use Case
Direct to: "agent-123" Specific agent
Role-based to: "@engineers" All agents of role
Broadcast to: "@all" Project-wide
Topic-based to: "#auth-module" Subscribed agents

Communication Modes

class MessageMode(str, Enum):
    SYNC = "sync"              # Await response (< 30s)
    ASYNC = "async"            # Queue, callback later
    FIRE_AND_FORGET = "fire"   # No response expected
    STREAM = "stream"          # Continuous updates

Message Bus Implementation

class AgentMessageBus:
    """Redis Streams-based message bus for agent communication."""

    async def send(self, message: AgentMessage) -> None:
        # Persist to PostgreSQL for audit
        await self.store.save(message)

        # Publish to Redis for real-time delivery
        channel = self._get_channel(message)
        await self.redis.xadd(channel, message.to_dict())

        # Publish SSE event for UI visibility
        await self.event_bus.publish(
            f"project:{message.project_id}",
            {"type": "agent_message", "preview": message.content[:100]}
        )

    async def subscribe(self, agent_id: str) -> AsyncIterator[AgentMessage]:
        """Subscribe to messages for an agent."""
        channels = [
            f"agent:{agent_id}",           # Direct messages
            f"role:{agent.role}",          # Role-based
            f"project:{agent.project_id}", # Broadcasts
        ]
        # ... Redis Streams consumer group logic

Context Hierarchy

  1. Conversation Context (short-term): Current thread, last N exchanges
  2. Session Context (medium-term): Sprint goals, recent decisions
  3. Project Context (long-term): Architecture, requirements, knowledge base

Conflict Resolution

When agents disagree:

  1. Peer Resolution: Agents attempt consensus (2 attempts)
  2. Supervisor Escalation: Product Owner or Architect decides
  3. Human Override: Client approval if configured

Consequences

Positive

  • Full audit trail of all agent communication
  • Flexible routing supports various collaboration patterns
  • Natural language preserves LLM reasoning quality
  • Real-time UI visibility into agent collaboration

Negative

  • Additional complexity vs simple function calls
  • Message persistence storage requirements

Mitigation

  • Archival policy for old messages
  • Compression for large attachments

Compliance

This decision aligns with:

  • FR-104: Inter-agent communication
  • FR-105: Agent activity monitoring
  • NFR-602: Comprehensive audit logging

This ADR establishes the agent communication protocol for Syndarix.