[SPIKE-007] Agent-to-Agent Communication Protocol #7

Closed
opened 2025-12-29 03:50:15 +00:00 by cardosofelipe · 1 comment

Objective

Define how agents communicate with each other during collaborative work.

Scenarios

  1. Architect asks Developer for implementation estimate
  2. Developer asks QA to review code
  3. PO asks multiple agents for sprint planning input
  4. Agents need to share context/artifacts

Key Questions

  1. Synchronous vs asynchronous communication?
  2. How do we structure messages between agents?
  3. How do we handle "waiting for response" states?
  4. How do we track conversation threads between agents?
  5. How do we prevent circular dependencies?

Research Areas

  • Message queue patterns (direct, pub/sub, request-reply)
  • Protocol design for agent messages
  • Conversation/thread tracking
  • Timeout and retry handling

Expected Deliverables

  • Message protocol specification
  • Implementation using Celery or dedicated messaging
  • Example multi-agent workflow
  • ADR documenting the protocol

Acceptance Criteria

  • Agents can send messages to each other
  • Responses are correctly routed
  • Conversations are trackable
  • Timeouts handled gracefully
  • Works with parallel execution

Labels

spike, architecture, agents

## Objective Define how agents communicate with each other during collaborative work. ## Scenarios 1. Architect asks Developer for implementation estimate 2. Developer asks QA to review code 3. PO asks multiple agents for sprint planning input 4. Agents need to share context/artifacts ## Key Questions 1. Synchronous vs asynchronous communication? 2. How do we structure messages between agents? 3. How do we handle "waiting for response" states? 4. How do we track conversation threads between agents? 5. How do we prevent circular dependencies? ## Research Areas - [ ] Message queue patterns (direct, pub/sub, request-reply) - [ ] Protocol design for agent messages - [ ] Conversation/thread tracking - [ ] Timeout and retry handling ## Expected Deliverables - Message protocol specification - Implementation using Celery or dedicated messaging - Example multi-agent workflow - ADR documenting the protocol ## Acceptance Criteria - [ ] Agents can send messages to each other - [ ] Responses are correctly routed - [ ] Conversations are trackable - [ ] Timeouts handled gracefully - [ ] Works with parallel execution ## Labels `spike`, `architecture`, `agents`
Author
Owner

SPIKE-007 Research Complete

The comprehensive spike document has been created at docs/spikes/SPIKE-007-agent-communication-protocol.md.

Executive Summary

After researching industry standards (Google A2A, IBM ACP, Anthropic MCP) and multi-agent system patterns, we recommend a hybrid message-based communication protocol that integrates with our existing infrastructure.

Key Decisions

  1. Message Format: Structured JSON envelope with natural language content payload

  2. Routing Strategies:

    • Direct routing (to: "agent-123")
    • Role-based routing (to: "@engineers")
    • Broadcast routing (to: "@all")
    • Topic-based routing (to: "#auth-module")
  3. Communication Modes:

    • Sync (request-response with timeout)
    • Async (task queue + callback)
    • Fire-and-forget (broadcasts)
    • Streaming (long-running updates)
  4. Infrastructure Mapping:

    • Redis Pub/Sub for real-time delivery
    • PostgreSQL for message persistence and audit
    • Celery for async task delegation
    • SSE for client notifications

Research Questions Answered

Question Answer
Structured vs natural language? Hybrid - structured envelope, natural language content
Async vs sync? Pattern-based - sync for quick clarifications, async for tasks
Message routing? Three strategies: direct, role-based, broadcast
Context management? Three-tier: conversation, session, project (RAG)
Conflict resolution? Hierarchical escalation: negotiation -> expert -> human
Audit/logging? Full message persistence with content hashing

Acceptance Criteria Status

  • Agents can send messages to each other - MessageRouter service defined
  • Responses are correctly routed - Response linking with in_response_to field
  • Conversations are trackable - Conversation model with thread support
  • Timeouts handled gracefully - send_and_wait() with configurable timeout
  • Works with parallel execution - Redis Pub/Sub + Celery integration

Deliverables Included

  1. Message Protocol Specification - Complete Pydantic schemas
  2. Database Schema - AgentMessage, Conversation, MessageDelivery models
  3. Code Examples - MessageRouter, AgentInbox, API endpoints
  4. SSE Integration - Extended event types for message notifications
  5. @Mentions Support - MentionParser with role and agent resolution
  6. Priority Handling - Urgent message interrupts with PriorityMessageHandler

References

Next Steps

  1. Create ADR-007 documenting the protocol decision
  2. Implement AgentMessage model and migrations
  3. Implement MessageRouter service
  4. Add message-related API endpoints
  5. Extend SSE events for message notifications
  6. Integration testing with multi-agent workflows
## SPIKE-007 Research Complete The comprehensive spike document has been created at `docs/spikes/SPIKE-007-agent-communication-protocol.md`. ### Executive Summary After researching industry standards (Google A2A, IBM ACP, Anthropic MCP) and multi-agent system patterns, we recommend a **hybrid message-based communication protocol** that integrates with our existing infrastructure. ### Key Decisions 1. **Message Format**: Structured JSON envelope with natural language content payload 2. **Routing Strategies**: - Direct routing (`to: "agent-123"`) - Role-based routing (`to: "@engineers"`) - Broadcast routing (`to: "@all"`) - Topic-based routing (`to: "#auth-module"`) 3. **Communication Modes**: - Sync (request-response with timeout) - Async (task queue + callback) - Fire-and-forget (broadcasts) - Streaming (long-running updates) 4. **Infrastructure Mapping**: - Redis Pub/Sub for real-time delivery - PostgreSQL for message persistence and audit - Celery for async task delegation - SSE for client notifications ### Research Questions Answered | Question | Answer | |----------|--------| | Structured vs natural language? | Hybrid - structured envelope, natural language content | | Async vs sync? | Pattern-based - sync for quick clarifications, async for tasks | | Message routing? | Three strategies: direct, role-based, broadcast | | Context management? | Three-tier: conversation, session, project (RAG) | | Conflict resolution? | Hierarchical escalation: negotiation -> expert -> human | | Audit/logging? | Full message persistence with content hashing | ### Acceptance Criteria Status - [x] Agents can send messages to each other - MessageRouter service defined - [x] Responses are correctly routed - Response linking with `in_response_to` field - [x] Conversations are trackable - Conversation model with thread support - [x] Timeouts handled gracefully - `send_and_wait()` with configurable timeout - [x] Works with parallel execution - Redis Pub/Sub + Celery integration ### Deliverables Included 1. **Message Protocol Specification** - Complete Pydantic schemas 2. **Database Schema** - AgentMessage, Conversation, MessageDelivery models 3. **Code Examples** - MessageRouter, AgentInbox, API endpoints 4. **SSE Integration** - Extended event types for message notifications 5. **@Mentions Support** - MentionParser with role and agent resolution 6. **Priority Handling** - Urgent message interrupts with PriorityMessageHandler ### References - [Google A2A Protocol](https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/) - [A2A Linux Foundation Project](https://www.linuxfoundation.org/press/linux-foundation-launches-the-agent2agent-protocol-project-to-enable-secure-intelligent-communication-between-ai-agents) - [IBM ACP & Protocol Survey](https://arxiv.org/html/2505.02279v1) - [Multi-Agent Collaboration Mechanisms](https://arxiv.org/html/2501.06322v1) ### Next Steps 1. Create ADR-007 documenting the protocol decision 2. Implement AgentMessage model and migrations 3. Implement MessageRouter service 4. Add message-related API endpoints 5. Extend SSE events for message notifications 6. Integration testing with multi-agent workflows
Sign in to join this conversation.