docs: add remaining ADRs and comprehensive architecture documentation

Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:54:43 +01:00
parent bd702734c2
commit 406b25cda0
9 changed files with 1899 additions and 5 deletions
--- a/docs/adrs/ADR-009-agent-communication-protocol.md
+++ b/docs/adrs/ADR-009-agent-communication-protocol.md
@@ -0,0 +1,166 @@
+# ADR-009: Agent Communication Protocol
+
+**Status:** Accepted
+**Date:** 2025-12-29
+**Deciders:** Architecture Team
+**Related Spikes:** SPIKE-007
+
+---
+
+## Context
+
+Syndarix requires a robust protocol for inter-agent communication. 10+ specialized AI agents must collaborate on software projects, sharing context, delegating tasks, and resolving conflicts.
+
+## Decision Drivers
+
+- **Auditability:** All communication must be traceable
+- **Flexibility:** Support various communication patterns
+- **Performance:** Low-latency for interactive collaboration
+- **Reliability:** Messages must not be lost
+
+## Considered Options
+
+### Option 1: Pure Natural Language
+Agents communicate via free-form text messages.
+
+**Pros:** Simple, flexible
+**Cons:** Difficult to route, parse, and audit
+
+### Option 2: Rigid RPC Protocol
+Strongly-typed function calls between agents.
+
+**Pros:** Predictable, type-safe
+**Cons:** Loses LLM reasoning flexibility
+
+### Option 3: Structured Envelope + Natural Language Payload (Selected)
+JSON envelope for routing/auditing with natural language content.
+
+**Pros:** Best of both worlds - routeable and auditable while preserving LLM capabilities
+**Cons:** Slightly more complex
+
+## Decision
+
+**Adopt structured message envelopes with natural language payloads**, inspired by Google's A2A protocol concepts.
+
+## Implementation
+
+### Message Schema
+
+```python
+@dataclass
+class AgentMessage:
+    id: UUID                           # Unique message ID
+    type: Literal["request", "response", "broadcast", "notification"]
+
+    # Routing
+    from_agent: AgentIdentity          # Source agent
+    to_agent: AgentIdentity | None     # Target (None = broadcast)
+    routing: Literal["direct", "role", "broadcast", "topic"]
+
+    # Action
+    action: str                        # e.g., "request_guidance", "task_handoff"
+    priority: Literal["low", "normal", "high", "urgent"]
+
+    # Context
+    project_id: str
+    conversation_id: str | None        # For threading
+    correlation_id: UUID | None        # For request/response matching
+
+    # Content
+    content: str                       # Natural language message
+    attachments: list[Attachment]      # Code snippets, files, etc.
+
+    # Metadata
+    created_at: datetime
+    expires_at: datetime | None
+    requires_response: bool
+```
+
+### Routing Strategies
+
+| Strategy | Syntax | Use Case |
+|----------|--------|----------|
+| Direct | `to: "agent-123"` | Specific agent |
+| Role-based | `to: "@engineers"` | All agents of role |
+| Broadcast | `to: "@all"` | Project-wide |
+| Topic-based | `to: "#auth-module"` | Subscribed agents |
+
+### Communication Modes
+
+```python
+class MessageMode(str, Enum):
+    SYNC = "sync"              # Await response (< 30s)
+    ASYNC = "async"            # Queue, callback later
+    FIRE_AND_FORGET = "fire"   # No response expected
+    STREAM = "stream"          # Continuous updates
+```
+
+### Message Bus Implementation
+
+```python
+class AgentMessageBus:
+    """Redis Streams-based message bus for agent communication."""
+
+    async def send(self, message: AgentMessage) -> None:
+        # Persist to PostgreSQL for audit
+        await self.store.save(message)
+
+        # Publish to Redis for real-time delivery
+        channel = self._get_channel(message)
+        await self.redis.xadd(channel, message.to_dict())
+
+        # Publish SSE event for UI visibility
+        await self.event_bus.publish(
+            f"project:{message.project_id}",
+            {"type": "agent_message", "preview": message.content[:100]}
+        )
+
+    async def subscribe(self, agent_id: str) -> AsyncIterator[AgentMessage]:
+        """Subscribe to messages for an agent."""
+        channels = [
+            f"agent:{agent_id}",           # Direct messages
+            f"role:{agent.role}",          # Role-based
+            f"project:{agent.project_id}", # Broadcasts
+        ]
+        # ... Redis Streams consumer group logic
+```
+
+### Context Hierarchy
+
+1. **Conversation Context** (short-term): Current thread, last N exchanges
+2. **Session Context** (medium-term): Sprint goals, recent decisions
+3. **Project Context** (long-term): Architecture, requirements, knowledge base
+
+### Conflict Resolution
+
+When agents disagree:
+1. **Peer Resolution:** Agents attempt consensus (2 attempts)
+2. **Supervisor Escalation:** Product Owner or Architect decides
+3. **Human Override:** Client approval if configured
+
+## Consequences
+
+### Positive
+- Full audit trail of all agent communication
+- Flexible routing supports various collaboration patterns
+- Natural language preserves LLM reasoning quality
+- Real-time UI visibility into agent collaboration
+
+### Negative
+- Additional complexity vs simple function calls
+- Message persistence storage requirements
+
+### Mitigation
+- Archival policy for old messages
+- Compression for large attachments
+
+## Compliance
+
+This decision aligns with:
+- FR-104: Inter-agent communication
+- FR-105: Agent activity monitoring
+- NFR-602: Comprehensive audit logging
+
+---
+
+*This ADR establishes the agent communication protocol for Syndarix.*