# ADR-013: Audit Logging Architecture **Status:** Accepted **Date:** 2025-12-29 **Deciders:** Architecture Team **Related Spikes:** SPIKE-011 --- ## Context As an autonomous AI-powered system, Syndarix requires comprehensive audit logging for: - Compliance (SOC2, GDPR) - Debugging agent behavior - Client trust and transparency - Security investigation Every action taken by agents must be traceable and tamper-evident. ## Decision Drivers - **Completeness:** Log all significant events - **Immutability:** Tamper-evident audit trail - **Queryability:** Fast search and filtering - **Scalability:** Handle high event volumes - **Retention:** Configurable retention policies ## Decision **Implement structured audit logging** using: - **Structlog** for JSON event formatting - **PostgreSQL** for hot storage (0-90 days) - **S3-compatible storage** for cold archival - **Cryptographic hash chaining** for immutability ## Implementation ### Event Categories | Category | Event Types | |----------|-------------| | **Agent** | spawned, action_started, action_completed, decision, terminated | | **LLM** | request, response, error, tool_call | | **MCP** | tool_invoked, tool_result, tool_error | | **Approval** | requested, granted, rejected, timeout | | **Git** | commit, branch_created, pr_created, pr_merged | | **Project** | created, sprint_started, milestone_completed | ### Event Schema ```python class AuditEvent(BaseModel): # Identity event_id: str # UUID v7 (time-ordered) trace_id: str | None # OpenTelemetry correlation parent_event_id: str | None # Event chain # Timestamp timestamp: datetime timestamp_unix_ms: int # Classification event_type: str # e.g., "agent.action.completed" event_category: str # e.g., "agent" severity: Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"] # Context project_id: str | None agent_id: str | None user_id: str | None # Content action: str # Human-readable description data: dict # Event-specific payload before_state: dict | None after_state: dict | None # Immutability previous_hash: str | None # Hash of previous event event_hash: str # SHA-256 of this event ``` ### Hash Chain Implementation ```python class AuditLogger: def __init__(self): self._last_hash: str | None = None async def log(self, event: AuditEvent) -> None: # Set hash chain event.previous_hash = self._last_hash event.event_hash = self._compute_hash(event) self._last_hash = event.event_hash # Persist await self._store(event) def _compute_hash(self, event: AuditEvent) -> str: payload = json.dumps({ "event_id": event.event_id, "timestamp_unix_ms": event.timestamp_unix_ms, "event_type": event.event_type, "data": event.data, "previous_hash": event.previous_hash }, sort_keys=True) return hashlib.sha256(payload.encode()).hexdigest() async def verify_chain(self, events: list[AuditEvent]) -> bool: """Verify audit trail integrity.""" for i, event in enumerate(events): expected_hash = self._compute_hash(event) if expected_hash != event.event_hash: return False if i > 0 and event.previous_hash != events[i-1].event_hash: return False return True ``` ### Database Schema ```sql CREATE TABLE audit_events ( event_id VARCHAR(36) PRIMARY KEY, trace_id VARCHAR(36), parent_event_id VARCHAR(36), timestamp TIMESTAMPTZ NOT NULL, timestamp_unix_ms BIGINT NOT NULL, event_type VARCHAR(100) NOT NULL, event_category VARCHAR(50) NOT NULL, severity VARCHAR(20) NOT NULL, project_id UUID, agent_id UUID, user_id UUID, action TEXT NOT NULL, data JSONB NOT NULL, before_state JSONB, after_state JSONB, previous_hash VARCHAR(64), event_hash VARCHAR(64) NOT NULL ); -- Indexes for common queries CREATE INDEX idx_audit_timestamp ON audit_events (timestamp DESC); CREATE INDEX idx_audit_project ON audit_events (project_id, timestamp DESC); CREATE INDEX idx_audit_agent ON audit_events (agent_id, timestamp DESC); CREATE INDEX idx_audit_type ON audit_events (event_type, timestamp DESC); ``` ### Storage Tiers | Tier | Storage | Retention | Query Speed | |------|---------|-----------|-------------| | Hot | PostgreSQL | 0-90 days | Fast | | Cold | S3/MinIO | 90+ days | Slow | ### Archival Process ```python @celery_app.task def archive_old_events(): """Move events older than 90 days to cold storage.""" cutoff = datetime.utcnow() - timedelta(days=90) # Export to S3 in daily batches events = db.query(""" SELECT * FROM audit_events WHERE timestamp < $1 ORDER BY timestamp """, cutoff) for date, batch in group_by_date(events): s3.put_object( Bucket="syndarix-audit", Key=f"audit/{date.isoformat()}.jsonl.gz", Body=gzip.compress(batch.to_jsonl()) ) # Delete from PostgreSQL db.execute("DELETE FROM audit_events WHERE timestamp < $1", cutoff) ``` ### Audit Viewer API ```python @router.get("/projects/{project_id}/audit") async def get_audit_trail( project_id: str, event_type: str | None = None, agent_id: str | None = None, start_time: datetime | None = None, end_time: datetime | None = None, limit: int = 100 ) -> list[AuditEvent]: """Query audit trail with filters.""" ... ``` ## Consequences ### Positive - Complete audit trail of all agent actions - Tamper-evident through hash chaining - Fast queries for recent events - Cost-effective long-term storage ### Negative - Storage requirements grow with activity - Hash chain verification adds complexity ### Mitigation - Tiered storage with archival - Batch verification for chain integrity ## Compliance This decision aligns with: - NFR-602: Comprehensive audit logging - Compliance: SOC2, GDPR requirements --- *This ADR establishes the audit logging architecture for Syndarix.*