syndarix/docs/adrs/ADR-013-audit-logging.md

# ADR-013: Audit Logging Architecture

**Status:** Accepted
**Date:** 2025-12-29
**Deciders:** Architecture Team
**Related Spikes:** SPIKE-011

---

## Context

As an autonomous AI-powered system, Syndarix requires comprehensive audit logging for:
- Compliance (SOC2, GDPR)
- Debugging agent behavior
- Client trust and transparency
- Security investigation

Every action taken by agents must be traceable and tamper-evident.

## Decision Drivers

- **Completeness:** Log all significant events
- **Immutability:** Tamper-evident audit trail
- **Queryability:** Fast search and filtering
- **Scalability:** Handle high event volumes
- **Retention:** Configurable retention policies

## Decision

**Implement structured audit logging** using:
- **Structlog** for JSON event formatting
- **PostgreSQL** for hot storage (0-90 days)
- **S3-compatible storage** for cold archival
- **Cryptographic hash chaining** for immutability

## Implementation

### Event Categories

| Category | Event Types |
|----------|-------------|
| **Agent** | spawned, action_started, action_completed, decision, terminated |
| **LLM** | request, response, error, tool_call |
| **MCP** | tool_invoked, tool_result, tool_error |
| **Approval** | requested, granted, rejected, timeout |
| **Git** | commit, branch_created, pr_created, pr_merged |
| **Project** | created, sprint_started, milestone_completed |

### Event Schema

```python
class AuditEvent(BaseModel):
    # Identity
    event_id: str                      # UUID v7 (time-ordered)
    trace_id: str | None               # OpenTelemetry correlation
    parent_event_id: str | None        # Event chain

    # Timestamp
    timestamp: datetime
    timestamp_unix_ms: int

    # Classification
    event_type: str                    # e.g., "agent.action.completed"
    event_category: str                # e.g., "agent"
    severity: Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]

    # Context
    project_id: str | None
    agent_id: str | None
    user_id: str | None

    # Content
    action: str                        # Human-readable description
    data: dict                         # Event-specific payload
    before_state: dict | None
    after_state: dict | None

    # Immutability
    previous_hash: str | None          # Hash of previous event
    event_hash: str                    # SHA-256 of this event
```

### Hash Chain Implementation

```python
class AuditLogger:
    def __init__(self):
        self._last_hash: str | None = None

    async def log(self, event: AuditEvent) -> None:
        # Set hash chain
        event.previous_hash = self._last_hash
        event.event_hash = self._compute_hash(event)
        self._last_hash = event.event_hash

        # Persist
        await self._store(event)

    def _compute_hash(self, event: AuditEvent) -> str:
        payload = json.dumps({
            "event_id": event.event_id,
            "timestamp_unix_ms": event.timestamp_unix_ms,
            "event_type": event.event_type,
            "data": event.data,
            "previous_hash": event.previous_hash
        }, sort_keys=True)
        return hashlib.sha256(payload.encode()).hexdigest()

    async def verify_chain(self, events: list[AuditEvent]) -> bool:
        """Verify audit trail integrity."""
        for i, event in enumerate(events):
            expected_hash = self._compute_hash(event)
            if expected_hash != event.event_hash:
                return False
            if i > 0 and event.previous_hash != events[i-1].event_hash:
                return False
        return True
```

### Database Schema

```sql
CREATE TABLE audit_events (
    event_id VARCHAR(36) PRIMARY KEY,
    trace_id VARCHAR(36),
    parent_event_id VARCHAR(36),

    timestamp TIMESTAMPTZ NOT NULL,
    timestamp_unix_ms BIGINT NOT NULL,

    event_type VARCHAR(100) NOT NULL,
    event_category VARCHAR(50) NOT NULL,
    severity VARCHAR(20) NOT NULL,

    project_id UUID,
    agent_id UUID,
    user_id UUID,

    action TEXT NOT NULL,
    data JSONB NOT NULL,
    before_state JSONB,
    after_state JSONB,

    previous_hash VARCHAR(64),
    event_hash VARCHAR(64) NOT NULL
);

-- Indexes for common queries
CREATE INDEX idx_audit_timestamp ON audit_events (timestamp DESC);
CREATE INDEX idx_audit_project ON audit_events (project_id, timestamp DESC);
CREATE INDEX idx_audit_agent ON audit_events (agent_id, timestamp DESC);
CREATE INDEX idx_audit_type ON audit_events (event_type, timestamp DESC);
```

### Storage Tiers

| Tier | Storage | Retention | Query Speed |
|------|---------|-----------|-------------|
| Hot | PostgreSQL | 0-90 days | Fast |
| Cold | S3/MinIO | 90+ days | Slow |

### Archival Process

```python
@celery_app.task
def archive_old_events():
    """Move events older than 90 days to cold storage."""
    cutoff = datetime.utcnow() - timedelta(days=90)

    # Export to S3 in daily batches
    events = db.query("""
        SELECT * FROM audit_events
        WHERE timestamp < $1
        ORDER BY timestamp
    """, cutoff)

    for date, batch in group_by_date(events):
        s3.put_object(
            Bucket="syndarix-audit",
            Key=f"audit/{date.isoformat()}.jsonl.gz",
            Body=gzip.compress(batch.to_jsonl())
        )

    # Delete from PostgreSQL
    db.execute("DELETE FROM audit_events WHERE timestamp < $1", cutoff)
```

### Audit Viewer API

```python
@router.get("/projects/{project_id}/audit")
async def get_audit_trail(
    project_id: str,
    event_type: str | None = None,
    agent_id: str | None = None,
    start_time: datetime | None = None,
    end_time: datetime | None = None,
    limit: int = 100
) -> list[AuditEvent]:
    """Query audit trail with filters."""
    ...
```

## Consequences

### Positive
- Complete audit trail of all agent actions
- Tamper-evident through hash chaining
- Fast queries for recent events
- Cost-effective long-term storage

### Negative
- Storage requirements grow with activity
- Hash chain verification adds complexity

### Mitigation
- Tiered storage with archival
- Batch verification for chain integrity

## Compliance

This decision aligns with:
- NFR-602: Comprehensive audit logging
- Compliance: SOC2, GDPR requirements

---

*This ADR establishes the audit logging architecture for Syndarix.*