forked from cardosofelipe/fast-next-template

Files

Felipe Cardoso 406b25cda0 docs: add remaining ADRs and comprehensive architecture documentation

Added 7 new Architecture Decision Records completing the full set:
- ADR-008: Knowledge Base and RAG (pgvector)
- ADR-009: Agent Communication Protocol (structured messages)
- ADR-010: Workflow State Machine (transitions + PostgreSQL)
- ADR-011: Issue Synchronization (webhook-first + polling)
- ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets)
- ADR-013: Audit Logging (hash chaining + tiered storage)
- ADR-014: Client Approval Flow (checkpoint-based)

Added comprehensive ARCHITECTURE.md that:
- Summarizes all 14 ADRs in decision matrix
- Documents full system architecture with diagrams
- Explains all component interactions
- Details technology stack with self-hostability guarantee
- Covers security, scalability, and deployment

Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-29 13:54:43 +01:00

6.1 KiB

Raw Permalink Blame History

ADR-013: Audit Logging Architecture

Status: Accepted Date: 2025-12-29 Deciders: Architecture Team Related Spikes: SPIKE-011

Context

As an autonomous AI-powered system, Syndarix requires comprehensive audit logging for:

Compliance (SOC2, GDPR)
Debugging agent behavior
Client trust and transparency
Security investigation

Every action taken by agents must be traceable and tamper-evident.

Decision Drivers

Completeness: Log all significant events
Immutability: Tamper-evident audit trail
Queryability: Fast search and filtering
Scalability: Handle high event volumes
Retention: Configurable retention policies

Decision

Implement structured audit logging using:

Structlog for JSON event formatting
PostgreSQL for hot storage (0-90 days)
S3-compatible storage for cold archival
Cryptographic hash chaining for immutability

Implementation

Event Categories

Category	Event Types
Agent	spawned, action_started, action_completed, decision, terminated
LLM	request, response, error, tool_call
MCP	tool_invoked, tool_result, tool_error
Approval	requested, granted, rejected, timeout
Git	commit, branch_created, pr_created, pr_merged
Project	created, sprint_started, milestone_completed

Event Schema

class AuditEvent(BaseModel):
    # Identity
    event_id: str                      # UUID v7 (time-ordered)
    trace_id: str | None               # OpenTelemetry correlation
    parent_event_id: str | None        # Event chain

    # Timestamp
    timestamp: datetime
    timestamp_unix_ms: int

    # Classification
    event_type: str                    # e.g., "agent.action.completed"
    event_category: str                # e.g., "agent"
    severity: Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]

    # Context
    project_id: str | None
    agent_id: str | None
    user_id: str | None

    # Content
    action: str                        # Human-readable description
    data: dict                         # Event-specific payload
    before_state: dict | None
    after_state: dict | None

    # Immutability
    previous_hash: str | None          # Hash of previous event
    event_hash: str                    # SHA-256 of this event

Hash Chain Implementation

class AuditLogger:
    def __init__(self):
        self._last_hash: str | None = None

    async def log(self, event: AuditEvent) -> None:
        # Set hash chain
        event.previous_hash = self._last_hash
        event.event_hash = self._compute_hash(event)
        self._last_hash = event.event_hash

        # Persist
        await self._store(event)

    def _compute_hash(self, event: AuditEvent) -> str:
        payload = json.dumps({
            "event_id": event.event_id,
            "timestamp_unix_ms": event.timestamp_unix_ms,
            "event_type": event.event_type,
            "data": event.data,
            "previous_hash": event.previous_hash
        }, sort_keys=True)
        return hashlib.sha256(payload.encode()).hexdigest()

    async def verify_chain(self, events: list[AuditEvent]) -> bool:
        """Verify audit trail integrity."""
        for i, event in enumerate(events):
            expected_hash = self._compute_hash(event)
            if expected_hash != event.event_hash:
                return False
            if i > 0 and event.previous_hash != events[i-1].event_hash:
                return False
        return True

Database Schema

CREATE TABLE audit_events (
    event_id VARCHAR(36) PRIMARY KEY,
    trace_id VARCHAR(36),
    parent_event_id VARCHAR(36),

    timestamp TIMESTAMPTZ NOT NULL,
    timestamp_unix_ms BIGINT NOT NULL,

    event_type VARCHAR(100) NOT NULL,
    event_category VARCHAR(50) NOT NULL,
    severity VARCHAR(20) NOT NULL,

    project_id UUID,
    agent_id UUID,
    user_id UUID,

    action TEXT NOT NULL,
    data JSONB NOT NULL,
    before_state JSONB,
    after_state JSONB,

    previous_hash VARCHAR(64),
    event_hash VARCHAR(64) NOT NULL
);

-- Indexes for common queries
CREATE INDEX idx_audit_timestamp ON audit_events (timestamp DESC);
CREATE INDEX idx_audit_project ON audit_events (project_id, timestamp DESC);
CREATE INDEX idx_audit_agent ON audit_events (agent_id, timestamp DESC);
CREATE INDEX idx_audit_type ON audit_events (event_type, timestamp DESC);

Storage Tiers

Tier	Storage	Retention	Query Speed
Hot	PostgreSQL	0-90 days	Fast
Cold	S3/MinIO	90+ days	Slow

Archival Process

@celery_app.task
def archive_old_events():
    """Move events older than 90 days to cold storage."""
    cutoff = datetime.utcnow() - timedelta(days=90)

    # Export to S3 in daily batches
    events = db.query("""
        SELECT * FROM audit_events
        WHERE timestamp < $1
        ORDER BY timestamp
    """, cutoff)

    for date, batch in group_by_date(events):
        s3.put_object(
            Bucket="syndarix-audit",
            Key=f"audit/{date.isoformat()}.jsonl.gz",
            Body=gzip.compress(batch.to_jsonl())
        )

    # Delete from PostgreSQL
    db.execute("DELETE FROM audit_events WHERE timestamp < $1", cutoff)

Audit Viewer API

@router.get("/projects/{project_id}/audit")
async def get_audit_trail(
    project_id: str,
    event_type: str | None = None,
    agent_id: str | None = None,
    start_time: datetime | None = None,
    end_time: datetime | None = None,
    limit: int = 100
) -> list[AuditEvent]:
    """Query audit trail with filters."""
    ...

Consequences

Positive

Complete audit trail of all agent actions
Tamper-evident through hash chaining
Fast queries for recent events
Cost-effective long-term storage

Negative

Storage requirements grow with activity
Hash chain verification adds complexity

Mitigation

Tiered storage with archival
Batch verification for chain integrity

Compliance

This decision aligns with:

NFR-602: Comprehensive audit logging
Compliance: SOC2, GDPR requirements

This ADR establishes the audit logging architecture for Syndarix.

6.1 KiB Raw Permalink Blame History