Files
syndarix/docs/adrs/ADR-013-audit-logging.md
Felipe Cardoso 406b25cda0 docs: add remaining ADRs and comprehensive architecture documentation
Added 7 new Architecture Decision Records completing the full set:
- ADR-008: Knowledge Base and RAG (pgvector)
- ADR-009: Agent Communication Protocol (structured messages)
- ADR-010: Workflow State Machine (transitions + PostgreSQL)
- ADR-011: Issue Synchronization (webhook-first + polling)
- ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets)
- ADR-013: Audit Logging (hash chaining + tiered storage)
- ADR-014: Client Approval Flow (checkpoint-based)

Added comprehensive ARCHITECTURE.md that:
- Summarizes all 14 ADRs in decision matrix
- Documents full system architecture with diagrams
- Explains all component interactions
- Details technology stack with self-hostability guarantee
- Covers security, scalability, and deployment

Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 13:54:43 +01:00

229 lines
6.1 KiB
Markdown

# ADR-013: Audit Logging Architecture
**Status:** Accepted
**Date:** 2025-12-29
**Deciders:** Architecture Team
**Related Spikes:** SPIKE-011
---
## Context
As an autonomous AI-powered system, Syndarix requires comprehensive audit logging for:
- Compliance (SOC2, GDPR)
- Debugging agent behavior
- Client trust and transparency
- Security investigation
Every action taken by agents must be traceable and tamper-evident.
## Decision Drivers
- **Completeness:** Log all significant events
- **Immutability:** Tamper-evident audit trail
- **Queryability:** Fast search and filtering
- **Scalability:** Handle high event volumes
- **Retention:** Configurable retention policies
## Decision
**Implement structured audit logging** using:
- **Structlog** for JSON event formatting
- **PostgreSQL** for hot storage (0-90 days)
- **S3-compatible storage** for cold archival
- **Cryptographic hash chaining** for immutability
## Implementation
### Event Categories
| Category | Event Types |
|----------|-------------|
| **Agent** | spawned, action_started, action_completed, decision, terminated |
| **LLM** | request, response, error, tool_call |
| **MCP** | tool_invoked, tool_result, tool_error |
| **Approval** | requested, granted, rejected, timeout |
| **Git** | commit, branch_created, pr_created, pr_merged |
| **Project** | created, sprint_started, milestone_completed |
### Event Schema
```python
class AuditEvent(BaseModel):
# Identity
event_id: str # UUID v7 (time-ordered)
trace_id: str | None # OpenTelemetry correlation
parent_event_id: str | None # Event chain
# Timestamp
timestamp: datetime
timestamp_unix_ms: int
# Classification
event_type: str # e.g., "agent.action.completed"
event_category: str # e.g., "agent"
severity: Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
# Context
project_id: str | None
agent_id: str | None
user_id: str | None
# Content
action: str # Human-readable description
data: dict # Event-specific payload
before_state: dict | None
after_state: dict | None
# Immutability
previous_hash: str | None # Hash of previous event
event_hash: str # SHA-256 of this event
```
### Hash Chain Implementation
```python
class AuditLogger:
def __init__(self):
self._last_hash: str | None = None
async def log(self, event: AuditEvent) -> None:
# Set hash chain
event.previous_hash = self._last_hash
event.event_hash = self._compute_hash(event)
self._last_hash = event.event_hash
# Persist
await self._store(event)
def _compute_hash(self, event: AuditEvent) -> str:
payload = json.dumps({
"event_id": event.event_id,
"timestamp_unix_ms": event.timestamp_unix_ms,
"event_type": event.event_type,
"data": event.data,
"previous_hash": event.previous_hash
}, sort_keys=True)
return hashlib.sha256(payload.encode()).hexdigest()
async def verify_chain(self, events: list[AuditEvent]) -> bool:
"""Verify audit trail integrity."""
for i, event in enumerate(events):
expected_hash = self._compute_hash(event)
if expected_hash != event.event_hash:
return False
if i > 0 and event.previous_hash != events[i-1].event_hash:
return False
return True
```
### Database Schema
```sql
CREATE TABLE audit_events (
event_id VARCHAR(36) PRIMARY KEY,
trace_id VARCHAR(36),
parent_event_id VARCHAR(36),
timestamp TIMESTAMPTZ NOT NULL,
timestamp_unix_ms BIGINT NOT NULL,
event_type VARCHAR(100) NOT NULL,
event_category VARCHAR(50) NOT NULL,
severity VARCHAR(20) NOT NULL,
project_id UUID,
agent_id UUID,
user_id UUID,
action TEXT NOT NULL,
data JSONB NOT NULL,
before_state JSONB,
after_state JSONB,
previous_hash VARCHAR(64),
event_hash VARCHAR(64) NOT NULL
);
-- Indexes for common queries
CREATE INDEX idx_audit_timestamp ON audit_events (timestamp DESC);
CREATE INDEX idx_audit_project ON audit_events (project_id, timestamp DESC);
CREATE INDEX idx_audit_agent ON audit_events (agent_id, timestamp DESC);
CREATE INDEX idx_audit_type ON audit_events (event_type, timestamp DESC);
```
### Storage Tiers
| Tier | Storage | Retention | Query Speed |
|------|---------|-----------|-------------|
| Hot | PostgreSQL | 0-90 days | Fast |
| Cold | S3/MinIO | 90+ days | Slow |
### Archival Process
```python
@celery_app.task
def archive_old_events():
"""Move events older than 90 days to cold storage."""
cutoff = datetime.utcnow() - timedelta(days=90)
# Export to S3 in daily batches
events = db.query("""
SELECT * FROM audit_events
WHERE timestamp < $1
ORDER BY timestamp
""", cutoff)
for date, batch in group_by_date(events):
s3.put_object(
Bucket="syndarix-audit",
Key=f"audit/{date.isoformat()}.jsonl.gz",
Body=gzip.compress(batch.to_jsonl())
)
# Delete from PostgreSQL
db.execute("DELETE FROM audit_events WHERE timestamp < $1", cutoff)
```
### Audit Viewer API
```python
@router.get("/projects/{project_id}/audit")
async def get_audit_trail(
project_id: str,
event_type: str | None = None,
agent_id: str | None = None,
start_time: datetime | None = None,
end_time: datetime | None = None,
limit: int = 100
) -> list[AuditEvent]:
"""Query audit trail with filters."""
...
```
## Consequences
### Positive
- Complete audit trail of all agent actions
- Tamper-evident through hash chaining
- Fast queries for recent events
- Cost-effective long-term storage
### Negative
- Storage requirements grow with activity
- Hash chain verification adds complexity
### Mitigation
- Tiered storage with archival
- Batch verification for chain integrity
## Compliance
This decision aligns with:
- NFR-602: Comprehensive audit logging
- Compliance: SOC2, GDPR requirements
---
*This ADR establishes the audit logging architecture for Syndarix.*