forked from cardosofelipe/fast-next-template
Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
229 lines
6.1 KiB
Markdown
229 lines
6.1 KiB
Markdown
# ADR-013: Audit Logging Architecture
|
|
|
|
**Status:** Accepted
|
|
**Date:** 2025-12-29
|
|
**Deciders:** Architecture Team
|
|
**Related Spikes:** SPIKE-011
|
|
|
|
---
|
|
|
|
## Context
|
|
|
|
As an autonomous AI-powered system, Syndarix requires comprehensive audit logging for:
|
|
- Compliance (SOC2, GDPR)
|
|
- Debugging agent behavior
|
|
- Client trust and transparency
|
|
- Security investigation
|
|
|
|
Every action taken by agents must be traceable and tamper-evident.
|
|
|
|
## Decision Drivers
|
|
|
|
- **Completeness:** Log all significant events
|
|
- **Immutability:** Tamper-evident audit trail
|
|
- **Queryability:** Fast search and filtering
|
|
- **Scalability:** Handle high event volumes
|
|
- **Retention:** Configurable retention policies
|
|
|
|
## Decision
|
|
|
|
**Implement structured audit logging** using:
|
|
- **Structlog** for JSON event formatting
|
|
- **PostgreSQL** for hot storage (0-90 days)
|
|
- **S3-compatible storage** for cold archival
|
|
- **Cryptographic hash chaining** for immutability
|
|
|
|
## Implementation
|
|
|
|
### Event Categories
|
|
|
|
| Category | Event Types |
|
|
|----------|-------------|
|
|
| **Agent** | spawned, action_started, action_completed, decision, terminated |
|
|
| **LLM** | request, response, error, tool_call |
|
|
| **MCP** | tool_invoked, tool_result, tool_error |
|
|
| **Approval** | requested, granted, rejected, timeout |
|
|
| **Git** | commit, branch_created, pr_created, pr_merged |
|
|
| **Project** | created, sprint_started, milestone_completed |
|
|
|
|
### Event Schema
|
|
|
|
```python
|
|
class AuditEvent(BaseModel):
|
|
# Identity
|
|
event_id: str # UUID v7 (time-ordered)
|
|
trace_id: str | None # OpenTelemetry correlation
|
|
parent_event_id: str | None # Event chain
|
|
|
|
# Timestamp
|
|
timestamp: datetime
|
|
timestamp_unix_ms: int
|
|
|
|
# Classification
|
|
event_type: str # e.g., "agent.action.completed"
|
|
event_category: str # e.g., "agent"
|
|
severity: Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
|
|
|
|
# Context
|
|
project_id: str | None
|
|
agent_id: str | None
|
|
user_id: str | None
|
|
|
|
# Content
|
|
action: str # Human-readable description
|
|
data: dict # Event-specific payload
|
|
before_state: dict | None
|
|
after_state: dict | None
|
|
|
|
# Immutability
|
|
previous_hash: str | None # Hash of previous event
|
|
event_hash: str # SHA-256 of this event
|
|
```
|
|
|
|
### Hash Chain Implementation
|
|
|
|
```python
|
|
class AuditLogger:
|
|
def __init__(self):
|
|
self._last_hash: str | None = None
|
|
|
|
async def log(self, event: AuditEvent) -> None:
|
|
# Set hash chain
|
|
event.previous_hash = self._last_hash
|
|
event.event_hash = self._compute_hash(event)
|
|
self._last_hash = event.event_hash
|
|
|
|
# Persist
|
|
await self._store(event)
|
|
|
|
def _compute_hash(self, event: AuditEvent) -> str:
|
|
payload = json.dumps({
|
|
"event_id": event.event_id,
|
|
"timestamp_unix_ms": event.timestamp_unix_ms,
|
|
"event_type": event.event_type,
|
|
"data": event.data,
|
|
"previous_hash": event.previous_hash
|
|
}, sort_keys=True)
|
|
return hashlib.sha256(payload.encode()).hexdigest()
|
|
|
|
async def verify_chain(self, events: list[AuditEvent]) -> bool:
|
|
"""Verify audit trail integrity."""
|
|
for i, event in enumerate(events):
|
|
expected_hash = self._compute_hash(event)
|
|
if expected_hash != event.event_hash:
|
|
return False
|
|
if i > 0 and event.previous_hash != events[i-1].event_hash:
|
|
return False
|
|
return True
|
|
```
|
|
|
|
### Database Schema
|
|
|
|
```sql
|
|
CREATE TABLE audit_events (
|
|
event_id VARCHAR(36) PRIMARY KEY,
|
|
trace_id VARCHAR(36),
|
|
parent_event_id VARCHAR(36),
|
|
|
|
timestamp TIMESTAMPTZ NOT NULL,
|
|
timestamp_unix_ms BIGINT NOT NULL,
|
|
|
|
event_type VARCHAR(100) NOT NULL,
|
|
event_category VARCHAR(50) NOT NULL,
|
|
severity VARCHAR(20) NOT NULL,
|
|
|
|
project_id UUID,
|
|
agent_id UUID,
|
|
user_id UUID,
|
|
|
|
action TEXT NOT NULL,
|
|
data JSONB NOT NULL,
|
|
before_state JSONB,
|
|
after_state JSONB,
|
|
|
|
previous_hash VARCHAR(64),
|
|
event_hash VARCHAR(64) NOT NULL
|
|
);
|
|
|
|
-- Indexes for common queries
|
|
CREATE INDEX idx_audit_timestamp ON audit_events (timestamp DESC);
|
|
CREATE INDEX idx_audit_project ON audit_events (project_id, timestamp DESC);
|
|
CREATE INDEX idx_audit_agent ON audit_events (agent_id, timestamp DESC);
|
|
CREATE INDEX idx_audit_type ON audit_events (event_type, timestamp DESC);
|
|
```
|
|
|
|
### Storage Tiers
|
|
|
|
| Tier | Storage | Retention | Query Speed |
|
|
|------|---------|-----------|-------------|
|
|
| Hot | PostgreSQL | 0-90 days | Fast |
|
|
| Cold | S3/MinIO | 90+ days | Slow |
|
|
|
|
### Archival Process
|
|
|
|
```python
|
|
@celery_app.task
|
|
def archive_old_events():
|
|
"""Move events older than 90 days to cold storage."""
|
|
cutoff = datetime.utcnow() - timedelta(days=90)
|
|
|
|
# Export to S3 in daily batches
|
|
events = db.query("""
|
|
SELECT * FROM audit_events
|
|
WHERE timestamp < $1
|
|
ORDER BY timestamp
|
|
""", cutoff)
|
|
|
|
for date, batch in group_by_date(events):
|
|
s3.put_object(
|
|
Bucket="syndarix-audit",
|
|
Key=f"audit/{date.isoformat()}.jsonl.gz",
|
|
Body=gzip.compress(batch.to_jsonl())
|
|
)
|
|
|
|
# Delete from PostgreSQL
|
|
db.execute("DELETE FROM audit_events WHERE timestamp < $1", cutoff)
|
|
```
|
|
|
|
### Audit Viewer API
|
|
|
|
```python
|
|
@router.get("/projects/{project_id}/audit")
|
|
async def get_audit_trail(
|
|
project_id: str,
|
|
event_type: str | None = None,
|
|
agent_id: str | None = None,
|
|
start_time: datetime | None = None,
|
|
end_time: datetime | None = None,
|
|
limit: int = 100
|
|
) -> list[AuditEvent]:
|
|
"""Query audit trail with filters."""
|
|
...
|
|
```
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
- Complete audit trail of all agent actions
|
|
- Tamper-evident through hash chaining
|
|
- Fast queries for recent events
|
|
- Cost-effective long-term storage
|
|
|
|
### Negative
|
|
- Storage requirements grow with activity
|
|
- Hash chain verification adds complexity
|
|
|
|
### Mitigation
|
|
- Tiered storage with archival
|
|
- Batch verification for chain integrity
|
|
|
|
## Compliance
|
|
|
|
This decision aligns with:
|
|
- NFR-602: Comprehensive audit logging
|
|
- Compliance: SOC2, GDPR requirements
|
|
|
|
---
|
|
|
|
*This ADR establishes the audit logging architecture for Syndarix.*
|