forked from cardosofelipe/fast-next-template
Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.1 KiB
6.1 KiB
ADR-013: Audit Logging Architecture
Status: Accepted Date: 2025-12-29 Deciders: Architecture Team Related Spikes: SPIKE-011
Context
As an autonomous AI-powered system, Syndarix requires comprehensive audit logging for:
- Compliance (SOC2, GDPR)
- Debugging agent behavior
- Client trust and transparency
- Security investigation
Every action taken by agents must be traceable and tamper-evident.
Decision Drivers
- Completeness: Log all significant events
- Immutability: Tamper-evident audit trail
- Queryability: Fast search and filtering
- Scalability: Handle high event volumes
- Retention: Configurable retention policies
Decision
Implement structured audit logging using:
- Structlog for JSON event formatting
- PostgreSQL for hot storage (0-90 days)
- S3-compatible storage for cold archival
- Cryptographic hash chaining for immutability
Implementation
Event Categories
| Category | Event Types |
|---|---|
| Agent | spawned, action_started, action_completed, decision, terminated |
| LLM | request, response, error, tool_call |
| MCP | tool_invoked, tool_result, tool_error |
| Approval | requested, granted, rejected, timeout |
| Git | commit, branch_created, pr_created, pr_merged |
| Project | created, sprint_started, milestone_completed |
Event Schema
class AuditEvent(BaseModel):
# Identity
event_id: str # UUID v7 (time-ordered)
trace_id: str | None # OpenTelemetry correlation
parent_event_id: str | None # Event chain
# Timestamp
timestamp: datetime
timestamp_unix_ms: int
# Classification
event_type: str # e.g., "agent.action.completed"
event_category: str # e.g., "agent"
severity: Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
# Context
project_id: str | None
agent_id: str | None
user_id: str | None
# Content
action: str # Human-readable description
data: dict # Event-specific payload
before_state: dict | None
after_state: dict | None
# Immutability
previous_hash: str | None # Hash of previous event
event_hash: str # SHA-256 of this event
Hash Chain Implementation
class AuditLogger:
def __init__(self):
self._last_hash: str | None = None
async def log(self, event: AuditEvent) -> None:
# Set hash chain
event.previous_hash = self._last_hash
event.event_hash = self._compute_hash(event)
self._last_hash = event.event_hash
# Persist
await self._store(event)
def _compute_hash(self, event: AuditEvent) -> str:
payload = json.dumps({
"event_id": event.event_id,
"timestamp_unix_ms": event.timestamp_unix_ms,
"event_type": event.event_type,
"data": event.data,
"previous_hash": event.previous_hash
}, sort_keys=True)
return hashlib.sha256(payload.encode()).hexdigest()
async def verify_chain(self, events: list[AuditEvent]) -> bool:
"""Verify audit trail integrity."""
for i, event in enumerate(events):
expected_hash = self._compute_hash(event)
if expected_hash != event.event_hash:
return False
if i > 0 and event.previous_hash != events[i-1].event_hash:
return False
return True
Database Schema
CREATE TABLE audit_events (
event_id VARCHAR(36) PRIMARY KEY,
trace_id VARCHAR(36),
parent_event_id VARCHAR(36),
timestamp TIMESTAMPTZ NOT NULL,
timestamp_unix_ms BIGINT NOT NULL,
event_type VARCHAR(100) NOT NULL,
event_category VARCHAR(50) NOT NULL,
severity VARCHAR(20) NOT NULL,
project_id UUID,
agent_id UUID,
user_id UUID,
action TEXT NOT NULL,
data JSONB NOT NULL,
before_state JSONB,
after_state JSONB,
previous_hash VARCHAR(64),
event_hash VARCHAR(64) NOT NULL
);
-- Indexes for common queries
CREATE INDEX idx_audit_timestamp ON audit_events (timestamp DESC);
CREATE INDEX idx_audit_project ON audit_events (project_id, timestamp DESC);
CREATE INDEX idx_audit_agent ON audit_events (agent_id, timestamp DESC);
CREATE INDEX idx_audit_type ON audit_events (event_type, timestamp DESC);
Storage Tiers
| Tier | Storage | Retention | Query Speed |
|---|---|---|---|
| Hot | PostgreSQL | 0-90 days | Fast |
| Cold | S3/MinIO | 90+ days | Slow |
Archival Process
@celery_app.task
def archive_old_events():
"""Move events older than 90 days to cold storage."""
cutoff = datetime.utcnow() - timedelta(days=90)
# Export to S3 in daily batches
events = db.query("""
SELECT * FROM audit_events
WHERE timestamp < $1
ORDER BY timestamp
""", cutoff)
for date, batch in group_by_date(events):
s3.put_object(
Bucket="syndarix-audit",
Key=f"audit/{date.isoformat()}.jsonl.gz",
Body=gzip.compress(batch.to_jsonl())
)
# Delete from PostgreSQL
db.execute("DELETE FROM audit_events WHERE timestamp < $1", cutoff)
Audit Viewer API
@router.get("/projects/{project_id}/audit")
async def get_audit_trail(
project_id: str,
event_type: str | None = None,
agent_id: str | None = None,
start_time: datetime | None = None,
end_time: datetime | None = None,
limit: int = 100
) -> list[AuditEvent]:
"""Query audit trail with filters."""
...
Consequences
Positive
- Complete audit trail of all agent actions
- Tamper-evident through hash chaining
- Fast queries for recent events
- Cost-effective long-term storage
Negative
- Storage requirements grow with activity
- Hash chain verification adds complexity
Mitigation
- Tiered storage with archival
- Batch verification for chain integrity
Compliance
This decision aligns with:
- NFR-602: Comprehensive audit logging
- Compliance: SOC2, GDPR requirements
This ADR establishes the audit logging architecture for Syndarix.