forked from cardosofelipe/fast-next-template
docs: add remaining ADRs and comprehensive architecture documentation
Added 7 new Architecture Decision Records completing the full set: - ADR-008: Knowledge Base and RAG (pgvector) - ADR-009: Agent Communication Protocol (structured messages) - ADR-010: Workflow State Machine (transitions + PostgreSQL) - ADR-011: Issue Synchronization (webhook-first + polling) - ADR-012: Cost Tracking (LiteLLM callbacks + Redis budgets) - ADR-013: Audit Logging (hash chaining + tiered storage) - ADR-014: Client Approval Flow (checkpoint-based) Added comprehensive ARCHITECTURE.md that: - Summarizes all 14 ADRs in decision matrix - Documents full system architecture with diagrams - Explains all component interactions - Details technology stack with self-hostability guarantee - Covers security, scalability, and deployment Updated IMPLEMENTATION_ROADMAP.md to mark Phase 0 completed items. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
228
docs/adrs/ADR-013-audit-logging.md
Normal file
228
docs/adrs/ADR-013-audit-logging.md
Normal file
@@ -0,0 +1,228 @@
|
||||
# ADR-013: Audit Logging Architecture
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2025-12-29
|
||||
**Deciders:** Architecture Team
|
||||
**Related Spikes:** SPIKE-011
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
As an autonomous AI-powered system, Syndarix requires comprehensive audit logging for:
|
||||
- Compliance (SOC2, GDPR)
|
||||
- Debugging agent behavior
|
||||
- Client trust and transparency
|
||||
- Security investigation
|
||||
|
||||
Every action taken by agents must be traceable and tamper-evident.
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
- **Completeness:** Log all significant events
|
||||
- **Immutability:** Tamper-evident audit trail
|
||||
- **Queryability:** Fast search and filtering
|
||||
- **Scalability:** Handle high event volumes
|
||||
- **Retention:** Configurable retention policies
|
||||
|
||||
## Decision
|
||||
|
||||
**Implement structured audit logging** using:
|
||||
- **Structlog** for JSON event formatting
|
||||
- **PostgreSQL** for hot storage (0-90 days)
|
||||
- **S3-compatible storage** for cold archival
|
||||
- **Cryptographic hash chaining** for immutability
|
||||
|
||||
## Implementation
|
||||
|
||||
### Event Categories
|
||||
|
||||
| Category | Event Types |
|
||||
|----------|-------------|
|
||||
| **Agent** | spawned, action_started, action_completed, decision, terminated |
|
||||
| **LLM** | request, response, error, tool_call |
|
||||
| **MCP** | tool_invoked, tool_result, tool_error |
|
||||
| **Approval** | requested, granted, rejected, timeout |
|
||||
| **Git** | commit, branch_created, pr_created, pr_merged |
|
||||
| **Project** | created, sprint_started, milestone_completed |
|
||||
|
||||
### Event Schema
|
||||
|
||||
```python
|
||||
class AuditEvent(BaseModel):
|
||||
# Identity
|
||||
event_id: str # UUID v7 (time-ordered)
|
||||
trace_id: str | None # OpenTelemetry correlation
|
||||
parent_event_id: str | None # Event chain
|
||||
|
||||
# Timestamp
|
||||
timestamp: datetime
|
||||
timestamp_unix_ms: int
|
||||
|
||||
# Classification
|
||||
event_type: str # e.g., "agent.action.completed"
|
||||
event_category: str # e.g., "agent"
|
||||
severity: Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
|
||||
|
||||
# Context
|
||||
project_id: str | None
|
||||
agent_id: str | None
|
||||
user_id: str | None
|
||||
|
||||
# Content
|
||||
action: str # Human-readable description
|
||||
data: dict # Event-specific payload
|
||||
before_state: dict | None
|
||||
after_state: dict | None
|
||||
|
||||
# Immutability
|
||||
previous_hash: str | None # Hash of previous event
|
||||
event_hash: str # SHA-256 of this event
|
||||
```
|
||||
|
||||
### Hash Chain Implementation
|
||||
|
||||
```python
|
||||
class AuditLogger:
|
||||
def __init__(self):
|
||||
self._last_hash: str | None = None
|
||||
|
||||
async def log(self, event: AuditEvent) -> None:
|
||||
# Set hash chain
|
||||
event.previous_hash = self._last_hash
|
||||
event.event_hash = self._compute_hash(event)
|
||||
self._last_hash = event.event_hash
|
||||
|
||||
# Persist
|
||||
await self._store(event)
|
||||
|
||||
def _compute_hash(self, event: AuditEvent) -> str:
|
||||
payload = json.dumps({
|
||||
"event_id": event.event_id,
|
||||
"timestamp_unix_ms": event.timestamp_unix_ms,
|
||||
"event_type": event.event_type,
|
||||
"data": event.data,
|
||||
"previous_hash": event.previous_hash
|
||||
}, sort_keys=True)
|
||||
return hashlib.sha256(payload.encode()).hexdigest()
|
||||
|
||||
async def verify_chain(self, events: list[AuditEvent]) -> bool:
|
||||
"""Verify audit trail integrity."""
|
||||
for i, event in enumerate(events):
|
||||
expected_hash = self._compute_hash(event)
|
||||
if expected_hash != event.event_hash:
|
||||
return False
|
||||
if i > 0 and event.previous_hash != events[i-1].event_hash:
|
||||
return False
|
||||
return True
|
||||
```
|
||||
|
||||
### Database Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE audit_events (
|
||||
event_id VARCHAR(36) PRIMARY KEY,
|
||||
trace_id VARCHAR(36),
|
||||
parent_event_id VARCHAR(36),
|
||||
|
||||
timestamp TIMESTAMPTZ NOT NULL,
|
||||
timestamp_unix_ms BIGINT NOT NULL,
|
||||
|
||||
event_type VARCHAR(100) NOT NULL,
|
||||
event_category VARCHAR(50) NOT NULL,
|
||||
severity VARCHAR(20) NOT NULL,
|
||||
|
||||
project_id UUID,
|
||||
agent_id UUID,
|
||||
user_id UUID,
|
||||
|
||||
action TEXT NOT NULL,
|
||||
data JSONB NOT NULL,
|
||||
before_state JSONB,
|
||||
after_state JSONB,
|
||||
|
||||
previous_hash VARCHAR(64),
|
||||
event_hash VARCHAR(64) NOT NULL
|
||||
);
|
||||
|
||||
-- Indexes for common queries
|
||||
CREATE INDEX idx_audit_timestamp ON audit_events (timestamp DESC);
|
||||
CREATE INDEX idx_audit_project ON audit_events (project_id, timestamp DESC);
|
||||
CREATE INDEX idx_audit_agent ON audit_events (agent_id, timestamp DESC);
|
||||
CREATE INDEX idx_audit_type ON audit_events (event_type, timestamp DESC);
|
||||
```
|
||||
|
||||
### Storage Tiers
|
||||
|
||||
| Tier | Storage | Retention | Query Speed |
|
||||
|------|---------|-----------|-------------|
|
||||
| Hot | PostgreSQL | 0-90 days | Fast |
|
||||
| Cold | S3/MinIO | 90+ days | Slow |
|
||||
|
||||
### Archival Process
|
||||
|
||||
```python
|
||||
@celery_app.task
|
||||
def archive_old_events():
|
||||
"""Move events older than 90 days to cold storage."""
|
||||
cutoff = datetime.utcnow() - timedelta(days=90)
|
||||
|
||||
# Export to S3 in daily batches
|
||||
events = db.query("""
|
||||
SELECT * FROM audit_events
|
||||
WHERE timestamp < $1
|
||||
ORDER BY timestamp
|
||||
""", cutoff)
|
||||
|
||||
for date, batch in group_by_date(events):
|
||||
s3.put_object(
|
||||
Bucket="syndarix-audit",
|
||||
Key=f"audit/{date.isoformat()}.jsonl.gz",
|
||||
Body=gzip.compress(batch.to_jsonl())
|
||||
)
|
||||
|
||||
# Delete from PostgreSQL
|
||||
db.execute("DELETE FROM audit_events WHERE timestamp < $1", cutoff)
|
||||
```
|
||||
|
||||
### Audit Viewer API
|
||||
|
||||
```python
|
||||
@router.get("/projects/{project_id}/audit")
|
||||
async def get_audit_trail(
|
||||
project_id: str,
|
||||
event_type: str | None = None,
|
||||
agent_id: str | None = None,
|
||||
start_time: datetime | None = None,
|
||||
end_time: datetime | None = None,
|
||||
limit: int = 100
|
||||
) -> list[AuditEvent]:
|
||||
"""Query audit trail with filters."""
|
||||
...
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Complete audit trail of all agent actions
|
||||
- Tamper-evident through hash chaining
|
||||
- Fast queries for recent events
|
||||
- Cost-effective long-term storage
|
||||
|
||||
### Negative
|
||||
- Storage requirements grow with activity
|
||||
- Hash chain verification adds complexity
|
||||
|
||||
### Mitigation
|
||||
- Tiered storage with archival
|
||||
- Batch verification for chain integrity
|
||||
|
||||
## Compliance
|
||||
|
||||
This decision aligns with:
|
||||
- NFR-602: Comprehensive audit logging
|
||||
- Compliance: SOC2, GDPR requirements
|
||||
|
||||
---
|
||||
|
||||
*This ADR establishes the audit logging architecture for Syndarix.*
|
||||
Reference in New Issue
Block a user