[SPIKE-011] Audit Logging & Decision Tracking #11

New Issue

cardosofelipe · 2025-12-29T03:51:02Z

cardosofelipe commented

2025-12-29 03:51:02 +00:00

Objective

Design comprehensive audit logging for all agent actions and decisions.

What to Log

All agent actions (tool calls, file changes, issue updates)
Agent decisions and reasoning
Approval requests and responses
State transitions
Configuration changes

Key Questions

What's the log storage strategy? (database, file, external service)
How do we structure log entries for searchability?
How long do we retain logs?
How do we make logs useful for debugging agent behavior?
How do we protect sensitive data in logs?

Research Areas

Structured logging patterns
Log aggregation options
PII/sensitive data handling
Log retention policies

Expected Deliverables

Audit log schema
Logging middleware/decorators
Log query API
Retention policy
ADR documenting the approach

Acceptance Criteria

All agent actions logged
Logs are searchable by project/agent/time
Sensitive data protected
Retention policy implemented

Labels

spike, architecture, observability

## Objective Design comprehensive audit logging for all agent actions and decisions. ## What to Log 1. All agent actions (tool calls, file changes, issue updates) 2. Agent decisions and reasoning 3. Approval requests and responses 4. State transitions 5. Configuration changes ## Key Questions 1. What's the log storage strategy? (database, file, external service) 2. How do we structure log entries for searchability? 3. How long do we retain logs? 4. How do we make logs useful for debugging agent behavior? 5. How do we protect sensitive data in logs? ## Research Areas - [ ] Structured logging patterns - [ ] Log aggregation options - [ ] PII/sensitive data handling - [ ] Log retention policies ## Expected Deliverables - Audit log schema - Logging middleware/decorators - Log query API - Retention policy - ADR documenting the approach ## Acceptance Criteria - [ ] All agent actions logged - [ ] Logs are searchable by project/agent/time - [ ] Sensitive data protected - [ ] Retention policy implemented ## Labels `spike`, `architecture`, `observability`

cardosofelipe commented

2025-12-29 12:20:20 +00:00

SPIKE-011: Audit Logging - Research Completed

The spike document has been created at docs/spikes/SPIKE-011-audit-logging.md.

Executive Summary

Recommendation: Implement a structured, OpenTelemetry-compatible audit logging system using:

Structlog for structured JSON logging with contextual enrichment
PostgreSQL + TimescaleDB for hot storage (0-90 days)
S3-compatible object storage (MinIO) for cold archival (90+ days)
Cryptographic hash chaining for immutability verification
OpenTelemetry integration for trace/span correlation

Key Findings

What to Log

The spike defines comprehensive logging for:

Agent Actions: spawned, action.started, action.completed, action.failed, decision, terminated
LLM Interactions: request, response, error, tool_call (with prompt/response capture)
MCP Tool Invocations: invoked, result, error
Human Approvals: requested, granted, rejected, timeout
Git Operations: commit, branch.created, pr.created, pr.merged
Project Lifecycle: created, sprint.started, milestone.completed, checkpoint

Storage Architecture

HOT (0-30 days)     -> PostgreSQL + TimescaleDB (full detail, fast queries)
WARM (30-90 days)   -> TimescaleDB compressed chunks + aggregates
COLD (90+ days)     -> S3/MinIO Parquet archives (7 year retention)

Immutability

SHA-256 hash chaining (blockchain-like) for tamper evidence
Each event includes previous_hash and event_hash
Verification API to audit chain integrity

Compliance

SOC2: Tamper-evident logs, access controls, documented retention
GDPR: PII redaction, pseudonymization in archives, right-to-deletion support
7-year retention for financial/legal compliance

Implementation Phases

Week 1-2: Foundation (TimescaleDB, base schema, agent action decorator)
Week 3-4: LLM & MCP logging with OpenTelemetry integration
Week 5-6: Immutability, compliance, cold storage archival
Week 7-8: Query APIs, full-text search, audit dashboard

Code Examples Included

AuditEvent Pydantic schema with all required fields
TimescaleDB table schema with hypertables and compression
@audit_agent_action, @audit_llm_call, @audit_mcp_tool decorators
Hash chaining implementation for immutability
OpenTelemetry integration for trace correlation
Common query patterns (timeline, search, LLM usage summary)

Next Steps

Review spike findings with team
Create ADR-007: Audit Logging Architecture
Begin Phase 1 implementation

Spike document: docs/spikes/SPIKE-011-audit-logging.md

## SPIKE-011: Audit Logging - Research Completed The spike document has been created at `docs/spikes/SPIKE-011-audit-logging.md`. ### Executive Summary **Recommendation:** Implement a structured, OpenTelemetry-compatible audit logging system using: - **Structlog** for structured JSON logging with contextual enrichment - **PostgreSQL + TimescaleDB** for hot storage (0-90 days) - **S3-compatible object storage** (MinIO) for cold archival (90+ days) - **Cryptographic hash chaining** for immutability verification - **OpenTelemetry** integration for trace/span correlation ### Key Findings #### What to Log The spike defines comprehensive logging for: - **Agent Actions**: spawned, action.started, action.completed, action.failed, decision, terminated - **LLM Interactions**: request, response, error, tool_call (with prompt/response capture) - **MCP Tool Invocations**: invoked, result, error - **Human Approvals**: requested, granted, rejected, timeout - **Git Operations**: commit, branch.created, pr.created, pr.merged - **Project Lifecycle**: created, sprint.started, milestone.completed, checkpoint #### Storage Architecture ``` HOT (0-30 days) -> PostgreSQL + TimescaleDB (full detail, fast queries) WARM (30-90 days) -> TimescaleDB compressed chunks + aggregates COLD (90+ days) -> S3/MinIO Parquet archives (7 year retention) ``` #### Immutability - SHA-256 hash chaining (blockchain-like) for tamper evidence - Each event includes `previous_hash` and `event_hash` - Verification API to audit chain integrity #### Compliance - **SOC2**: Tamper-evident logs, access controls, documented retention - **GDPR**: PII redaction, pseudonymization in archives, right-to-deletion support - **7-year retention** for financial/legal compliance ### Implementation Phases 1. **Week 1-2**: Foundation (TimescaleDB, base schema, agent action decorator) 2. **Week 3-4**: LLM & MCP logging with OpenTelemetry integration 3. **Week 5-6**: Immutability, compliance, cold storage archival 4. **Week 7-8**: Query APIs, full-text search, audit dashboard ### Code Examples Included - `AuditEvent` Pydantic schema with all required fields - TimescaleDB table schema with hypertables and compression - `@audit_agent_action`, `@audit_llm_call`, `@audit_mcp_tool` decorators - Hash chaining implementation for immutability - OpenTelemetry integration for trace correlation - Common query patterns (timeline, search, LLM usage summary) ### Next Steps - Review spike findings with team - Create ADR-007: Audit Logging Architecture - Begin Phase 1 implementation --- *Spike document: `docs/spikes/SPIKE-011-audit-logging.md`*

cardosofelipe referenced this issue from a commit

2025-12-29 12:31:10 +00:00

docs: add architecture spikes and deep analysis documentation

cardosofelipe closed this issue

2025-12-29 12:31:47 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cardosofelipe/syndarix#11