Felipe Cardoso
c8b88dadc3
fix(safety): copy default patterns to avoid test pollution
...
The ContentFilter was appending references to DEFAULT_PATTERNS objects,
so when tests modified patterns (e.g., disabling them), those changes
persisted across test runs. Use dataclass replace() to create copies.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-03 12:08:43 +01:00
Felipe Cardoso
015f2de6c6
test(safety): add Phase E comprehensive safety tests
...
- Add tests for models: ActionMetadata, ActionRequest, ActionResult,
ValidationRule, BudgetStatus, RateLimitConfig, ApprovalRequest/Response,
Checkpoint, RollbackResult, AuditEvent, SafetyPolicy, GuardianResult
- Add tests for validation: ActionValidator rules, priorities, patterns,
bypass mode, batch validation, rule creation helpers
- Add tests for loops: LoopDetector exact/semantic/oscillation detection,
LoopBreaker throttle/backoff, history management
- Add tests for content filter: PII filtering (email, phone, SSN, credit card),
secret blocking (API keys, GitHub tokens, private keys), custom patterns,
scan without filtering, dict filtering
- Add tests for emergency controls: state management, pause/resume/reset,
scoped emergency stops, callbacks, EmergencyTrigger events
- Fix exception kwargs in content filter and emergency controls to match
exception class signatures
All 108 tests passing with lint and type checks clean.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-03 11:52:35 +01:00
Felipe Cardoso
ef659cd72d
feat(safety): add Phase C advanced controls
...
- Add rollback manager with file checkpointing and transaction context
- Add HITL manager with approval queues and notification handlers
- Add content filter with PII, secrets, and injection detection
- Add emergency controls with stop/pause/resume capabilities
- Update SafetyConfig with checkpoint_dir setting
Issue #63
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-03 11:36:24 +01:00
Felipe Cardoso
498c0a0e94
feat(backend): add safety framework foundation (Phase A) ( #63 )
...
Core safety framework architecture for autonomous agent guardrails:
**Core Components:**
- SafetyGuardian: Main orchestrator for all safety checks
- AuditLogger: Comprehensive audit logging with hash chain tamper detection
- SafetyConfig: Pydantic-based configuration
- Models: Action requests, validation results, policies, checkpoints
**Exception Hierarchy:**
- SafetyError base with context preservation
- Permission, Budget, RateLimit, Loop errors
- Approval workflow errors (Required, Denied, Timeout)
- Rollback, Sandbox, Emergency exceptions
**Safety Policy System:**
- Autonomy level based policies (FULL_CONTROL, MILESTONE, AUTONOMOUS)
- Cost limits, rate limits, permission patterns
- HITL approval requirements per action type
- Configurable loop detection thresholds
**Directory Structure:**
- validation/, costs/, limits/, loops/ - Control subsystems
- permissions/, rollback/, hitl/ - Access and recovery
- content/, sandbox/, emergency/ - Protection systems
- audit/, policies/ - Logging and configuration
Phase A establishes the architecture. Subsystems to be implemented in Phase B-C.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-03 11:22:25 +01:00