Felipe Cardoso
|
728edd1453
|
feat(backend): add Phase B safety subsystems (#63)
Implements core control subsystems for the safety framework:
**Action Validation (validation/validator.py):**
- Rule-based validation engine with priority ordering
- Allow/deny/require-approval rule types
- Pattern matching for tools and resources
- Validation result caching with LRU eviction
- Emergency bypass capability with audit
**Permission System (permissions/manager.py):**
- Per-agent permission grants on resources
- Resource pattern matching (wildcards)
- Temporary permissions with expiration
- Permission inheritance hierarchy
- Default deny with configurable defaults
**Cost Control (costs/controller.py):**
- Per-session and per-day budget tracking
- Token and USD cost limits
- Warning alerts at configurable thresholds
- Budget rollover and reset policies
- Real-time usage tracking
**Rate Limiting (limits/limiter.py):**
- Sliding window rate limiter
- Per-action, per-LLM-call, per-file-op limits
- Burst allowance with recovery
- Configurable limits per operation type
**Loop Detection (loops/detector.py):**
- Exact repetition detection (same action+args)
- Semantic repetition (similar actions)
- Oscillation pattern detection (A→B→A→B)
- Per-agent action history tracking
- Loop breaking suggestions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-03 11:28:00 +01:00 |
|
Felipe Cardoso
|
498c0a0e94
|
feat(backend): add safety framework foundation (Phase A) (#63)
Core safety framework architecture for autonomous agent guardrails:
**Core Components:**
- SafetyGuardian: Main orchestrator for all safety checks
- AuditLogger: Comprehensive audit logging with hash chain tamper detection
- SafetyConfig: Pydantic-based configuration
- Models: Action requests, validation results, policies, checkpoints
**Exception Hierarchy:**
- SafetyError base with context preservation
- Permission, Budget, RateLimit, Loop errors
- Approval workflow errors (Required, Denied, Timeout)
- Rollback, Sandbox, Emergency exceptions
**Safety Policy System:**
- Autonomy level based policies (FULL_CONTROL, MILESTONE, AUTONOMOUS)
- Cost limits, rate limits, permission patterns
- HITL approval requirements per action type
- Configurable loop detection thresholds
**Directory Structure:**
- validation/, costs/, limits/, loops/ - Control subsystems
- permissions/, rollback/, hitl/ - Access and recovery
- content/, sandbox/, emergency/ - Protection systems
- audit/, policies/ - Logging and configuration
Phase A establishes the architecture. Subsystems to be implemented in Phase B-C.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-03 11:22:25 +01:00 |
|