Commit Graph

161 Commits

Author SHA1 Message Date
Felipe Cardoso
ad0c06851d feat(tests): add comprehensive E2E tests for MCP and Agent workflows
- Introduced end-to-end tests for MCP workflows, including server discovery, authentication, context engine operations, error handling, and input validation.
- Added full lifecycle tests for agent workflows, covering type management, instance spawning, status transitions, and admin-only operations.
- Enhanced test coverage for real-world MCP and Agent scenarios across PostgreSQL and async environments.
2026-01-05 01:02:41 +01:00
Felipe Cardoso
49359b1416 feat(api): add Context Management API and routes
- Introduced a new `context` module and its endpoints for Context Management.
- Added `/context` route to the API router for assembling LLM context, token counting, budget management, and cache invalidation.
- Implemented health checks, context assembly, token counting, and caching operations in the Context Management Engine.
- Included schemas for request/response models and tightened error handling for context-related operations.
2026-01-05 01:02:33 +01:00
Felipe Cardoso
911d950c15 feat(tests): add comprehensive integration tests for MCP stack
- Introduced integration tests covering backend, LLM Gateway, Knowledge Base, and Context Engine.
- Includes health checks, tool listing, token counting, and end-to-end MCP flows.
- Added `RUN_INTEGRATION_TESTS` environment flag to enable selective test execution.
- Includes a quick health check script to verify service availability before running tests.
2026-01-05 01:02:22 +01:00
Felipe Cardoso
b2a3ac60e0 feat: add integration testing target to Makefile
- Introduced `test-integration` command for MCP integration tests.
- Expanded help section with details about running integration tests.
- Improved Makefile's testing capabilities for enhanced developer workflows.
2026-01-05 01:02:16 +01:00
Felipe Cardoso
60ebeaa582 test(safety): add comprehensive tests for safety framework modules
Add tests to improve backend coverage from 85% to 93%:

- test_audit.py: 60 tests for AuditLogger (20% -> 99%)
  - Hash chain integrity, sanitization, retention, handlers
  - Fixed bug: hash chain modification after event creation
  - Fixed bug: verification not using correct prev_hash

- test_hitl.py: Tests for HITL manager (0% -> 100%)
- test_permissions.py: Tests for permissions manager (0% -> 99%)
- test_rollback.py: Tests for rollback manager (0% -> 100%)
- test_metrics.py: Tests for metrics collector (0% -> 100%)
- test_mcp_integration.py: Tests for MCP safety wrapper (0% -> 100%)
- test_validation.py: Additional cache and edge case tests (76% -> 100%)
- test_scoring.py: Lock cleanup and edge case tests (78% -> 91%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 19:41:54 +01:00
Felipe Cardoso
758052dcff feat(context): improve budget validation and XML safety in ranking and Claude adapter
- Added stricter budget validation in ContextRanker with explicit error handling for invalid configurations.
- Introduced `_get_valid_token_count()` helper to validate and safeguard token counts.
- Enhanced XML escaping in Claude adapter to prevent injection risks from scores and unhandled content.
2026-01-04 16:02:18 +01:00
Felipe Cardoso
1628eacf2b feat(context): enhance timeout handling, tenant isolation, and budget management
- Added timeout enforcement for token counting, scoring, and compression with detailed error handling.
- Introduced tenant isolation in context caching using project and agent identifiers.
- Enhanced budget management with stricter checks for critical context overspending and buffer limitations.
- Optimized per-context locking with cleanup to prevent memory leaks in concurrent environments.
- Updated default assembly timeout settings for improved performance and reliability.
- Improved XML escaping in Claude adapter for safety against injection attacks.
- Standardized token estimation using model-specific ratios.
2026-01-04 15:52:50 +01:00
Felipe Cardoso
2bea057fb1 chore(context): refactor for consistency, optimize formatting, and simplify logic
- Cleaned up unnecessary comments in `__all__` definitions for better readability.
- Adjusted indentation and formatting across modules for improved clarity (e.g., long lines, logical grouping).
- Simplified conditional expressions and inline comments for context scoring and ranking.
- Replaced some hard-coded values with type-safe annotations (e.g., `ClassVar`).
- Removed unused imports and ensured consistent usage across test files.
- Updated `test_score_not_cached_on_context` to clarify caching behavior.
- Improved truncation strategy logic and marker handling.
2026-01-04 15:23:14 +01:00
Felipe Cardoso
9e54f16e56 test(context): add edge case tests for truncation and scoring concurrency
- Add tests for truncation edge cases, including zero tokens, short content, and marker handling.
- Add concurrency tests for scoring to verify per-context locking and handling of multiple contexts.
2026-01-04 12:38:04 +01:00
Felipe Cardoso
96e6400bd8 feat(context): enhance performance, caching, and settings management
- Replace hard-coded limits with configurable settings (e.g., cache memory size, truncation strategy, relevance settings).
- Optimize parallel execution in token counting, scoring, and reranking for source diversity.
- Improve caching logic:
  - Add per-context locks for safe parallel scoring.
  - Reuse precomputed fingerprints for cache efficiency.
- Make truncation, scoring, and ranker behaviors fully configurable via settings.
- Add support for middle truncation, context hash-based hashing, and dynamic token limiting.
- Refactor methods for scalability and better error handling.

Tests: Updated all affected components with additional test cases.
2026-01-04 12:37:58 +01:00
Felipe Cardoso
6c7b72f130 chore(context): apply linter fixes and sort imports (#86)
Phase 8 of Context Management Engine - Final Cleanup:

- Sort __all__ exports alphabetically
- Sort imports per isort conventions
- Fix minor linting issues

Final test results:
- 311 context management tests passing
- 2507 total backend tests passing
- 85% code coverage

Context Management Engine is complete with all 8 phases:
1. Foundation: Types, Config, Exceptions
2. Token Budget Management
3. Context Scoring & Ranking
4. Context Assembly Pipeline
5. Model Adapters (Claude, OpenAI)
6. Caching Layer (Redis + in-memory)
7. Main Engine & Integration
8. Testing & Documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:46:56 +01:00
Felipe Cardoso
027ebfc332 feat(context): implement main ContextEngine with full integration (#85)
Phase 7 of Context Management Engine - Main Engine:

- Add ContextEngine as main orchestration class
- Integrate all components: calculator, scorer, ranker, compressor, cache
- Add high-level assemble_context() API with:
  - System prompt support
  - Task description support
  - Knowledge Base integration via MCP
  - Conversation history conversion
  - Tool results conversion
  - Custom contexts support
- Add helper methods:
  - get_budget_for_model()
  - count_tokens() with caching
  - invalidate_cache()
  - get_stats()
- Add create_context_engine() factory function

Tests: 26 new tests, 311 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:44:40 +01:00
Felipe Cardoso
c2466ab401 feat(context): implement Redis-based caching layer (#84)
Phase 6 of Context Management Engine - Caching Layer:

- Add ContextCache with Redis integration
- Support fingerprint-based assembled context caching
- Support token count caching (model-specific)
- Support score caching (scorer + context + query)
- Add in-memory fallback with LRU eviction
- Add cache invalidation with pattern matching
- Add cache statistics reporting

Key features:
- Hierarchical cache key structure (ctx:type:hash)
- Automatic TTL expiration
- Memory cache for fast repeated access
- Graceful degradation when Redis unavailable

Tests: 29 new tests, 285 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:41:21 +01:00
Felipe Cardoso
7828d35e06 feat(context): implement model adapters for Claude and OpenAI (#83)
Phase 5 of Context Management Engine - Model Adapters:

- Add ModelAdapter abstract base class with model matching
- Add DefaultAdapter for unknown models (plain text)
- Add ClaudeAdapter with XML-based formatting:
  - <system_instructions> for system context
  - <reference_documents>/<document> for knowledge
  - <conversation_history>/<message> for chat
  - <tool_results>/<tool_result> for tool outputs
  - XML escaping for special characters
- Add OpenAIAdapter with markdown formatting:
  - ## headers for sections
  - ### Source headers for documents
  - **ROLE** bold labels for conversation
  - Code blocks for tool outputs
- Add get_adapter() factory function for model selection

Tests: 33 new tests, 256 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:36:32 +01:00
Felipe Cardoso
6b07e62f00 feat(context): implement assembly pipeline and compression (#82)
Phase 4 of Context Management Engine - Assembly Pipeline:

- Add TruncationStrategy with end/middle/sentence-aware truncation
- Add TruncationResult dataclass for tracking compression metrics
- Add ContextCompressor for type-specific compression
- Add ContextPipeline orchestrating full assembly workflow:
  - Token counting for all contexts
  - Scoring and ranking via ContextRanker
  - Optional compression when budget threshold exceeded
  - Model-specific formatting (XML for Claude, markdown for OpenAI)
- Add PipelineMetrics for performance tracking
- Update AssembledContext with new fields (model, contexts, metadata)
- Add backward compatibility aliases for renamed fields

Tests: 34 new tests, 223 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:32:25 +01:00
Felipe Cardoso
0d2005ddcb feat(context): implement context scoring and ranking (Phase 3)
Add comprehensive scoring system with three strategies:
- RelevanceScorer: Semantic similarity with keyword fallback
- RecencyScorer: Exponential decay with type-specific half-lives
- PriorityScorer: Priority-based scoring with type bonuses

Implement CompositeScorer combining all strategies with configurable
weights (default: 50% relevance, 30% recency, 20% priority).

Add ContextRanker for budget-aware context selection with:
- Greedy selection algorithm respecting token budgets
- CRITICAL priority contexts always included
- Diversity reranking to prevent source dominance
- Comprehensive selection statistics

68 tests covering all scoring and ranking functionality.

Part of #61 - Context Management Engine

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:24:06 +01:00
Felipe Cardoso
dfa75e682e feat(context): implement token budget management (Phase 2)
Add TokenCalculator with LLM Gateway integration for accurate token
counting with in-memory caching and fallback character-based estimation.
Implement TokenBudget for tracking allocations per context type with
budget enforcement, and BudgetAllocator for creating budgets based on
model context window sizes.

- TokenCalculator: MCP integration, caching, model-specific ratios
- TokenBudget: allocation tracking, can_fit/allocate/deallocate/reset
- BudgetAllocator: model context sizes, budget creation and adjustment
- 35 comprehensive tests covering all budget functionality

Part of #61 - Context Management Engine

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:13:23 +01:00
Felipe Cardoso
22ecb5e989 feat(context): Phase 1 - Foundation types, config and exceptions (#79)
Implements the foundation for Context Management Engine:

Types (backend/app/services/context/types/):
- BaseContext: Abstract base with ID, content, priority, scoring
- SystemContext: System prompts, personas, instructions
- KnowledgeContext: RAG results from Knowledge Base MCP
- ConversationContext: Chat history with role support
- TaskContext: Task/issue context with acceptance criteria
- ToolContext: Tool definitions and execution results
- AssembledContext: Final assembled context result

Configuration (config.py):
- Token budget allocation (system 5%, task 10%, knowledge 40%, etc.)
- Scoring weights (relevance 50%, recency 30%, priority 20%)
- Cache settings (TTL, prefix)
- Performance settings (max assembly time, parallel scoring)
- Environment variable overrides with CTX_ prefix

Exceptions (exceptions.py):
- ContextError: Base exception
- BudgetExceededError: Token budget violations
- TokenCountError: Token counting failures
- CompressionError: Compression failures
- AssemblyTimeoutError: Assembly timeout
- ScoringError, FormattingError, CacheError
- ContextNotFoundError, InvalidContextError

All 86 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:07:39 +01:00
Felipe Cardoso
ca5f5e3383 refactor(environment): update virtualenv path to /opt/venv in Docker setup
- Adjusted `docker-compose.dev.yml` to reflect the new venv location.
- Modified entrypoint script and Dockerfile to reference `/opt/venv` for isolated dependencies.
- Improved bind mount setup to prevent venv overwrites during development.
2026-01-04 00:58:24 +01:00
Felipe Cardoso
caf283bed2 feat(safety): enhance rate limiting and cost control with alert deduplication and usage tracking
- Added `record_action` in `RateLimiter` for precise tracking of slot consumption post-validation.
- Introduced deduplication mechanism for warning alerts in `CostController` to prevent spamming.
- Refactored `CostController`'s session and daily budget alert handling for improved clarity.
- Implemented test suites for `CostController` and `SafetyGuardian` to validate changes.
- Expanded integration testing to cover deduplication, validation, and loop detection edge cases.
2026-01-03 17:55:34 +01:00
Felipe Cardoso
520c06175e refactor(safety): apply consistent formatting across services and tests
Improved code readability and uniformity by standardizing line breaks, indentation, and inline conditions across safety-related services, models, and tests, including content filters, validation rules, and emergency controls.
2026-01-03 16:23:39 +01:00
Felipe Cardoso
065e43c5a9 fix(tests): use delay variables in retry delay test
The delay2 and delay3 variables were calculated but never asserted,
causing lint warnings. Added assertions to verify all delays are
positive and within max bounds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 16:19:54 +01:00
Felipe Cardoso
c8b88dadc3 fix(safety): copy default patterns to avoid test pollution
The ContentFilter was appending references to DEFAULT_PATTERNS objects,
so when tests modified patterns (e.g., disabling them), those changes
persisted across test runs. Use dataclass replace() to create copies.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 12:08:43 +01:00
Felipe Cardoso
015f2de6c6 test(safety): add Phase E comprehensive safety tests
- Add tests for models: ActionMetadata, ActionRequest, ActionResult,
  ValidationRule, BudgetStatus, RateLimitConfig, ApprovalRequest/Response,
  Checkpoint, RollbackResult, AuditEvent, SafetyPolicy, GuardianResult
- Add tests for validation: ActionValidator rules, priorities, patterns,
  bypass mode, batch validation, rule creation helpers
- Add tests for loops: LoopDetector exact/semantic/oscillation detection,
  LoopBreaker throttle/backoff, history management
- Add tests for content filter: PII filtering (email, phone, SSN, credit card),
  secret blocking (API keys, GitHub tokens, private keys), custom patterns,
  scan without filtering, dict filtering
- Add tests for emergency controls: state management, pause/resume/reset,
  scoped emergency stops, callbacks, EmergencyTrigger events
- Fix exception kwargs in content filter and emergency controls to match
  exception class signatures

All 108 tests passing with lint and type checks clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 11:52:35 +01:00
Felipe Cardoso
f36bfb3781 feat(safety): add Phase D MCP integration and metrics
- Add MCPSafetyWrapper for safe MCP tool execution
- Add MCPToolCall/MCPToolResult models for MCP interactions
- Add SafeToolExecutor context manager
- Add SafetyMetrics collector with Prometheus export support
- Track validations, approvals, rate limits, budgets, and more
- Support for counters, gauges, and histograms

Issue #63

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 11:40:14 +01:00
Felipe Cardoso
ef659cd72d feat(safety): add Phase C advanced controls
- Add rollback manager with file checkpointing and transaction context
- Add HITL manager with approval queues and notification handlers
- Add content filter with PII, secrets, and injection detection
- Add emergency controls with stop/pause/resume capabilities
- Update SafetyConfig with checkpoint_dir setting

Issue #63

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 11:36:24 +01:00
Felipe Cardoso
728edd1453 feat(backend): add Phase B safety subsystems (#63)
Implements core control subsystems for the safety framework:

**Action Validation (validation/validator.py):**
- Rule-based validation engine with priority ordering
- Allow/deny/require-approval rule types
- Pattern matching for tools and resources
- Validation result caching with LRU eviction
- Emergency bypass capability with audit

**Permission System (permissions/manager.py):**
- Per-agent permission grants on resources
- Resource pattern matching (wildcards)
- Temporary permissions with expiration
- Permission inheritance hierarchy
- Default deny with configurable defaults

**Cost Control (costs/controller.py):**
- Per-session and per-day budget tracking
- Token and USD cost limits
- Warning alerts at configurable thresholds
- Budget rollover and reset policies
- Real-time usage tracking

**Rate Limiting (limits/limiter.py):**
- Sliding window rate limiter
- Per-action, per-LLM-call, per-file-op limits
- Burst allowance with recovery
- Configurable limits per operation type

**Loop Detection (loops/detector.py):**
- Exact repetition detection (same action+args)
- Semantic repetition (similar actions)
- Oscillation pattern detection (A→B→A→B)
- Per-agent action history tracking
- Loop breaking suggestions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 11:28:00 +01:00
Felipe Cardoso
498c0a0e94 feat(backend): add safety framework foundation (Phase A) (#63)
Core safety framework architecture for autonomous agent guardrails:

**Core Components:**
- SafetyGuardian: Main orchestrator for all safety checks
- AuditLogger: Comprehensive audit logging with hash chain tamper detection
- SafetyConfig: Pydantic-based configuration
- Models: Action requests, validation results, policies, checkpoints

**Exception Hierarchy:**
- SafetyError base with context preservation
- Permission, Budget, RateLimit, Loop errors
- Approval workflow errors (Required, Denied, Timeout)
- Rollback, Sandbox, Emergency exceptions

**Safety Policy System:**
- Autonomy level based policies (FULL_CONTROL, MILESTONE, AUTONOMOUS)
- Cost limits, rate limits, permission patterns
- HITL approval requirements per action type
- Configurable loop detection thresholds

**Directory Structure:**
- validation/, costs/, limits/, loops/ - Control subsystems
- permissions/, rollback/, hitl/ - Access and recovery
- content/, sandbox/, emergency/ - Protection systems
- audit/, policies/ - Logging and configuration

Phase A establishes the architecture. Subsystems to be implemented in Phase B-C.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 11:22:25 +01:00
Felipe Cardoso
e5975fa5d0 feat(backend): implement MCP client infrastructure (#55)
Core MCP client implementation with comprehensive tooling:

**Services:**
- MCPClientManager: Main facade for all MCP operations
- MCPServerRegistry: Thread-safe singleton for server configs
- ConnectionPool: Connection pooling with auto-reconnection
- ToolRouter: Automatic tool routing with circuit breaker
- AsyncCircuitBreaker: Custom async-compatible circuit breaker

**Configuration:**
- YAML-based config with Pydantic models
- Environment variable expansion support
- Transport types: HTTP, SSE, STDIO

**API Endpoints:**
- GET /mcp/servers - List all MCP servers
- GET /mcp/servers/{name}/tools - List server tools
- GET /mcp/tools - List all tools from all servers
- GET /mcp/health - Health check all servers
- POST /mcp/call - Execute tool (admin only)
- GET /mcp/circuit-breakers - Circuit breaker status
- POST /mcp/circuit-breakers/{name}/reset - Reset circuit breaker
- POST /mcp/servers/{name}/reconnect - Force reconnection

**Testing:**
- 156 unit tests with comprehensive coverage
- Tests for all services, routes, and error handling
- Proper mocking and async test support

**Documentation:**
- MCP_CLIENT.md with usage examples
- Phase 2+ workflow documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 11:12:41 +01:00
Felipe Cardoso
664415111a test(backend): add comprehensive tests for OAuth and agent endpoints
- Added tests for OAuth provider admin and consent endpoints covering edge cases.
- Extended agent-related tests to handle incorrect project associations and lifecycle state transitions.
- Introduced tests for sprint status transitions and validation checks.
- Improved multiline formatting consistency across all test functions.
2026-01-03 01:44:11 +01:00
Felipe Cardoso
acd18ff694 chore(backend): standardize multiline formatting across modules
Reformatted multiline function calls, object definitions, and queries for improved code readability and consistency. Adjusted imports and constraints where necessary.
2026-01-03 01:35:18 +01:00
Felipe Cardoso
8b6cca5d4d refactor(backend): simplify ENUM handling in alembic migration script
- Removed explicit ENUM creation statements; rely on `sa.Enum` to auto-generate ENUM types during table creation.
- Cleaned up redundant `create_type=False` arguments to streamline definitions.
2026-01-01 12:34:09 +01:00
Felipe Cardoso
7280b182bd fix(backend): race condition fixes for task completion and sprint operations
## Changes

### agent_instance.py - Task Completion Counter Race Condition
- Changed `record_task_completion()` from read-modify-write pattern to
  atomic SQL UPDATE
- Previously: Read instance → increment in Python memory → write back
- Now: Uses `UPDATE ... SET tasks_completed = tasks_completed + 1`
- Prevents lost updates when multiple concurrent task completions occur

### sprint.py - Row-Level Locking for Sprint Operations
- Added `with_for_update()` to `complete_sprint()` to prevent race
  conditions during velocity calculation
- Added `with_for_update()` to `cancel_sprint()` for consistency
- Ensures atomic check-and-update for sprint status changes

## Impact
These fixes prevent:
- Counter metrics being lost under concurrent load
- Data corruption during sprint completion
- Race conditions with concurrent sprint status changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 17:23:33 +01:00
Felipe Cardoso
06b2491c1f fix(backend): critical bug fixes for agent termination and sprint validation
Bug Fixes:
- bulk_terminate_by_project now unassigns issues before terminating agents
  to prevent orphaned issue assignments
- PATCH /issues/{id} now validates sprint status - cannot assign issues
  to COMPLETED or CANCELLED sprints
- archive_project now performs cascading cleanup:
  - Terminates all active agent instances
  - Cancels all planned/active sprints
  - Unassigns issues from terminated agents

Added edge case tests for all fixed bugs (19 new tests total):
- TestBulkTerminateEdgeCases
- TestSprintStatusValidation
- TestArchiveProjectCleanup
- TestDataIntegrityEdgeCases (IDOR protection)

Coverage: 93% (1836 tests passing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 15:23:21 +01:00
Felipe Cardoso
b8265783f3 fix(agents): prevent issue assignment to terminated agents and cleanup on termination
This commit fixes 4 production bugs found via edge case testing:

1. BUG: System allowed assigning issues to terminated agents
   - Added validation in issue creation endpoint
   - Added validation in issue update endpoint
   - Added validation in issue assign endpoint

2. BUG: Issues remained orphaned when agent was terminated
   - Agent termination now auto-unassigns all issues from that agent

These bugs could lead to issues being assigned to non-functional agents
that would never work on them, causing work to stall silently.

Tests added in tests/api/routes/syndarix/test_edge_cases.py to verify:
- Cannot assign issue to terminated agent (3 variations)
- Issues are auto-unassigned when agent is terminated
- Various other edge cases (sprints, projects, IDOR protection)

Coverage: 88% → 93% (1830 tests passing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:43:08 +01:00
Felipe Cardoso
63066c50ba test(crud): add comprehensive Syndarix CRUD tests for 95% coverage
Added CRUD layer tests for all Syndarix domain modules:
- test_issue.py: 37 tests covering issue CRUD operations
- test_sprint.py: 31 tests covering sprint CRUD operations
- test_agent_instance.py: 28 tests covering agent instance CRUD
- test_agent_type.py: 19 tests covering agent type CRUD
- test_project.py: 20 tests covering project CRUD operations

Each test file covers:
- Successful CRUD operations
- Not found cases
- Exception handling paths (IntegrityError, OperationalError)
- Filter and pagination operations
- PostgreSQL-specific tests marked as skip for SQLite

Coverage improvements:
- issue.py: 65% → 99%
- sprint.py: 74% → 100%
- agent_instance.py: 73% → 100%
- agent_type.py: 71% → 93%
- project.py: 79% → 100%

Total backend coverage: 89% → 92%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:30:05 +01:00
Felipe Cardoso
ddf9b5fe25 test(sprints): add sprint issues and IDOR prevention tests
- Add TestSprintIssues class (5 tests)
  - List sprint issues (empty/with data)
  - Add issue to sprint
  - Add nonexistent issue to sprint

- Add TestSprintCrossProjectValidation class (3 tests)
  - IDOR prevention for get/update/start through wrong project

Coverage: sprints.py 72% → 76%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:04:05 +01:00
Felipe Cardoso
c3b66cccfc test(syndarix): add agent_types and enhance issues API tests
- Add comprehensive test_agent_types.py (36 tests)
  - CRUD operations (create, read, update, deactivate)
  - Authorization (superuser vs regular user)
  - Pagination and filtering
  - Slug lookup functionality
  - Model configuration validation

- Enhance test_issues.py (15 new tests, total 39)
  - Issue assignment/unassignment endpoints
  - Issue sync endpoint
  - Cross-project validation (IDOR prevention)
  - Validation error handling
  - Sprint/agent reference validation

Coverage improvements:
- agent_types.py: 41% → 83%
- issues.py: 55% → 75%
- Overall: 88% → 89%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:00:11 +01:00
Felipe Cardoso
896f0d92e5 test(agents): add comprehensive API route tests
Add 22 tests for agents API covering:
- CRUD operations (spawn, list, get, update, delete)
- Lifecycle management (pause, resume)
- Agent metrics (single and project-level)
- Authorization and access control
- Status filtering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:20:25 +01:00
Felipe Cardoso
2ccaeb23f2 test(issues): add comprehensive API route tests
Add 24 tests for issues API covering:
- CRUD operations (create, list, get, update, delete)
- Status and priority filtering
- Search functionality
- Issue statistics
- Authorization and access control

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:20:17 +01:00
Felipe Cardoso
04c939d4c2 test(sprints): add comprehensive API route tests
Add 28 tests for sprints API covering:
- CRUD operations (create, list, get, update)
- Lifecycle management (start, complete, cancel)
- Sprint velocity endpoint
- Authorization and access control
- Pagination and filtering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:20:09 +01:00
Felipe Cardoso
71c94c3b5a test(projects): add comprehensive API route tests
Add 46 tests for projects API covering:
- CRUD operations (create, list, get, update, archive)
- Lifecycle management (pause, resume)
- Authorization and access control
- Pagination and filtering
- All autonomy levels

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:20:01 +01:00
Felipe Cardoso
d71891ac4e fix(agents): move project metrics endpoint before {agent_id} routes
FastAPI processes routes in order, so /agents/metrics must be defined
before /agents/{agent_id} to prevent "metrics" from being parsed as a UUID.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:19:53 +01:00
Felipe Cardoso
3492941aec fix(issues): route ordering and delete method
- Move stats endpoint before {issue_id} routes to prevent UUID parsing errors
- Use remove() instead of soft_delete() since Issue model lacks deleted_at column

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:19:45 +01:00
Felipe Cardoso
81e8d7e73d fix(sprints): move velocity endpoint before {sprint_id} routes
FastAPI processes routes in order, so /velocity must be defined
before /{sprint_id} to prevent "velocity" from being parsed as a UUID.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:19:37 +01:00
Felipe Cardoso
82cb6386a6 fix(backend): regenerate Syndarix migration to match models
Completely rewrote migration 0004 to match current model definitions:
- Added issue_type ENUM (epic, story, task, bug)
- Fixed sprint_status ENUM to include in_review
- Fixed all table columns to match models exactly
- Fixed all indexes and constraints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 12:47:30 +01:00
Felipe Cardoso
2d05035c1d fix(backend): add unique constraint for sprint numbers
Add UniqueConstraint to Sprint model to ensure sprint numbers
are unique within a project, matching the migration specification.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 12:47:19 +01:00
Felipe Cardoso
15d747eb28 fix(sse): Fix critical SSE auth and URL issues
1. Fix SSE URL mismatch (CRITICAL):
   - Frontend was connecting to /events instead of /events/stream
   - Updated useProjectEvents.ts to use correct endpoint path

2. Fix SSE token authentication (CRITICAL):
   - EventSource API doesn't support custom headers
   - Added get_current_user_sse dependency that accepts tokens from:
     - Authorization header (preferred, for non-EventSource clients)
     - Query parameter 'token' (fallback for browser EventSource)
   - Updated SSE endpoint to use new auth dependency
   - Both auth methods now work correctly

Files changed:
- backend/app/api/dependencies/auth.py: +80 lines (new SSE auth)
- backend/app/api/routes/events.py: +23 lines (query param support)
- frontend/src/lib/hooks/useProjectEvents.ts: +5 lines (URL fix)

All 20 backend SSE tests pass.
All 17 frontend useProjectEvents tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 11:59:33 +01:00
Felipe Cardoso
cea97afe25 fix: Add missing API endpoints and validation improvements
- Add cancel_sprint and delete_sprint endpoints to sprints.py
- Add unassign_issue endpoint to issues.py
- Add remove_issue_from_sprint endpoint to sprints.py
- Add CRUD methods: remove_sprint_from_issues, unassign, remove_from_sprint
- Add validation to prevent closed issues in active/planned sprints
- Add authorization tests for SSE events endpoint
- Fix IDOR vulnerabilities in agents.py and projects.py
- Add Syndarix models migration (0004)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 15:39:51 +01:00
Felipe Cardoso
b43fa8ace2 feat: Implement Phase 1 API layer (Issues #28-32)
Complete REST API endpoints for all Syndarix core entities:

Projects (8 endpoints):
- CRUD operations with owner-based access control
- Lifecycle management (pause/resume)
- Slug-based retrieval

Agent Types (6 endpoints):
- CRUD operations with superuser-only writes
- Search and filtering support
- Instance count tracking

Agent Instances (10 endpoints):
- Spawn/list/update/terminate operations
- Status lifecycle with transition validation
- Pause/resume functionality
- Individual and project-wide metrics

Issues (8 endpoints):
- CRUD with comprehensive filtering
- Agent/human assignment
- External tracker sync trigger
- Statistics aggregation

Sprints (10 endpoints):
- CRUD with lifecycle enforcement
- Start/complete transitions
- Issue management
- Velocity metrics

All endpoints include:
- Rate limiting via slowapi
- Project ownership authorization
- Proper error handling with custom exceptions
- Comprehensive logging

Phase 1 API Layer: 100% complete
Phase 1 Overall: ~88% (frontend blocked by design approvals)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 10:50:32 +01:00