Commit Graph

84 Commits

Author SHA1 Message Date
Felipe Cardoso
3554efe66a feat(memory): add episodic memory implementation (Issue #90)
Implements the episodic memory service for storing and retrieving
agent task execution experiences. This enables learning from past
successes and failures.

Components:
- EpisodicMemory: Main service class combining recording and retrieval
- EpisodeRecorder: Handles episode creation, importance scoring
- EpisodeRetriever: Multiple retrieval strategies (recency, semantic,
  outcome, importance, task type)

Key features:
- Records task completions with context, actions, outcomes
- Calculates importance scores based on outcome, duration, lessons
- Semantic search with fallback to recency when embeddings unavailable
- Full CRUD operations with statistics and summarization
- Comprehensive unit tests (50 tests, all passing)

Closes #90

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 02:08:16 +01:00
Felipe Cardoso
4974233169 feat(memory): add working memory implementation (Issue #89)
Implements session-scoped ephemeral memory with:

Storage Backends:
- InMemoryStorage: Thread-safe fallback with TTL support and capacity limits
- RedisStorage: Primary storage with connection pooling and JSON serialization
- Auto-fallback from Redis to in-memory when unavailable

WorkingMemory Class:
- Key-value storage with TTL and reserved key protection
- Task state tracking with progress updates
- Scratchpad for reasoning steps with timestamps
- Checkpoint/snapshot support for recovery
- Factory methods for auto-configured storage

Tests:
- 55 unit tests covering all functionality
- Tests for basic ops, TTL, capacity, concurrency
- Tests for task state, scratchpad, checkpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:51:03 +01:00
Felipe Cardoso
c9d8c0835c feat(memory): add database schema and storage layer (Issue #88)
Add SQLAlchemy models for the Agent Memory System:
- WorkingMemory: Key-value storage with TTL for active sessions
- Episode: Experiential memories from task executions
- Fact: Semantic knowledge triples with confidence scores
- Procedure: Learned skills and procedures with success tracking
- MemoryConsolidationLog: Tracks consolidation jobs between memory tiers

Create enums for memory system:
- ScopeType: global, project, agent_type, agent_instance, session
- EpisodeOutcome: success, failure, partial
- ConsolidationType: working_to_episodic, episodic_to_semantic, etc.
- ConsolidationStatus: pending, running, completed, failed

Add Alembic migration (0005) for all memory tables with:
- Foreign key relationships to projects, agent_instances, agent_types
- Comprehensive indexes for query patterns
- Unique constraints for key lookups and triple uniqueness
- Vector embedding column placeholders (Text fallback until pgvector enabled)

Fix timezone-naive datetime.now() in types.py TaskState (review feedback)

Includes 30 unit tests for models and enums.

Closes #88

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:37:58 +01:00
Felipe Cardoso
085a748929 feat(memory): #87 project setup & core architecture
Implements Sub-Issue #87 of Issue #62 (Agent Memory System).

Core infrastructure:
- memory/types.py: Type definitions for all memory types (Working, Episodic,
  Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome
- memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton
- memory/exceptions.py: Comprehensive exception hierarchy for memory operations
- memory/manager.py: MemoryManager facade with placeholder methods

Directory structure:
- working/: Working memory (Redis/in-memory) - to be implemented in #89
- episodic/: Episodic memory (experiences) - to be implemented in #90
- semantic/: Semantic memory (facts) - to be implemented in #91
- procedural/: Procedural memory (skills) - to be implemented in #92
- scoping/: Scope management - to be implemented in #93
- indexing/: Vector indexing - to be implemented in #94
- consolidation/: Memory consolidation - to be implemented in #95

Tests: 71 unit tests for config, types, and exceptions
Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 01:27:36 +01:00
Felipe Cardoso
4b149b8a52 feat(tests): add unit tests for Context Management API routes
- Added detailed unit tests for `/context` endpoints, covering health checks, context assembly, token counting, budget retrieval, and cache invalidation.
- Included edge cases, error handling, and input validation for context-related operations.
- Improved test coverage for the Context Management module with mocked dependencies and integration scenarios.
2026-01-05 01:02:49 +01:00
Felipe Cardoso
ad0c06851d feat(tests): add comprehensive E2E tests for MCP and Agent workflows
- Introduced end-to-end tests for MCP workflows, including server discovery, authentication, context engine operations, error handling, and input validation.
- Added full lifecycle tests for agent workflows, covering type management, instance spawning, status transitions, and admin-only operations.
- Enhanced test coverage for real-world MCP and Agent scenarios across PostgreSQL and async environments.
2026-01-05 01:02:41 +01:00
Felipe Cardoso
911d950c15 feat(tests): add comprehensive integration tests for MCP stack
- Introduced integration tests covering backend, LLM Gateway, Knowledge Base, and Context Engine.
- Includes health checks, tool listing, token counting, and end-to-end MCP flows.
- Added `RUN_INTEGRATION_TESTS` environment flag to enable selective test execution.
- Includes a quick health check script to verify service availability before running tests.
2026-01-05 01:02:22 +01:00
Felipe Cardoso
60ebeaa582 test(safety): add comprehensive tests for safety framework modules
Add tests to improve backend coverage from 85% to 93%:

- test_audit.py: 60 tests for AuditLogger (20% -> 99%)
  - Hash chain integrity, sanitization, retention, handlers
  - Fixed bug: hash chain modification after event creation
  - Fixed bug: verification not using correct prev_hash

- test_hitl.py: Tests for HITL manager (0% -> 100%)
- test_permissions.py: Tests for permissions manager (0% -> 99%)
- test_rollback.py: Tests for rollback manager (0% -> 100%)
- test_metrics.py: Tests for metrics collector (0% -> 100%)
- test_mcp_integration.py: Tests for MCP safety wrapper (0% -> 100%)
- test_validation.py: Additional cache and edge case tests (76% -> 100%)
- test_scoring.py: Lock cleanup and edge case tests (78% -> 91%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 19:41:54 +01:00
Felipe Cardoso
1628eacf2b feat(context): enhance timeout handling, tenant isolation, and budget management
- Added timeout enforcement for token counting, scoring, and compression with detailed error handling.
- Introduced tenant isolation in context caching using project and agent identifiers.
- Enhanced budget management with stricter checks for critical context overspending and buffer limitations.
- Optimized per-context locking with cleanup to prevent memory leaks in concurrent environments.
- Updated default assembly timeout settings for improved performance and reliability.
- Improved XML escaping in Claude adapter for safety against injection attacks.
- Standardized token estimation using model-specific ratios.
2026-01-04 15:52:50 +01:00
Felipe Cardoso
2bea057fb1 chore(context): refactor for consistency, optimize formatting, and simplify logic
- Cleaned up unnecessary comments in `__all__` definitions for better readability.
- Adjusted indentation and formatting across modules for improved clarity (e.g., long lines, logical grouping).
- Simplified conditional expressions and inline comments for context scoring and ranking.
- Replaced some hard-coded values with type-safe annotations (e.g., `ClassVar`).
- Removed unused imports and ensured consistent usage across test files.
- Updated `test_score_not_cached_on_context` to clarify caching behavior.
- Improved truncation strategy logic and marker handling.
2026-01-04 15:23:14 +01:00
Felipe Cardoso
9e54f16e56 test(context): add edge case tests for truncation and scoring concurrency
- Add tests for truncation edge cases, including zero tokens, short content, and marker handling.
- Add concurrency tests for scoring to verify per-context locking and handling of multiple contexts.
2026-01-04 12:38:04 +01:00
Felipe Cardoso
027ebfc332 feat(context): implement main ContextEngine with full integration (#85)
Phase 7 of Context Management Engine - Main Engine:

- Add ContextEngine as main orchestration class
- Integrate all components: calculator, scorer, ranker, compressor, cache
- Add high-level assemble_context() API with:
  - System prompt support
  - Task description support
  - Knowledge Base integration via MCP
  - Conversation history conversion
  - Tool results conversion
  - Custom contexts support
- Add helper methods:
  - get_budget_for_model()
  - count_tokens() with caching
  - invalidate_cache()
  - get_stats()
- Add create_context_engine() factory function

Tests: 26 new tests, 311 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:44:40 +01:00
Felipe Cardoso
c2466ab401 feat(context): implement Redis-based caching layer (#84)
Phase 6 of Context Management Engine - Caching Layer:

- Add ContextCache with Redis integration
- Support fingerprint-based assembled context caching
- Support token count caching (model-specific)
- Support score caching (scorer + context + query)
- Add in-memory fallback with LRU eviction
- Add cache invalidation with pattern matching
- Add cache statistics reporting

Key features:
- Hierarchical cache key structure (ctx:type:hash)
- Automatic TTL expiration
- Memory cache for fast repeated access
- Graceful degradation when Redis unavailable

Tests: 29 new tests, 285 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:41:21 +01:00
Felipe Cardoso
7828d35e06 feat(context): implement model adapters for Claude and OpenAI (#83)
Phase 5 of Context Management Engine - Model Adapters:

- Add ModelAdapter abstract base class with model matching
- Add DefaultAdapter for unknown models (plain text)
- Add ClaudeAdapter with XML-based formatting:
  - <system_instructions> for system context
  - <reference_documents>/<document> for knowledge
  - <conversation_history>/<message> for chat
  - <tool_results>/<tool_result> for tool outputs
  - XML escaping for special characters
- Add OpenAIAdapter with markdown formatting:
  - ## headers for sections
  - ### Source headers for documents
  - **ROLE** bold labels for conversation
  - Code blocks for tool outputs
- Add get_adapter() factory function for model selection

Tests: 33 new tests, 256 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:36:32 +01:00
Felipe Cardoso
6b07e62f00 feat(context): implement assembly pipeline and compression (#82)
Phase 4 of Context Management Engine - Assembly Pipeline:

- Add TruncationStrategy with end/middle/sentence-aware truncation
- Add TruncationResult dataclass for tracking compression metrics
- Add ContextCompressor for type-specific compression
- Add ContextPipeline orchestrating full assembly workflow:
  - Token counting for all contexts
  - Scoring and ranking via ContextRanker
  - Optional compression when budget threshold exceeded
  - Model-specific formatting (XML for Claude, markdown for OpenAI)
- Add PipelineMetrics for performance tracking
- Update AssembledContext with new fields (model, contexts, metadata)
- Add backward compatibility aliases for renamed fields

Tests: 34 new tests, 223 total context tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:32:25 +01:00
Felipe Cardoso
0d2005ddcb feat(context): implement context scoring and ranking (Phase 3)
Add comprehensive scoring system with three strategies:
- RelevanceScorer: Semantic similarity with keyword fallback
- RecencyScorer: Exponential decay with type-specific half-lives
- PriorityScorer: Priority-based scoring with type bonuses

Implement CompositeScorer combining all strategies with configurable
weights (default: 50% relevance, 30% recency, 20% priority).

Add ContextRanker for budget-aware context selection with:
- Greedy selection algorithm respecting token budgets
- CRITICAL priority contexts always included
- Diversity reranking to prevent source dominance
- Comprehensive selection statistics

68 tests covering all scoring and ranking functionality.

Part of #61 - Context Management Engine

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:24:06 +01:00
Felipe Cardoso
dfa75e682e feat(context): implement token budget management (Phase 2)
Add TokenCalculator with LLM Gateway integration for accurate token
counting with in-memory caching and fallback character-based estimation.
Implement TokenBudget for tracking allocations per context type with
budget enforcement, and BudgetAllocator for creating budgets based on
model context window sizes.

- TokenCalculator: MCP integration, caching, model-specific ratios
- TokenBudget: allocation tracking, can_fit/allocate/deallocate/reset
- BudgetAllocator: model context sizes, budget creation and adjustment
- 35 comprehensive tests covering all budget functionality

Part of #61 - Context Management Engine

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:13:23 +01:00
Felipe Cardoso
22ecb5e989 feat(context): Phase 1 - Foundation types, config and exceptions (#79)
Implements the foundation for Context Management Engine:

Types (backend/app/services/context/types/):
- BaseContext: Abstract base with ID, content, priority, scoring
- SystemContext: System prompts, personas, instructions
- KnowledgeContext: RAG results from Knowledge Base MCP
- ConversationContext: Chat history with role support
- TaskContext: Task/issue context with acceptance criteria
- ToolContext: Tool definitions and execution results
- AssembledContext: Final assembled context result

Configuration (config.py):
- Token budget allocation (system 5%, task 10%, knowledge 40%, etc.)
- Scoring weights (relevance 50%, recency 30%, priority 20%)
- Cache settings (TTL, prefix)
- Performance settings (max assembly time, parallel scoring)
- Environment variable overrides with CTX_ prefix

Exceptions (exceptions.py):
- ContextError: Base exception
- BudgetExceededError: Token budget violations
- TokenCountError: Token counting failures
- CompressionError: Compression failures
- AssemblyTimeoutError: Assembly timeout
- ScoringError, FormattingError, CacheError
- ContextNotFoundError, InvalidContextError

All 86 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:07:39 +01:00
Felipe Cardoso
caf283bed2 feat(safety): enhance rate limiting and cost control with alert deduplication and usage tracking
- Added `record_action` in `RateLimiter` for precise tracking of slot consumption post-validation.
- Introduced deduplication mechanism for warning alerts in `CostController` to prevent spamming.
- Refactored `CostController`'s session and daily budget alert handling for improved clarity.
- Implemented test suites for `CostController` and `SafetyGuardian` to validate changes.
- Expanded integration testing to cover deduplication, validation, and loop detection edge cases.
2026-01-03 17:55:34 +01:00
Felipe Cardoso
520c06175e refactor(safety): apply consistent formatting across services and tests
Improved code readability and uniformity by standardizing line breaks, indentation, and inline conditions across safety-related services, models, and tests, including content filters, validation rules, and emergency controls.
2026-01-03 16:23:39 +01:00
Felipe Cardoso
065e43c5a9 fix(tests): use delay variables in retry delay test
The delay2 and delay3 variables were calculated but never asserted,
causing lint warnings. Added assertions to verify all delays are
positive and within max bounds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 16:19:54 +01:00
Felipe Cardoso
015f2de6c6 test(safety): add Phase E comprehensive safety tests
- Add tests for models: ActionMetadata, ActionRequest, ActionResult,
  ValidationRule, BudgetStatus, RateLimitConfig, ApprovalRequest/Response,
  Checkpoint, RollbackResult, AuditEvent, SafetyPolicy, GuardianResult
- Add tests for validation: ActionValidator rules, priorities, patterns,
  bypass mode, batch validation, rule creation helpers
- Add tests for loops: LoopDetector exact/semantic/oscillation detection,
  LoopBreaker throttle/backoff, history management
- Add tests for content filter: PII filtering (email, phone, SSN, credit card),
  secret blocking (API keys, GitHub tokens, private keys), custom patterns,
  scan without filtering, dict filtering
- Add tests for emergency controls: state management, pause/resume/reset,
  scoped emergency stops, callbacks, EmergencyTrigger events
- Fix exception kwargs in content filter and emergency controls to match
  exception class signatures

All 108 tests passing with lint and type checks clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 11:52:35 +01:00
Felipe Cardoso
e5975fa5d0 feat(backend): implement MCP client infrastructure (#55)
Core MCP client implementation with comprehensive tooling:

**Services:**
- MCPClientManager: Main facade for all MCP operations
- MCPServerRegistry: Thread-safe singleton for server configs
- ConnectionPool: Connection pooling with auto-reconnection
- ToolRouter: Automatic tool routing with circuit breaker
- AsyncCircuitBreaker: Custom async-compatible circuit breaker

**Configuration:**
- YAML-based config with Pydantic models
- Environment variable expansion support
- Transport types: HTTP, SSE, STDIO

**API Endpoints:**
- GET /mcp/servers - List all MCP servers
- GET /mcp/servers/{name}/tools - List server tools
- GET /mcp/tools - List all tools from all servers
- GET /mcp/health - Health check all servers
- POST /mcp/call - Execute tool (admin only)
- GET /mcp/circuit-breakers - Circuit breaker status
- POST /mcp/circuit-breakers/{name}/reset - Reset circuit breaker
- POST /mcp/servers/{name}/reconnect - Force reconnection

**Testing:**
- 156 unit tests with comprehensive coverage
- Tests for all services, routes, and error handling
- Proper mocking and async test support

**Documentation:**
- MCP_CLIENT.md with usage examples
- Phase 2+ workflow documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 11:12:41 +01:00
Felipe Cardoso
664415111a test(backend): add comprehensive tests for OAuth and agent endpoints
- Added tests for OAuth provider admin and consent endpoints covering edge cases.
- Extended agent-related tests to handle incorrect project associations and lifecycle state transitions.
- Introduced tests for sprint status transitions and validation checks.
- Improved multiline formatting consistency across all test functions.
2026-01-03 01:44:11 +01:00
Felipe Cardoso
06b2491c1f fix(backend): critical bug fixes for agent termination and sprint validation
Bug Fixes:
- bulk_terminate_by_project now unassigns issues before terminating agents
  to prevent orphaned issue assignments
- PATCH /issues/{id} now validates sprint status - cannot assign issues
  to COMPLETED or CANCELLED sprints
- archive_project now performs cascading cleanup:
  - Terminates all active agent instances
  - Cancels all planned/active sprints
  - Unassigns issues from terminated agents

Added edge case tests for all fixed bugs (19 new tests total):
- TestBulkTerminateEdgeCases
- TestSprintStatusValidation
- TestArchiveProjectCleanup
- TestDataIntegrityEdgeCases (IDOR protection)

Coverage: 93% (1836 tests passing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 15:23:21 +01:00
Felipe Cardoso
b8265783f3 fix(agents): prevent issue assignment to terminated agents and cleanup on termination
This commit fixes 4 production bugs found via edge case testing:

1. BUG: System allowed assigning issues to terminated agents
   - Added validation in issue creation endpoint
   - Added validation in issue update endpoint
   - Added validation in issue assign endpoint

2. BUG: Issues remained orphaned when agent was terminated
   - Agent termination now auto-unassigns all issues from that agent

These bugs could lead to issues being assigned to non-functional agents
that would never work on them, causing work to stall silently.

Tests added in tests/api/routes/syndarix/test_edge_cases.py to verify:
- Cannot assign issue to terminated agent (3 variations)
- Issues are auto-unassigned when agent is terminated
- Various other edge cases (sprints, projects, IDOR protection)

Coverage: 88% → 93% (1830 tests passing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:43:08 +01:00
Felipe Cardoso
63066c50ba test(crud): add comprehensive Syndarix CRUD tests for 95% coverage
Added CRUD layer tests for all Syndarix domain modules:
- test_issue.py: 37 tests covering issue CRUD operations
- test_sprint.py: 31 tests covering sprint CRUD operations
- test_agent_instance.py: 28 tests covering agent instance CRUD
- test_agent_type.py: 19 tests covering agent type CRUD
- test_project.py: 20 tests covering project CRUD operations

Each test file covers:
- Successful CRUD operations
- Not found cases
- Exception handling paths (IntegrityError, OperationalError)
- Filter and pagination operations
- PostgreSQL-specific tests marked as skip for SQLite

Coverage improvements:
- issue.py: 65% → 99%
- sprint.py: 74% → 100%
- agent_instance.py: 73% → 100%
- agent_type.py: 71% → 93%
- project.py: 79% → 100%

Total backend coverage: 89% → 92%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:30:05 +01:00
Felipe Cardoso
ddf9b5fe25 test(sprints): add sprint issues and IDOR prevention tests
- Add TestSprintIssues class (5 tests)
  - List sprint issues (empty/with data)
  - Add issue to sprint
  - Add nonexistent issue to sprint

- Add TestSprintCrossProjectValidation class (3 tests)
  - IDOR prevention for get/update/start through wrong project

Coverage: sprints.py 72% → 76%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:04:05 +01:00
Felipe Cardoso
c3b66cccfc test(syndarix): add agent_types and enhance issues API tests
- Add comprehensive test_agent_types.py (36 tests)
  - CRUD operations (create, read, update, deactivate)
  - Authorization (superuser vs regular user)
  - Pagination and filtering
  - Slug lookup functionality
  - Model configuration validation

- Enhance test_issues.py (15 new tests, total 39)
  - Issue assignment/unassignment endpoints
  - Issue sync endpoint
  - Cross-project validation (IDOR prevention)
  - Validation error handling
  - Sprint/agent reference validation

Coverage improvements:
- agent_types.py: 41% → 83%
- issues.py: 55% → 75%
- Overall: 88% → 89%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 14:00:11 +01:00
Felipe Cardoso
896f0d92e5 test(agents): add comprehensive API route tests
Add 22 tests for agents API covering:
- CRUD operations (spawn, list, get, update, delete)
- Lifecycle management (pause, resume)
- Agent metrics (single and project-level)
- Authorization and access control
- Status filtering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:20:25 +01:00
Felipe Cardoso
2ccaeb23f2 test(issues): add comprehensive API route tests
Add 24 tests for issues API covering:
- CRUD operations (create, list, get, update, delete)
- Status and priority filtering
- Search functionality
- Issue statistics
- Authorization and access control

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:20:17 +01:00
Felipe Cardoso
04c939d4c2 test(sprints): add comprehensive API route tests
Add 28 tests for sprints API covering:
- CRUD operations (create, list, get, update)
- Lifecycle management (start, complete, cancel)
- Sprint velocity endpoint
- Authorization and access control
- Pagination and filtering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:20:09 +01:00
Felipe Cardoso
71c94c3b5a test(projects): add comprehensive API route tests
Add 46 tests for projects API covering:
- CRUD operations (create, list, get, update, archive)
- Lifecycle management (pause, resume)
- Authorization and access control
- Pagination and filtering
- All autonomy levels

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 13:20:01 +01:00
Felipe Cardoso
cea97afe25 fix: Add missing API endpoints and validation improvements
- Add cancel_sprint and delete_sprint endpoints to sprints.py
- Add unassign_issue endpoint to issues.py
- Add remove_issue_from_sprint endpoint to sprints.py
- Add CRUD methods: remove_sprint_from_issues, unassign, remove_from_sprint
- Add validation to prevent closed issues in active/planned sprints
- Add authorization tests for SSE events endpoint
- Fix IDOR vulnerabilities in agents.py and projects.py
- Add Syndarix models migration (0004)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 15:39:51 +01:00
Felipe Cardoso
742ce4c9c8 fix: Comprehensive validation and bug fixes
Infrastructure:
- Add Redis and Celery workers to all docker-compose files
- Fix celery migration race condition in entrypoint.sh
- Add healthchecks and resource limits to dev compose
- Update .env.template with Redis/Celery variables

Backend Models & Schemas:
- Rename Sprint.completed_points to velocity (per requirements)
- Add AgentInstance.name as required field
- Rename Issue external tracker fields for consistency
- Add IssueSource and TrackerType enums
- Add Project.default_tracker_type field

Backend Fixes:
- Add Celery retry configuration with exponential backoff
- Remove unused sequence counter from EventBus
- Add mypy overrides for test dependencies
- Fix test file using wrong schema (UserUpdate -> dict)

Frontend Fixes:
- Fix memory leak in useProjectEvents (proper cleanup)
- Fix race condition with stale closure in reconnection
- Sync TokenWithUser type with regenerated API client
- Fix expires_in null handling in useAuth
- Clean up unused imports in prototype pages
- Add ESLint relaxed rules for prototype files

CI/CD:
- Add E2E testing stage with Testcontainers
- Add security scanning with Trivy and pip-audit
- Add dependency caching for faster builds

Tests:
- Update all tests to use renamed fields (velocity, name, etc.)
- Fix 14 schema test failures
- All 1500 tests pass with 91% coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 10:35:30 +01:00
Felipe Cardoso
11da0d57a8 feat(backend): Add Celery worker infrastructure with task stubs
- Add Celery app configuration with Redis broker/backend
- Add task modules: agent, workflow, cost, git, sync
- Add task stubs for:
  - Agent execution (spawn, heartbeat, terminate)
  - Workflow orchestration (start sprint, checkpoint, code review)
  - Cost tracking (record usage, calculate, generate report)
  - Git operations (clone, commit, push, sync)
  - External sync (import issues, export updates)
- Add task tests directory structure
- Configure for production-ready Celery setup

Implements #18

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:08:14 +01:00
Felipe Cardoso
acfda1e9a9 feat(backend): Add SSE endpoint for project event streaming
- Add /projects/{project_id}/events/stream SSE endpoint
- Add event_bus dependency injection
- Add project access authorization (placeholder)
- Add test event endpoint for development
- Add keepalive comments every 30 seconds
- Add reconnection support via Last-Event-ID header
- Add rate limiting (10/minute per IP)
- Mount events router in API
- Add sse-starlette dependency
- Add 19 comprehensive tests for SSE functionality

Implements #34

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:08:03 +01:00
Felipe Cardoso
3c24a8c522 feat(backend): Add EventBus service with Redis Pub/Sub
- Add EventBus class for real-time event communication
- Add Event schema with type-safe event types (agent, issue, sprint events)
- Add typed payload schemas (AgentSpawnedPayload, AgentMessagePayload)
- Add channel helpers for project/agent/user scoping
- Add subscribe_sse generator for SSE streaming
- Add reconnection support via Last-Event-ID
- Add keepalive mechanism for connection health
- Add 44 comprehensive tests with mocked Redis

Implements #33

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:07:51 +01:00
Felipe Cardoso
ec111f9ce6 feat(backend): Add Redis client with connection pooling
- Add RedisClient with async connection pool management
- Add cache operations (get, set, delete, expire, pattern delete)
- Add JSON serialization helpers for cache
- Add pub/sub operations (publish, subscribe, psubscribe)
- Add health check and pool statistics
- Add FastAPI dependency injection support
- Update config with Redis settings (URL, SSL, TLS)
- Add comprehensive tests for Redis client

Implements #17

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:07:40 +01:00
Felipe Cardoso
520a4d60fb feat(backend): Add Syndarix domain models with CRUD operations
- Add Project model with slug, description, autonomy level, and settings
- Add AgentType model for agent templates with model config and failover
- Add AgentInstance model for running agents with status and memory
- Add Issue model with external tracker sync (Gitea/GitHub/GitLab)
- Add Sprint model with velocity tracking and lifecycle management
- Add comprehensive Pydantic schemas with validation
- Add full CRUD operations for all models with filtering/sorting
- Add 280+ tests for models, schemas, and CRUD operations

Implements #23, #24, #25, #26, #27

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-30 02:07:27 +01:00
Felipe Cardoso
7ba1767cea Refactor E2E tests for OAuth provider workflows
- Renamed unused `code_verifier` variables to `_code_verifier` for clarity.
- Improved test readability by reformatting long lines and assertions.
- Streamlined `get` request calls by consolidating parameters into single lines.
2025-11-26 14:10:25 +01:00
Felipe Cardoso
803b720530 Add comprehensive E2E tests for OAuth provider workflows
- Introduced E2E test coverage for OAuth Provider mode, covering metadata discovery, client management, authorization flows, token operations, consent management, and security checks.
- Verified PKCE enforcement, consent submission, token rotation, and introspection.
- Expanded fixtures and utility methods for testing real OAuth scenarios with PostgreSQL via Testcontainers.
2025-11-26 14:06:20 +01:00
Felipe Cardoso
b3f0dd4005 Add full OAuth provider functionality and enhance flows
- Implemented OAuth 2.0 Authorization Server endpoints per RFCs, including token, introspection, revocation, and metadata discovery.
- Added user consent submission, listing, and revocation APIs alongside frontend integration for improved UX.
- Enforced stricter OAuth security measures (PKCE, state validation, scopes).
- Refactored schemas and services for consistency and expanded coverage of OAuth workflows.
- Updated documentation and type definitions for new API behaviors.
2025-11-26 13:23:44 +01:00
Felipe Cardoso
0ea428b718 Refactor tests for improved readability and fixture consistency
- Reformatted headers in E2E tests to improve readability and ensure consistent style.
- Updated confidential client fixture to use bcrypt for secret hashing, enhancing security and testing backward compatibility with legacy SHA-256 hashes.
- Added new test cases for PKCE verification, rejecting insecure 'plain' methods, and improved error handling.
- Refined session workflows and user agent handling in E2E tests for session management.
- Consolidated schema operation tests and fixed minor formatting inconsistencies.
2025-11-26 00:13:53 +01:00
Felipe Cardoso
400d6f6f75 Enhance OAuth security and state validation
- Implemented stricter OAuth security measures, including CSRF protection via state parameter validation and redirect_uri checks.
- Updated OAuth models to support timezone-aware datetime comparisons, replacing deprecated `utcnow`.
- Enhanced logging for malformed Basic auth headers during token, introspect, and revoke requests.
- Added allowlist validation for OAuth provider domains to prevent open redirect attacks.
- Improved nonce validation for OpenID Connect tokens, ensuring token integrity during Google provider flows.
- Updated E2E and unit tests to cover new security features and expanded OAuth state handling scenarios.
2025-11-25 23:50:43 +01:00
Felipe Cardoso
7716468d72 Add E2E tests for admin and organization workflows
- Introduced E2E tests for admin user and organization management workflows: user listing, creation, updates, bulk actions, and organization membership management.
- Added comprehensive tests for organization CRUD operations, membership visibility, roles, and permission validation.
- Expanded fixtures for superuser and member setup to streamline testing of admin-specific operations.
- Verified pagination, filtering, and action consistency across admin endpoints.
2025-11-25 23:50:02 +01:00
Felipe Cardoso
48f052200f Add OAuth provider mode and MCP integration
- Introduced full OAuth 2.0 Authorization Server functionality for MCP clients.
- Updated documentation with details on endpoints, scopes, and consent management.
- Added a new frontend OAuth consent page for user authorization flows.
- Implemented database models for authorization codes, refresh tokens, and user consents.
- Created unit tests for service methods (PKCE verification, client validation, scope handling).
- Included comprehensive integration tests for OAuth provider workflows.
2025-11-25 23:18:19 +01:00
Felipe Cardoso
fbb030da69 Add E2E workflow tests for organizations, users, sessions, and API contracts
- Introduced comprehensive E2E tests for organization workflows: creation, membership management, and updates.
- Added tests for user management workflows: profile viewing, updates, password changes, and settings.
- Implemented session management tests, including listing, revocation, multi-device handling, and cleanup.
- Included API contract validation tests using Schemathesis, covering protected endpoints and schema structure.
- Enhanced E2E testing infrastructure with full PostgreSQL support and detailed workflow coverage.
2025-11-25 23:13:28 +01:00
Felipe Cardoso
507f2e9c00 Refactor E2E tests and fixtures for improved readability and consistency
- Reformatted assertions in `test_database_workflows.py` for better readability.
- Refactored `postgres_url` transformation logic in `conftest.py` for improved clarity.
- Adjusted import handling in `test_api_contracts.py` to streamline usage of Hypothesis and Schemathesis libraries.
2025-11-25 22:27:11 +01:00
Felipe Cardoso
c0b253a010 Add support for E2E testing infrastructure and OAuth configurations
- Introduced make commands for E2E tests using Testcontainers and Schemathesis.
- Updated `.env.demo` with configurable OAuth settings for Google and GitHub.
- Enhanced `README.md` with updated environment setup instructions.
- Added E2E testing dependencies and markers in `pyproject.toml` for real PostgreSQL and API contract validation.
- Included new libraries (`arrow`, `attrs`, `docker`, etc.) for testing and schema validation workflows.
2025-11-25 22:24:23 +01:00