Add scope management system for hierarchical memory access:
- ScopeManager with hierarchy: Global → Project → Agent Type → Agent Instance → Session
- ScopePolicy for access control (read, write, inherit permissions)
- ScopeResolver for resolving queries across scope hierarchies with inheritance
- ScopeFilter for filtering scopes by type, project, or agent
- Access control enforcement with parent scope visibility
- Deduplication support during resolution across scopes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements procedural memory for learned skills and procedures:
Core functionality:
- ProceduralMemory class for procedure storage/retrieval
- record_procedure with duplicate detection and step merging
- find_matching for context-based procedure search
- record_outcome for success/failure tracking
- get_best_procedure for finding highest success rate
- update_steps for procedure refinement
Supporting modules:
- ProcedureMatcher: Keyword-based procedure matching
- MatchResult/MatchContext: Matching result types
- Success rate weighting in match scoring
Test coverage:
- 43 unit tests covering all modules
- matching.py: 97% coverage
- memory.py: 86% coverage
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the episodic memory service for storing and retrieving
agent task execution experiences. This enables learning from past
successes and failures.
Components:
- EpisodicMemory: Main service class combining recording and retrieval
- EpisodeRecorder: Handles episode creation, importance scoring
- EpisodeRetriever: Multiple retrieval strategies (recency, semantic,
outcome, importance, task type)
Key features:
- Records task completions with context, actions, outcomes
- Calculates importance scores based on outcome, duration, lessons
- Semantic search with fallback to recency when embeddings unavailable
- Full CRUD operations with statistics and summarization
- Comprehensive unit tests (50 tests, all passing)
Closes#90🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements session-scoped ephemeral memory with:
Storage Backends:
- InMemoryStorage: Thread-safe fallback with TTL support and capacity limits
- RedisStorage: Primary storage with connection pooling and JSON serialization
- Auto-fallback from Redis to in-memory when unavailable
WorkingMemory Class:
- Key-value storage with TTL and reserved key protection
- Task state tracking with progress updates
- Scratchpad for reasoning steps with timestamps
- Checkpoint/snapshot support for recovery
- Factory methods for auto-configured storage
Tests:
- 55 unit tests covering all functionality
- Tests for basic ops, TTL, capacity, concurrency
- Tests for task state, scratchpad, checkpoints
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements Sub-Issue #87 of Issue #62 (Agent Memory System).
Core infrastructure:
- memory/types.py: Type definitions for all memory types (Working, Episodic,
Semantic, Procedural) with enums for MemoryType, ScopeLevel, Outcome
- memory/config.py: MemorySettings with MEM_ env prefix, thread-safe singleton
- memory/exceptions.py: Comprehensive exception hierarchy for memory operations
- memory/manager.py: MemoryManager facade with placeholder methods
Directory structure:
- working/: Working memory (Redis/in-memory) - to be implemented in #89
- episodic/: Episodic memory (experiences) - to be implemented in #90
- semantic/: Semantic memory (facts) - to be implemented in #91
- procedural/: Procedural memory (skills) - to be implemented in #92
- scoping/: Scope management - to be implemented in #93
- indexing/: Vector indexing - to be implemented in #94
- consolidation/: Memory consolidation - to be implemented in #95
Tests: 71 unit tests for config, types, and exceptions
Docs: Comprehensive implementation plan at docs/architecture/memory-system-plan.md
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added stricter budget validation in ContextRanker with explicit error handling for invalid configurations.
- Introduced `_get_valid_token_count()` helper to validate and safeguard token counts.
- Enhanced XML escaping in Claude adapter to prevent injection risks from scores and unhandled content.
- Added timeout enforcement for token counting, scoring, and compression with detailed error handling.
- Introduced tenant isolation in context caching using project and agent identifiers.
- Enhanced budget management with stricter checks for critical context overspending and buffer limitations.
- Optimized per-context locking with cleanup to prevent memory leaks in concurrent environments.
- Updated default assembly timeout settings for improved performance and reliability.
- Improved XML escaping in Claude adapter for safety against injection attacks.
- Standardized token estimation using model-specific ratios.
- Cleaned up unnecessary comments in `__all__` definitions for better readability.
- Adjusted indentation and formatting across modules for improved clarity (e.g., long lines, logical grouping).
- Simplified conditional expressions and inline comments for context scoring and ranking.
- Replaced some hard-coded values with type-safe annotations (e.g., `ClassVar`).
- Removed unused imports and ensured consistent usage across test files.
- Updated `test_score_not_cached_on_context` to clarify caching behavior.
- Improved truncation strategy logic and marker handling.
- Replace hard-coded limits with configurable settings (e.g., cache memory size, truncation strategy, relevance settings).
- Optimize parallel execution in token counting, scoring, and reranking for source diversity.
- Improve caching logic:
- Add per-context locks for safe parallel scoring.
- Reuse precomputed fingerprints for cache efficiency.
- Make truncation, scoring, and ranker behaviors fully configurable via settings.
- Add support for middle truncation, context hash-based hashing, and dynamic token limiting.
- Refactor methods for scalability and better error handling.
Tests: Updated all affected components with additional test cases.
Phase 7 of Context Management Engine - Main Engine:
- Add ContextEngine as main orchestration class
- Integrate all components: calculator, scorer, ranker, compressor, cache
- Add high-level assemble_context() API with:
- System prompt support
- Task description support
- Knowledge Base integration via MCP
- Conversation history conversion
- Tool results conversion
- Custom contexts support
- Add helper methods:
- get_budget_for_model()
- count_tokens() with caching
- invalidate_cache()
- get_stats()
- Add create_context_engine() factory function
Tests: 26 new tests, 311 total context tests passing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 5 of Context Management Engine - Model Adapters:
- Add ModelAdapter abstract base class with model matching
- Add DefaultAdapter for unknown models (plain text)
- Add ClaudeAdapter with XML-based formatting:
- <system_instructions> for system context
- <reference_documents>/<document> for knowledge
- <conversation_history>/<message> for chat
- <tool_results>/<tool_result> for tool outputs
- XML escaping for special characters
- Add OpenAIAdapter with markdown formatting:
- ## headers for sections
- ### Source headers for documents
- **ROLE** bold labels for conversation
- Code blocks for tool outputs
- Add get_adapter() factory function for model selection
Tests: 33 new tests, 256 total context tests passing
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add TokenCalculator with LLM Gateway integration for accurate token
counting with in-memory caching and fallback character-based estimation.
Implement TokenBudget for tracking allocations per context type with
budget enforcement, and BudgetAllocator for creating budgets based on
model context window sizes.
- TokenCalculator: MCP integration, caching, model-specific ratios
- TokenBudget: allocation tracking, can_fit/allocate/deallocate/reset
- BudgetAllocator: model context sizes, budget creation and adjustment
- 35 comprehensive tests covering all budget functionality
Part of #61 - Context Management Engine
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added `record_action` in `RateLimiter` for precise tracking of slot consumption post-validation.
- Introduced deduplication mechanism for warning alerts in `CostController` to prevent spamming.
- Refactored `CostController`'s session and daily budget alert handling for improved clarity.
- Implemented test suites for `CostController` and `SafetyGuardian` to validate changes.
- Expanded integration testing to cover deduplication, validation, and loop detection edge cases.
Improved code readability and uniformity by standardizing line breaks, indentation, and inline conditions across safety-related services, models, and tests, including content filters, validation rules, and emergency controls.
The ContentFilter was appending references to DEFAULT_PATTERNS objects,
so when tests modified patterns (e.g., disabling them), those changes
persisted across test runs. Use dataclass replace() to create copies.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MCPSafetyWrapper for safe MCP tool execution
- Add MCPToolCall/MCPToolResult models for MCP interactions
- Add SafeToolExecutor context manager
- Add SafetyMetrics collector with Prometheus export support
- Track validations, approvals, rate limits, budgets, and more
- Support for counters, gauges, and histograms
Issue #63🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add rollback manager with file checkpointing and transaction context
- Add HITL manager with approval queues and notification handlers
- Add content filter with PII, secrets, and injection detection
- Add emergency controls with stop/pause/resume capabilities
- Update SafetyConfig with checkpoint_dir setting
Issue #63🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Core MCP client implementation with comprehensive tooling:
**Services:**
- MCPClientManager: Main facade for all MCP operations
- MCPServerRegistry: Thread-safe singleton for server configs
- ConnectionPool: Connection pooling with auto-reconnection
- ToolRouter: Automatic tool routing with circuit breaker
- AsyncCircuitBreaker: Custom async-compatible circuit breaker
**Configuration:**
- YAML-based config with Pydantic models
- Environment variable expansion support
- Transport types: HTTP, SSE, STDIO
**API Endpoints:**
- GET /mcp/servers - List all MCP servers
- GET /mcp/servers/{name}/tools - List server tools
- GET /mcp/tools - List all tools from all servers
- GET /mcp/health - Health check all servers
- POST /mcp/call - Execute tool (admin only)
- GET /mcp/circuit-breakers - Circuit breaker status
- POST /mcp/circuit-breakers/{name}/reset - Reset circuit breaker
- POST /mcp/servers/{name}/reconnect - Force reconnection
**Testing:**
- 156 unit tests with comprehensive coverage
- Tests for all services, routes, and error handling
- Proper mocking and async test support
**Documentation:**
- MCP_CLIENT.md with usage examples
- Phase 2+ workflow documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reformatted multiline function calls, object definitions, and queries for improved code readability and consistency. Adjusted imports and constraints where necessary.
Infrastructure:
- Add Redis and Celery workers to all docker-compose files
- Fix celery migration race condition in entrypoint.sh
- Add healthchecks and resource limits to dev compose
- Update .env.template with Redis/Celery variables
Backend Models & Schemas:
- Rename Sprint.completed_points to velocity (per requirements)
- Add AgentInstance.name as required field
- Rename Issue external tracker fields for consistency
- Add IssueSource and TrackerType enums
- Add Project.default_tracker_type field
Backend Fixes:
- Add Celery retry configuration with exponential backoff
- Remove unused sequence counter from EventBus
- Add mypy overrides for test dependencies
- Fix test file using wrong schema (UserUpdate -> dict)
Frontend Fixes:
- Fix memory leak in useProjectEvents (proper cleanup)
- Fix race condition with stale closure in reconnection
- Sync TokenWithUser type with regenerated API client
- Fix expires_in null handling in useAuth
- Clean up unused imports in prototype pages
- Add ESLint relaxed rules for prototype files
CI/CD:
- Add E2E testing stage with Testcontainers
- Add security scanning with Trivy and pip-audit
- Add dependency caching for faster builds
Tests:
- Update all tests to use renamed fields (velocity, name, etc.)
- Fix 14 schema test failures
- All 1500 tests pass with 91% coverage
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Deleted `I18N_IMPLEMENTATION_PLAN.md` and `PROJECT_PROGRESS.md` to declutter the repository.
- These documents were finalized, no longer relevant, and superseded by implemented features and external references.
- Implemented OAuth 2.0 Authorization Server endpoints per RFCs, including token, introspection, revocation, and metadata discovery.
- Added user consent submission, listing, and revocation APIs alongside frontend integration for improved UX.
- Enforced stricter OAuth security measures (PKCE, state validation, scopes).
- Refactored schemas and services for consistency and expanded coverage of OAuth workflows.
- Updated documentation and type definitions for new API behaviors.
- Enforced stricter PKCE requirements by rejecting insecure 'plain' method for public clients.
- Transitioned client secret hashing to bcrypt for improved security and migration compatibility.
- Added constant-time comparison for state parameter validation to prevent timing attacks.
- Improved error handling and logging for OAuth workflows, including malformed headers and invalid scopes.
- Upgraded Google OIDC token validation to verify both signature and nonce.
- Refactored OAuth service methods and schemas for better readability, consistency, and compliance with RFC specifications.
- Implemented stricter OAuth security measures, including CSRF protection via state parameter validation and redirect_uri checks.
- Updated OAuth models to support timezone-aware datetime comparisons, replacing deprecated `utcnow`.
- Enhanced logging for malformed Basic auth headers during token, introspect, and revoke requests.
- Added allowlist validation for OAuth provider domains to prevent open redirect attacks.
- Improved nonce validation for OpenID Connect tokens, ensuring token integrity during Google provider flows.
- Updated E2E and unit tests to cover new security features and expanded OAuth state handling scenarios.
- Introduced full OAuth 2.0 Authorization Server functionality for MCP clients.
- Updated documentation with details on endpoints, scopes, and consent management.
- Added a new frontend OAuth consent page for user authorization flows.
- Implemented database models for authorization codes, refresh tokens, and user consents.
- Created unit tests for service methods (PKCE verification, client validation, scope handling).
- Included comprehensive integration tests for OAuth provider workflows.
- Added models for `OAuthClient`, `OAuthState`, and `OAuthAccount`.
- Created Pydantic schemas to support OAuth flows, client management, and linked accounts.
- Implemented skeleton endpoints for OAuth Provider mode: authorization, token, and revocation.
- Updated router imports to include new `/oauth` and `/oauth/provider` routes.
- Added Alembic migration script to create OAuth-related database tables.
- Enhanced `users` table to allow OAuth-only accounts by making `password_hash` nullable.
- Introduced `pyproject.toml` to centralize backend tool configurations (e.g., Ruff, mypy, coverage, pytest).
- Replaced Black, isort, and Flake8 with Ruff for linting, formatting, and import sorting.
- Updated `requirements.txt` to include Ruff and remove replaced tools.
- Added `Makefile` to streamline development workflows with commands for linting, formatting, type-checking, testing, and cleanup.
- **Authentication Refactor:** Introduce `_create_login_session` utility to streamline session creation for login and OAuth flows.
- **Configurations:** Dynamically set app name in PostgreSQL connection (`application_name`) and adjust token expiration settings (`expires_in`) based on system configuration.
Updated the `Token` schema to include `user` data and `expires_in` field. Adjusted backend `auth_service.py` to populate these fields while generating tokens. Replaced `getCurrentUserInfo` with `getCurrentUserProfile` in the frontend and disabled ESLint for generated files to suppress warnings.
- Deleted `database_async.py`, `base_async.py`, and `organization_async.py` modules due to deprecation and unused references across the project.
- Improved overall codebase clarity and minimized redundant functionality by removing unused async database logic, CRUD utilities, and organization-related operations.
- Updated `verify_password` and `get_password_hash` to their async counterparts to prevent event loop blocking.
- Replaced N+1 query patterns in `admin.py` and `session_async.py` with optimized bulk operations for improved performance.
- Enhanced `user_async.py` with bulk update and soft delete methods for efficient user management.
- Added eager loading support in CRUD operations to prevent N+1 query issues.
- Updated test cases with stronger password examples for better security representation.
- Refactored `create_user` and `change_password` methods to add transaction rollback on failures and enhanced logging for error contexts.
- Updated security utilities to use constant-time comparison (`hmac.compare_digest`) to mitigate timing attacks.
- Adjusted API responses in registration and password reset flows for better security and user experience.
- Added session invalidation after password resets to enhance account security.
- Migrated database sessions and operations to `AsyncSession` for full async support.
- Updated all service methods and dependencies (`get_db` to `get_async_db`) to support async logic.
- Refactored admin, user, organization, session-related CRUD methods, and routes with await syntax.
- Improved consistency and performance with async SQLAlchemy patterns.
- Enhanced logging and error handling for async context.
- Introduced session management endpoints to list, revoke, and cleanup sessions per user.
- Added cron-based job for periodic cleanup of expired sessions.
- Implemented `CRUDSession` for session-specific database operations.
- Integrated session cleanup startup and shutdown events in the application lifecycle.
- Enhanced CORS configuration to include `X-Device-Id` for session tracking.
- Added comprehensive integration tests for multi-device login, per-device logout, session listing, and cleanup logic.
- Introduced endpoints for requesting and confirming password resets.
- Implemented token-based password reset logic with validation checks.
- Added `EmailService` with `ConsoleEmailBackend` and placeholder for SMTP backend.
- Integrated password reset flow in `auth` API routes with rate limiting.
- Updated schemas for password reset requests and token confirmation.
- Refined validation for secure password updates and token verification.
- Enhanced configuration with `FRONTEND_URL` for email links.
This commit introduces a suite of tests for user models, schemas, CRUD operations, and authentication services. It also adds utilities for in-memory database setup to support these tests and updates environment settings for consistency.