- Implemented sorting of agent types by `sort_order` in Agents page.
- Added support for category, icon, color, sort_order, typical_tasks, and collaboration_hints fields in agent creation and update actions.
- Added grid and list view modes to AgentTypeList with user preference management.
- Enhanced filtering with category selection alongside existing search and status filters.
- Updated AgentTypeDetail with category badges and improved layout.
- Added unit tests for grid/list views and category filtering in AgentTypeList.
- Introduced `@radix-ui/react-toggle-group` for view mode toggle in AgentTypeList.
- Add comprehensive test coverage for FormTextarea and FormSelect components to validate rendering, accessibility, props forwarding, error handling, and behavior.
- Introduced function-scoped fixtures in e2e tests to ensure test isolation and address event loop issues with pytest-asyncio and SQLAlchemy.
- Introduced `format-check` for verification without modification in `llm-gateway` and `knowledge-base` Makefiles.
- Updated `validate` to include `format-check`.
- Added `format-all` to root Makefile for consistent formatting across all components.
- Unexported `VIRTUAL_ENV` to prevent virtual environment warnings.
Improved code formatting, line breaks, and indentation across chunking logic and multiple test modules to enhance code clarity and maintain consistent style. No functional changes made.
Added comprehensive unit and API tests to validate AgentType category and display fields:
- Category validation for valid, null, and invalid values
- Icon, color, and sort_order field constraints
- Typical tasks and collaboration hints handling (stripping, removing empty strings, normalization)
- New API tests for field creation, filtering, updating, and grouping
Add new "Category & Display" card in Basic Info tab with:
- Category dropdown to select agent category
- Sort order input for display ordering
- Icon text input with Lucide icon name
- Color picker with hex input and visual color selector
- Typical tasks tag input for agent capabilities
- Collaboration hints tag input for agent relationships
Updates include:
- TAB_FIELD_MAPPING with new field mappings
- State and handlers for typical_tasks and collaboration_hints
- Fix tests to use getAllByRole for multiple Add buttons
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Frontend changes to support new AgentType category and display fields:
Types (agentTypes.ts):
- Add AgentTypeCategory union type with 8 categories
- Add CATEGORY_METADATA constant with labels, descriptions, colors
- Update all interfaces with new fields (category, icon, color, etc.)
- Add AgentTypeGroupedResponse type
Validation (agentType.ts):
- Add AGENT_TYPE_CATEGORIES constant with metadata
- Add AVAILABLE_ICONS constant for icon picker
- Add COLOR_PALETTE constant for color selection
- Update agentTypeFormSchema with new field validators
- Update defaultAgentTypeValues with new fields
Form updates:
- Transform function now maps category and display fields from API
Test updates:
- Add new fields to mock AgentTypeResponse objects
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add 6 new fields to AgentType for better organization and UI display:
- category: enum for grouping (development, design, quality, etc.)
- icon: Lucide icon identifier for UI
- color: hex color code for visual distinction
- sort_order: display ordering within categories
- typical_tasks: list of tasks the agent excels at
- collaboration_hints: agent slugs that work well together
Backend changes:
- Add AgentTypeCategory enum to enums.py
- Update AgentType model with 6 new columns and indexes
- Update schemas with validators for new fields
- Add category filter and /grouped endpoint to routes
- Update CRUD with get_grouped_by_category method
- Update seed data with categories for all 27 agents
- Add migration 0007
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major revamp of agent types based on SOTA personality design research:
- Expanded from 6 to 27 specialized agent types
- Rich personality prompts following Anthropic and CrewAI best practices
- Each agent has structured prompt with Core Identity, Expertise,
Principles, and Scenario Handling sections
Agent Categories:
- Core Development (8): Product Owner, PM, BA, Architect, Full Stack,
Backend, Frontend, Mobile Engineers
- Design (2): UI/UX Designer, UX Researcher
- Quality & Operations (3): QA, DevOps, Security Engineers
- AI/ML (5): AI/ML Engineer, Researcher, CV, NLP, MLOps Engineers
- Data (2): Data Scientist, Data Engineer
- Leadership (2): Technical Lead, Scrum Master
- Domain Specialists (5): Financial, Healthcare, Scientific,
Behavioral Psychology Experts, Technical Writer
Research applied:
- Anthropic Claude persona design guidelines
- CrewAI role/backstory/goal patterns
- Role prompting research on detailed vs generic personas
- Temperature tuning per agent type (0.2-0.7 based on role)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When default value is null but source has a value (e.g., description
field), the merge was discarding the source value because typeof null
!== typeof string. Now properly accepts source values for nullable fields.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds console.log statements throughout the form submission flow:
- Form submit triggered
- Current form values
- Form state (isDirty, isValid, isSubmitting, errors)
- Validation pass/fail
- onSubmit call and completion
This will help diagnose why the save button appears to do nothing.
Check browser console for '[AgentTypeForm]' logs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: The demo data's model_params was missing `top_p`, but the
Zod schema required all three fields (temperature, max_tokens, top_p).
This caused silent validation failures when editing agent types.
Fixes:
1. Add getInitialValues() that ensures all required fields have defaults
2. Handle nested validation errors in handleFormError (e.g., model_params.top_p)
3. Add useEffect to reset form when agentType changes
4. Add console.error logging for debugging validation failures
5. Update demo data to include top_p in all agent types
The form now properly initializes with safe defaults for any missing
fields from the API response, preventing silent validation failures.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When form validation fails (e.g., personality_prompt is empty), the form
would silently not submit. Now it shows a toast with the first error
and navigates to the tab containing the error field.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When running in Docker, the frontend needs to use 'http://backend:8000'
as the backend URL for Next.js rewrites. This env var is set to use
the Docker service name for proper container-to-container communication.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rewrite was using 'http://backend:8000' which only resolves inside
Docker network. When running Next.js locally (npm run dev), the hostname
'backend' doesn't exist, causing ENOTFOUND errors.
Now uses NEXT_PUBLIC_API_BASE_URL env var with fallback to localhost:8000
for local development. In Docker, set NEXT_PUBLIC_API_BASE_URL=http://backend:8000.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Dashboard changes:
- Update useDashboard hook to fetch real projects from API
- Calculate stats (active projects, agents, issues) from real data
- Keep pending approvals as mock (no backend endpoint yet)
Demo data additions:
- API Gateway Modernization project (active, complex)
- Customer Analytics Dashboard project (completed)
- DevOps Pipeline Automation project (active, complex)
- Added sprints, agent instances, and issues for each new project
Total demo data: 6 projects, 14 agents, 22 issues
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update demo_data.json to use "__admin__" as owner_email for all projects
- Add admin user lookup in load_demo_data() with special "__admin__" key
- Remove notification_email from project settings (not a valid field)
This ensures demo projects are visible to the admin user when logged in.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
register_vector() requires the vector type to exist in PostgreSQL before
it can register the type codec. Move CREATE EXTENSION to a separate
_ensure_pgvector_extension() method that runs before pool creation.
This fixes the "unknown type: public.vector" error on fresh databases.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add values_callable to all enum columns so SQLAlchemy serializes using
the enum's .value (lowercase) instead of .name (uppercase). PostgreSQL
enum types defined in migrations use lowercase values.
Fixes: invalid input value for enum autonomy_level: "MILESTONE"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SQLAlchemy's Enum() auto-generates type names from Python class names
(e.g., AutonomyLevel -> autonomylevel), but migrations defined them
with underscores (e.g., autonomy_level). This mismatch caused:
"type 'autonomylevel' does not exist"
Added explicit name parameters to all enum columns to match the
migration-defined type names:
- autonomy_level, project_status, project_complexity, client_mode
- agent_status, sprint_status
- issue_type, issue_status, issue_priority, sync_status
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete `demo_data.json` replaced by structured logic for better modularity.
- Add support for seeding default agent types and new demo data structure.
- Ensure demo mode only executes when explicitly enabled (settings.DEMO_MODE).
- Enhance logging for improved debugging during DB initialization.
- Replace default empty list with `deque` for `memory_retrieval_latency_seconds`
- Prevents unbounded memory growth by leveraging bounded circular buffer behavior
- Skip SSE connection in demo mode (MSW doesn't support SSE).
- Remove unused `useProjectEvents` and related real-time hooks from `Dashboard`.
- Temporarily disable activity feed SSE until a global endpoint is available.
- Add ABANDONED value to core Outcome enum in types.py
- Replace duplicate OutcomeType class in mcp/tools.py with alias to Outcome
- Simplify mcp/service.py to use outcome directly (no more silent mapping)
- Add migration 0006 to extend PostgreSQL episode_outcome enum
- Add missing constraints to migration 0005 (ix_facts_unique_triple_global)
This fixes the semantic issue where ABANDONED outcomes were silently
converted to FAILURE, losing information about task abandonment.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Bug Fixes:
- Remove singleton pattern from consolidation/reflection services to
prevent stale database session bugs (session is now passed per-request)
- Add LRU eviction to MemoryToolService._working dict (max 1000 sessions)
to prevent unbounded memory growth
- Replace O(n) list.remove() with O(1) OrderedDict.move_to_end() in
RetrievalCache for better performance under load
- Use deque with maxlen for metrics histograms to prevent unbounded
memory growth (circular buffer with 10k max samples)
- Use full UUID for checkpoint IDs instead of 8-char prefix to avoid
collision risk at scale (birthday paradox at ~50k checkpoints)
Test Updates:
- Update checkpoint test to expect 36-char UUID
- Update reflection singleton tests to expect new factory behavior
- Add reset_memory_reflection() no-op for backwards compatibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Connect to MCP servers concurrently instead of sequentially
- Reduce retry settings in test mode (IS_TEST=True):
- 1 attempt instead of 3
- 100ms retry delay instead of 1s
- 2s timeout instead of 30-120s
Reduces MCP E2E test time from ~16s to under 1s.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Adjusted TTL durations and sleep intervals across memory and cache tests for consistent expiration behavior.
- Prevented test flakiness caused by timing discrepancies in token expiration and cache cleanup.
- Create shallow copy of VectorIndexEntry when adding similarity score
- Prevents mutation of cached entries that could corrupt shared state
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add _escape_like_pattern() helper to escape SQL wildcards (%, _, \)
- Apply escaping in SemanticMemory.search_facts and get_by_entity
- Apply escaping in ProceduralMemory.search and find_best_for_task
Prevents attackers from injecting SQL wildcard patterns through
user-controlled search terms.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add threading.Lock with double-check locking to ScopeManager
- Add asyncio.Lock with double-check locking to MemoryReflection
- Make reset_memory_metrics async with proper locking
- Update test fixtures to handle async reset functions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change source_episode_ids from JSON to JSONB for PostgreSQL consistency
- Add unique constraint for global facts (project_id IS NULL)
- Add CHECK constraint ensuring reinforcement_count >= 1
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Moved tests/unit/models/memory/ to tests/models/memory/ to avoid
Python import path conflicts when pytest collects all tests.
The conflict was caused by tests/models/ and tests/unit/models/ both
having __init__.py files, causing Python to confuse app.models.memory
imports.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added `memory_consolidation` to the task list and updated `__all__` in test files.
- Updated `source_episode_ids` in `Fact` model to use JSON for cross-database compatibility.
- Revised related database migrations to use JSONB instead of ARRAY.
- Adjusted test concurrency in Makefile for improved test performance.
Auto-fixed linting errors and formatting issues:
- Removed unused imports (F401): pytest, Any, AnalysisType, MemoryType, OutcomeType
- Removed unused variable (F841): hooks variable in test
- Applied consistent formatting across memory service and test files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive caching layer for the Agent Memory System:
- HotMemoryCache: LRU cache for frequently accessed memories
- Python 3.12 type parameter syntax
- Thread-safe operations with RLock
- TTL-based expiration
- Access count tracking for hot memory identification
- Scoped invalidation by type, scope, or pattern
- EmbeddingCache: Cache embeddings by content hash
- Content-hash based deduplication
- Optional Redis backing for persistence
- LRU eviction with configurable max size
- CachedEmbeddingGenerator wrapper for transparent caching
- CacheManager: Unified cache management
- Coordinates hot cache, embedding cache, and retrieval cache
- Centralized invalidation across all caches
- Aggregated statistics and hit rate tracking
- Automatic cleanup scheduling
- Cache warmup support
Performance targets:
- Cache hit rate > 80% for hot memories
- Cache operations < 1ms (memory), < 5ms (Redis)
83 new tests with comprehensive coverage.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add MCP-compatible tools that expose memory operations to agents:
Tools implemented:
- remember: Store data in working, episodic, semantic, or procedural memory
- recall: Retrieve memories by query across multiple memory types
- forget: Delete specific keys or bulk delete by pattern
- reflect: Analyze patterns in recent episodes (success/failure factors)
- get_memory_stats: Return usage statistics and breakdowns
- search_procedures: Find procedures matching trigger patterns
- record_outcome: Record task outcomes and update procedure success rates
Key components:
- tools.py: Pydantic schemas for tool argument validation with comprehensive
field constraints (importance 0-1, TTL limits, limit ranges)
- service.py: MemoryToolService coordinating memory type operations with
proper scoping via ToolContext (project_id, agent_instance_id, session_id)
- Lazy initialization of memory services (WorkingMemory, EpisodicMemory,
SemanticMemory, ProceduralMemory)
Test coverage:
- 60 tests covering tool definitions, argument validation, and service
execution paths
- Mock-based tests for all memory type interactions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MemoryConsolidationService with Working→Episodic→Semantic/Procedural transfer
- Add Celery tasks for session and nightly consolidation
- Implement memory pruning with importance-based retention
- Add comprehensive test suite (32 tests)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive indexing and retrieval system for memory search:
- VectorIndex for semantic similarity search using cosine similarity
- TemporalIndex for time-based queries with range and recency support
- EntityIndex for entity-based lookups with multi-entity intersection
- OutcomeIndex for success/failure filtering on episodes
- MemoryIndexer as unified interface for all index types
- RetrievalEngine with hybrid search combining all indices
- RelevanceScorer for multi-signal relevance scoring
- RetrievalCache for LRU caching of search results
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>