feat(mcp): Prompt Management System #67

Open
opened 2026-01-03 09:15:53 +00:00 by cardosofelipe · 0 comments

Overview

Implement a system for managing prompts as first-class engineering artifacts. Prompts are the "code" that instructs AI agents - they need versioning, testing, templating, and optimization just like any other code. This system prevents prompt drift, enables A/B testing, and allows model-specific optimizations.

Parent Epic

  • Epic #60: [EPIC] Phase 2: MCP Integration

Why This Is Critical

The Problem

  • Prompts are scattered across code, hard to find and update
  • No version control for prompts (changes break things silently)
  • Same prompt doesn't work equally well across models
  • No way to test if a prompt change is an improvement
  • Prompt duplication leads to inconsistency
  • No optimization feedback loop

The Solution

A comprehensive prompt management system that:

  1. Centralizes prompts in a versioned repository
  2. Templates prompts with composable parts
  3. Adapts per model with model-specific variants
  4. Tests prompts with regression detection
  5. Optimizes prompts based on performance data

Implementation Sub-Tasks

1. Project Setup & Architecture

  • Create backend/src/mcp_core/prompts/ directory
  • Create __init__.py with public API exports
  • Create manager.py with PromptManager class
  • Create config.py with Pydantic settings
  • Define prompt schema standards
  • Design storage architecture (files + database)
  • Write architecture decision record (ADR)

2. Prompt Schema & Types

  • Create schema/prompt.py with prompt definitions
  • Define base Prompt model
  • Define SystemPrompt for agent personas
  • Define TaskPrompt for specific tasks
  • Define ToolPrompt for tool descriptions
  • Define EvaluationPrompt for LLM-as-judge
  • Define ChainPrompt for multi-step workflows
  • Create prompt metadata schema
  • Add prompt validation
  • Write schema tests

3. Prompt Templating

  • Create templating/engine.py with template engine
  • Implement Jinja2-based templating
  • Create variable injection system
  • Implement conditional sections
  • Create loop support for dynamic content
  • Implement template inheritance
  • Create partial templates (reusable sections)
  • Add template validation
  • Implement safe variable escaping
  • Write templating tests

4. Prompt Composition

  • Create composition/composer.py with composition
  • Implement section-based composition
  • Create prompt modules (reusable blocks)
  • Implement priority-based section ordering
  • Create conditional inclusion
  • Implement token-budget-aware composition
  • Add composition validation
  • Create composition debugging
  • Write composition tests

5. Prompt Versioning

  • Create versioning/manager.py with version control
  • Implement semantic versioning for prompts
  • Create version history tracking
  • Implement diff between versions
  • Create version rollback
  • Implement branching (experiment without affecting main)
  • Add version metadata (author, date, changelog)
  • Create version migration tools
  • Write versioning tests

6. Model-Specific Variants

  • Create variants/manager.py with variant management
  • Implement base prompt + model overrides
  • Create variant selection logic
  • Define model capability profiles
  • Implement variant inheritance
  • Create variant validation
  • Add variant A/B testing
  • Write variant tests

7. Prompt Storage

  • Create storage/repository.py with storage layer
  • Implement file-based storage (YAML/JSON)
  • Implement database storage for dynamic prompts
  • Create hybrid storage (files for version control, DB for runtime)
  • Implement caching layer
  • Create import/export functionality
  • Add backup/restore
  • Write storage tests

8. Prompt Registry

  • Create registry/registry.py with prompt registry
  • Implement prompt registration
  • Create prompt discovery
  • Implement namespace management
  • Create prompt tagging
  • Implement dependency tracking
  • Add registry search
  • Write registry tests

9. Prompt Optimization

  • Create optimization/optimizer.py with optimization
  • Implement token counting and reduction
  • Create clarity analysis
  • Implement redundancy detection
  • Create optimization suggestions
  • Implement automated optimization (with approval)
  • Add optimization metrics
  • Write optimization tests

10. Prompt Testing

  • Create testing/tester.py with prompt testing
  • Implement golden test integration
  • Create regression detection
  • Implement quality scoring
  • Create coverage analysis
  • Implement A/B testing framework
  • Add test reporting
  • Write testing tests

11. Prompt Analytics

  • Create analytics/collector.py with analytics
  • Track prompt usage frequency
  • Track prompt success rates
  • Track token usage per prompt
  • Track response quality per prompt
  • Create prompt performance dashboards
  • Implement trend analysis
  • Add analytics alerts
  • Write analytics tests

12. Prompt Validation

  • Create validation/validator.py with validation
  • Implement syntax validation
  • Create token limit validation
  • Implement variable usage validation
  • Create consistency validation
  • Implement style guide validation
  • Add validation hooks (pre-save, pre-use)
  • Write validation tests

13. Prompt Security

  • Create security/scanner.py with security scanning
  • Detect prompt injection vulnerabilities
  • Detect data leakage risks
  • Implement sensitive data detection
  • Create security scoring
  • Add security alerts
  • Write security tests

14. MCP Integration

  • Create get_prompt tool - Retrieve prompt by name/version
  • Create list_prompts tool - List available prompts
  • Create render_prompt tool - Render with variables
  • Create validate_prompt tool - Validate a prompt
  • Create get_prompt_stats tool - Get prompt analytics
  • Create suggest_prompt tool - Get prompt recommendations
  • Write MCP tool tests

15. CLI & Admin Tools

  • Create cli/prompt_cli.py with CLI commands
  • Implement prompt list command
  • Implement prompt show command
  • Implement prompt validate command
  • Implement prompt test command
  • Implement prompt diff command
  • Implement prompt optimize command
  • Write CLI tests

16. Testing

  • Write unit tests for all components
  • Write integration tests for full system
  • Create end-to-end prompt lifecycle tests
  • Write performance benchmarks
  • Achieve >90% code coverage
  • Create regression test suite

17. Documentation

  • Write README with system overview
  • Document prompt schema
  • Document templating syntax
  • Document versioning workflow
  • Document model variants
  • Create prompt writing guide
  • Add troubleshooting guide
  • Create best practices

Technical Specifications

Prompt Schema

class Prompt(BaseModel):
    # Identity
    name: str  # Unique identifier
    version: str  # Semantic version
    namespace: str  # e.g., "agents.product_owner"
    
    # Content
    content: str  # The actual prompt text
    template_type: Literal["static", "jinja2"]
    
    # Variables (for templates)
    variables: list[PromptVariable]
    
    # Metadata
    description: str
    author: str
    created_at: datetime
    updated_at: datetime
    
    # Model compatibility
    base_model: str | None  # Model this was optimized for
    variants: dict[str, str]  # model_id -> variant content
    
    # Dependencies
    includes: list[str]  # Other prompts this depends on
    modules: list[str]  # Reusable modules included
    
    # Testing
    golden_tests: list[GoldenTest]
    quality_score: float | None
    
    # Usage
    tags: list[str]
    category: str

class PromptVariable(BaseModel):
    name: str
    type: Literal["string", "number", "list", "object"]
    required: bool
    default: Any | None
    description: str
    validation: str | None  # Regex or validator name

Prompt Template Example

name: agent.product_owner.requirements_discovery
version: 1.2.0
namespace: agents.product_owner
template_type: jinja2

content: |
  You are an experienced Product Owner helping to discover and refine requirements.
  
  ## Context
  Project: {{ project_name }}
  Current Phase: {{ phase }}
  
  ## Your Objectives
  {% for objective in objectives %}
  - {{ objective }}
  {% endfor %}
  
  ## Guidelines
  {% include "modules/questioning_techniques.yaml" %}
  
  ## Constraints
  {% if autonomy_level == "full_control" %}
  Always confirm decisions with the user before proceeding.
  {% else %}
  You may proceed with minor decisions autonomously.
  {% endif %}
  
  ## Available Tools
  {% for tool in tools %}
  - {{ tool.name }}: {{ tool.description }}
  {% endfor %}

variables:
  - name: project_name
    type: string
    required: true
    description: Name of the current project
  - name: phase
    type: string
    required: true
    description: Current project phase
  - name: objectives
    type: list
    required: true
    description: List of objectives for this session
  - name: autonomy_level
    type: string
    required: false
    default: "milestone"
    description: Agent autonomy level
  - name: tools
    type: list
    required: true
    description: Available tools for this agent

variants:
  claude-3-opus: |
    # Same content but with Claude-optimized formatting using XML tags
    <context>
    Project: {{ project_name }}
    ...
  gpt-4: |
    # Same content but with GPT-optimized formatting
    ...

golden_tests:
  - name: basic_requirements_discovery
    variables:
      project_name: "E-Commerce Platform"
      phase: "discovery"
      objectives: ["Understand user needs", "Define MVP scope"]
      tools: [{"name": "ask_question", "description": "Ask user a question"}]
    expected_behavior: "Should ask clarifying questions about user needs"

Prompt Versioning Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                          Prompt Lifecycle                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐                 │
│  │  Draft   │──▶│  Review  │──▶│   Test   │──▶│  Active  │                 │
│  │  v0.0.x  │   │  v0.x.x  │   │  v1.0.0  │   │  v1.x.x  │                 │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘                 │
│       │              │              │              │                        │
│       │              │              │              ▼                        │
│       │              │              │         ┌──────────┐                  │
│       │              │              │         │ Archived │                  │
│       │              │              │         └──────────┘                  │
│       │              │              │                                       │
│       ▼              ▼              ▼                                       │
│  ┌────────────────────────────────────────────────────────┐                │
│  │                  Version Control                        │                │
│  │  - All changes tracked                                  │                │
│  │  - Diff between versions                                │                │
│  │  - Rollback capability                                  │                │
│  │  - Branch for experiments                               │                │
│  └────────────────────────────────────────────────────────┘                │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Acceptance Criteria

  • All prompts are centrally managed
  • Prompt versioning tracks all changes
  • Templates render correctly with all variable types
  • Model variants work correctly for 3+ models
  • Prompt testing catches regressions
  • Optimization reduces token usage by ≥10%
  • Analytics track all prompt usage
  • Security scanning detects injection risks
  • >90% test coverage
  • Documentation complete with examples

Labels

phase-2, mcp, backend, prompts, quality

Milestone

Phase 2: MCP Integration

## Overview Implement a system for managing prompts as first-class engineering artifacts. Prompts are the "code" that instructs AI agents - they need versioning, testing, templating, and optimization just like any other code. This system prevents prompt drift, enables A/B testing, and allows model-specific optimizations. ## Parent Epic - Epic #60: [EPIC] Phase 2: MCP Integration ## Why This Is Critical ### The Problem - Prompts are scattered across code, hard to find and update - No version control for prompts (changes break things silently) - Same prompt doesn't work equally well across models - No way to test if a prompt change is an improvement - Prompt duplication leads to inconsistency - No optimization feedback loop ### The Solution A comprehensive prompt management system that: 1. **Centralizes prompts** in a versioned repository 2. **Templates prompts** with composable parts 3. **Adapts per model** with model-specific variants 4. **Tests prompts** with regression detection 5. **Optimizes prompts** based on performance data --- ## Implementation Sub-Tasks ### 1. Project Setup & Architecture - [ ] Create `backend/src/mcp_core/prompts/` directory - [ ] Create `__init__.py` with public API exports - [ ] Create `manager.py` with `PromptManager` class - [ ] Create `config.py` with Pydantic settings - [ ] Define prompt schema standards - [ ] Design storage architecture (files + database) - [ ] Write architecture decision record (ADR) ### 2. Prompt Schema & Types - [ ] Create `schema/prompt.py` with prompt definitions - [ ] Define base `Prompt` model - [ ] Define `SystemPrompt` for agent personas - [ ] Define `TaskPrompt` for specific tasks - [ ] Define `ToolPrompt` for tool descriptions - [ ] Define `EvaluationPrompt` for LLM-as-judge - [ ] Define `ChainPrompt` for multi-step workflows - [ ] Create prompt metadata schema - [ ] Add prompt validation - [ ] Write schema tests ### 3. Prompt Templating - [ ] Create `templating/engine.py` with template engine - [ ] Implement Jinja2-based templating - [ ] Create variable injection system - [ ] Implement conditional sections - [ ] Create loop support for dynamic content - [ ] Implement template inheritance - [ ] Create partial templates (reusable sections) - [ ] Add template validation - [ ] Implement safe variable escaping - [ ] Write templating tests ### 4. Prompt Composition - [ ] Create `composition/composer.py` with composition - [ ] Implement section-based composition - [ ] Create prompt modules (reusable blocks) - [ ] Implement priority-based section ordering - [ ] Create conditional inclusion - [ ] Implement token-budget-aware composition - [ ] Add composition validation - [ ] Create composition debugging - [ ] Write composition tests ### 5. Prompt Versioning - [ ] Create `versioning/manager.py` with version control - [ ] Implement semantic versioning for prompts - [ ] Create version history tracking - [ ] Implement diff between versions - [ ] Create version rollback - [ ] Implement branching (experiment without affecting main) - [ ] Add version metadata (author, date, changelog) - [ ] Create version migration tools - [ ] Write versioning tests ### 6. Model-Specific Variants - [ ] Create `variants/manager.py` with variant management - [ ] Implement base prompt + model overrides - [ ] Create variant selection logic - [ ] Define model capability profiles - [ ] Implement variant inheritance - [ ] Create variant validation - [ ] Add variant A/B testing - [ ] Write variant tests ### 7. Prompt Storage - [ ] Create `storage/repository.py` with storage layer - [ ] Implement file-based storage (YAML/JSON) - [ ] Implement database storage for dynamic prompts - [ ] Create hybrid storage (files for version control, DB for runtime) - [ ] Implement caching layer - [ ] Create import/export functionality - [ ] Add backup/restore - [ ] Write storage tests ### 8. Prompt Registry - [ ] Create `registry/registry.py` with prompt registry - [ ] Implement prompt registration - [ ] Create prompt discovery - [ ] Implement namespace management - [ ] Create prompt tagging - [ ] Implement dependency tracking - [ ] Add registry search - [ ] Write registry tests ### 9. Prompt Optimization - [ ] Create `optimization/optimizer.py` with optimization - [ ] Implement token counting and reduction - [ ] Create clarity analysis - [ ] Implement redundancy detection - [ ] Create optimization suggestions - [ ] Implement automated optimization (with approval) - [ ] Add optimization metrics - [ ] Write optimization tests ### 10. Prompt Testing - [ ] Create `testing/tester.py` with prompt testing - [ ] Implement golden test integration - [ ] Create regression detection - [ ] Implement quality scoring - [ ] Create coverage analysis - [ ] Implement A/B testing framework - [ ] Add test reporting - [ ] Write testing tests ### 11. Prompt Analytics - [ ] Create `analytics/collector.py` with analytics - [ ] Track prompt usage frequency - [ ] Track prompt success rates - [ ] Track token usage per prompt - [ ] Track response quality per prompt - [ ] Create prompt performance dashboards - [ ] Implement trend analysis - [ ] Add analytics alerts - [ ] Write analytics tests ### 12. Prompt Validation - [ ] Create `validation/validator.py` with validation - [ ] Implement syntax validation - [ ] Create token limit validation - [ ] Implement variable usage validation - [ ] Create consistency validation - [ ] Implement style guide validation - [ ] Add validation hooks (pre-save, pre-use) - [ ] Write validation tests ### 13. Prompt Security - [ ] Create `security/scanner.py` with security scanning - [ ] Detect prompt injection vulnerabilities - [ ] Detect data leakage risks - [ ] Implement sensitive data detection - [ ] Create security scoring - [ ] Add security alerts - [ ] Write security tests ### 14. MCP Integration - [ ] Create `get_prompt` tool - Retrieve prompt by name/version - [ ] Create `list_prompts` tool - List available prompts - [ ] Create `render_prompt` tool - Render with variables - [ ] Create `validate_prompt` tool - Validate a prompt - [ ] Create `get_prompt_stats` tool - Get prompt analytics - [ ] Create `suggest_prompt` tool - Get prompt recommendations - [ ] Write MCP tool tests ### 15. CLI & Admin Tools - [ ] Create `cli/prompt_cli.py` with CLI commands - [ ] Implement `prompt list` command - [ ] Implement `prompt show` command - [ ] Implement `prompt validate` command - [ ] Implement `prompt test` command - [ ] Implement `prompt diff` command - [ ] Implement `prompt optimize` command - [ ] Write CLI tests ### 16. Testing - [ ] Write unit tests for all components - [ ] Write integration tests for full system - [ ] Create end-to-end prompt lifecycle tests - [ ] Write performance benchmarks - [ ] Achieve >90% code coverage - [ ] Create regression test suite ### 17. Documentation - [ ] Write README with system overview - [ ] Document prompt schema - [ ] Document templating syntax - [ ] Document versioning workflow - [ ] Document model variants - [ ] Create prompt writing guide - [ ] Add troubleshooting guide - [ ] Create best practices --- ## Technical Specifications ### Prompt Schema ```python class Prompt(BaseModel): # Identity name: str # Unique identifier version: str # Semantic version namespace: str # e.g., "agents.product_owner" # Content content: str # The actual prompt text template_type: Literal["static", "jinja2"] # Variables (for templates) variables: list[PromptVariable] # Metadata description: str author: str created_at: datetime updated_at: datetime # Model compatibility base_model: str | None # Model this was optimized for variants: dict[str, str] # model_id -> variant content # Dependencies includes: list[str] # Other prompts this depends on modules: list[str] # Reusable modules included # Testing golden_tests: list[GoldenTest] quality_score: float | None # Usage tags: list[str] category: str class PromptVariable(BaseModel): name: str type: Literal["string", "number", "list", "object"] required: bool default: Any | None description: str validation: str | None # Regex or validator name ``` ### Prompt Template Example ```yaml name: agent.product_owner.requirements_discovery version: 1.2.0 namespace: agents.product_owner template_type: jinja2 content: | You are an experienced Product Owner helping to discover and refine requirements. ## Context Project: {{ project_name }} Current Phase: {{ phase }} ## Your Objectives {% for objective in objectives %} - {{ objective }} {% endfor %} ## Guidelines {% include "modules/questioning_techniques.yaml" %} ## Constraints {% if autonomy_level == "full_control" %} Always confirm decisions with the user before proceeding. {% else %} You may proceed with minor decisions autonomously. {% endif %} ## Available Tools {% for tool in tools %} - {{ tool.name }}: {{ tool.description }} {% endfor %} variables: - name: project_name type: string required: true description: Name of the current project - name: phase type: string required: true description: Current project phase - name: objectives type: list required: true description: List of objectives for this session - name: autonomy_level type: string required: false default: "milestone" description: Agent autonomy level - name: tools type: list required: true description: Available tools for this agent variants: claude-3-opus: | # Same content but with Claude-optimized formatting using XML tags <context> Project: {{ project_name }} ... gpt-4: | # Same content but with GPT-optimized formatting ... golden_tests: - name: basic_requirements_discovery variables: project_name: "E-Commerce Platform" phase: "discovery" objectives: ["Understand user needs", "Define MVP scope"] tools: [{"name": "ask_question", "description": "Ask user a question"}] expected_behavior: "Should ask clarifying questions about user needs" ``` ### Prompt Versioning Flow ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Prompt Lifecycle │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Draft │──▶│ Review │──▶│ Test │──▶│ Active │ │ │ │ v0.0.x │ │ v0.x.x │ │ v1.0.0 │ │ v1.x.x │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ │ ┌──────────┐ │ │ │ │ │ │ Archived │ │ │ │ │ │ └──────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ Version Control │ │ │ │ - All changes tracked │ │ │ │ - Diff between versions │ │ │ │ - Rollback capability │ │ │ │ - Branch for experiments │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` --- ## Acceptance Criteria - [ ] All prompts are centrally managed - [ ] Prompt versioning tracks all changes - [ ] Templates render correctly with all variable types - [ ] Model variants work correctly for 3+ models - [ ] Prompt testing catches regressions - [ ] Optimization reduces token usage by ≥10% - [ ] Analytics track all prompt usage - [ ] Security scanning detects injection risks - [ ] >90% test coverage - [ ] Documentation complete with examples --- ## Labels `phase-2`, `mcp`, `backend`, `prompts`, `quality` ## Milestone Phase 2: MCP Integration
Sign in to join this conversation.