feat(mcp): Prompt Management System #67

New Issue

cardosofelipe · 2026-01-03T09:15:53Z

cardosofelipe commented

2026-01-03 09:15:53 +00:00

Overview

Implement a system for managing prompts as first-class engineering artifacts. Prompts are the "code" that instructs AI agents - they need versioning, testing, templating, and optimization just like any other code. This system prevents prompt drift, enables A/B testing, and allows model-specific optimizations.

Parent Epic

Epic #60: [EPIC] Phase 2: MCP Integration

Why This Is Critical

The Problem

Prompts are scattered across code, hard to find and update
No version control for prompts (changes break things silently)
Same prompt doesn't work equally well across models
No way to test if a prompt change is an improvement
Prompt duplication leads to inconsistency
No optimization feedback loop

The Solution

A comprehensive prompt management system that:

Centralizes prompts in a versioned repository
Templates prompts with composable parts
Adapts per model with model-specific variants
Tests prompts with regression detection
Optimizes prompts based on performance data

Implementation Sub-Tasks

1. Project Setup & Architecture

Create backend/src/mcp_core/prompts/ directory
Create __init__.py with public API exports
Create manager.py with PromptManager class
Create config.py with Pydantic settings
Define prompt schema standards
Design storage architecture (files + database)
Write architecture decision record (ADR)

2. Prompt Schema & Types

Create schema/prompt.py with prompt definitions
Define base Prompt model
Define SystemPrompt for agent personas
Define TaskPrompt for specific tasks
Define ToolPrompt for tool descriptions
Define EvaluationPrompt for LLM-as-judge
Define ChainPrompt for multi-step workflows
Create prompt metadata schema
Add prompt validation
Write schema tests

3. Prompt Templating

Create templating/engine.py with template engine
Implement Jinja2-based templating
Create variable injection system
Implement conditional sections
Create loop support for dynamic content
Implement template inheritance
Create partial templates (reusable sections)
Add template validation
Implement safe variable escaping
Write templating tests

4. Prompt Composition

Create composition/composer.py with composition
Implement section-based composition
Create prompt modules (reusable blocks)
Implement priority-based section ordering
Create conditional inclusion
Implement token-budget-aware composition
Add composition validation
Create composition debugging
Write composition tests

5. Prompt Versioning

Create versioning/manager.py with version control
Implement semantic versioning for prompts
Create version history tracking
Implement diff between versions
Create version rollback
Implement branching (experiment without affecting main)
Add version metadata (author, date, changelog)
Create version migration tools
Write versioning tests

6. Model-Specific Variants

Create variants/manager.py with variant management
Implement base prompt + model overrides
Create variant selection logic
Define model capability profiles
Implement variant inheritance
Create variant validation
Add variant A/B testing
Write variant tests

7. Prompt Storage

Create storage/repository.py with storage layer
Implement file-based storage (YAML/JSON)
Implement database storage for dynamic prompts
Create hybrid storage (files for version control, DB for runtime)
Implement caching layer
Create import/export functionality
Add backup/restore
Write storage tests

8. Prompt Registry

Create registry/registry.py with prompt registry
Implement prompt registration
Create prompt discovery
Implement namespace management
Create prompt tagging
Implement dependency tracking
Add registry search
Write registry tests

9. Prompt Optimization

Create optimization/optimizer.py with optimization
Implement token counting and reduction
Create clarity analysis
Implement redundancy detection
Create optimization suggestions
Implement automated optimization (with approval)
Add optimization metrics
Write optimization tests

10. Prompt Testing

Create testing/tester.py with prompt testing
Implement golden test integration
Create regression detection
Implement quality scoring
Create coverage analysis
Implement A/B testing framework
Add test reporting
Write testing tests

11. Prompt Analytics

Create analytics/collector.py with analytics
Track prompt usage frequency
Track prompt success rates
Track token usage per prompt
Track response quality per prompt
Create prompt performance dashboards
Implement trend analysis
Add analytics alerts
Write analytics tests

12. Prompt Validation

Create validation/validator.py with validation
Implement syntax validation
Create token limit validation
Implement variable usage validation
Create consistency validation
Implement style guide validation
Add validation hooks (pre-save, pre-use)
Write validation tests

13. Prompt Security

Create security/scanner.py with security scanning
Detect prompt injection vulnerabilities
Detect data leakage risks
Implement sensitive data detection
Create security scoring
Add security alerts
Write security tests

14. MCP Integration

Create get_prompt tool - Retrieve prompt by name/version
Create list_prompts tool - List available prompts
Create render_prompt tool - Render with variables
Create validate_prompt tool - Validate a prompt
Create get_prompt_stats tool - Get prompt analytics
Create suggest_prompt tool - Get prompt recommendations
Write MCP tool tests

15. CLI & Admin Tools

Create cli/prompt_cli.py with CLI commands
Implement prompt list command
Implement prompt show command
Implement prompt validate command
Implement prompt test command
Implement prompt diff command
Implement prompt optimize command
Write CLI tests

16. Testing

Write unit tests for all components
Write integration tests for full system
Create end-to-end prompt lifecycle tests
Write performance benchmarks
Achieve >90% code coverage
Create regression test suite

17. Documentation

Write README with system overview
Document prompt schema
Document templating syntax
Document versioning workflow
Document model variants
Create prompt writing guide
Add troubleshooting guide
Create best practices

Technical Specifications

Prompt Schema

class Prompt(BaseModel):
    # Identity
    name: str  # Unique identifier
    version: str  # Semantic version
    namespace: str  # e.g., "agents.product_owner"
    
    # Content
    content: str  # The actual prompt text
    template_type: Literal["static", "jinja2"]
    
    # Variables (for templates)
    variables: list[PromptVariable]
    
    # Metadata
    description: str
    author: str
    created_at: datetime
    updated_at: datetime
    
    # Model compatibility
    base_model: str | None  # Model this was optimized for
    variants: dict[str, str]  # model_id -> variant content
    
    # Dependencies
    includes: list[str]  # Other prompts this depends on
    modules: list[str]  # Reusable modules included
    
    # Testing
    golden_tests: list[GoldenTest]
    quality_score: float | None
    
    # Usage
    tags: list[str]
    category: str

class PromptVariable(BaseModel):
    name: str
    type: Literal["string", "number", "list", "object"]
    required: bool
    default: Any | None
    description: str
    validation: str | None  # Regex or validator name

Prompt Template Example

name: agent.product_owner.requirements_discovery
version: 1.2.0
namespace: agents.product_owner
template_type: jinja2

content: |
  You are an experienced Product Owner helping to discover and refine requirements.
  
  ## Context
  Project: {{ project_name }}
  Current Phase: {{ phase }}
  
  ## Your Objectives
  {% for objective in objectives %}
  - {{ objective }}
  {% endfor %}
  
  ## Guidelines
  {% include "modules/questioning_techniques.yaml" %}
  
  ## Constraints
  {% if autonomy_level == "full_control" %}
  Always confirm decisions with the user before proceeding.
  {% else %}
  You may proceed with minor decisions autonomously.
  {% endif %}
  
  ## Available Tools
  {% for tool in tools %}
  - {{ tool.name }}: {{ tool.description }}
  {% endfor %}

variables:
  - name: project_name
    type: string
    required: true
    description: Name of the current project
  - name: phase
    type: string
    required: true
    description: Current project phase
  - name: objectives
    type: list
    required: true
    description: List of objectives for this session
  - name: autonomy_level
    type: string
    required: false
    default: "milestone"
    description: Agent autonomy level
  - name: tools
    type: list
    required: true
    description: Available tools for this agent

variants:
  claude-3-opus: |
    # Same content but with Claude-optimized formatting using XML tags
    <context>
    Project: {{ project_name }}
    ...
  gpt-4: |
    # Same content but with GPT-optimized formatting
    ...

golden_tests:
  - name: basic_requirements_discovery
    variables:
      project_name: "E-Commerce Platform"
      phase: "discovery"
      objectives: ["Understand user needs", "Define MVP scope"]
      tools: [{"name": "ask_question", "description": "Ask user a question"}]
    expected_behavior: "Should ask clarifying questions about user needs"

Prompt Versioning Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                          Prompt Lifecycle                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐                 │
│  │  Draft   │──▶│  Review  │──▶│   Test   │──▶│  Active  │                 │
│  │  v0.0.x  │   │  v0.x.x  │   │  v1.0.0  │   │  v1.x.x  │                 │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘                 │
│       │              │              │              │                        │
│       │              │              │              ▼                        │
│       │              │              │         ┌──────────┐                  │
│       │              │              │         │ Archived │                  │
│       │              │              │         └──────────┘                  │
│       │              │              │                                       │
│       ▼              ▼              ▼                                       │
│  ┌────────────────────────────────────────────────────────┐                │
│  │                  Version Control                        │                │
│  │  - All changes tracked                                  │                │
│  │  - Diff between versions                                │                │
│  │  - Rollback capability                                  │                │
│  │  - Branch for experiments                               │                │
│  └────────────────────────────────────────────────────────┘                │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Acceptance Criteria

All prompts are centrally managed
Prompt versioning tracks all changes
Templates render correctly with all variable types
Model variants work correctly for 3+ models
Prompt testing catches regressions
Optimization reduces token usage by ≥10%
Analytics track all prompt usage
Security scanning detects injection risks
>90% test coverage
Documentation complete with examples

Labels

phase-2, mcp, backend, prompts, quality

Milestone

Phase 2: MCP Integration

## Overview Implement a system for managing prompts as first-class engineering artifacts. Prompts are the "code" that instructs AI agents - they need versioning, testing, templating, and optimization just like any other code. This system prevents prompt drift, enables A/B testing, and allows model-specific optimizations. ## Parent Epic - Epic #60: [EPIC] Phase 2: MCP Integration ## Why This Is Critical ### The Problem - Prompts are scattered across code, hard to find and update - No version control for prompts (changes break things silently) - Same prompt doesn't work equally well across models - No way to test if a prompt change is an improvement - Prompt duplication leads to inconsistency - No optimization feedback loop ### The Solution A comprehensive prompt management system that: 1. **Centralizes prompts** in a versioned repository 2. **Templates prompts** with composable parts 3. **Adapts per model** with model-specific variants 4. **Tests prompts** with regression detection 5. **Optimizes prompts** based on performance data --- ## Implementation Sub-Tasks ### 1. Project Setup & Architecture - [ ] Create `backend/src/mcp_core/prompts/` directory - [ ] Create `__init__.py` with public API exports - [ ] Create `manager.py` with `PromptManager` class - [ ] Create `config.py` with Pydantic settings - [ ] Define prompt schema standards - [ ] Design storage architecture (files + database) - [ ] Write architecture decision record (ADR) ### 2. Prompt Schema & Types - [ ] Create `schema/prompt.py` with prompt definitions - [ ] Define base `Prompt` model - [ ] Define `SystemPrompt` for agent personas - [ ] Define `TaskPrompt` for specific tasks - [ ] Define `ToolPrompt` for tool descriptions - [ ] Define `EvaluationPrompt` for LLM-as-judge - [ ] Define `ChainPrompt` for multi-step workflows - [ ] Create prompt metadata schema - [ ] Add prompt validation - [ ] Write schema tests ### 3. Prompt Templating - [ ] Create `templating/engine.py` with template engine - [ ] Implement Jinja2-based templating - [ ] Create variable injection system - [ ] Implement conditional sections - [ ] Create loop support for dynamic content - [ ] Implement template inheritance - [ ] Create partial templates (reusable sections) - [ ] Add template validation - [ ] Implement safe variable escaping - [ ] Write templating tests ### 4. Prompt Composition - [ ] Create `composition/composer.py` with composition - [ ] Implement section-based composition - [ ] Create prompt modules (reusable blocks) - [ ] Implement priority-based section ordering - [ ] Create conditional inclusion - [ ] Implement token-budget-aware composition - [ ] Add composition validation - [ ] Create composition debugging - [ ] Write composition tests ### 5. Prompt Versioning - [ ] Create `versioning/manager.py` with version control - [ ] Implement semantic versioning for prompts - [ ] Create version history tracking - [ ] Implement diff between versions - [ ] Create version rollback - [ ] Implement branching (experiment without affecting main) - [ ] Add version metadata (author, date, changelog) - [ ] Create version migration tools - [ ] Write versioning tests ### 6. Model-Specific Variants - [ ] Create `variants/manager.py` with variant management - [ ] Implement base prompt + model overrides - [ ] Create variant selection logic - [ ] Define model capability profiles - [ ] Implement variant inheritance - [ ] Create variant validation - [ ] Add variant A/B testing - [ ] Write variant tests ### 7. Prompt Storage - [ ] Create `storage/repository.py` with storage layer - [ ] Implement file-based storage (YAML/JSON) - [ ] Implement database storage for dynamic prompts - [ ] Create hybrid storage (files for version control, DB for runtime) - [ ] Implement caching layer - [ ] Create import/export functionality - [ ] Add backup/restore - [ ] Write storage tests ### 8. Prompt Registry - [ ] Create `registry/registry.py` with prompt registry - [ ] Implement prompt registration - [ ] Create prompt discovery - [ ] Implement namespace management - [ ] Create prompt tagging - [ ] Implement dependency tracking - [ ] Add registry search - [ ] Write registry tests ### 9. Prompt Optimization - [ ] Create `optimization/optimizer.py` with optimization - [ ] Implement token counting and reduction - [ ] Create clarity analysis - [ ] Implement redundancy detection - [ ] Create optimization suggestions - [ ] Implement automated optimization (with approval) - [ ] Add optimization metrics - [ ] Write optimization tests ### 10. Prompt Testing - [ ] Create `testing/tester.py` with prompt testing - [ ] Implement golden test integration - [ ] Create regression detection - [ ] Implement quality scoring - [ ] Create coverage analysis - [ ] Implement A/B testing framework - [ ] Add test reporting - [ ] Write testing tests ### 11. Prompt Analytics - [ ] Create `analytics/collector.py` with analytics - [ ] Track prompt usage frequency - [ ] Track prompt success rates - [ ] Track token usage per prompt - [ ] Track response quality per prompt - [ ] Create prompt performance dashboards - [ ] Implement trend analysis - [ ] Add analytics alerts - [ ] Write analytics tests ### 12. Prompt Validation - [ ] Create `validation/validator.py` with validation - [ ] Implement syntax validation - [ ] Create token limit validation - [ ] Implement variable usage validation - [ ] Create consistency validation - [ ] Implement style guide validation - [ ] Add validation hooks (pre-save, pre-use) - [ ] Write validation tests ### 13. Prompt Security - [ ] Create `security/scanner.py` with security scanning - [ ] Detect prompt injection vulnerabilities - [ ] Detect data leakage risks - [ ] Implement sensitive data detection - [ ] Create security scoring - [ ] Add security alerts - [ ] Write security tests ### 14. MCP Integration - [ ] Create `get_prompt` tool - Retrieve prompt by name/version - [ ] Create `list_prompts` tool - List available prompts - [ ] Create `render_prompt` tool - Render with variables - [ ] Create `validate_prompt` tool - Validate a prompt - [ ] Create `get_prompt_stats` tool - Get prompt analytics - [ ] Create `suggest_prompt` tool - Get prompt recommendations - [ ] Write MCP tool tests ### 15. CLI & Admin Tools - [ ] Create `cli/prompt_cli.py` with CLI commands - [ ] Implement `prompt list` command - [ ] Implement `prompt show` command - [ ] Implement `prompt validate` command - [ ] Implement `prompt test` command - [ ] Implement `prompt diff` command - [ ] Implement `prompt optimize` command - [ ] Write CLI tests ### 16. Testing - [ ] Write unit tests for all components - [ ] Write integration tests for full system - [ ] Create end-to-end prompt lifecycle tests - [ ] Write performance benchmarks - [ ] Achieve >90% code coverage - [ ] Create regression test suite ### 17. Documentation - [ ] Write README with system overview - [ ] Document prompt schema - [ ] Document templating syntax - [ ] Document versioning workflow - [ ] Document model variants - [ ] Create prompt writing guide - [ ] Add troubleshooting guide - [ ] Create best practices --- ## Technical Specifications ### Prompt Schema ```python class Prompt(BaseModel): # Identity name: str # Unique identifier version: str # Semantic version namespace: str # e.g., "agents.product_owner" # Content content: str # The actual prompt text template_type: Literal["static", "jinja2"] # Variables (for templates) variables: list[PromptVariable] # Metadata description: str author: str created_at: datetime updated_at: datetime # Model compatibility base_model: str | None # Model this was optimized for variants: dict[str, str] # model_id -> variant content # Dependencies includes: list[str] # Other prompts this depends on modules: list[str] # Reusable modules included # Testing golden_tests: list[GoldenTest] quality_score: float | None # Usage tags: list[str] category: str class PromptVariable(BaseModel): name: str type: Literal["string", "number", "list", "object"] required: bool default: Any | None description: str validation: str | None # Regex or validator name ``` ### Prompt Template Example ```yaml name: agent.product_owner.requirements_discovery version: 1.2.0 namespace: agents.product_owner template_type: jinja2 content: | You are an experienced Product Owner helping to discover and refine requirements. ## Context Project: {{ project_name }} Current Phase: {{ phase }} ## Your Objectives {% for objective in objectives %} - {{ objective }} {% endfor %} ## Guidelines {% include "modules/questioning_techniques.yaml" %} ## Constraints {% if autonomy_level == "full_control" %} Always confirm decisions with the user before proceeding. {% else %} You may proceed with minor decisions autonomously. {% endif %} ## Available Tools {% for tool in tools %} - {{ tool.name }}: {{ tool.description }} {% endfor %} variables: - name: project_name type: string required: true description: Name of the current project - name: phase type: string required: true description: Current project phase - name: objectives type: list required: true description: List of objectives for this session - name: autonomy_level type: string required: false default: "milestone" description: Agent autonomy level - name: tools type: list required: true description: Available tools for this agent variants: claude-3-opus: | # Same content but with Claude-optimized formatting using XML tags <context> Project: {{ project_name }} ... gpt-4: | # Same content but with GPT-optimized formatting ... golden_tests: - name: basic_requirements_discovery variables: project_name: "E-Commerce Platform" phase: "discovery" objectives: ["Understand user needs", "Define MVP scope"] tools: [{"name": "ask_question", "description": "Ask user a question"}] expected_behavior: "Should ask clarifying questions about user needs" ``` ### Prompt Versioning Flow ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Prompt Lifecycle │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Draft │──▶│ Review │──▶│ Test │──▶│ Active │ │ │ │ v0.0.x │ │ v0.x.x │ │ v1.0.0 │ │ v1.x.x │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ │ ┌──────────┐ │ │ │ │ │ │ Archived │ │ │ │ │ │ └──────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ Version Control │ │ │ │ - All changes tracked │ │ │ │ - Diff between versions │ │ │ │ - Rollback capability │ │ │ │ - Branch for experiments │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` --- ## Acceptance Criteria - [ ] All prompts are centrally managed - [ ] Prompt versioning tracks all changes - [ ] Templates render correctly with all variable types - [ ] Model variants work correctly for 3+ models - [ ] Prompt testing catches regressions - [ ] Optimization reduces token usage by ≥10% - [ ] Analytics track all prompt usage - [ ] Security scanning detects injection risks - [ ] >90% test coverage - [ ] Documentation complete with examples --- ## Labels `phase-2`, `mcp`, `backend`, `prompts`, `quality` ## Milestone Phase 2: MCP Integration

cardosofelipe referenced this issue

2026-01-03 09:18:19 +00:00

[EPIC] Phase 2: MCP Integration #60

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cardosofelipe/syndarix#67