feat(mcp): Context Management Engine #61

Closed
opened 2026-01-03 09:07:52 +00:00 by cardosofelipe · 0 comments

Overview

Implement a sophisticated context management engine that optimizes what information gets sent to LLMs. This is THE MOST CRITICAL component for enabling smaller, cheaper models to perform well - they need precisely the right context, not everything.

Parent Epic

  • Epic #60: [EPIC] Phase 2: MCP Integration

Architecture Decision

Location: Backend Service (Not Separate MCP Server)

Unlike LLM Gateway (#56) and Knowledge Base (#57), the Context Management Engine is implemented as a backend service within the main FastAPI application, not as a separate MCP server.

Reasoning:

  1. Tight Integration Required: Context management needs direct access to MCPClientManager, agent instances, project settings, and safety framework
  2. No External Consumers: Only the backend orchestration layer uses context management - it's an internal optimization layer
  3. Reduced Latency: Every LLM call goes through context assembly. An HTTP hop adds unacceptable latency
  4. Consistent with Epic #60: Architecture diagram shows Context Engine inside backend SERVICE LAYER

Directory Structure

backend/app/services/context/
├── __init__.py              # Public API exports
├── engine.py                # ContextEngine main class
├── config.py                # Pydantic settings
├── exceptions.py            # Context-specific exceptions
├── types/
│   ├── base.py              # BaseContext abstract class
│   ├── system.py            # SystemContext
│   ├── knowledge.py         # KnowledgeContext
│   ├── conversation.py      # ConversationContext
│   ├── task.py              # TaskContext
│   └── tool.py              # ToolContext
├── budget/
│   ├── calculator.py        # Token counting integration
│   └── allocator.py         # Budget partitioning
├── scoring/
│   ├── relevance.py         # Relevance scoring
│   └── composite.py         # Combined scoring
├── prioritization/
│   └── ranker.py            # Context ranking/selection
├── compression/
│   └── truncation.py        # Smart truncation
├── assembly/
│   ├── pipeline.py          # Assembly pipeline
│   └── formatter.py         # LLM-specific formatting
├── adapters/
│   ├── base.py              # Abstract adapter
│   ├── claude.py            # Claude-specific
│   └── openai.py            # OpenAI-specific
└── cache/
    └── context_cache.py     # Redis caching

Context Assembly Flow

┌─────────────────────────────────────────────────────────────────────┐
│                     Context Assembly Pipeline                        │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐         │
│  │  GATHER  │──▶│  SCORE   │──▶│   RANK   │──▶│ COMPRESS │         │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘         │
│       │              │              │              │                 │
│       ▼              ▼              ▼              ���                 │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐         │
│  │ Knowledge│   │ Relevance│   │ Priority │   │ Truncate │         │
│  │ Memory   │   │ Recency  │   │ Diversity│   │ Cache    │         │
│  │ Tools    │   │ Priority │   │ Coverage │   │          │         │
│  │ History  │   │          │   │          │   │          │         │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘         │
│                                                     │                │
│                                                     ▼                │
│                                              ┌──────────┐           │
│                                              │  FORMAT  │           │
│                                              │ for LLM  │           │
│                                              └──────────┘           │
└─────────────────────────────────────────────────────────────────────┘

Child Issues (Implementation Phases)

Phase Issue Description Status
1 #79 Foundation: Types, Config & Exceptions Pending
2 #80 Token Budget Management Pending
3 #81 Context Scoring & Ranking Pending
4 #82 Context Assembly Pipeline Pending
5 #83 Model Adapters (Claude, OpenAI) Pending
6 #84 Caching Layer Pending
7 #85 Main Engine & Integration Pending
8 #86 Testing & Documentation Pending

Token Budget Allocation

Total Budget: 100,000 tokens (example)
├── System Prompt: 5,000 (5%)
├── Task Context: 10,000 (10%)
├── Knowledge/RAG: 40,000 (40%)
├── Conversation History: 20,000 (20%)
├── Tool Descriptions: 5,000 (5%)
├── Response Reserve: 15,000 (15%)
└── Buffer: 5,000 (5%)

Acceptance Criteria

  • Token counting accurate (via LLM Gateway integration)
  • Context assembly completes in <100ms
  • Correct budget allocation and enforcement
  • Model-specific formatting (Claude XML, OpenAI markdown)
  • Redis caching with fingerprint-based keys
  • >90% test coverage
  • Integration with existing MCPClientManager

Deferred Features (Future Phase)

  • Abstractive summarization (requires LLM calls - expensive)
  • Hierarchical compression
  • Context streaming
  • A/B testing hooks
  • Prometheus metrics/Grafana dashboard

Labels

phase-2, mcp, backend, context, critical

## Overview Implement a sophisticated context management engine that optimizes what information gets sent to LLMs. This is **THE MOST CRITICAL** component for enabling smaller, cheaper models to perform well - they need precisely the right context, not everything. ## Parent Epic - Epic #60: [EPIC] Phase 2: MCP Integration --- ## Architecture Decision ### Location: Backend Service (Not Separate MCP Server) Unlike LLM Gateway (#56) and Knowledge Base (#57), the Context Management Engine is implemented as a **backend service** within the main FastAPI application, not as a separate MCP server. **Reasoning:** 1. **Tight Integration Required**: Context management needs direct access to MCPClientManager, agent instances, project settings, and safety framework 2. **No External Consumers**: Only the backend orchestration layer uses context management - it's an internal optimization layer 3. **Reduced Latency**: Every LLM call goes through context assembly. An HTTP hop adds unacceptable latency 4. **Consistent with Epic #60**: Architecture diagram shows Context Engine inside backend SERVICE LAYER ### Directory Structure ``` backend/app/services/context/ ├── __init__.py # Public API exports ├── engine.py # ContextEngine main class ├── config.py # Pydantic settings ├── exceptions.py # Context-specific exceptions ├── types/ │ ├── base.py # BaseContext abstract class │ ├── system.py # SystemContext │ ├── knowledge.py # KnowledgeContext │ ├── conversation.py # ConversationContext │ ├── task.py # TaskContext │ └── tool.py # ToolContext ├── budget/ │ ├── calculator.py # Token counting integration │ └── allocator.py # Budget partitioning ├── scoring/ │ ├── relevance.py # Relevance scoring │ └── composite.py # Combined scoring ├── prioritization/ │ └── ranker.py # Context ranking/selection ├── compression/ │ └── truncation.py # Smart truncation ├── assembly/ │ ├── pipeline.py # Assembly pipeline │ └── formatter.py # LLM-specific formatting ├── adapters/ │ ├── base.py # Abstract adapter │ ├── claude.py # Claude-specific │ └── openai.py # OpenAI-specific └── cache/ └── context_cache.py # Redis caching ``` --- ## Context Assembly Flow ``` ┌─────────────────────────────────────────────────────────────────────┐ │ Context Assembly Pipeline │ ├─────────────────────────────────────────────────────────────────────┤ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ GATHER │──▶│ SCORE │──▶│ RANK │──▶│ COMPRESS │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ��� │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Knowledge│ │ Relevance│ │ Priority │ │ Truncate │ │ │ │ Memory │ │ Recency │ │ Diversity│ │ Cache │ │ │ │ Tools │ │ Priority │ │ Coverage │ │ │ │ │ │ History │ │ │ │ │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ ▼ │ │ ┌──────────┐ │ │ │ FORMAT │ │ │ │ for LLM │ │ │ └──────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## Child Issues (Implementation Phases) | Phase | Issue | Description | Status | |-------|-------|-------------|--------| | 1 | #79 | Foundation: Types, Config & Exceptions | ⏳ Pending | | 2 | #80 | Token Budget Management | ⏳ Pending | | 3 | #81 | Context Scoring & Ranking | ⏳ Pending | | 4 | #82 | Context Assembly Pipeline | ⏳ Pending | | 5 | #83 | Model Adapters (Claude, OpenAI) | ⏳ Pending | | 6 | #84 | Caching Layer | ⏳ Pending | | 7 | #85 | Main Engine & Integration | ⏳ Pending | | 8 | #86 | Testing & Documentation | ⏳ Pending | --- ## Token Budget Allocation ``` Total Budget: 100,000 tokens (example) ├── System Prompt: 5,000 (5%) ├── Task Context: 10,000 (10%) ├── Knowledge/RAG: 40,000 (40%) ├── Conversation History: 20,000 (20%) ├── Tool Descriptions: 5,000 (5%) ├── Response Reserve: 15,000 (15%) └── Buffer: 5,000 (5%) ``` --- ## Acceptance Criteria - [ ] Token counting accurate (via LLM Gateway integration) - [ ] Context assembly completes in <100ms - [ ] Correct budget allocation and enforcement - [ ] Model-specific formatting (Claude XML, OpenAI markdown) - [ ] Redis caching with fingerprint-based keys - [ ] >90% test coverage - [ ] Integration with existing MCPClientManager --- ## Deferred Features (Future Phase) - Abstractive summarization (requires LLM calls - expensive) - Hierarchical compression - Context streaming - A/B testing hooks - Prometheus metrics/Grafana dashboard --- ## Labels `phase-2`, `mcp`, `backend`, `context`, `critical`
Sign in to join this conversation.