feat(mcp): Context Management Engine #61

New Issue

cardosofelipe · 2026-01-03T09:07:52Z

cardosofelipe commented

2026-01-03 09:07:52 +00:00

Overview

Implement a sophisticated context management engine that optimizes what information gets sent to LLMs. This is THE MOST CRITICAL component for enabling smaller, cheaper models to perform well - they need precisely the right context, not everything.

Parent Epic

Epic #60: [EPIC] Phase 2: MCP Integration

Architecture Decision

Location: Backend Service (Not Separate MCP Server)

Unlike LLM Gateway (#56) and Knowledge Base (#57), the Context Management Engine is implemented as a backend service within the main FastAPI application, not as a separate MCP server.

Reasoning:

Tight Integration Required: Context management needs direct access to MCPClientManager, agent instances, project settings, and safety framework
No External Consumers: Only the backend orchestration layer uses context management - it's an internal optimization layer
Reduced Latency: Every LLM call goes through context assembly. An HTTP hop adds unacceptable latency
Consistent with Epic #60: Architecture diagram shows Context Engine inside backend SERVICE LAYER

Directory Structure

backend/app/services/context/
├── __init__.py              # Public API exports
├── engine.py                # ContextEngine main class
├── config.py                # Pydantic settings
├── exceptions.py            # Context-specific exceptions
├── types/
│   ├── base.py              # BaseContext abstract class
│   ├── system.py            # SystemContext
│   ├── knowledge.py         # KnowledgeContext
│   ├── conversation.py      # ConversationContext
│   ├── task.py              # TaskContext
│   └── tool.py              # ToolContext
├── budget/
│   ├── calculator.py        # Token counting integration
│   └── allocator.py         # Budget partitioning
├── scoring/
│   ├── relevance.py         # Relevance scoring
│   └── composite.py         # Combined scoring
├── prioritization/
│   └── ranker.py            # Context ranking/selection
├── compression/
│   └── truncation.py        # Smart truncation
├── assembly/
│   ├── pipeline.py          # Assembly pipeline
│   └── formatter.py         # LLM-specific formatting
├── adapters/
│   ├── base.py              # Abstract adapter
│   ├── claude.py            # Claude-specific
│   └── openai.py            # OpenAI-specific
└── cache/
    └── context_cache.py     # Redis caching

Context Assembly Flow

┌─────────────────────────────────────────────────────────────────────┐
│                     Context Assembly Pipeline                        │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐         │
│  │  GATHER  │──▶│  SCORE   │──▶│   RANK   │──▶│ COMPRESS │         │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘         │
│       │              │              │              │                 │
│       ▼              ▼              ▼              ���                 │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐         │
│  │ Knowledge│   │ Relevance│   │ Priority │   │ Truncate │         │
│  │ Memory   │   │ Recency  │   │ Diversity│   │ Cache    │         │
│  │ Tools    │   │ Priority │   │ Coverage │   │          │         │
│  │ History  │   │          │   │          │   │          │         │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘         │
│                                                     │                │
│                                                     ▼                │
│                                              ┌──────────┐           │
│                                              │  FORMAT  │           │
│                                              │ for LLM  │           │
│                                              └──────────┘           │
└─────────────────────────────────────────────────────────────────────┘

Child Issues (Implementation Phases)

Phase	Issue	Description	Status
1	#79	Foundation: Types, Config & Exceptions	⏳ Pending
2	#80	Token Budget Management	⏳ Pending
3	#81	Context Scoring & Ranking	⏳ Pending
4	#82	Context Assembly Pipeline	⏳ Pending
5	#83	Model Adapters (Claude, OpenAI)	⏳ Pending
6	#84	Caching Layer	⏳ Pending
7	#85	Main Engine & Integration	⏳ Pending
8	#86	Testing & Documentation	⏳ Pending

Token Budget Allocation

Total Budget: 100,000 tokens (example)
├── System Prompt: 5,000 (5%)
├── Task Context: 10,000 (10%)
├── Knowledge/RAG: 40,000 (40%)
├── Conversation History: 20,000 (20%)
├── Tool Descriptions: 5,000 (5%)
├── Response Reserve: 15,000 (15%)
└── Buffer: 5,000 (5%)

Acceptance Criteria

Token counting accurate (via LLM Gateway integration)
Context assembly completes in <100ms
Correct budget allocation and enforcement
Model-specific formatting (Claude XML, OpenAI markdown)
Redis caching with fingerprint-based keys
>90% test coverage
Integration with existing MCPClientManager

Deferred Features (Future Phase)

Abstractive summarization (requires LLM calls - expensive)
Hierarchical compression
Context streaming
A/B testing hooks
Prometheus metrics/Grafana dashboard

Labels

phase-2, mcp, backend, context, critical

## Overview Implement a sophisticated context management engine that optimizes what information gets sent to LLMs. This is **THE MOST CRITICAL** component for enabling smaller, cheaper models to perform well - they need precisely the right context, not everything. ## Parent Epic - Epic #60: [EPIC] Phase 2: MCP Integration --- ## Architecture Decision ### Location: Backend Service (Not Separate MCP Server) Unlike LLM Gateway (#56) and Knowledge Base (#57), the Context Management Engine is implemented as a **backend service** within the main FastAPI application, not as a separate MCP server. **Reasoning:** 1. **Tight Integration Required**: Context management needs direct access to MCPClientManager, agent instances, project settings, and safety framework 2. **No External Consumers**: Only the backend orchestration layer uses context management - it's an internal optimization layer 3. **Reduced Latency**: Every LLM call goes through context assembly. An HTTP hop adds unacceptable latency 4. **Consistent with Epic #60**: Architecture diagram shows Context Engine inside backend SERVICE LAYER ### Directory Structure ``` backend/app/services/context/ ├── __init__.py # Public API exports ├── engine.py # ContextEngine main class ├── config.py # Pydantic settings ├── exceptions.py # Context-specific exceptions ├── types/ │ ├── base.py # BaseContext abstract class │ ├── system.py # SystemContext │ ├── knowledge.py # KnowledgeContext │ ├── conversation.py # ConversationContext │ ├── task.py # TaskContext │ └── tool.py # ToolContext ├── budget/ │ ├── calculator.py # Token counting integration │ └── allocator.py # Budget partitioning ├── scoring/ │ ├── relevance.py # Relevance scoring │ └── composite.py # Combined scoring ├── prioritization/ │ └── ranker.py # Context ranking/selection ├── compression/ │ └── truncation.py # Smart truncation ├── assembly/ │ ├── pipeline.py # Assembly pipeline │ └── formatter.py # LLM-specific formatting ├── adapters/ │ ├── base.py # Abstract adapter │ ├── claude.py # Claude-specific │ └── openai.py # OpenAI-specific └── cache/ └── context_cache.py # Redis caching ``` --- ## Context Assembly Flow ``` ┌─────────────────────────────────────────────────────────────────────┐ │ Context Assembly Pipeline │ ├─────────────────────────────────────────────────────────────────────┤ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ GATHER │──▶│ SCORE │──▶│ RANK │──▶│ COMPRESS │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ ▼ ▼ ▼ �� │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Knowledge│ │ Relevance│ │ Priority │ │ Truncate │ │ │ │ Memory │ │ Recency │ │ Diversity│ │ Cache │ │ │ │ Tools │ │ Priority │ │ Coverage │ │ │ │ │ │ History │ │ │ │ │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ ▼ │ │ ┌──────────┐ │ │ │ FORMAT │ │ │ │ for LLM │ │ │ └──────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## Child Issues (Implementation Phases) | Phase | Issue | Description | Status | |-------|-------|-------------|--------| | 1 | #79 | Foundation: Types, Config & Exceptions | ⏳ Pending | | 2 | #80 | Token Budget Management | ⏳ Pending | | 3 | #81 | Context Scoring & Ranking | ⏳ Pending | | 4 | #82 | Context Assembly Pipeline | ⏳ Pending | | 5 | #83 | Model Adapters (Claude, OpenAI) | ⏳ Pending | | 6 | #84 | Caching Layer | ⏳ Pending | | 7 | #85 | Main Engine & Integration | ⏳ Pending | | 8 | #86 | Testing & Documentation | ⏳ Pending | --- ## Token Budget Allocation ``` Total Budget: 100,000 tokens (example) ├── System Prompt: 5,000 (5%) ├── Task Context: 10,000 (10%) ├── Knowledge/RAG: 40,000 (40%) ├── Conversation History: 20,000 (20%) ├── Tool Descriptions: 5,000 (5%) ├── Response Reserve: 15,000 (15%) └── Buffer: 5,000 (5%) ``` --- ## Acceptance Criteria - [ ] Token counting accurate (via LLM Gateway integration) - [ ] Context assembly completes in <100ms - [ ] Correct budget allocation and enforcement - [ ] Model-specific formatting (Claude XML, OpenAI markdown) - [ ] Redis caching with fingerprint-based keys - [ ] >90% test coverage - [ ] Integration with existing MCPClientManager --- ## Deferred Features (Future Phase) - Abstractive summarization (requires LLM calls - expensive) - Hierarchical compression - Context streaming - A/B testing hooks - Prometheus metrics/Grafana dashboard --- ## Labels `phase-2`, `mcp`, `backend`, `context`, `critical`

cardosofelipe referenced this issue

2026-01-03 09:09:22 +00:00

feat(mcp): Agent Memory System #62

cardosofelipe referenced this issue

2026-01-03 09:18:19 +00:00

[EPIC] Phase 2: MCP Integration #60

cardosofelipe referenced this issue

2026-01-04 00:51:15 +00:00

feat(context): Phase 1 - Foundation: Types, Config & Exceptions #79

cardosofelipe referenced this issue

2026-01-04 00:51:16 +00:00

feat(context): Phase 2 - Token Budget Management #80