feat(context): Phase 2 - Token Budget Management #80

Closed
opened 2026-01-04 00:51:16 +00:00 by cardosofelipe · 0 comments

Overview

Implement accurate token counting and budget allocation for context management.

Parent Issue

  • #61: Context Management Engine

Implementation Tasks

1. Create budget/calculator.py

  • Create TokenCalculator class
  • Integrate with LLM Gateway count_tokens tool via MCPClientManager
  • Implement in-memory caching for token counts
  • Handle edge cases (empty text, very long text)
class TokenCalculator:
    def __init__(self, mcp_manager: MCPClientManager):
        self.mcp = mcp_manager
        self._cache: dict[str, int] = {}

    async def count_tokens(
        self,
        text: str,
        model: str | None = None
    ) -> int:
        """Count tokens using LLM Gateway."""
        ...

    async def count_batch(
        self,
        texts: list[str],
        model: str | None = None
    ) -> list[int]:
        """Count tokens for multiple texts in parallel."""
        ...

2. Create budget/allocator.py

  • Create TokenBudget dataclass
  • Implement budget partitioning from percentages
  • Implement remaining() method
  • Implement can_fit() method
  • Implement allocate() method
  • Implement to_dict() for reporting
@dataclass
class TokenBudget:
    total: int
    system: int
    task: int
    knowledge: int
    conversation: int
    tools: int
    response_reserve: int
    buffer: int

    used: dict[ContextType, int] = field(default_factory=dict)

    @classmethod
    def from_settings(
        cls,
        total_tokens: int,
        settings: ContextSettings
    ) -> "TokenBudget":
        """Create budget from settings percentages."""
        ...

3. Create budget/__init__.py

  • Export TokenCalculator and TokenBudget

Files to Create

backend/app/services/context/budget/
├── __init__.py
├── calculator.py
└── allocator.py

Acceptance Criteria

  • Token counting via LLM Gateway works correctly
  • Budget allocation matches settings percentages
  • can_fit() and allocate() work correctly
  • In-memory caching prevents duplicate API calls
  • Unit tests with mocked LLM Gateway

Dependencies

  • #69 (Phase 1 - Foundation)

Labels

phase-2, context, backend

## Overview Implement accurate token counting and budget allocation for context management. ## Parent Issue - #61: Context Management Engine --- ## Implementation Tasks ### 1. Create `budget/calculator.py` - [ ] Create `TokenCalculator` class - [ ] Integrate with LLM Gateway `count_tokens` tool via MCPClientManager - [ ] Implement in-memory caching for token counts - [ ] Handle edge cases (empty text, very long text) ```python class TokenCalculator: def __init__(self, mcp_manager: MCPClientManager): self.mcp = mcp_manager self._cache: dict[str, int] = {} async def count_tokens( self, text: str, model: str | None = None ) -> int: """Count tokens using LLM Gateway.""" ... async def count_batch( self, texts: list[str], model: str | None = None ) -> list[int]: """Count tokens for multiple texts in parallel.""" ... ``` ### 2. Create `budget/allocator.py` - [ ] Create `TokenBudget` dataclass - [ ] Implement budget partitioning from percentages - [ ] Implement `remaining()` method - [ ] Implement `can_fit()` method - [ ] Implement `allocate()` method - [ ] Implement `to_dict()` for reporting ```python @dataclass class TokenBudget: total: int system: int task: int knowledge: int conversation: int tools: int response_reserve: int buffer: int used: dict[ContextType, int] = field(default_factory=dict) @classmethod def from_settings( cls, total_tokens: int, settings: ContextSettings ) -> "TokenBudget": """Create budget from settings percentages.""" ... ``` ### 3. Create `budget/__init__.py` - [ ] Export TokenCalculator and TokenBudget --- ## Files to Create ``` backend/app/services/context/budget/ ├── __init__.py ├── calculator.py └── allocator.py ``` --- ## Acceptance Criteria - [ ] Token counting via LLM Gateway works correctly - [ ] Budget allocation matches settings percentages - [ ] `can_fit()` and `allocate()` work correctly - [ ] In-memory caching prevents duplicate API calls - [ ] Unit tests with mocked LLM Gateway --- ## Dependencies - #69 (Phase 1 - Foundation) ## Labels `phase-2`, `context`, `backend`
Sign in to join this conversation.