fix: address code review findings — batch args, venv path, serve flags
- Fix missing BATCH_ARGS in long-context commands (both benchmark scripts) - Fix CLAUDE.md stale venv path (data/venv → .venv) and add serve/power docs - Add -b/--batch to bin/benchmark help text - Add --no-think flag to serve script (--reasoning-budget 0) - Sanitize model names in eval run directories - Simplify agentic setup to use requirements.txt - Add serve --help test, batch flag assertions to existing tests - Add requirements.txt for reproducible venv setup (Python 3.13)
This commit is contained in:
12
requirements.txt
Normal file
12
requirements.txt
Normal file
@@ -0,0 +1,12 @@
|
||||
# Agentic evaluation frameworks
|
||||
# Install: python3.13 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
|
||||
# Requires Python >=3.10, <3.14 (bigcodebench constraint)
|
||||
|
||||
inspect-ai>=0.3.201
|
||||
inspect-evals>=0.6.0
|
||||
evalplus>=0.3.1
|
||||
bigcodebench>=0.2.5
|
||||
openai>=2.26.0
|
||||
|
||||
# IFEval dependency (not on PyPI)
|
||||
instruction_following_eval @ git+https://github.com/josejg/instruction_following_eval
|
||||
Reference in New Issue
Block a user