fix: address code review findings — batch args, venv path, serve flags

- Fix missing BATCH_ARGS in long-context commands (both benchmark scripts) - Fix CLAUDE.md stale venv path (data/venv → .venv) and add serve/power docs - Add -b/--batch to bin/benchmark help text - Add --no-think flag to serve script (--reasoning-budget 0) - Sanitize model names in eval run directories - Simplify agentic setup to use requirements.txt - Add serve --help test, batch flag assertions to existing tests - Add requirements.txt for reproducible venv setup (Python 3.13)
2026-03-31 10:10:48 +02:00
parent dd403a907c
commit 6ab08537ca
10 changed files with 137 additions and 93 deletions
--- a/bin/benchmark
+++ b/bin/benchmark
@@ -23,6 +23,7 @@ case "${1:-help}" in
        echo "  --category LIST     Comma-separated: smoke,dense,moe"
        echo "  --skip-longctx      Skip long-context (32K) tests"
        echo "  --reps N            Standard test repetitions (default: 5)"
+        echo "  -b, --batch N       Batch size (default: 2048, try 256 for MoE)"
        echo "  --kv-types LIST     KV cache sweep (e.g. f16,q8_0,q4_0 or q4_0:q8_0)"
        echo ""
        echo "Examples:"