fix: address code review findings — batch args, venv path, serve flags
- Fix missing BATCH_ARGS in long-context commands (both benchmark scripts) - Fix CLAUDE.md stale venv path (data/venv → .venv) and add serve/power docs - Add -b/--batch to bin/benchmark help text - Add --no-think flag to serve script (--reasoning-budget 0) - Sanitize model names in eval run directories - Simplify agentic setup to use requirements.txt - Add serve --help test, batch flag assertions to existing tests - Add requirements.txt for reproducible venv setup (Python 3.13)
This commit is contained in:
14
CLAUDE.md
14
CLAUDE.md
@@ -41,9 +41,21 @@ make verify # 9-point optimization checklist
|
||||
bin/audit --json | python3 -m json.tool # Verify JSON output is valid
|
||||
```
|
||||
|
||||
## Serving
|
||||
|
||||
`scripts/serve/launch.sh` with dispatcher at `bin/serve`. Launches llama-server inside toolbox containers with optimized defaults: Vulkan RADV, q4_0 KV cache, flash attention, no-mmap, full GPU offload. Key flags:
|
||||
- `--ngram` — n-gram speculative decoding (~1.1-1.4x tg for repetitive content)
|
||||
- `--no-think` — disables thinking/reasoning via `--reasoning-budget 0` (faster for evals)
|
||||
- `--ctx N` — context size (default 131072)
|
||||
- `--parallel N` — concurrent request slots
|
||||
|
||||
## System Tuning
|
||||
|
||||
`scripts/optimize/power-profile.sh` applies Phase 2 optimizations: RyzenAdj PPT increase (85W target, HP caps at 70W sustained), sysctl tuning (vm.swappiness=1, vm.max_map_count=500000), THP=always, RADV_PERFTEST=nogttspill. Systemd services for boot/resume persistence at `configs/ryzenadj-llm.service` and `configs/ryzenadj-resume.service`.
|
||||
|
||||
## Agentic Evaluation
|
||||
|
||||
Scripts in `scripts/agentic/` with dispatcher at `bin/agentic`. Uses a Python venv at `data/venv/`. Eval frameworks: inspect-ai (all-in-one), evalplus (HumanEval+/MBPP+), bigcodebench. All target an OpenAI-compatible endpoint (ollama or llama.cpp server). Model catalog at `configs/models.conf`.
|
||||
Scripts in `scripts/agentic/` with dispatcher at `bin/agentic`. Uses a Python venv at `.venv/` (Python 3.13, dependencies in `requirements.txt`). Eval frameworks: inspect-ai (all-in-one), inspect-evals (task definitions), evalplus (HumanEval+/MBPP+), bigcodebench. All target an OpenAI-compatible endpoint — auto-detects llama-server (port 8080) or ollama (port 11434). Model catalog at `configs/models.conf`.
|
||||
|
||||
## External Resources
|
||||
|
||||
|
||||
Reference in New Issue
Block a user