feat(benchmark): add -b/--batch flag, test MoE batch size impact

Add batch size override to benchmark scripts. Testing -b 256 vs default
2048 on Vulkan RADV shows no meaningful difference for MoE pp2048
(826 vs 843 t/s, within noise). Community-reported +70% improvement
does not reproduce on this backend.
This commit is contained in:
Felipe Cardoso
2026-03-30 20:01:24 +02:00
parent ea70687cd2
commit ba24091791
3 changed files with 36 additions and 11 deletions

View File

@@ -104,11 +104,14 @@ Living document tracking what was applied, tested, and the actual results. Each
### 3.2 MoE Batch Size `-b 256`
- **Date**: PENDING
- **Change**: Add `-b 256` to MoE benchmark runs
- **Expected**: Up to +70% pp improvement for MoE models (community benchmarks)
- **Benchmark**: Not yet run
- **Verdict**: PENDING
- **Date**: 2026-03-30
- **Change**: `-b 256` vs default (2048)
- **Benchmark**: `data/benchmarks/batch-default-*` vs `data/benchmarks/batch-256-*`
- **Result** (Vulkan RADV, Qwen3.5-35B-A3B UD-Q4_K_XL, q4_0 KV):
- Default: 826 pp, 55.9 tg
- b=256: 843 pp, 55.5 tg (within noise)
- **Notes**: Community-reported +70% improvement does not reproduce on Vulkan RADV. May only apply to ROCm or CPU backends, or to longer prompts (pp8192+).
- **Verdict**: NO IMPACT on Vulkan — not recommended
---