feat(benchmark): add -b/--batch flag, test MoE batch size impact
Add batch size override to benchmark scripts. Testing -b 256 vs default 2048 on Vulkan RADV shows no meaningful difference for MoE pp2048 (826 vs 843 t/s, within noise). Community-reported +70% improvement does not reproduce on this backend.
This commit is contained in:
@@ -104,11 +104,14 @@ Living document tracking what was applied, tested, and the actual results. Each
|
||||
|
||||
### 3.2 MoE Batch Size `-b 256`
|
||||
|
||||
- **Date**: PENDING
|
||||
- **Change**: Add `-b 256` to MoE benchmark runs
|
||||
- **Expected**: Up to +70% pp improvement for MoE models (community benchmarks)
|
||||
- **Benchmark**: Not yet run
|
||||
- **Verdict**: PENDING
|
||||
- **Date**: 2026-03-30
|
||||
- **Change**: `-b 256` vs default (2048)
|
||||
- **Benchmark**: `data/benchmarks/batch-default-*` vs `data/benchmarks/batch-256-*`
|
||||
- **Result** (Vulkan RADV, Qwen3.5-35B-A3B UD-Q4_K_XL, q4_0 KV):
|
||||
- Default: 826 pp, 55.9 tg
|
||||
- b=256: 843 pp, 55.5 tg (within noise)
|
||||
- **Notes**: Community-reported +70% improvement does not reproduce on Vulkan RADV. May only apply to ROCm or CPU backends, or to longer prompts (pp8192+).
|
||||
- **Verdict**: NO IMPACT on Vulkan — not recommended
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user