# Optimization Log Living document tracking what was applied, tested, and the actual results. Each entry records the change, benchmark evidence, and verdict. **Verdicts**: KEEP (applied permanently), REVERTED (tested, didn't help), PENDING (not yet tested), BLOCKED (can't test yet). --- ## Phase 1: Core System ### 1.1 Tuned Profile: accelerator-performance - **Date**: 2026-03-26 - **Change**: `sudo tuned-adm profile accelerator-performance` - **Benchmark**: `data/benchmarks/after-tuned-*` - **Result**: +5-8% pp improvement, +2-3% tg improvement - **Verdict**: KEEP ### 1.2 Kernel Boot Parameters - **Date**: 2026-03-26 - **Change**: `iommu=pt amdgpu.gttsize=60416 ttm.pages_limit=15466496` - **Benchmark**: `data/benchmarks/full-opt-all-models-*` - **Result**: Combined with BIOS VRAM change. Large models now fit in GTT. Peak usage 38.8/59 GiB. - **Verdict**: KEEP ### 1.3 BIOS VRAM Reduction (512 MB) - **Date**: 2026-03-26 - **Change**: UMA Frame Buffer Size 32 GB -> 512 MB (HP ZBook F10 BIOS) - **Benchmark**: `data/benchmarks/full-opt-all-models-*` - **Result**: 31.5 GB freed for OS/GTT. Small models ~3-8% slower (GTT indirection vs dedicated VRAM), but system gained ability to run 37 GB+ models at 32K+ context. Net positive. - **Trade-off**: Small model regression is acceptable given the massive capability gain. - **Verdict**: KEEP --- ## Phase 2: System Tuning ### 2.1 RyzenAdj 85W PPT - **Date**: PENDING - **Change**: `sudo ryzenadj --stapm-limit=85000 --fast-limit=85000 --slow-limit=85000` - **Expected**: +12-19% CPU/GPU throughput (community data from Strix Halo Wiki) - **Benchmark**: Not yet run - **Notes**: HP ZBook ships at 60W. 85W is the community-recommended sweet spot. - **Verdict**: PENDING ### 2.2 VM Sysctl Tuning - **Date**: PENDING - **Change**: `vm.swappiness=1, vm.dirty_ratio=40, vm.max_map_count=500000` - **Expected**: Prevent model weight eviction, reduce I/O disruption - **Benchmark**: Not yet run - **Verdict**: PENDING ### 2.3 Transparent Huge Pages - **Date**: PENDING - **Change**: `transparent_hugepage=always` - **Expected**: Faster model load time, possible 1-5% tg improvement from reduced TLB misses - **Benchmark**: Not yet run - **Verdict**: PENDING ### 2.4 RADV_PERFTEST=nogttspill - **Date**: PENDING - **Change**: `export RADV_PERFTEST=nogttspill` - **Expected**: Fix pp degradation on Vulkan RADV (community-reported fix for Strix Halo) - **Benchmark**: Not yet run — needs Vulkan-specific benchmark comparison - **Verdict**: PENDING ### 2.5 amdgpu.noretry=0 - **Date**: PENDING - **Change**: Kernel cmdline `amdgpu.noretry=0` - **Expected**: Improved stability under memory pressure - **Notes**: Only apply if experiencing GPU page faults or crashes during large model loading - **Verdict**: PENDING --- ## Phase 3: Runtime Flags ### 3.1 KV Cache Quantization - **Date**: PENDING (sweep running) - **Change**: `-ctk q8_0 -ctv q8_0` / `-ctk q4_0 -ctv q4_0` - **Benchmark**: `data/benchmarks/kv-sweep-128k-*` (in progress) - **Expected**: Q8_0: ~50% less KV memory, negligible quality loss. Q4_0: ~75% less, noticeable quality impact. - **Verdict**: PENDING ### 3.2 MoE Batch Size `-b 256` - **Date**: PENDING - **Change**: Add `-b 256` to MoE benchmark runs - **Expected**: Up to +70% pp improvement for MoE models (community benchmarks) - **Benchmark**: Not yet run - **Verdict**: PENDING --- ## Phase 4: Build Optimizations ### 4.1 rocWMMA Flash Attention - **Date**: PENDING - **Change**: Rebuild ROCm toolbox with `-DGGML_HIP_ROCWMMA_FATTN=ON -DGGML_HIP_UMA=ON` - **Expected**: +96% long-context performance (65K+) - **Notes**: Need to check if Donato's toolboxes already include this - **Verdict**: PENDING ### 4.2 rocWMMA Tuned Patch (PR #16827) - **Date**: PENDING - **Notes**: Fixes long-context regression. Check Donato's latest toolbox builds. - **Verdict**: PENDING --- ## Phase 5: Future / Blocked ### 5.1 Speculative Decoding - **Status**: BLOCKED — llama.cpp PR #20075 (hybrid SSM/MoE fix) - **Draft model**: Downloaded `Qwen3.5-0.8B-Q8_0.gguf` (812 MB) on 2026-03-27 - **Last checked**: 2026-03-27 — PR open since 2026-03-03, has ROCm buffer issues ### 5.2 Native MTP (Multi-Token Prediction) - **Status**: BLOCKED — llama.cpp PR #20700 - **Last checked**: 2026-03-27 — WIP, not expected to merge soon ### 5.3 GPU Clock Fix - **Status**: BLOCKED — ROCm issue #5750 - **Notes**: GPU may be stuck at 885 MHz instead of 2900 MHz on gfx1151 - **Last checked**: 2026-03-27 --- ## Context Window Benchmarks ### 64K Context (pp4096/tg1024, MoE models) - **Date**: 2026-03-26 - **Benchmark**: `data/benchmarks/ctx64k-*` - **Results**: (check logs) ### 128K Context (pp8192/tg1024, MoE models) - **Date**: 2026-03-26 - **Benchmark**: `data/benchmarks/ctx128k-realistic-*` - **Results**: (check logs) ### 256K Context (pp16384/tg1024, MoE models) - **Date**: 2026-03-27 - **Benchmark**: `data/benchmarks/ctx256k-*` - **Results**: (check logs) --- ## How to Add Entries When testing a new optimization: 1. Record the date and exact change 2. Run a benchmark: `make benchmark ARGS="--tag DESCRIPTIVE-NAME ..."` 3. Compare: `make benchmark-compare BEFORE=data/path/baseline AFTER=data/path/new` 4. Update this log with results and verdict 5. If KEEP: document in [optimization.md](optimization.md) with the measured numbers