Files

Felipe Cardoso f92b710492 fix(benchmark): parse llama-bench output with variable column count

KV cache quantization adds type_k/type_v columns to llama-bench output,
shifting test and t/s to different indices. Parse from end of row instead
of hardcoded positions. Also fix KV suffix separator (underscore to dash)
to avoid regex ambiguity with type names like q8_0.

Add 5-phase optimization guide, optimization log for tracking results,
and research docs on llama.cpp and inference landscape optimizations.

2026-03-27 14:54:19 +01:00

6.1 KiB

Raw Blame History

External References

Single source of truth for all external links used across this project.

AMD Official

ROCm Strix Halo Optimization Guide — BIOS, kernel params, GTT/TTM configuration
ROCm System Optimization Index — General ROCm tuning
ROCm Installation Guide (Linux) — Package installation
AMD SMI Documentation — GPU monitoring API
ROCm GitHub — Source and issue tracker

Strix Halo Toolboxes (Donato Capitella)

The most comprehensive community resource for Strix Halo LLM optimization.

strix-halo-toolboxes.com — Documentation, benchmarks, guides
GitHub: kyuz0/amd-strix-halo-toolboxes — Container images, benchmark scripts, VRAM estimator
Benchmark Results Viewer — Interactive performance charts

Community

Strix Halo Wiki — AI Capabilities — Community benchmarks, model compatibility
Strix Halo Wiki — Power Modes — RyzenAdj sweet spots (85W recommended)
Strix Halo Wiki — llama.cpp Performance — Backend comparison data
Level1Techs Forum — HP G1a Guide — Laptop-specific configuration
Framework Community — GPU Performance Tests — Framework Desktop results
Framework Community — Compiling vLLM on Strix Halo — Native vLLM build guide
Hardware Corner — Strix Halo LLM Optimization — Comprehensive optimization walkthrough
Chips and Cheese — Strix Halo Memory Subsystem — Bandwidth measurements (215 GB/s)
LLM Tracker — Strix Halo — Centralized performance database

Other Strix Halo Repos

pablo-ross/strix-halo-gmktec-evo-x2 — GMKtec EVO X2 optimization
kyuz0/amd-strix-halo-llm-finetuning — Fine-tuning guides (Gemma-3, Qwen-3)

Monitoring Tools

amdgpu_top — Best AMD GPU monitor (TUI/GUI/JSON)
nvtop — Cross-vendor GPU monitor
btop — System resource monitor

LLM Inference

llama.cpp — LLM inference engine (Vulkan + ROCm)
ollama — LLM runtime with model management
vLLM — High-throughput serving
llama-benchy — Multi-backend LLM benchmarking

Qwen3.5 Models (GGUF)

unsloth/Qwen3.5-35B-A3B-GGUF — Top pick for 64GB Strix Halo (MoE, 3B active)
unsloth/Qwen3.5-27B-GGUF — Dense 27B
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF — Best for agentic/coding
Qwen3.5 Official — Model family overview
Unsloth Dynamic 2.0 — Adaptive quantization methodology
Unsloth Studio — Training + inference UI (beta)

Agentic Evaluation

Inspect AI — All-in-one eval framework (HumanEval, BFCL, IFEval, GAIA)
EvalPlus — HumanEval+ / MBPP+ with native ollama support
BigCodeBench — 1,140 coding tasks across 139 libraries
BFCL — Berkeley Function Calling Leaderboard
SWE-bench — Real GitHub issue resolution
Qwen-Agent — Optimized agentic framework for Qwen models

System Tuning

RyzenAdj — Power management for Ryzen APUs (PPT/TDP control)
geohot/ztop — Power monitoring for Strix Halo (discovered 60W HP limits)
ROCm Issue #5750 — GPU clocks stuck at idle on gfx1151
Mesa RADV Environment Variables — RADV_PERFTEST=nogttspill docs
Linux Kernel: amd-pstate — CPU power management

llama.cpp Optimization

llama.cpp Speculative Decoding — Draft model setup
llama.cpp PR #20075 — Fix speculative for Qwen3.5 MoE (pending)
llama.cpp PR #20700 — Native MTP for Qwen3.5 (WIP)
llama.cpp PR #16827 — rocWMMA tuned flash attention
llama.cpp Issue #12444 — Hugepage support proposal

AMD GPU Profiling

Radeon GPU Profiler (RGP) — Hardware-level Vulkan/HIP profiling
Radeon GPU Analyzer (RGA) — Offline shader/kernel analysis

6.1 KiB Raw Blame History

External References

AMD Official

Strix Halo Toolboxes (Donato Capitella)

Community

Other Strix Halo Repos

Monitoring Tools

LLM Inference

Qwen3.5 Models (GGUF)

Agentic Evaluation

System Tuning

llama.cpp Optimization

AMD GPU Profiling

6.1 KiB

Raw Blame History