Files
strix-halo-optimizations/docs/references.md
Felipe Cardoso f92b710492 fix(benchmark): parse llama-bench output with variable column count
KV cache quantization adds type_k/type_v columns to llama-bench output,
shifting test and t/s to different indices. Parse from end of row instead
of hardcoded positions. Also fix KV suffix separator (underscore to dash)
to avoid regex ambiguity with type names like q8_0.

Add 5-phase optimization guide, optimization log for tracking results,
and research docs on llama.cpp and inference landscape optimizations.
2026-03-27 14:54:19 +01:00

6.1 KiB

External References

Single source of truth for all external links used across this project.

AMD Official

Strix Halo Toolboxes (Donato Capitella)

The most comprehensive community resource for Strix Halo LLM optimization.

Community

Other Strix Halo Repos

Monitoring Tools

  • amdgpu_top — Best AMD GPU monitor (TUI/GUI/JSON)
  • nvtop — Cross-vendor GPU monitor
  • btop — System resource monitor

LLM Inference

  • llama.cpp — LLM inference engine (Vulkan + ROCm)
  • ollama — LLM runtime with model management
  • vLLM — High-throughput serving
  • llama-benchy — Multi-backend LLM benchmarking

Qwen3.5 Models (GGUF)

Agentic Evaluation

  • Inspect AI — All-in-one eval framework (HumanEval, BFCL, IFEval, GAIA)
  • EvalPlus — HumanEval+ / MBPP+ with native ollama support
  • BigCodeBench — 1,140 coding tasks across 139 libraries
  • BFCL — Berkeley Function Calling Leaderboard
  • SWE-bench — Real GitHub issue resolution
  • Qwen-Agent — Optimized agentic framework for Qwen models

System Tuning

llama.cpp Optimization

AMD GPU Profiling