strix-halo-optimizations/docs/references.md

# External References

Single source of truth for all external links used across this project.

## AMD Official

- [ROCm Strix Halo Optimization Guide](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/strixhalo.html) — BIOS, kernel params, GTT/TTM configuration
- [ROCm System Optimization Index](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/index.html) — General ROCm tuning
- [ROCm Installation Guide (Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/) — Package installation
- [AMD SMI Documentation](https://rocm.docs.amd.com/projects/amdsmi/en/latest/) — GPU monitoring API
- [ROCm GitHub](https://github.com/ROCm/ROCm) — Source and issue tracker

## Strix Halo Toolboxes (Donato Capitella)

The most comprehensive community resource for Strix Halo LLM optimization.

- [strix-halo-toolboxes.com](https://strix-halo-toolboxes.com/) — Documentation, benchmarks, guides
- [GitHub: kyuz0/amd-strix-halo-toolboxes](https://github.com/kyuz0/amd-strix-halo-toolboxes) — Container images, benchmark scripts, VRAM estimator
- [Benchmark Results Viewer](https://kyuz0.github.io/amd-strix-halo-toolboxes/) — Interactive performance charts

## Community

- [Strix Halo Wiki — AI Capabilities](https://strixhalo.wiki/AI/AI_Capabilities_Overview) — Community benchmarks, model compatibility
- [Strix Halo Wiki — Power Modes](https://strixhalo.wiki/Guides/Power-Modes-and-Performance) — RyzenAdj sweet spots (85W recommended)
- [Strix Halo Wiki — llama.cpp Performance](https://strixhalo.wiki/AI/llamacpp-performance) — Backend comparison data
- [Level1Techs Forum — HP G1a Guide](https://forum.level1techs.com/t/the-ultimate-arch-secureboot-guide-for-ryzen-ai-max-ft-hp-g1a-128gb-8060s-monster-laptop/230652) — Laptop-specific configuration
- [Framework Community — GPU Performance Tests](https://community.frame.work/t/amd-strix-halo-ryzen-ai-max-395-gpu-llm-performance-tests/72521) — Framework Desktop results
- [Framework Community — Compiling vLLM on Strix Halo](https://community.frame.work/t/how-to-compiling-vllm-from-source-on-strix-halo/77241) — Native vLLM build guide
- [Hardware Corner — Strix Halo LLM Optimization](https://www.hardware-corner.net/strix-halo-llm-optimization/) — Comprehensive optimization walkthrough
- [Chips and Cheese — Strix Halo Memory Subsystem](https://chipsandcheese.com/p/strix-halos-memory-subsystem-tackling) — Bandwidth measurements (215 GB/s)
- [LLM Tracker — Strix Halo](https://llm-tracker.info/_TOORG/Strix-Halo) — Centralized performance database

## Other Strix Halo Repos

- [pablo-ross/strix-halo-gmktec-evo-x2](https://github.com/pablo-ross/strix-halo-gmktec-evo-x2) — GMKtec EVO X2 optimization
- [kyuz0/amd-strix-halo-llm-finetuning](https://github.com/kyuz0/amd-strix-halo-llm-finetuning) — Fine-tuning guides (Gemma-3, Qwen-3)

## Monitoring Tools

- [amdgpu_top](https://github.com/Umio-Yasuno/amdgpu_top) — Best AMD GPU monitor (TUI/GUI/JSON)
- [nvtop](https://github.com/Syllo/nvtop) — Cross-vendor GPU monitor
- [btop](https://github.com/aristocratos/btop) — System resource monitor

## LLM Inference

- [llama.cpp](https://github.com/ggml-org/llama.cpp) — LLM inference engine (Vulkan + ROCm)
- [ollama](https://ollama.com/) — LLM runtime with model management
- [vLLM](https://github.com/vllm-project/vllm) — High-throughput serving
- [llama-benchy](https://github.com/eugr/llama-benchy) — Multi-backend LLM benchmarking

## Qwen3.5 Models (GGUF)

- [unsloth/Qwen3.5-35B-A3B-GGUF](https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF) — Top pick for 64GB Strix Halo (MoE, 3B active)
- [unsloth/Qwen3.5-27B-GGUF](https://huggingface.co/unsloth/Qwen3.5-27B-GGUF) — Dense 27B
- [unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF](https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF) — Best for agentic/coding
- [Qwen3.5 Official](https://github.com/QwenLM/Qwen3.5) — Model family overview
- [Unsloth Dynamic 2.0](https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs) — Adaptive quantization methodology
- [Unsloth Studio](https://unsloth.ai/docs/new/studio) — Training + inference UI (beta)

## Agentic Evaluation

- [Inspect AI](https://github.com/UKGovernmentBEIS/inspect_ai) — All-in-one eval framework (HumanEval, BFCL, IFEval, GAIA)
- [EvalPlus](https://github.com/evalplus/evalplus) — HumanEval+ / MBPP+ with native ollama support
- [BigCodeBench](https://github.com/bigcode-project/bigcodebench) — 1,140 coding tasks across 139 libraries
- [BFCL](https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard) — Berkeley Function Calling Leaderboard
- [SWE-bench](https://github.com/princeton-nlp/SWE-bench) — Real GitHub issue resolution
- [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) — Optimized agentic framework for Qwen models

## System Tuning

- [RyzenAdj](https://github.com/FlyGoat/RyzenAdj) — Power management for Ryzen APUs (PPT/TDP control)
- [geohot/ztop](https://github.com/geohot/ztop) — Power monitoring for Strix Halo (discovered 60W HP limits)
- [ROCm Issue #5750](https://github.com/ROCm/ROCm/issues/5750) — GPU clocks stuck at idle on gfx1151
- [Mesa RADV Environment Variables](https://docs.mesa3d.org/envvars.html) — RADV_PERFTEST=nogttspill docs
- [Linux Kernel: amd-pstate](https://docs.kernel.org/admin-guide/pm/amd-pstate.html) — CPU power management

## llama.cpp Optimization

- [llama.cpp Speculative Decoding](https://github.com/ggml-org/llama.cpp/blob/master/docs/speculative.md) — Draft model setup
- [llama.cpp PR #20075](https://github.com/ggml-org/llama.cpp/pull/20075) — Fix speculative for Qwen3.5 MoE (pending)
- [llama.cpp PR #20700](https://github.com/ggml-org/llama.cpp/pull/20700) — Native MTP for Qwen3.5 (WIP)
- [llama.cpp PR #16827](https://github.com/ggml-org/llama.cpp/pull/16827) — rocWMMA tuned flash attention
- [llama.cpp Issue #12444](https://github.com/ggml-org/llama.cpp/issues/12444) — Hugepage support proposal

## AMD GPU Profiling

- [Radeon GPU Profiler (RGP)](https://gpuopen.com/rgp/) — Hardware-level Vulkan/HIP profiling
- [Radeon GPU Analyzer (RGA)](https://gpuopen.com/rga/) — Offline shader/kernel analysis