f92b710492cfb879571d6e874c7129151d746b74
KV cache quantization adds type_k/type_v columns to llama-bench output, shifting test and t/s to different indices. Parse from end of row instead of hardcoded positions. Also fix KV suffix separator (underscore to dash) to avoid regex ambiguity with type names like q8_0. Add 5-phase optimization guide, optimization log for tracking results, and research docs on llama.cpp and inference landscape optimizations.
Strix Halo Optimization Toolkit
Audit, monitor, benchmark, and optimize AMD Strix Halo integrated GPU systems for LLM inference workloads.
Target hardware: AMD Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151) with 64 GB unified memory, on Fedora 43. Tested on HP ZBook Ultra G1a.
Quick Start
make audit # See current system status and optimization score
make monitor-install # Install amdgpu_top + btop
make benchmark-setup # Create toolbox containers + download test model
make benchmark-baseline # Capture performance before optimization
System Status
make audit produces a single-screen overview:
=== Memory Allocation ===
[!!] VRAM (dedicated) 32.0 GiB — should be 0.5 GiB in BIOS
[!!] GTT (dynamic) 15.5 GiB — should be ~59.0 GiB with kernel params
=== Kernel Boot Parameters ===
[!!] iommu=pt MISSING
[!!] amdgpu.gttsize MISSING — recommended: 60416
[!!] ttm.pages_limit MISSING — recommended: 15466496
=== Performance Profile ===
[!!] Tuned profile throughput-performance — recommended: accelerator-performance
=== Optimization Score ===
2 / 8 checks passing
Each [!!] is an optimization opportunity. Run make optimize to address them.
Commands
| Command | Description |
|---|---|
make audit |
Quick system status (single screen) |
make audit-full |
Full system report (saved to data/audits/) |
make monitor |
Launch tmux monitoring dashboard |
make monitor-simple |
Launch amdgpu_top only |
make monitor-install |
Install monitoring tools (amdgpu_top, btop) |
make monitor-log |
Start background CSV metric logger |
make benchmark-setup |
Ensure toolboxes and test models are ready |
make benchmark-baseline |
Capture pre-optimization baseline |
make benchmark |
Run full benchmark suite |
make benchmark-compare |
Compare two runs (BEFORE=dir AFTER=dir) |
sudo make optimize |
Interactive optimization walkthrough |
sudo make optimize-kernel |
Configure kernel boot parameters |
sudo make optimize-tuned |
Switch to accelerator-performance profile |
make optimize-vram |
BIOS VRAM guidance + GTT verification |
make verify |
Post-optimization verification checklist |
sudo make rollback |
Rollback optimizations |
make agentic-setup |
Install agentic eval frameworks (inspect-ai, evalplus) |
make agentic-quick ARGS="--model NAME" |
EvalPlus + IFEval (~1 hour) |
make agentic-code ARGS="--model NAME" |
Code generation evals (~2-3 hours) |
make agentic-tooluse ARGS="--model NAME" |
BFCL function calling eval (~1-2 hours) |
make agentic-full ARGS="--model NAME" |
All agentic evaluations (~5-6 hours) |
make test |
Run BATS test suite |
Optimization Workflow
1. Audit make audit
│
2. Monitor make monitor-install && make monitor
│
3. Baseline make benchmark-setup && make benchmark-baseline
│
4. Optimize sudo make optimize
│ ├── tuned profile (instant, +5-8% pp)
│ ├── kernel params (reboot required)
│ └── BIOS VRAM (reboot + BIOS access)
│
5. Verify make verify
│
6. Re-benchmark make benchmark && make benchmark-compare BEFORE=... AFTER=...
See docs/optimization.md for the full walkthrough with explanations.
Project Structure
bin/ Entry points (audit, monitor, benchmark, optimize)
lib/ Shared bash libraries (common, detect, format)
scripts/ Implementation organized by function
configs/ Reference configuration (grub-cmdline.conf with recommended kernel params)
data/ Runtime output: audits, benchmarks, logs, backups (gitignored)
docs/ Technical documentation
See docs/architecture.md for the full architecture, data flow, and JSON schemas.
Requirements
- OS: Fedora 43 (tested). Requires kernel >= 6.18.4
- Hardware: AMD Strix Halo (Ryzen AI MAX / MAX+) with RDNA 3.5 iGPU
- Tools:
bc,python3,tmux,podman,toolbox - Optional:
amdgpu_top(installed viamake monitor-install),huggingface-cli(for model downloads)
Documentation
| Document | Contents |
|---|---|
| docs/architecture.md | Script layers, data flow, unified memory model, JSON schemas |
| docs/optimization.md | Step-by-step optimization walkthrough |
| docs/benchmarking.md | Benchmark methodology, test params, result interpretation |
| docs/bios-vram-guide.md | HP ZBook BIOS configuration for VRAM |
| docs/troubleshooting.md | Common issues and fixes |
| docs/model-recommendations.md | Qwen3.5 models, quantization, memory planning |
| docs/agentic-benchmarks.md | Agentic evaluation frameworks and methodology |
| docs/references.md | External links: AMD docs, toolboxes, community resources |
Contributing
AI assistants: see CLAUDE.md for safety rules and technical context. Agent workflows are in AGENTS.md.
Description
AMD Strix Halo optimization toolkit — audit, monitor, benchmark, and optimize for LLM workloads
Languages
Shell
97.5%
Makefile
2.5%