Felipe Cardoso f92b710492 fix(benchmark): parse llama-bench output with variable column count
KV cache quantization adds type_k/type_v columns to llama-bench output,
shifting test and t/s to different indices. Parse from end of row instead
of hardcoded positions. Also fix KV suffix separator (underscore to dash)
to avoid regex ambiguity with type names like q8_0.

Add 5-phase optimization guide, optimization log for tracking results,
and research docs on llama.cpp and inference landscape optimizations.
2026-03-27 14:54:19 +01:00

Strix Halo Optimization Toolkit

Audit, monitor, benchmark, and optimize AMD Strix Halo integrated GPU systems for LLM inference workloads.

Target hardware: AMD Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151) with 64 GB unified memory, on Fedora 43. Tested on HP ZBook Ultra G1a.

Quick Start

make audit                # See current system status and optimization score
make monitor-install      # Install amdgpu_top + btop
make benchmark-setup      # Create toolbox containers + download test model
make benchmark-baseline   # Capture performance before optimization

System Status

make audit produces a single-screen overview:

=== Memory Allocation ===
  [!!] VRAM (dedicated)               32.0 GiB — should be 0.5 GiB in BIOS
  [!!] GTT (dynamic)                  15.5 GiB — should be ~59.0 GiB with kernel params

=== Kernel Boot Parameters ===
  [!!] iommu=pt                       MISSING
  [!!] amdgpu.gttsize                 MISSING — recommended: 60416
  [!!] ttm.pages_limit                MISSING — recommended: 15466496

=== Performance Profile ===
  [!!] Tuned profile                  throughput-performance — recommended: accelerator-performance

=== Optimization Score ===
  2 / 8 checks passing

Each [!!] is an optimization opportunity. Run make optimize to address them.

Commands

Command Description
make audit Quick system status (single screen)
make audit-full Full system report (saved to data/audits/)
make monitor Launch tmux monitoring dashboard
make monitor-simple Launch amdgpu_top only
make monitor-install Install monitoring tools (amdgpu_top, btop)
make monitor-log Start background CSV metric logger
make benchmark-setup Ensure toolboxes and test models are ready
make benchmark-baseline Capture pre-optimization baseline
make benchmark Run full benchmark suite
make benchmark-compare Compare two runs (BEFORE=dir AFTER=dir)
sudo make optimize Interactive optimization walkthrough
sudo make optimize-kernel Configure kernel boot parameters
sudo make optimize-tuned Switch to accelerator-performance profile
make optimize-vram BIOS VRAM guidance + GTT verification
make verify Post-optimization verification checklist
sudo make rollback Rollback optimizations
make agentic-setup Install agentic eval frameworks (inspect-ai, evalplus)
make agentic-quick ARGS="--model NAME" EvalPlus + IFEval (~1 hour)
make agentic-code ARGS="--model NAME" Code generation evals (~2-3 hours)
make agentic-tooluse ARGS="--model NAME" BFCL function calling eval (~1-2 hours)
make agentic-full ARGS="--model NAME" All agentic evaluations (~5-6 hours)
make test Run BATS test suite

Optimization Workflow

1. Audit          make audit
      │
2. Monitor        make monitor-install && make monitor
      │
3. Baseline       make benchmark-setup && make benchmark-baseline
      │
4. Optimize       sudo make optimize
      │               ├── tuned profile  (instant, +5-8% pp)
      │               ├── kernel params  (reboot required)
      │               └── BIOS VRAM      (reboot + BIOS access)
      │
5. Verify         make verify
      │
6. Re-benchmark   make benchmark && make benchmark-compare BEFORE=... AFTER=...

See docs/optimization.md for the full walkthrough with explanations.

Project Structure

bin/            Entry points (audit, monitor, benchmark, optimize)
lib/            Shared bash libraries (common, detect, format)
scripts/        Implementation organized by function
configs/        Reference configuration (grub-cmdline.conf with recommended kernel params)
data/           Runtime output: audits, benchmarks, logs, backups (gitignored)
docs/           Technical documentation

See docs/architecture.md for the full architecture, data flow, and JSON schemas.

Requirements

  • OS: Fedora 43 (tested). Requires kernel >= 6.18.4
  • Hardware: AMD Strix Halo (Ryzen AI MAX / MAX+) with RDNA 3.5 iGPU
  • Tools: bc, python3, tmux, podman, toolbox
  • Optional: amdgpu_top (installed via make monitor-install), huggingface-cli (for model downloads)

Documentation

Document Contents
docs/architecture.md Script layers, data flow, unified memory model, JSON schemas
docs/optimization.md Step-by-step optimization walkthrough
docs/benchmarking.md Benchmark methodology, test params, result interpretation
docs/bios-vram-guide.md HP ZBook BIOS configuration for VRAM
docs/troubleshooting.md Common issues and fixes
docs/model-recommendations.md Qwen3.5 models, quantization, memory planning
docs/agentic-benchmarks.md Agentic evaluation frameworks and methodology
docs/references.md External links: AMD docs, toolboxes, community resources

Contributing

AI assistants: see CLAUDE.md for safety rules and technical context. Agent workflows are in AGENTS.md.

Description
AMD Strix Halo optimization toolkit — audit, monitor, benchmark, and optimize for LLM workloads
Readme 389 KiB
Languages
Shell 97.5%
Makefile 2.5%