Go to file

Felipe Cardoso f92b710492 fix(benchmark): parse llama-bench output with variable column count

KV cache quantization adds type_k/type_v columns to llama-bench output,
shifting test and t/s to different indices. Parse from end of row instead
of hardcoded positions. Also fix KV suffix separator (underscore to dash)
to avoid regex ambiguity with type names like q8_0.

Add 5-phase optimization guide, optimization log for tracking results,
and research docs on llama.cpp and inference landscape optimizations.

2026-03-27 14:54:19 +01:00

bin

feat(benchmark): add --kv-types flag for KV cache quantization sweep

2026-03-27 12:29:19 +01:00

configs

fix: follow symlinks in model discovery, update model catalog

2026-03-26 09:44:16 +01:00

docs

fix(benchmark): parse llama-bench output with variable column count

2026-03-27 14:54:19 +01:00

lib

fix(docs): address review findings — accuracy, consistency, completeness

2026-03-25 21:44:16 +01:00

scripts

fix(benchmark): parse llama-bench output with variable column count

2026-03-27 14:54:19 +01:00

tests

feat(benchmark): add --kv-types flag for KV cache quantization sweep

2026-03-27 12:29:19 +01:00

.gitignore

chore: remove .idea from tracking, add to .gitignore

2026-03-25 23:58:18 +01:00

AGENTS.md

docs: add README, CLAUDE.md, AGENTS.md, and full docs/ suite

2026-03-25 20:50:00 +01:00

CLAUDE.md

feat: add Qwen3.5 model catalog and agentic evaluation framework

2026-03-26 00:20:23 +01:00

Makefile

fix: pass ARGS through in benchmark Makefile targets

2026-03-26 19:10:59 +01:00

README.md

feat: add Qwen3.5 model catalog and agentic evaluation framework

2026-03-26 00:20:23 +01:00

README.md

Strix Halo Optimization Toolkit

Audit, monitor, benchmark, and optimize AMD Strix Halo integrated GPU systems for LLM inference workloads.

Target hardware: AMD Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151) with 64 GB unified memory, on Fedora 43. Tested on HP ZBook Ultra G1a.

Quick Start

make audit                # See current system status and optimization score
make monitor-install      # Install amdgpu_top + btop
make benchmark-setup      # Create toolbox containers + download test model
make benchmark-baseline   # Capture performance before optimization

System Status

make audit produces a single-screen overview:

=== Memory Allocation ===
  [!!] VRAM (dedicated)               32.0 GiB — should be 0.5 GiB in BIOS
  [!!] GTT (dynamic)                  15.5 GiB — should be ~59.0 GiB with kernel params

=== Kernel Boot Parameters ===
  [!!] iommu=pt                       MISSING
  [!!] amdgpu.gttsize                 MISSING — recommended: 60416
  [!!] ttm.pages_limit                MISSING — recommended: 15466496

=== Performance Profile ===
  [!!] Tuned profile                  throughput-performance — recommended: accelerator-performance

=== Optimization Score ===
  2 / 8 checks passing

Each [!!] is an optimization opportunity. Run make optimize to address them.

Commands

Command	Description
`make audit`	Quick system status (single screen)
`make audit-full`	Full system report (saved to data/audits/)
`make monitor`	Launch tmux monitoring dashboard
`make monitor-simple`	Launch amdgpu_top only
`make monitor-install`	Install monitoring tools (amdgpu_top, btop)
`make monitor-log`	Start background CSV metric logger
`make benchmark-setup`	Ensure toolboxes and test models are ready
`make benchmark-baseline`	Capture pre-optimization baseline
`make benchmark`	Run full benchmark suite
`make benchmark-compare`	Compare two runs (`BEFORE=dir AFTER=dir`)
`sudo make optimize`	Interactive optimization walkthrough
`sudo make optimize-kernel`	Configure kernel boot parameters
`sudo make optimize-tuned`	Switch to accelerator-performance profile
`make optimize-vram`	BIOS VRAM guidance + GTT verification
`make verify`	Post-optimization verification checklist
`sudo make rollback`	Rollback optimizations
`make agentic-setup`	Install agentic eval frameworks (inspect-ai, evalplus)
`make agentic-quick ARGS="--model NAME"`	EvalPlus + IFEval (~1 hour)
`make agentic-code ARGS="--model NAME"`	Code generation evals (~2-3 hours)
`make agentic-tooluse ARGS="--model NAME"`	BFCL function calling eval (~1-2 hours)
`make agentic-full ARGS="--model NAME"`	All agentic evaluations (~5-6 hours)
`make test`	Run BATS test suite

Optimization Workflow

1. Audit          make audit
      │
2. Monitor        make monitor-install && make monitor
      │
3. Baseline       make benchmark-setup && make benchmark-baseline
      │
4. Optimize       sudo make optimize
      │               ├── tuned profile  (instant, +5-8% pp)
      │               ├── kernel params  (reboot required)
      │               └── BIOS VRAM      (reboot + BIOS access)
      │
5. Verify         make verify
      │
6. Re-benchmark   make benchmark && make benchmark-compare BEFORE=... AFTER=...

See docs/optimization.md for the full walkthrough with explanations.

Project Structure

bin/            Entry points (audit, monitor, benchmark, optimize)
lib/            Shared bash libraries (common, detect, format)
scripts/        Implementation organized by function
configs/        Reference configuration (grub-cmdline.conf with recommended kernel params)
data/           Runtime output: audits, benchmarks, logs, backups (gitignored)
docs/           Technical documentation

See docs/architecture.md for the full architecture, data flow, and JSON schemas.

Requirements

OS: Fedora 43 (tested). Requires kernel >= 6.18.4
Hardware: AMD Strix Halo (Ryzen AI MAX / MAX+) with RDNA 3.5 iGPU
Tools: bc, python3, tmux, podman, toolbox
Optional: amdgpu_top (installed via make monitor-install), huggingface-cli (for model downloads)

Documentation

Document	Contents
docs/architecture.md	Script layers, data flow, unified memory model, JSON schemas
docs/optimization.md	Step-by-step optimization walkthrough
docs/benchmarking.md	Benchmark methodology, test params, result interpretation
docs/bios-vram-guide.md	HP ZBook BIOS configuration for VRAM
docs/troubleshooting.md	Common issues and fixes
docs/model-recommendations.md	Qwen3.5 models, quantization, memory planning
docs/agentic-benchmarks.md	Agentic evaluation frameworks and methodology
docs/references.md	External links: AMD docs, toolboxes, community resources

Contributing

AI assistants: see CLAUDE.md for safety rules and technical context. Agent workflows are in AGENTS.md.