docs: add README, CLAUDE.md, AGENTS.md, and full docs/ suite

- README.md: project overview, quick start, command reference, workflow - CLAUDE.md: AI safety rules, technical details, conventions - AGENTS.md: agent workflows, file responsibility map, dependency matrix - docs/architecture.md: script layers, data flow, unified memory, JSON schemas - docs/optimization.md: step-by-step optimization walkthrough - docs/benchmarking.md: methodology, test params, result interpretation - docs/troubleshooting.md: common issues and fixes - docs/references.md: centralized external links (single source of truth) - docs/bios-vram-guide.md: add back-link to optimization workflow Cross-linked non-redundantly: each doc owns one layer, others link to it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 20:50:00 +01:00
parent af0515d05d
commit 5b81437637
9 changed files with 667 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,62 @@
 # AGENTS.md — Agent Workflows
 Read [CLAUDE.md](CLAUDE.md) first for safety rules and technical context.
 ## Common Workflows
 ### Add a Detection Function
 1. Add the function to `lib/detect.sh` following `detect_*` naming convention
 2. If it reads sysfs, use `$GPU_SYSFS` (auto-detected) with a `2>/dev/null` fallback
 3. Wire it into `scripts/audit/quick-glance.sh` (display) and/or `scripts/audit/system-report.sh` (JSON output)
 4. If it has an optimal value, add a check to `scripts/optimize/verify.sh`
 5. Validate: `make audit` and `bin/audit --json | python3 -m json.tool`
 ### Add a Benchmark Backend
 1. Add the toolbox entry to `BENCH_PATHS` associative array in both `scripts/benchmark/run-baseline.sh` and `scripts/benchmark/run-suite.sh`
 2. Map the toolbox name → llama-bench binary path (Vulkan: `/usr/sbin/llama-bench`, ROCm: `/usr/local/bin/llama-bench`)
 3. If ROCm, the `ENV_ARGS` logic already handles `ROCBLAS_USE_HIPBLASLT=1`
 4. Add the toolbox to `refresh-toolboxes.sh` in the [toolboxes repo](https://github.com/kyuz0/amd-strix-halo-toolboxes)
 5. Validate: `make benchmark-setup` then `make benchmark`
 ### Add an Optimization Script
 1. Create `scripts/optimize/my-optimization.sh` sourcing `lib/common.sh` (and `detect.sh` / `format.sh` as needed)
 2. Add root check at top if the script modifies system state: `[[ $EUID -ne 0 ]] && { log_error "Requires root"; exit 1; }`
 3. Add a corresponding case to `bin/optimize`
 4. Add a Makefile target
 5. Add verification criteria to `scripts/optimize/verify.sh`
 6. If the optimization is reversible, add rollback logic to `scripts/optimize/rollback.sh`
 7. Document in [docs/optimization.md](docs/optimization.md)
 ### Add a Monitoring Metric
 1. In `scripts/monitor/log-metrics.sh`, cache the sysfs path at startup (avoid per-sample globbing)
 2. Read with `read -r var < "$SYSFS_PATH" 2>/dev/null || var=0` (no subshells in the hot loop)
 3. Add the column to the CSV header and the `echo` line
 4. Update the CSV schema in [docs/architecture.md](docs/architecture.md)
 ## File Responsibility Map
 | Want to change... | Touch these files |
 |-------------------|-------------------|
 | What `make audit` shows | `scripts/audit/quick-glance.sh`, `lib/detect.sh` |
 | JSON audit output | `scripts/audit/system-report.sh`, `lib/detect.sh` |
 | Dashboard layout | `scripts/monitor/dashboard.sh` |
 | Metric collection | `scripts/monitor/log-metrics.sh` |
 | Benchmark parameters | `scripts/benchmark/run-baseline.sh`, `run-suite.sh` |
 | Result comparison | `scripts/benchmark/compare.sh` |
 | Kernel params | `scripts/optimize/kernel-params.sh`, `lib/detect.sh` (recommended values) |
 | Optimization checks | `scripts/optimize/verify.sh`, `scripts/audit/quick-glance.sh` |
 | Shared utilities | `lib/common.sh` (logging), `lib/format.sh` (output), `lib/detect.sh` (hardware) |
 | External links | `docs/references.md` (single source of truth) |
 ## Dependencies by Workflow
 | Workflow | Requires |
 |----------|----------|
 | Audit | `bc`, `python3` |
 | Monitor | `tmux`, `amdgpu_top` or `nvtop`, `btop` or `htop` |
 | Benchmark | `toolbox`, `podman`, GGUF models in `data/models/`, `python3` |
 | Optimize | `sudo`/root, `grubby` or `grub2-mkconfig`, `tuned-adm`, `python3` |
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,48 @@
 # CLAUDE.md — AI Assistant Context
 Optimization toolkit for AMD Strix Halo (Ryzen AI MAX+ 395, Radeon 8060S gfx1151, 64 GB unified memory) on Fedora 43. Pure bash scripts with inline Python for JSON handling and GRUB editing. See [README.md](README.md) for user-facing commands.
 ## Architecture
 `bin/` dispatchers → `scripts/` implementations → `lib/` shared libraries. All scripts source libs in order: `common.sh` → `detect.sh` → `format.sh`. Runtime data goes to `data/` (gitignored). Full details in [docs/architecture.md](docs/architecture.md).
 ## Safety Rules
 - **`scripts/optimize/kernel-params.sh`** modifies `/etc/default/grub` — requires root, backs up to `data/backups/` first. Always maintain the Python-with-env-vars pattern for GRUB editing (no shell variable interpolation into Python code).
 - **`scripts/optimize/tuned-profile.sh`** and **`rollback.sh`** require root and save previous state for rollback.
 - **`data/backups/`** contains GRUB backups and tuned profile snapshots — never delete these.
 - Optimization scripts that require root check `$EUID` at the top and exit immediately if not root.
 - All Python blocks receive data via environment variables (`os.environ`), never via shell interpolation into Python source. This prevents injection. **Do not revert to `'''$var'''` or `"$var"` patterns inside Python heredocs.**
 ## Key Technical Details
 - **GPU sysfs**: Auto-detected by `find_gpu_card()` in `lib/detect.sh` (matches vendor `0x1002`). Falls back to first card with `mem_info_vram_total`.
 - **Memory recommendations**: `recommended_gttsize_mib()` in `detect.sh` computes from total physical RAM = visible RAM + dedicated VRAM (the VRAM is still physical memory). Floor at 1 GiB.
 - **Kernel param detection**: `detect_kernel_param()` uses word-boundary-anchored regex to avoid `iommu` matching `amd_iommu`.
 - **Benchmark invocation**: `toolbox run -c NAME -- [env ROCBLAS_USE_HIPBLASLT=1] /path/to/llama-bench -ngl 99 -mmp 0 -fa 1 -r N`. ENV_ARGS passed as a proper bash array (not string splitting).
 - **llama-bench output**: Pipe-delimited table. Python parser at fixed column indices (parts[8]=test, parts[9]=t/s). Format changes upstream would break parsing.
 - **ROCm for gfx1151**: `ROCBLAS_USE_HIPBLASLT=1`, `HSA_OVERRIDE_GFX_VERSION=11.5.1`.
 - **Fedora GRUB**: Prefers `grubby` (BLS) over `grub2-mkconfig`. Both paths are handled.
 ## Conventions
 - `set -euo pipefail` in every executable script
 - `snake_case` function names, `UPPER_CASE` for constants and loop variables
 - 4-space indentation, no tabs
 - `lib/` files are sourced (no shebang enforcement), but include `#!/usr/bin/env bash` for editor support
 - Colors gated on `[[ -t 1 ]]` (disabled when piped)
 - `bc` used for float math; `python3` for JSON and GRUB editing only
 ## Validating Changes
 ```bash
 make audit          # Quick check — shows system status with pass/fail indicators
 make verify         # 9-point optimization checklist
 bin/audit --json | python3 -m json.tool   # Verify JSON output is valid
 ```
 ## External Resources
 All external links are centralized in [docs/references.md](docs/references.md). Key ones:
 - AMD ROCm Strix Halo guide (kernel params, GTT configuration)
 - Donato Capitella toolboxes (container images, benchmarks, VRAM estimator)
--- a/README.md
+++ b/README.md
@@ -0,0 +1,114 @@
 # Strix Halo Optimization Toolkit
 Audit, monitor, benchmark, and optimize AMD Strix Halo integrated GPU systems for LLM inference workloads.
 **Target hardware**: AMD Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151) with 64 GB unified memory, on Fedora 43. Tested on HP ZBook Ultra G1a.
 ## Quick Start
 ```bash
 make audit                # See current system status and optimization score
 make monitor-install      # Install amdgpu_top + btop
 make benchmark-setup      # Create toolbox containers + download test model
 make benchmark-baseline   # Capture performance before optimization
 ```
 ## System Status
 `make audit` produces a single-screen overview:
 ```
 === Memory Allocation ===
  [!!] VRAM (dedicated)               32.0 GiB — should be 0.5 GiB in BIOS
  [!!] GTT (dynamic)                  15.5 GiB — should be ~59.0 GiB with kernel params
 === Kernel Boot Parameters ===
  [!!] iommu=pt                       MISSING
  [!!] amdgpu.gttsize                 MISSING — recommended: 60416
  [!!] ttm.pages_limit                MISSING — recommended: 15466496
 === Performance Profile ===
  [!!] Tuned profile                  throughput-performance — recommended: accelerator-performance
 === Optimization Score ===
  2 / 8 checks passing
 ```
 Each `[!!]` is an optimization opportunity. Run `make optimize` to address them.
 ## Commands
 | Command | Description |
 |---------|-------------|
 | `make audit` | Quick system status (single screen) |
 | `make audit-full` | Full system report (saved to data/audits/) |
 | `make monitor` | Launch tmux monitoring dashboard |
 | `make monitor-simple` | Launch amdgpu_top only |
 | `make monitor-install` | Install monitoring tools (amdgpu_top, btop) |
 | `make monitor-log` | Start background CSV metric logger |
 | `make benchmark-setup` | Ensure toolboxes and test models are ready |
 | `make benchmark-baseline` | Capture pre-optimization baseline |
 | `make benchmark` | Run full benchmark suite |
 | `make benchmark-compare` | Compare two runs (`BEFORE=dir AFTER=dir`) |
 | `sudo make optimize` | Interactive optimization walkthrough |
 | `sudo make optimize-kernel` | Configure kernel boot parameters |
 | `sudo make optimize-tuned` | Switch to accelerator-performance profile |
 | `make optimize-vram` | BIOS VRAM guidance + GTT verification |
 | `make verify` | Post-optimization verification checklist |
 | `sudo make rollback` | Rollback optimizations |
 ## Optimization Workflow
 ```
 1. Audit          make audit
      │
 2. Monitor        make monitor-install && make monitor
      │
 3. Baseline       make benchmark-setup && make benchmark-baseline
      │
 4. Optimize       sudo make optimize
      │               ├── tuned profile  (instant, +5-8% pp)
      │               ├── kernel params  (reboot required)
      │               └── BIOS VRAM      (reboot + BIOS access)
      │
 5. Verify         make verify
      │
 6. Re-benchmark   make benchmark && make benchmark-compare BEFORE=... AFTER=...
 ```
 See [docs/optimization.md](docs/optimization.md) for the full walkthrough with explanations.
 ## Project Structure
 ```
 bin/            Entry points (audit, monitor, benchmark, optimize)
 lib/            Shared bash libraries (common, detect, format)
 scripts/        Implementation organized by function
 configs/        Reference configuration templates
 data/           Runtime output: audits, benchmarks, logs, backups (gitignored)
 docs/           Technical documentation
 ```
 See [docs/architecture.md](docs/architecture.md) for the full architecture, data flow, and JSON schemas.
 ## Requirements
 - **OS**: Fedora 43 (tested), Fedora 42+ should work
 - **Hardware**: AMD Strix Halo (Ryzen AI MAX / MAX+) with RDNA 3.5 iGPU
 - **Tools**: `bc`, `python3`, `tmux`, `podman`, `toolbox`
 - **Optional**: `amdgpu_top` (installed via `make monitor-install`), `huggingface-cli` (for model downloads)
 ## Documentation
 | Document | Contents |
 |----------|----------|
 | [docs/architecture.md](docs/architecture.md) | Script layers, data flow, unified memory model, JSON schemas |
 | [docs/optimization.md](docs/optimization.md) | Step-by-step optimization walkthrough |
 | [docs/benchmarking.md](docs/benchmarking.md) | Benchmark methodology, test params, result interpretation |
 | [docs/bios-vram-guide.md](docs/bios-vram-guide.md) | HP ZBook BIOS configuration for VRAM |
 | [docs/troubleshooting.md](docs/troubleshooting.md) | Common issues and fixes |
 | [docs/references.md](docs/references.md) | External links: AMD docs, toolboxes, community resources |
 ## Contributing
 AI assistants: see [CLAUDE.md](CLAUDE.md) for safety rules and technical context. Agent workflows are in [AGENTS.md](AGENTS.md).
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,118 @@
 # Architecture
 ## Script Layers
 ```
 bin/                    User entry points (thin dispatchers)
  audit                   → scripts/audit/
  monitor                 → scripts/monitor/
  benchmark               → scripts/benchmark/
  optimize                → scripts/optimize/
 scripts/                Implementation (sourcing lib/ for shared logic)
  audit/                  System assessment
  monitor/                GPU/system monitoring + metrics logging
  benchmark/              llama-bench via toolbox containers
  optimize/               GRUB, tuned, BIOS guidance, verify, rollback
 lib/                    Shared bash libraries (sourced, not executed)
  common.sh               Logging, root checks, confirm prompts, paths
  detect.sh               Hardware/config detection from sysfs + system tools
  format.sh               Color output, human_bytes, status indicators, tables
 configs/                Reference configuration templates
 data/                   Runtime output (gitignored)
 docs/                   Documentation
 ```
 Every script sources libs in order: `common.sh` → `detect.sh` → `format.sh`. `format.sh` depends on color variables defined in `common.sh`.
 ## Data Flow
 ```
 /sys/class/drm/card1/device/   ──→  lib/detect.sh  ──→  scripts/audit/
 /proc/cpuinfo, /proc/meminfo   ──→      (detect_*)  ──→  scripts/monitor/
 /proc/cmdline                   ──→                  ──→  scripts/optimize/
 tuned-adm, rpm, lspci          ──→                  ──→  scripts/benchmark/
                                                              │
                                                              ▼
                                                         data/
                                                           ├── audits/*.json
                                                           ├── logs/*.csv
                                                           ├── baselines/*/summary.json
                                                           └── benchmarks/*/summary.json
 ```
 ## Unified Memory Model
 AMD Strix Halo shares physical RAM between CPU and GPU. Two allocation mechanisms:
 | Type | Description | Configuration |
 |------|-------------|---------------|
 | **VRAM (dedicated)** | Permanently reserved for GPU framebuffer | BIOS setting (UMA Frame Buffer Size) |
 | **GTT (dynamic)** | System RAM mapped into GPU address space on demand | Kernel boot params: `amdgpu.gttsize`, `ttm.pages_limit` |
 **Optimal for LLM workloads**: Minimize VRAM (0.5 GiB), maximize GTT (~60 GiB on 64 GB system). The GPU borrows memory when needed and releases it when idle.
 ### Kernel Parameter Math (64 GB system)
 ```
 Total physical RAM:     64 GiB
 OS reserve:              4 GiB
 Available for GTT:      60 GiB = 61440 MiB
 amdgpu.gttsize = 60 * 1024         = 61440   (MiB)
 ttm.pages_limit = 60 * 1024 * 256  = 15728640 (4K pages)
 iommu = pt                                     (passthrough, lower latency)
 ```
 The toolkit computes these dynamically via `recommended_gttsize_mib()` and `recommended_pages_limit()` in `lib/detect.sh`, based on detected total physical RAM (visible + VRAM).
 ### Sysfs Paths
 | Path | Content |
 |------|---------|
 | `/sys/class/drm/card1/device/mem_info_vram_total` | Dedicated VRAM in bytes |
 | `/sys/class/drm/card1/device/mem_info_vram_used` | VRAM currently in use |
 | `/sys/class/drm/card1/device/mem_info_gtt_total` | GTT allocation in bytes |
 | `/sys/class/drm/card1/device/mem_info_gtt_used` | GTT currently in use |
 | `/sys/class/drm/card1/device/gpu_busy_percent` | GPU utilization 0-100 |
 | `/sys/class/drm/card1/device/hwmon/hwmon*/temp1_input` | Temperature in millidegrees C |
 | `/sys/class/drm/card1/device/hwmon/hwmon*/power1_average` | Power in microwatts |
 Card number is auto-detected by `find_gpu_card()` (matches AMD vendor ID `0x1002`).
 ## JSON Output Schemas
 ### system-state.json (from `audit --json`)
 ```json
 {
  "timestamp": "20260325-120000",
  "hardware": { "cpu_model": "...", "cpu_cores": 16, "cpu_threads": 32, "gpu_name": "...", "gpu_device_id": "1586", "system_ram_kb": 32609248 },
  "memory": { "vram_total_bytes": 0, "vram_used_bytes": 0, "gtt_total_bytes": 0, "gtt_used_bytes": 0, "recommended_gttsize_mib": 0, "recommended_pages_limit": 0 },
  "kernel": { "version": "...", "cmdline": "...", "param_iommu": "", "param_gttsize": "", "param_pages_limit": "" },
  "firmware": "...", "tuned_profile": "...", "rocm_version": "...",
  "vulkan": { "driver": "...", "version": "..." },
  "sensors": { "gpu_temp_mc": 0, "gpu_power_uw": 0, "gpu_busy_pct": 0 },
  "toolboxes": [], "stacks": { "ollama": "...", "lmstudio": "...", "llamacpp": "...", "opencode": "..." }
 }
 ```
 ### summary.json (from benchmark runs)
 ```json
 {
  "results": [
    { "file": "model__backend__fa1.log", "model": "...", "size": "...", "backend": "Vulkan", "test": "pp512", "tokens_per_sec": 548.18, "raw": "548.18 +/- 1.59" }
  ]
 }
 ```
 ### metrics.csv (from `monitor --log`)
 ```
 timestamp,gpu_busy_pct,vram_used_mib,gtt_used_mib,gpu_temp_c,gpu_power_w,cpu_pct,ram_used_mib
 ```
 Sampled every 2 seconds by default. Pure bash implementation (no subshell forks per sample).
--- a/docs/benchmarking.md
+++ b/docs/benchmarking.md
@@ -0,0 +1,94 @@
 # Benchmarking
 ## What We Measure
 All benchmarks use [llama-bench](https://github.com/ggml-org/llama.cpp) (part of llama.cpp) running inside toolbox containers. Two test types:
 | Metric | Meaning | Test Params |
 |--------|---------|-------------|
 | **pp** (prompt processing) | How fast the model ingests input tokens | Default: 512 tokens |
 | **tg** (token generation) | How fast the model produces output tokens | Default: 128 tokens |
 Results are in **tokens/second (t/s)**. Higher is better.
 ## Test Parameters
 ### Standard Test
 ```
 -ngl 99 -mmp 0 -fa 1 -r 5
 ```
 - `-ngl 99` — all layers on GPU
 - `-mmp 0` — disable memory mapping (`--no-mmap`)
 - `-fa 1` — flash attention enabled
 - `-r 5` — 5 repetitions for statistical confidence
 ### Long-Context Test
 ```
 -ngl 99 -mmp 0 -fa 1 -p 2048 -n 32 -d 32768 -ub SIZE -r 3
 ```
 - `-p 2048` — 2048 prompt tokens
 - `-n 32` — generate 32 tokens
 - `-d 32768` — 32K context window
 - `-ub SIZE` — micro-batch size (512 for Vulkan, 2048 for ROCm)
 - `-r 3` — 3 repetitions (long-context tests are slow)
 The `-fa 1 --no-mmap -ngl 999` flags are **mandatory** on Strix Halo to avoid crashes.
 ## Available Backends
 | Backend | Container | Technology | Notes |
 |---------|-----------|------------|-------|
 | `llama-vulkan-radv` | Mesa RADV | Vulkan | Most stable, recommended default |
 | `llama-vulkan-amdvlk` | AMDVLK | Vulkan | Fastest when it works, 2GB buffer limit |
 | `llama-rocm-6.4.4` | ROCm 6.4.4 | HIP | Proven stable |
 | `llama-rocm-7.2` | ROCm 7.2 | HIP | Latest, compiler fixes applied |
 Containers are from [kyuz0/amd-strix-halo-toolboxes](https://github.com/kyuz0/amd-strix-halo-toolboxes). Set up with `make benchmark-setup`.
 ## Workflow
 ```bash
 # 1. Setup (one-time)
 make benchmark-setup
 # 2. Capture baseline (before optimization)
 make benchmark-baseline
 # 3. After optimizing, run again
 make benchmark              # or: bin/benchmark run --tag post-opt
 # 4. Compare
 make benchmark-compare BEFORE=data/baselines/20260325-120000 AFTER=data/benchmarks/post-opt-20260326-100000
 ```
 ## Result Format
 Each run produces a directory under `data/baselines/` or `data/benchmarks/`:
 ```
 TIMESTAMP/
  system-state.json    # Full system audit snapshot
  summary.json         # Parsed results (model, backend, test, t/s)
  metrics.csv          # GPU/CPU metrics during the run
  *.log                # Raw llama-bench output per backend+model+test
 ```
 ### Comparison Output
 ```
 Backend     | Model     | Test  | Before  | After   | Delta
 vulkan-radv | qwen3-4b  | pp512 | 548 t/s | 612 t/s | +11.7%
 vulkan-radv | qwen3-4b  | tg128 | 13.9    | 15.2    | +9.4%
 ```
 Configuration changes between runs (VRAM, GTT, kernel params, tuned profile) are shown if system-state.json differs.
 ## Recommended Test Models
 | Size | Model | File | Disk | Use Case |
 |------|-------|------|------|----------|
 | Small | Qwen3-4B | Q4_K_M.gguf | ~3 GB | Quick smoke tests |
 | Medium | Qwen3-14B | Q4_K_M.gguf | ~9 GB | Standard benchmarks |
 | Large | Qwen3-32B | Q4_K_M.gguf | ~20 GB | Memory pressure tests |
 Place models in `data/models/`. The VRAM estimator from the [toolboxes project](https://github.com/kyuz0/amd-strix-halo-toolboxes) (`gguf-vram-estimator.py`) can help plan which models fit.
--- a/docs/bios-vram-guide.md
+++ b/docs/bios-vram-guide.md
@@ -1,5 +1,7 @@
 # BIOS VRAM Configuration — HP ZBook Ultra G1a
 > Part of the [optimization workflow](optimization.md). For the full context on unified memory, see [architecture.md](architecture.md).
 ## Why Change VRAM?
 AMD Strix Halo uses **unified memory** — the CPU and GPU share the same physical RAM. By default, the HP ZBook allocates **32 GB as dedicated VRAM**, permanently locking that memory away from the OS even when the GPU isn't using it.
--- a/docs/optimization.md
+++ b/docs/optimization.md
@@ -0,0 +1,84 @@
 # Optimization Guide
 Complete walkthrough for optimizing AMD Strix Halo for LLM workloads.
 **Prerequisites**: Run `make audit` first to see your current state. Run `make benchmark-baseline` to capture pre-optimization performance numbers.
 ## Step 1: Tuned Profile (no reboot)
 ```bash
 sudo make optimize-tuned
 ```
 Switches from `throughput-performance` to `accelerator-performance`, which disables higher-latency CPU STOP states. Provides 5-8% improvement in prompt processing throughput.
 Takes effect immediately. Previous profile is saved for rollback.
 ## Step 2: Kernel Boot Parameters (reboot required)
 ```bash
 sudo make optimize-kernel
 ```
 Adds three parameters to GRUB:
 | Parameter | Value (64 GB) | Purpose |
 |-----------|--------------|---------|
 | `iommu=pt` | — | IOMMU passthrough, reduces memory access latency |
 | `amdgpu.gttsize` | `60416` | Max GPU-addressable system RAM in MiB |
 | `ttm.pages_limit` | `15466496` | Max pinnable 4K pages for GPU memory |
 Values are computed dynamically based on your system's total physical RAM. The script backs up `/etc/default/grub` before modifying it.
 See [docs/architecture.md](architecture.md) for the math behind these values.
 ## Step 3: BIOS VRAM Reduction (reboot + BIOS access)
 ```bash
 make optimize-vram
 ```
 This prints guidance — it cannot modify BIOS directly. The goal is to reduce dedicated VRAM from 32 GB to 0.5 GB, freeing 31.5 GB back to the OS for dynamic GPU access via GTT.
 See [docs/bios-vram-guide.md](bios-vram-guide.md) for the full BIOS walkthrough.
 **Combine Steps 2 and 3 into a single reboot**: apply kernel params, then reboot into BIOS (F10) to change VRAM, then boot normally.
 ## Step 4: Verify
 ```bash
 make verify
 ```
 Checks 9 criteria and reports a score. Target: 9/9.
 ## Step 5: Measure Impact
 ```bash
 make benchmark
 make benchmark-compare BEFORE=data/baselines/TIMESTAMP AFTER=data/benchmarks/TAG-TIMESTAMP
 ```
 See [docs/benchmarking.md](benchmarking.md) for methodology and result interpretation.
 ## Expected Impact
 | Optimization | pp512 Improvement | tg128 Improvement |
 |-------------|-------------------|-------------------|
 | Tuned profile | +5-8% | +2-3% |
 | Kernel params + BIOS VRAM | +10-20% | +5-15% |
 | **Combined** | **+15-25%** | **+8-18%** |
 Numbers vary by model size and backend. Larger models see bigger gains from GTT expansion.
 ## Rollback
 ```bash
 sudo make rollback
 ```
 Restores GRUB backup and previous tuned profile. BIOS VRAM must be reverted manually (F10 → restore previous UMA Frame Buffer Size).
 ## Troubleshooting
 If anything goes wrong, see [docs/troubleshooting.md](troubleshooting.md).
--- a/docs/references.md
+++ b/docs/references.md
@@ -0,0 +1,49 @@
 # External References
 Single source of truth for all external links used across this project.
 ## AMD Official
 - [ROCm Strix Halo Optimization Guide](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/strixhalo.html) — BIOS, kernel params, GTT/TTM configuration
 - [ROCm System Optimization Index](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/index.html) — General ROCm tuning
 - [ROCm Installation Guide (Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/) — Package installation
 - [AMD SMI Documentation](https://rocm.docs.amd.com/projects/amdsmi/en/latest/) — GPU monitoring API
 - [ROCm GitHub](https://github.com/ROCm/ROCm) — Source and issue tracker
 ## Strix Halo Toolboxes (Donato Capitella)
 The most comprehensive community resource for Strix Halo LLM optimization.
 - [strix-halo-toolboxes.com](https://strix-halo-toolboxes.com/) — Documentation, benchmarks, guides
 - [GitHub: kyuz0/amd-strix-halo-toolboxes](https://github.com/kyuz0/amd-strix-halo-toolboxes) — Container images, benchmark scripts, VRAM estimator
 - [Benchmark Results Viewer](https://kyuz0.github.io/amd-strix-halo-toolboxes/) — Interactive performance charts
 ## Community
 - [Strix Halo Wiki — AI Capabilities](https://strixhalo.wiki/AI/AI_Capabilities_Overview) — Community benchmarks, model compatibility
 - [Level1Techs Forum — HP G1a Guide](https://forum.level1techs.com/t/the-ultimate-arch-secureboot-guide-for-ryzen-ai-max-ft-hp-g1a-128gb-8060s-monster-laptop/230652) — Laptop-specific configuration
 - [Framework Community — GPU Performance Tests](https://community.frame.work/t/amd-strix-halo-ryzen-ai-max-395-gpu-llm-performance-tests/72521) — Framework Desktop results
 - [LLM Tracker — Strix Halo](https://llm-tracker.info/_TOORG/Strix-Halo) — Centralized performance database
 ## Other Strix Halo Repos
 - [pablo-ross/strix-halo-gmktec-evo-x2](https://github.com/pablo-ross/strix-halo-gmktec-evo-x2) — GMKtec EVO X2 optimization
 - [kyuz0/amd-strix-halo-llm-finetuning](https://github.com/kyuz0/amd-strix-halo-llm-finetuning) — Fine-tuning guides (Gemma-3, Qwen-3)
 ## Monitoring Tools
 - [amdgpu_top](https://github.com/Umio-Yasuno/amdgpu_top) — Best AMD GPU monitor (TUI/GUI/JSON)
 - [nvtop](https://github.com/Syllo/nvtop) — Cross-vendor GPU monitor
 - [btop](https://github.com/aristocratos/btop) — System resource monitor
 ## LLM Inference
 - [llama.cpp](https://github.com/ggml-org/llama.cpp) — LLM inference engine (Vulkan + ROCm)
 - [ollama](https://ollama.com/) — LLM runtime with model management
 - [vLLM](https://github.com/vllm-project/vllm) — High-throughput serving
 - [llama-benchy](https://github.com/eugr/llama-benchy) — Multi-backend LLM benchmarking
 ## AMD GPU Profiling
 - [Radeon GPU Profiler (RGP)](https://gpuopen.com/rgp/) — Hardware-level Vulkan/HIP profiling
 - [Radeon GPU Analyzer (RGA)](https://gpuopen.com/rga/) — Offline shader/kernel analysis
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -0,0 +1,96 @@
 # Troubleshooting
 ## Firmware: linux-firmware 20251125 Causes ROCm Crashes
 **Symptoms**: Arbitrary crashes, instability, or mysterious failures with ROCm workloads.
 **Check**: `rpm -qa | grep linux-firmware`
 **Fix**: Downgrade to 20251111 or upgrade to 20260110+. After changing firmware:
 ```bash
 sudo dracut -f --kver $(uname -r)
 ```
 The toolkit checks this automatically — `make audit` shows firmware status.
 ## amdgpu_top: Cargo Build Fails (gix-hash error)
 **Symptoms**: `error: Please set either the sha1 or sha256 feature flag` during `cargo install amdgpu_top`.
 **Cause**: Rust toolchain version incompatibility with the `gix-hash` dependency.
 **Fix**: Use the pre-built RPM instead:
 ```bash
 make monitor-install
 ```
 The install script downloads the RPM from GitHub releases, bypassing cargo entirely.
 ## Toolbox GPU Access Failure
 **Symptoms**: `llama-cli --list-devices` shows no GPU inside a toolbox container.
 **Check**: Device mappings when creating the toolbox:
 - Vulkan backends need: `--device /dev/dri`
 - ROCm backends need: `--device /dev/dri --device /dev/kfd`
 **Fix**: Recreate the toolbox with correct device flags. The [refresh-toolboxes.sh](https://github.com/kyuz0/amd-strix-halo-toolboxes) script handles this automatically.
 Also ensure your user is in the `video` and `render` groups:
 ```bash
 sudo usermod -aG video,render $USER
 ```
 ## GRUB Changes Not Taking Effect
 **Symptoms**: After `make optimize-kernel` and reboot, `make audit` still shows missing params.
 **Possible causes**:
 1. **BLS (Boot Loader Spec)**: Modern Fedora uses BLS entries. The script uses `grubby` when available, but verify:
   ```bash
   grubby --info=ALL | grep args
   ```
 2. **Wrong GRUB config path**: Check which config is actually used:
   ```bash
   cat /proc/cmdline    # what the kernel actually booted with
   cat /etc/default/grub  # what the script modified
   ```
 3. **GRUB not regenerated**: Manually regenerate:
   ```bash
   sudo grub2-mkconfig -o /boot/grub2/grub.cfg
   ```
 ## Memory Unchanged After BIOS Change
 **Symptoms**: Changed VRAM in BIOS but `make audit` still shows 32 GiB.
 **Check**:
 ```bash
 cat /sys/class/drm/card1/device/mem_info_vram_total
 ```
 **Possible causes**:
 - BIOS change not saved (verify by re-entering BIOS)
 - Wrong BIOS setting modified (look for "UMA Frame Buffer Size", not "Shared Memory")
 - Kernel params not applied (VRAM reduction requires kernel params to be useful)
 ## Benchmark Failures
 **Symptoms**: `make benchmark-baseline` reports "FAILED" for some backends.
 **Common fixes**:
 - Ensure model exists: `ls data/models/*.gguf`
 - Check model fits in memory: small models (4B) for initial testing
 - Try `llama-vulkan-radv` first (most stable backend)
 - Check dmesg for GPU errors: `dmesg | tail -30`
 ## Rollback
 If optimization causes issues:
 ```bash
 sudo make rollback
 ```
 This restores the GRUB backup and previous tuned profile. BIOS changes must be reverted manually (F10 at boot). See [docs/optimization.md](optimization.md) for the full rollback procedure.