strix-halo-optimizations/README.md

# Strix Halo Optimization Toolkit

Audit, monitor, benchmark, and optimize AMD Strix Halo integrated GPU systems for LLM inference workloads.

**Target hardware**: AMD Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151) with 64 GB unified memory, on Fedora 43. Tested on HP ZBook Ultra G1a.

## Quick Start

```bash
make audit                # See current system status and optimization score
make monitor-install      # Install amdgpu_top + btop
make benchmark-setup      # Create toolbox containers + download test model
make benchmark-baseline   # Capture performance before optimization
```

## System Status

`make audit` produces a single-screen overview:

```
=== Memory Allocation ===
  [!!] VRAM (dedicated)               32.0 GiB — should be 0.5 GiB in BIOS
  [!!] GTT (dynamic)                  15.5 GiB — should be ~59.0 GiB with kernel params

=== Kernel Boot Parameters ===
  [!!] iommu=pt                       MISSING
  [!!] amdgpu.gttsize                 MISSING — recommended: 60416
  [!!] ttm.pages_limit                MISSING — recommended: 15466496

=== Performance Profile ===
  [!!] Tuned profile                  throughput-performance — recommended: accelerator-performance

=== Optimization Score ===
  2 / 8 checks passing
```

Each `[!!]` is an optimization opportunity. Run `make optimize` to address them.

## Commands

| Command | Description |
|---------|-------------|
| `make audit` | Quick system status (single screen) |
| `make audit-full` | Full system report (saved to data/audits/) |
| `make monitor` | Launch tmux monitoring dashboard |
| `make monitor-simple` | Launch amdgpu_top only |
| `make monitor-install` | Install monitoring tools (amdgpu_top, btop) |
| `make monitor-log` | Start background CSV metric logger |
| `make benchmark-setup` | Ensure toolboxes and test models are ready |
| `make benchmark-baseline` | Capture pre-optimization baseline |
| `make benchmark` | Run full benchmark suite |
| `make benchmark-compare` | Compare two runs (`BEFORE=dir AFTER=dir`) |
| `sudo make optimize` | Interactive optimization walkthrough |
| `sudo make optimize-kernel` | Configure kernel boot parameters |
| `sudo make optimize-tuned` | Switch to accelerator-performance profile |
| `make optimize-vram` | BIOS VRAM guidance + GTT verification |
| `make verify` | Post-optimization verification checklist |
| `sudo make rollback` | Rollback optimizations |
| `make agentic-setup` | Install agentic eval frameworks (inspect-ai, evalplus) |
| `make agentic-quick ARGS="--model NAME"` | EvalPlus + IFEval (~1 hour) |
| `make agentic-code ARGS="--model NAME"` | Code generation evals (~2-3 hours) |
| `make agentic-tooluse ARGS="--model NAME"` | BFCL function calling eval (~1-2 hours) |
| `make agentic-full ARGS="--model NAME"` | All agentic evaluations (~5-6 hours) |
| `make test` | Run BATS test suite |

## Optimization Workflow

```
1. Audit          make audit
      │
2. Monitor        make monitor-install && make monitor
      │
3. Baseline       make benchmark-setup && make benchmark-baseline
      │
4. Optimize       sudo make optimize
      │               ├── tuned profile  (instant, +5-8% pp)
      │               ├── kernel params  (reboot required)
      │               └── BIOS VRAM      (reboot + BIOS access)
      │
5. Verify         make verify
      │
6. Re-benchmark   make benchmark && make benchmark-compare BEFORE=... AFTER=...
```

See [docs/optimization.md](docs/optimization.md) for the full walkthrough with explanations.

## Project Structure

```
bin/            Entry points (audit, monitor, benchmark, optimize)
lib/            Shared bash libraries (common, detect, format)
scripts/        Implementation organized by function
configs/        Reference configuration (grub-cmdline.conf with recommended kernel params)
data/           Runtime output: audits, benchmarks, logs, backups (gitignored)
docs/           Technical documentation
```

See [docs/architecture.md](docs/architecture.md) for the full architecture, data flow, and JSON schemas.

## Requirements

- **OS**: Fedora 43 (tested). Requires kernel >= 6.18.4
- **Hardware**: AMD Strix Halo (Ryzen AI MAX / MAX+) with RDNA 3.5 iGPU
- **Tools**: `bc`, `python3`, `tmux`, `podman`, `toolbox`
- **Optional**: `amdgpu_top` (installed via `make monitor-install`), `huggingface-cli` (for model downloads)

## Documentation

| Document | Contents |
|----------|----------|
| [docs/architecture.md](docs/architecture.md) | Script layers, data flow, unified memory model, JSON schemas |
| [docs/optimization.md](docs/optimization.md) | Step-by-step optimization walkthrough |
| [docs/benchmarking.md](docs/benchmarking.md) | Benchmark methodology, test params, result interpretation |
| [docs/bios-vram-guide.md](docs/bios-vram-guide.md) | HP ZBook BIOS configuration for VRAM |
| [docs/troubleshooting.md](docs/troubleshooting.md) | Common issues and fixes |
| [docs/model-recommendations.md](docs/model-recommendations.md) | Qwen3.5 models, quantization, memory planning |
| [docs/agentic-benchmarks.md](docs/agentic-benchmarks.md) | Agentic evaluation frameworks and methodology |
| [docs/references.md](docs/references.md) | External links: AMD docs, toolboxes, community resources |

## Contributing

AI assistants: see [CLAUDE.md](CLAUDE.md) for safety rules and technical context. Agent workflows are in [AGENTS.md](AGENTS.md).