- README.md: project overview, quick start, command reference, workflow - CLAUDE.md: AI safety rules, technical details, conventions - AGENTS.md: agent workflows, file responsibility map, dependency matrix - docs/architecture.md: script layers, data flow, unified memory, JSON schemas - docs/optimization.md: step-by-step optimization walkthrough - docs/benchmarking.md: methodology, test params, result interpretation - docs/troubleshooting.md: common issues and fixes - docs/references.md: centralized external links (single source of truth) - docs/bios-vram-guide.md: add back-link to optimization workflow Cross-linked non-redundantly: each doc owns one layer, others link to it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
85 lines
2.6 KiB
Markdown
85 lines
2.6 KiB
Markdown
# Optimization Guide
|
|
|
|
Complete walkthrough for optimizing AMD Strix Halo for LLM workloads.
|
|
|
|
**Prerequisites**: Run `make audit` first to see your current state. Run `make benchmark-baseline` to capture pre-optimization performance numbers.
|
|
|
|
## Step 1: Tuned Profile (no reboot)
|
|
|
|
```bash
|
|
sudo make optimize-tuned
|
|
```
|
|
|
|
Switches from `throughput-performance` to `accelerator-performance`, which disables higher-latency CPU STOP states. Provides 5-8% improvement in prompt processing throughput.
|
|
|
|
Takes effect immediately. Previous profile is saved for rollback.
|
|
|
|
## Step 2: Kernel Boot Parameters (reboot required)
|
|
|
|
```bash
|
|
sudo make optimize-kernel
|
|
```
|
|
|
|
Adds three parameters to GRUB:
|
|
|
|
| Parameter | Value (64 GB) | Purpose |
|
|
|-----------|--------------|---------|
|
|
| `iommu=pt` | — | IOMMU passthrough, reduces memory access latency |
|
|
| `amdgpu.gttsize` | `60416` | Max GPU-addressable system RAM in MiB |
|
|
| `ttm.pages_limit` | `15466496` | Max pinnable 4K pages for GPU memory |
|
|
|
|
Values are computed dynamically based on your system's total physical RAM. The script backs up `/etc/default/grub` before modifying it.
|
|
|
|
See [docs/architecture.md](architecture.md) for the math behind these values.
|
|
|
|
## Step 3: BIOS VRAM Reduction (reboot + BIOS access)
|
|
|
|
```bash
|
|
make optimize-vram
|
|
```
|
|
|
|
This prints guidance — it cannot modify BIOS directly. The goal is to reduce dedicated VRAM from 32 GB to 0.5 GB, freeing 31.5 GB back to the OS for dynamic GPU access via GTT.
|
|
|
|
See [docs/bios-vram-guide.md](bios-vram-guide.md) for the full BIOS walkthrough.
|
|
|
|
**Combine Steps 2 and 3 into a single reboot**: apply kernel params, then reboot into BIOS (F10) to change VRAM, then boot normally.
|
|
|
|
## Step 4: Verify
|
|
|
|
```bash
|
|
make verify
|
|
```
|
|
|
|
Checks 9 criteria and reports a score. Target: 9/9.
|
|
|
|
## Step 5: Measure Impact
|
|
|
|
```bash
|
|
make benchmark
|
|
make benchmark-compare BEFORE=data/baselines/TIMESTAMP AFTER=data/benchmarks/TAG-TIMESTAMP
|
|
```
|
|
|
|
See [docs/benchmarking.md](benchmarking.md) for methodology and result interpretation.
|
|
|
|
## Expected Impact
|
|
|
|
| Optimization | pp512 Improvement | tg128 Improvement |
|
|
|-------------|-------------------|-------------------|
|
|
| Tuned profile | +5-8% | +2-3% |
|
|
| Kernel params + BIOS VRAM | +10-20% | +5-15% |
|
|
| **Combined** | **+15-25%** | **+8-18%** |
|
|
|
|
Numbers vary by model size and backend. Larger models see bigger gains from GTT expansion.
|
|
|
|
## Rollback
|
|
|
|
```bash
|
|
sudo make rollback
|
|
```
|
|
|
|
Restores GRUB backup and previous tuned profile. BIOS VRAM must be reverted manually (F10 → restore previous UMA Frame Buffer Size).
|
|
|
|
## Troubleshooting
|
|
|
|
If anything goes wrong, see [docs/troubleshooting.md](troubleshooting.md).
|