docs: add README, CLAUDE.md, AGENTS.md, and full docs/ suite

- README.md: project overview, quick start, command reference, workflow - CLAUDE.md: AI safety rules, technical details, conventions - AGENTS.md: agent workflows, file responsibility map, dependency matrix - docs/architecture.md: script layers, data flow, unified memory, JSON schemas - docs/optimization.md: step-by-step optimization walkthrough - docs/benchmarking.md: methodology, test params, result interpretation - docs/troubleshooting.md: common issues and fixes - docs/references.md: centralized external links (single source of truth) - docs/bios-vram-guide.md: add back-link to optimization workflow Cross-linked non-redundantly: each doc owns one layer, others link to it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 20:50:00 +01:00
parent af0515d05d
commit 5b81437637
9 changed files with 667 additions and 0 deletions
--- a/docs/optimization.md
+++ b/docs/optimization.md
@@ -0,0 +1,84 @@
+# Optimization Guide
+
+Complete walkthrough for optimizing AMD Strix Halo for LLM workloads.
+
+**Prerequisites**: Run `make audit` first to see your current state. Run `make benchmark-baseline` to capture pre-optimization performance numbers.
+
+## Step 1: Tuned Profile (no reboot)
+
+```bash
+sudo make optimize-tuned
+```
+
+Switches from `throughput-performance` to `accelerator-performance`, which disables higher-latency CPU STOP states. Provides 5-8% improvement in prompt processing throughput.
+
+Takes effect immediately. Previous profile is saved for rollback.
+
+## Step 2: Kernel Boot Parameters (reboot required)
+
+```bash
+sudo make optimize-kernel
+```
+
+Adds three parameters to GRUB:
+
+| Parameter | Value (64 GB) | Purpose |
+|-----------|--------------|---------|
+| `iommu=pt` | — | IOMMU passthrough, reduces memory access latency |
+| `amdgpu.gttsize` | `60416` | Max GPU-addressable system RAM in MiB |
+| `ttm.pages_limit` | `15466496` | Max pinnable 4K pages for GPU memory |
+
+Values are computed dynamically based on your system's total physical RAM. The script backs up `/etc/default/grub` before modifying it.
+
+See [docs/architecture.md](architecture.md) for the math behind these values.
+
+## Step 3: BIOS VRAM Reduction (reboot + BIOS access)
+
+```bash
+make optimize-vram
+```
+
+This prints guidance — it cannot modify BIOS directly. The goal is to reduce dedicated VRAM from 32 GB to 0.5 GB, freeing 31.5 GB back to the OS for dynamic GPU access via GTT.
+
+See [docs/bios-vram-guide.md](bios-vram-guide.md) for the full BIOS walkthrough.
+
+**Combine Steps 2 and 3 into a single reboot**: apply kernel params, then reboot into BIOS (F10) to change VRAM, then boot normally.
+
+## Step 4: Verify
+
+```bash
+make verify
+```
+
+Checks 9 criteria and reports a score. Target: 9/9.
+
+## Step 5: Measure Impact
+
+```bash
+make benchmark
+make benchmark-compare BEFORE=data/baselines/TIMESTAMP AFTER=data/benchmarks/TAG-TIMESTAMP
+```
+
+See [docs/benchmarking.md](benchmarking.md) for methodology and result interpretation.
+
+## Expected Impact
+
+| Optimization | pp512 Improvement | tg128 Improvement |
+|-------------|-------------------|-------------------|
+| Tuned profile | +5-8% | +2-3% |
+| Kernel params + BIOS VRAM | +10-20% | +5-15% |
+| **Combined** | **+15-25%** | **+8-18%** |
+
+Numbers vary by model size and backend. Larger models see bigger gains from GTT expansion.
+
+## Rollback
+
+```bash
+sudo make rollback
+```
+
+Restores GRUB backup and previous tuned profile. BIOS VRAM must be reverted manually (F10 → restore previous UMA Frame Buffer Size).
+
+## Troubleshooting
+
+If anything goes wrong, see [docs/troubleshooting.md](troubleshooting.md).