docs: add README, CLAUDE.md, AGENTS.md, and full docs/ suite

- README.md: project overview, quick start, command reference, workflow
- CLAUDE.md: AI safety rules, technical details, conventions
- AGENTS.md: agent workflows, file responsibility map, dependency matrix
- docs/architecture.md: script layers, data flow, unified memory, JSON schemas
- docs/optimization.md: step-by-step optimization walkthrough
- docs/benchmarking.md: methodology, test params, result interpretation
- docs/troubleshooting.md: common issues and fixes
- docs/references.md: centralized external links (single source of truth)
- docs/bios-vram-guide.md: add back-link to optimization workflow

Cross-linked non-redundantly: each doc owns one layer, others link to it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Felipe Cardoso
2026-03-25 20:50:00 +01:00
parent af0515d05d
commit 5b81437637
9 changed files with 667 additions and 0 deletions

84
docs/optimization.md Normal file
View File

@@ -0,0 +1,84 @@
# Optimization Guide
Complete walkthrough for optimizing AMD Strix Halo for LLM workloads.
**Prerequisites**: Run `make audit` first to see your current state. Run `make benchmark-baseline` to capture pre-optimization performance numbers.
## Step 1: Tuned Profile (no reboot)
```bash
sudo make optimize-tuned
```
Switches from `throughput-performance` to `accelerator-performance`, which disables higher-latency CPU STOP states. Provides 5-8% improvement in prompt processing throughput.
Takes effect immediately. Previous profile is saved for rollback.
## Step 2: Kernel Boot Parameters (reboot required)
```bash
sudo make optimize-kernel
```
Adds three parameters to GRUB:
| Parameter | Value (64 GB) | Purpose |
|-----------|--------------|---------|
| `iommu=pt` | — | IOMMU passthrough, reduces memory access latency |
| `amdgpu.gttsize` | `60416` | Max GPU-addressable system RAM in MiB |
| `ttm.pages_limit` | `15466496` | Max pinnable 4K pages for GPU memory |
Values are computed dynamically based on your system's total physical RAM. The script backs up `/etc/default/grub` before modifying it.
See [docs/architecture.md](architecture.md) for the math behind these values.
## Step 3: BIOS VRAM Reduction (reboot + BIOS access)
```bash
make optimize-vram
```
This prints guidance — it cannot modify BIOS directly. The goal is to reduce dedicated VRAM from 32 GB to 0.5 GB, freeing 31.5 GB back to the OS for dynamic GPU access via GTT.
See [docs/bios-vram-guide.md](bios-vram-guide.md) for the full BIOS walkthrough.
**Combine Steps 2 and 3 into a single reboot**: apply kernel params, then reboot into BIOS (F10) to change VRAM, then boot normally.
## Step 4: Verify
```bash
make verify
```
Checks 9 criteria and reports a score. Target: 9/9.
## Step 5: Measure Impact
```bash
make benchmark
make benchmark-compare BEFORE=data/baselines/TIMESTAMP AFTER=data/benchmarks/TAG-TIMESTAMP
```
See [docs/benchmarking.md](benchmarking.md) for methodology and result interpretation.
## Expected Impact
| Optimization | pp512 Improvement | tg128 Improvement |
|-------------|-------------------|-------------------|
| Tuned profile | +5-8% | +2-3% |
| Kernel params + BIOS VRAM | +10-20% | +5-15% |
| **Combined** | **+15-25%** | **+8-18%** |
Numbers vary by model size and backend. Larger models see bigger gains from GTT expansion.
## Rollback
```bash
sudo make rollback
```
Restores GRUB backup and previous tuned profile. BIOS VRAM must be reverted manually (F10 → restore previous UMA Frame Buffer Size).
## Troubleshooting
If anything goes wrong, see [docs/troubleshooting.md](troubleshooting.md).