docs: add README, CLAUDE.md, AGENTS.md, and full docs/ suite
- README.md: project overview, quick start, command reference, workflow - CLAUDE.md: AI safety rules, technical details, conventions - AGENTS.md: agent workflows, file responsibility map, dependency matrix - docs/architecture.md: script layers, data flow, unified memory, JSON schemas - docs/optimization.md: step-by-step optimization walkthrough - docs/benchmarking.md: methodology, test params, result interpretation - docs/troubleshooting.md: common issues and fixes - docs/references.md: centralized external links (single source of truth) - docs/bios-vram-guide.md: add back-link to optimization workflow Cross-linked non-redundantly: each doc owns one layer, others link to it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
114
README.md
Normal file
114
README.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# Strix Halo Optimization Toolkit
|
||||
|
||||
Audit, monitor, benchmark, and optimize AMD Strix Halo integrated GPU systems for LLM inference workloads.
|
||||
|
||||
**Target hardware**: AMD Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151) with 64 GB unified memory, on Fedora 43. Tested on HP ZBook Ultra G1a.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
make audit # See current system status and optimization score
|
||||
make monitor-install # Install amdgpu_top + btop
|
||||
make benchmark-setup # Create toolbox containers + download test model
|
||||
make benchmark-baseline # Capture performance before optimization
|
||||
```
|
||||
|
||||
## System Status
|
||||
|
||||
`make audit` produces a single-screen overview:
|
||||
|
||||
```
|
||||
=== Memory Allocation ===
|
||||
[!!] VRAM (dedicated) 32.0 GiB — should be 0.5 GiB in BIOS
|
||||
[!!] GTT (dynamic) 15.5 GiB — should be ~59.0 GiB with kernel params
|
||||
|
||||
=== Kernel Boot Parameters ===
|
||||
[!!] iommu=pt MISSING
|
||||
[!!] amdgpu.gttsize MISSING — recommended: 60416
|
||||
[!!] ttm.pages_limit MISSING — recommended: 15466496
|
||||
|
||||
=== Performance Profile ===
|
||||
[!!] Tuned profile throughput-performance — recommended: accelerator-performance
|
||||
|
||||
=== Optimization Score ===
|
||||
2 / 8 checks passing
|
||||
```
|
||||
|
||||
Each `[!!]` is an optimization opportunity. Run `make optimize` to address them.
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `make audit` | Quick system status (single screen) |
|
||||
| `make audit-full` | Full system report (saved to data/audits/) |
|
||||
| `make monitor` | Launch tmux monitoring dashboard |
|
||||
| `make monitor-simple` | Launch amdgpu_top only |
|
||||
| `make monitor-install` | Install monitoring tools (amdgpu_top, btop) |
|
||||
| `make monitor-log` | Start background CSV metric logger |
|
||||
| `make benchmark-setup` | Ensure toolboxes and test models are ready |
|
||||
| `make benchmark-baseline` | Capture pre-optimization baseline |
|
||||
| `make benchmark` | Run full benchmark suite |
|
||||
| `make benchmark-compare` | Compare two runs (`BEFORE=dir AFTER=dir`) |
|
||||
| `sudo make optimize` | Interactive optimization walkthrough |
|
||||
| `sudo make optimize-kernel` | Configure kernel boot parameters |
|
||||
| `sudo make optimize-tuned` | Switch to accelerator-performance profile |
|
||||
| `make optimize-vram` | BIOS VRAM guidance + GTT verification |
|
||||
| `make verify` | Post-optimization verification checklist |
|
||||
| `sudo make rollback` | Rollback optimizations |
|
||||
|
||||
## Optimization Workflow
|
||||
|
||||
```
|
||||
1. Audit make audit
|
||||
│
|
||||
2. Monitor make monitor-install && make monitor
|
||||
│
|
||||
3. Baseline make benchmark-setup && make benchmark-baseline
|
||||
│
|
||||
4. Optimize sudo make optimize
|
||||
│ ├── tuned profile (instant, +5-8% pp)
|
||||
│ ├── kernel params (reboot required)
|
||||
│ └── BIOS VRAM (reboot + BIOS access)
|
||||
│
|
||||
5. Verify make verify
|
||||
│
|
||||
6. Re-benchmark make benchmark && make benchmark-compare BEFORE=... AFTER=...
|
||||
```
|
||||
|
||||
See [docs/optimization.md](docs/optimization.md) for the full walkthrough with explanations.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
bin/ Entry points (audit, monitor, benchmark, optimize)
|
||||
lib/ Shared bash libraries (common, detect, format)
|
||||
scripts/ Implementation organized by function
|
||||
configs/ Reference configuration templates
|
||||
data/ Runtime output: audits, benchmarks, logs, backups (gitignored)
|
||||
docs/ Technical documentation
|
||||
```
|
||||
|
||||
See [docs/architecture.md](docs/architecture.md) for the full architecture, data flow, and JSON schemas.
|
||||
|
||||
## Requirements
|
||||
|
||||
- **OS**: Fedora 43 (tested), Fedora 42+ should work
|
||||
- **Hardware**: AMD Strix Halo (Ryzen AI MAX / MAX+) with RDNA 3.5 iGPU
|
||||
- **Tools**: `bc`, `python3`, `tmux`, `podman`, `toolbox`
|
||||
- **Optional**: `amdgpu_top` (installed via `make monitor-install`), `huggingface-cli` (for model downloads)
|
||||
|
||||
## Documentation
|
||||
|
||||
| Document | Contents |
|
||||
|----------|----------|
|
||||
| [docs/architecture.md](docs/architecture.md) | Script layers, data flow, unified memory model, JSON schemas |
|
||||
| [docs/optimization.md](docs/optimization.md) | Step-by-step optimization walkthrough |
|
||||
| [docs/benchmarking.md](docs/benchmarking.md) | Benchmark methodology, test params, result interpretation |
|
||||
| [docs/bios-vram-guide.md](docs/bios-vram-guide.md) | HP ZBook BIOS configuration for VRAM |
|
||||
| [docs/troubleshooting.md](docs/troubleshooting.md) | Common issues and fixes |
|
||||
| [docs/references.md](docs/references.md) | External links: AMD docs, toolboxes, community resources |
|
||||
|
||||
## Contributing
|
||||
|
||||
AI assistants: see [CLAUDE.md](CLAUDE.md) for safety rules and technical context. Agent workflows are in [AGENTS.md](AGENTS.md).
|
||||
Reference in New Issue
Block a user