# Architecture ## Script Layers ``` bin/ User entry points (thin dispatchers) audit → scripts/audit/ monitor → scripts/monitor/ benchmark → scripts/benchmark/ optimize → scripts/optimize/ scripts/ Implementation (sourcing lib/ for shared logic) audit/ System assessment monitor/ GPU/system monitoring + metrics logging benchmark/ llama-bench via toolbox containers optimize/ GRUB, tuned, BIOS guidance, verify, rollback lib/ Shared bash libraries (sourced, not executed) common.sh Logging, root checks, confirm prompts, paths detect.sh Hardware/config detection from sysfs + system tools format.sh Color output, human_bytes, status indicators, tables configs/ Reference configuration templates data/ Runtime output (gitignored) docs/ Documentation ``` Scripts source libs as needed: always `common.sh` first, then `detect.sh` for hardware detection, then `format.sh` for formatted output. Not all scripts need all three — `rollback.sh` only needs `common.sh`, for example. `format.sh` depends on color variables defined in `common.sh`. ## Data Flow ``` /sys/class/drm/card1/device/ ──→ lib/detect.sh ──→ scripts/audit/ /proc/cpuinfo, /proc/meminfo ──→ (detect_*) ──→ scripts/monitor/ /proc/cmdline ──→ ──→ scripts/optimize/ tuned-adm, rpm, lspci ──→ ──→ scripts/benchmark/ │ ▼ data/ ├── audits/*.json ├── logs/*.csv ├── baselines/*/summary.json └── benchmarks/*/summary.json ``` ## Unified Memory Model AMD Strix Halo shares physical RAM between CPU and GPU. Two allocation mechanisms: | Type | Description | Configuration | |------|-------------|---------------| | **VRAM (dedicated)** | Permanently reserved for GPU framebuffer | BIOS setting (UMA Frame Buffer Size) | | **GTT (dynamic)** | System RAM mapped into GPU address space on demand | Kernel boot params: `amdgpu.gttsize`, `ttm.pages_limit` | **Optimal for LLM workloads**: Minimize VRAM (0.5 GiB), maximize GTT (~60 GiB on 64 GB system). The GPU borrows memory when needed and releases it when idle. ### Kernel Parameter Math The formula (using integer GiB arithmetic): ``` total_physical_gib = (visible_ram + dedicated_vram) / 1024 / 1024 (integer division) gtt_gib = total_physical_gib - 4 (4 GiB OS reserve) amdgpu.gttsize = gtt_gib * 1024 (MiB) ttm.pages_limit = amdgpu.gttsize * 256 (4K pages) iommu = pt (passthrough) ``` Actual values depend on how the OS reports memory. On a 64 GB HP ZBook with 32 GB dedicated VRAM: ``` visible_ram ≈ 31.1 GiB, vram = 32 GiB → total ≈ 63 GiB (integer: 59 GiB GTT) amdgpu.gttsize = 60416, ttm.pages_limit = 15466496 ``` Run `make audit` to see the exact computed values for your system. The toolkit computes these dynamically via `recommended_gttsize_mib()` and `recommended_pages_limit()` in `lib/detect.sh`. ### Sysfs Paths Card number is auto-detected by `find_gpu_card()` in `lib/detect.sh` (matches AMD vendor ID `0x1002`, falls back to first card with `mem_info_vram_total`). Below uses `cardN` as placeholder: | Path | Content | |------|---------| | `/sys/class/drm/cardN/device/mem_info_vram_total` | Dedicated VRAM in bytes | | `/sys/class/drm/cardN/device/mem_info_vram_used` | VRAM currently in use | | `/sys/class/drm/cardN/device/mem_info_gtt_total` | GTT allocation in bytes | | `/sys/class/drm/cardN/device/mem_info_gtt_used` | GTT currently in use | | `/sys/class/drm/cardN/device/gpu_busy_percent` | GPU utilization 0-100 | | `/sys/class/drm/cardN/device/hwmon/hwmon*/temp1_input` | Temperature in millidegrees C | | `/sys/class/drm/cardN/device/hwmon/hwmon*/power1_average` | Power in microwatts | ## JSON Output Schemas ### system-state.json (from `audit --json`) ```json { "timestamp": "20260325-120000", "hardware": { "cpu_model": "...", "cpu_cores": 16, "cpu_threads": 32, "gpu_name": "...", "gpu_device_id": "1586", "system_ram_kb": 32609248 }, // OS-visible RAM (excludes dedicated VRAM) "memory": { "vram_total_bytes": 0, "vram_used_bytes": 0, "gtt_total_bytes": 0, "gtt_used_bytes": 0, "recommended_gttsize_mib": 0, "recommended_pages_limit": 0 }, "kernel": { "version": "...", "cmdline": "...", "param_iommu": "", "param_gttsize": "", "param_pages_limit": "" }, "firmware": "...", "tuned_profile": "...", "rocm_version": "...", "vulkan": { "driver": "...", "version": "..." }, "sensors": { "gpu_temp_mc": 0, "gpu_power_uw": 0, "gpu_busy_pct": 0 }, "toolboxes": [], "stacks": { "ollama": "...", "lmstudio": "...", "llamacpp": "...", "opencode": "..." } } ``` ### summary.json (from benchmark runs) ```json { "results": [ { "file": "model__backend__fa1.log", "model": "...", "size": "...", "backend": "Vulkan", "test": "pp512", "tokens_per_sec": 548.18, "raw": "548.18 +/- 1.59" } ] } ``` ### metrics.csv (from `monitor --log`) ``` timestamp,gpu_busy_pct,vram_used_mib,gtt_used_mib,gpu_temp_c,gpu_power_w,cpu_pct,ram_used_mib ``` Sampled every 2 seconds by default. Pure bash implementation (no subshell forks per sample).