# Architecture ## Script Layers ``` bin/ User entry points (thin dispatchers) audit → scripts/audit/ monitor → scripts/monitor/ benchmark → scripts/benchmark/ optimize → scripts/optimize/ scripts/ Implementation (sourcing lib/ for shared logic) audit/ System assessment monitor/ GPU/system monitoring + metrics logging benchmark/ llama-bench via toolbox containers optimize/ GRUB, tuned, BIOS guidance, verify, rollback lib/ Shared bash libraries (sourced, not executed) common.sh Logging, root checks, confirm prompts, paths detect.sh Hardware/config detection from sysfs + system tools format.sh Color output, human_bytes, status indicators, tables configs/ Reference configuration templates data/ Runtime output (gitignored) docs/ Documentation ``` Every script sources libs in order: `common.sh` → `detect.sh` → `format.sh`. `format.sh` depends on color variables defined in `common.sh`. ## Data Flow ``` /sys/class/drm/card1/device/ ──→ lib/detect.sh ──→ scripts/audit/ /proc/cpuinfo, /proc/meminfo ──→ (detect_*) ──→ scripts/monitor/ /proc/cmdline ──→ ──→ scripts/optimize/ tuned-adm, rpm, lspci ──→ ──→ scripts/benchmark/ │ ▼ data/ ├── audits/*.json ├── logs/*.csv ├── baselines/*/summary.json └── benchmarks/*/summary.json ``` ## Unified Memory Model AMD Strix Halo shares physical RAM between CPU and GPU. Two allocation mechanisms: | Type | Description | Configuration | |------|-------------|---------------| | **VRAM (dedicated)** | Permanently reserved for GPU framebuffer | BIOS setting (UMA Frame Buffer Size) | | **GTT (dynamic)** | System RAM mapped into GPU address space on demand | Kernel boot params: `amdgpu.gttsize`, `ttm.pages_limit` | **Optimal for LLM workloads**: Minimize VRAM (0.5 GiB), maximize GTT (~60 GiB on 64 GB system). The GPU borrows memory when needed and releases it when idle. ### Kernel Parameter Math (64 GB system) ``` Total physical RAM: 64 GiB OS reserve: 4 GiB Available for GTT: 60 GiB = 61440 MiB amdgpu.gttsize = 60 * 1024 = 61440 (MiB) ttm.pages_limit = 60 * 1024 * 256 = 15728640 (4K pages) iommu = pt (passthrough, lower latency) ``` The toolkit computes these dynamically via `recommended_gttsize_mib()` and `recommended_pages_limit()` in `lib/detect.sh`, based on detected total physical RAM (visible + VRAM). ### Sysfs Paths | Path | Content | |------|---------| | `/sys/class/drm/card1/device/mem_info_vram_total` | Dedicated VRAM in bytes | | `/sys/class/drm/card1/device/mem_info_vram_used` | VRAM currently in use | | `/sys/class/drm/card1/device/mem_info_gtt_total` | GTT allocation in bytes | | `/sys/class/drm/card1/device/mem_info_gtt_used` | GTT currently in use | | `/sys/class/drm/card1/device/gpu_busy_percent` | GPU utilization 0-100 | | `/sys/class/drm/card1/device/hwmon/hwmon*/temp1_input` | Temperature in millidegrees C | | `/sys/class/drm/card1/device/hwmon/hwmon*/power1_average` | Power in microwatts | Card number is auto-detected by `find_gpu_card()` (matches AMD vendor ID `0x1002`). ## JSON Output Schemas ### system-state.json (from `audit --json`) ```json { "timestamp": "20260325-120000", "hardware": { "cpu_model": "...", "cpu_cores": 16, "cpu_threads": 32, "gpu_name": "...", "gpu_device_id": "1586", "system_ram_kb": 32609248 }, "memory": { "vram_total_bytes": 0, "vram_used_bytes": 0, "gtt_total_bytes": 0, "gtt_used_bytes": 0, "recommended_gttsize_mib": 0, "recommended_pages_limit": 0 }, "kernel": { "version": "...", "cmdline": "...", "param_iommu": "", "param_gttsize": "", "param_pages_limit": "" }, "firmware": "...", "tuned_profile": "...", "rocm_version": "...", "vulkan": { "driver": "...", "version": "..." }, "sensors": { "gpu_temp_mc": 0, "gpu_power_uw": 0, "gpu_busy_pct": 0 }, "toolboxes": [], "stacks": { "ollama": "...", "lmstudio": "...", "llamacpp": "...", "opencode": "..." } } ``` ### summary.json (from benchmark runs) ```json { "results": [ { "file": "model__backend__fa1.log", "model": "...", "size": "...", "backend": "Vulkan", "test": "pp512", "tokens_per_sec": 548.18, "raw": "548.18 +/- 1.59" } ] } ``` ### metrics.csv (from `monitor --log`) ``` timestamp,gpu_busy_pct,vram_used_mib,gtt_used_mib,gpu_temp_c,gpu_power_w,cpu_pct,ram_used_mib ``` Sampled every 2 seconds by default. Pure bash implementation (no subshell forks per sample).