# Troubleshooting ## Firmware: linux-firmware 20251125 Causes ROCm Crashes **Symptoms**: Arbitrary crashes, instability, or mysterious failures with ROCm workloads. **Check**: `rpm -qa | grep linux-firmware` **Fix**: Downgrade to 20251111 or upgrade to 20260110+. After changing firmware: ```bash sudo dracut -f --kver $(uname -r) ``` The toolkit checks this automatically — `make audit` shows firmware status. ## amdgpu_top: Cargo Build Fails (gix-hash error) **Symptoms**: `error: Please set either the sha1 or sha256 feature flag` during `cargo install amdgpu_top`. **Cause**: Rust toolchain version incompatibility with the `gix-hash` dependency. **Fix**: Use the pre-built RPM instead: ```bash make monitor-install ``` The install script downloads the RPM from GitHub releases, bypassing cargo entirely. ## Toolbox GPU Access Failure **Symptoms**: `llama-cli --list-devices` shows no GPU inside a toolbox container. **Check**: Device mappings when creating the toolbox: - Vulkan backends need: `--device /dev/dri` - ROCm backends need: `--device /dev/dri --device /dev/kfd` **Fix**: Recreate the toolbox with correct device flags. The [refresh-toolboxes.sh](https://github.com/kyuz0/amd-strix-halo-toolboxes) script handles this automatically. Also ensure your user is in the `video` and `render` groups: ```bash sudo usermod -aG video,render $USER ``` ## GRUB Changes Not Taking Effect **Symptoms**: After `make optimize-kernel` and reboot, `make audit` still shows missing params. **Possible causes**: 1. **BLS (Boot Loader Spec)**: Modern Fedora uses BLS entries. The script uses `grubby` when available, but verify: ```bash grubby --info=ALL | grep args ``` 2. **Wrong GRUB config path**: Check which config is actually used: ```bash cat /proc/cmdline # what the kernel actually booted with cat /etc/default/grub # what the script modified ``` 3. **GRUB not regenerated**: Manually regenerate: ```bash sudo grub2-mkconfig -o /boot/grub2/grub.cfg ``` ## Memory Unchanged After BIOS Change **Symptoms**: Changed VRAM in BIOS but `make audit` still shows 32 GiB. **Check**: ```bash cat /sys/class/drm/card1/device/mem_info_vram_total ``` **Possible causes**: - BIOS change not saved (verify by re-entering BIOS) - Wrong BIOS setting modified (look for "UMA Frame Buffer Size", not "Shared Memory") - Kernel params not applied (VRAM reduction requires kernel params to be useful) ## Benchmark Failures **Symptoms**: `make benchmark-baseline` reports "FAILED" for some backends. **Common fixes**: - Ensure model exists: `ls data/models/*.gguf` - Check model fits in memory: small models (4B) for initial testing - Try `llama-vulkan-radv` first (most stable backend) - Check dmesg for GPU errors: `dmesg | tail -30` ## Rollback If optimization causes issues: ```bash sudo make rollback ``` This restores the GRUB backup and previous tuned profile. BIOS changes must be reverted manually (F10 at boot). See [docs/optimization.md](optimization.md) for the full rollback procedure. ## Further Resources For external tool documentation, upstream bug trackers, and community resources, see [docs/references.md](references.md).