feat(serve): set APEX I-Compact as default, harden benchmark workflow
Serving: - make serve now launches Claude-distilled APEX 35B-A3B (16GB) with 2 parallel slots and 256K context as the daily driver - add serve-custom for ad-hoc model testing - add flush-gpu to reclaim unified memory after stuck runs Benchmarks: - default Vulkan-only backends (ROCm trails at long context) - add --backends filter to run-baseline.sh - fix backend filter substring bug (grep -qFx for exact line match) - fix model filter regex metacharacter bug (grep -qiF for literal) - respect --tg in long-context tests instead of hardcoded n=32 ROCm bump to 7.2.1 (kernel 6.18.4+ patch); keep 7.2 as optional. Catalog: - add mudler APEX I-Compact (Claude-distilled 35B, 17GB) - add 0xSero REAP-40 (pruned 122B-A10B, 46GB) - update download instructions: hf download (huggingface-cli is gone)
This commit is contained in:
@@ -41,7 +41,8 @@ The `-fa 1 -mmp 0 -ngl 99` flags are **mandatory** on Strix Halo to avoid crashe
|
||||
| `llama-vulkan-radv` | Mesa RADV | Vulkan | Most stable, recommended default |
|
||||
| `llama-vulkan-amdvlk` | AMDVLK | Vulkan | Fastest when it works, 2GB buffer limit |
|
||||
| `llama-rocm-6.4.4` | ROCm 6.4.4 | HIP | Proven stable |
|
||||
| `llama-rocm-7.2` | ROCm 7.2 | HIP | Latest, compiler fixes applied |
|
||||
| `llama-rocm-7.2.1` | ROCm 7.2.1 | HIP | Current stable (kernel 6.18.4+ patch) |
|
||||
| `llama-rocm-7.2` | ROCm 7.2 | HIP | Deprecated — use 7.2.1 |
|
||||
| `llama-rocm7-nightlies` | ROCm 7 nightly | HIP | Experimental/development builds |
|
||||
|
||||
Containers are from [kyuz0/amd-strix-halo-toolboxes](https://github.com/kyuz0/amd-strix-halo-toolboxes). Set up with `make benchmark-setup`.
|
||||
|
||||
Reference in New Issue
Block a user