feat(serve): set APEX I-Compact as default, harden benchmark workflow
Serving: - make serve now launches Claude-distilled APEX 35B-A3B (16GB) with 2 parallel slots and 256K context as the daily driver - add serve-custom for ad-hoc model testing - add flush-gpu to reclaim unified memory after stuck runs Benchmarks: - default Vulkan-only backends (ROCm trails at long context) - add --backends filter to run-baseline.sh - fix backend filter substring bug (grep -qFx for exact line match) - fix model filter regex metacharacter bug (grep -qiF for literal) - respect --tg in long-context tests instead of hardcoded n=32 ROCm bump to 7.2.1 (kernel 6.18.4+ patch); keep 7.2 as optional. Catalog: - add mudler APEX I-Compact (Claude-distilled 35B, 17GB) - add 0xSero REAP-40 (pruned 122B-A10B, 46GB) - update download instructions: hf download (huggingface-cli is gone)
This commit is contained in:
13
Makefile
13
Makefile
@@ -39,12 +39,23 @@ benchmark-compare: ## Compare two benchmark runs (usage: make benchmark-compare
|
||||
@bash bin/benchmark compare $(BEFORE) $(AFTER)
|
||||
|
||||
# --- Serve ---
|
||||
serve: ## Launch llama-server with optimized settings (ARGS="-m MODEL.gguf")
|
||||
serve: ## Launch APEX I-Compact daily driver (2 slots, 256K ctx)
|
||||
@bash bin/serve -m Qwen3.5-35B-A3B-Claude-Distilled-APEX-I-Compact.gguf --parallel 2 --ctx 262144 $(ARGS)
|
||||
|
||||
serve-custom: ## Launch llama-server with custom model (ARGS="-m MODEL.gguf")
|
||||
@bash bin/serve $(ARGS)
|
||||
|
||||
serve-ngram: ## Launch with n-gram speculative decoding (ARGS="-m MODEL.gguf")
|
||||
@bash bin/serve --ngram $(ARGS)
|
||||
|
||||
flush-gpu: ## Kill llama-server/bench processes and drop kernel caches to free unified VRAM
|
||||
-@pkill -x llama-server 2>/dev/null || true
|
||||
-@pkill -x llama-bench 2>/dev/null || true
|
||||
-@pkill -x llama-cli 2>/dev/null || true
|
||||
-@podman ps --filter name=llama --format '{{.Names}}' | xargs -r podman stop
|
||||
@sync && sudo sysctl vm.drop_caches=3
|
||||
@echo "VRAM usage:" && cat /sys/class/drm/card*/device/mem_info_vram_used 2>/dev/null | awk '{printf " %.2f MiB\n", $$1/1048576}'
|
||||
|
||||
# --- Hardware Info ---
|
||||
hw-bandwidth: ## Measure GPU memory bandwidth and compute (clpeak)
|
||||
@clpeak 2>&1
|
||||
|
||||
Reference in New Issue
Block a user