strix-halo-optimizations

Author	SHA1	Message	Date
Felipe Cardoso	15bb6a8ed9	feat(serve): set APEX I-Compact as default, harden benchmark workflow Serving: - make serve now launches Claude-distilled APEX 35B-A3B (16GB) with 2 parallel slots and 256K context as the daily driver - add serve-custom for ad-hoc model testing - add flush-gpu to reclaim unified memory after stuck runs Benchmarks: - default Vulkan-only backends (ROCm trails at long context) - add --backends filter to run-baseline.sh - fix backend filter substring bug (grep -qFx for exact line match) - fix model filter regex metacharacter bug (grep -qiF for literal) - respect --tg in long-context tests instead of hardcoded n=32 ROCm bump to 7.2.1 (kernel 6.18.4+ patch); keep 7.2 as optional. Catalog: - add mudler APEX I-Compact (Claude-distilled 35B, 17GB) - add 0xSero REAP-40 (pruned 122B-A10B, 46GB) - update download instructions: hf download (huggingface-cli is gone)	2026-04-13 01:11:46 +02:00
Felipe Cardoso	fb1e57f1bf	feat: make llama-rocm-7.2 a required toolbox in benchmark setup ROCm 7.2 is now created alongside vulkan-radv during setup, giving Vulkan vs ROCm comparison in baseline and post-optimization benchmarks. Smoke test: ROCm 7.2 on Qwen3.5-0.8B → 8090 t/s pp512, 161 t/s tg128 (vs Vulkan: 8900 t/s pp512, 177 t/s tg128) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 19:23:03 +01:00
Felipe Cardoso	d22c062ca7	fix: model catalog shows download status, GPU detection in toolbox - Catalog * indicator now searches recursively (finds models in subdirs) - GPU verification suppresses toolbox crun stderr (directory not found noise) - Matches on "radeon" and "available devices" for Vulkan output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 19:14:31 +01:00
Felipe Cardoso	eb52ea52ce	fix: follow symlinks in model discovery, update model catalog - Add -L flag to find in benchmark scripts (follows symlinks to /data/models/llms/) - Exclude mmproj-*.gguf (vision projection files, not LLM models) - Update configs/models.conf: remove Qwen3-Coder (user prefers Qwen3.5-35B-A3B), add Qwen3.5-27B-Q4_K_M and Q8_0 variant, reflect actual downloaded models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 09:44:16 +01:00
Felipe Cardoso	58124cd657	feat: add Qwen3.5 model catalog and agentic evaluation framework Models: - configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick), Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding) - Updated benchmark setup to show catalog with download status - docs/model-recommendations.md: memory planning, quantization guide Agentic evaluation: - scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench in a Python venv - scripts/agentic/run-eval.sh: runs evaluations against local LLM server (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code (EvalPlus+BigCodeBench), tooluse (BFCL), full (all) - bin/agentic: dispatcher with help - docs/agentic-benchmarks.md: methodology, framework comparison, model recommendations for agentic use Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 00:20:23 +01:00
Felipe Cardoso	c596e38e9e	Initial commit	2026-03-25 20:13:15 +01:00

6 Commits