Add `make optimize-power` (ryzenadj 85W, sysctl, THP, RADV nogttspill) with systemd services for boot/resume persistence. Integrate into `make optimize --all` as Phase 2. Update optimization log with RyzenAdj results (+46% tg at 70W sustained), KV sweep data, and quant shootout. Add Qwen3-Coder-30B and Nemotron-Cascade-2 to model catalog.
31 lines
2.3 KiB
Plaintext
31 lines
2.3 KiB
Plaintext
# Model catalog for benchmarking
|
|
# Format: NAME|HF_REPO|FILE|SIZE_GB|CATEGORY|DESCRIPTION
|
|
#
|
|
# Categories: smoke, standard, moe, dense
|
|
# Download with: huggingface-cli download REPO FILE --local-dir /data/models/llms/REPO
|
|
|
|
# ── Smoke tests (quick, small) ───────────────────────────
|
|
qwen3.5-0.8b-q8|unsloth/Qwen3.5-0.8B-GGUF|Qwen3.5-0.8B-Q8_0.gguf|0.8|smoke|Tiny, Q8 full precision
|
|
qwen3.5-2b-q4|unsloth/Qwen3.5-2B-GGUF|Qwen3.5-2B-Q4_K_S.gguf|1.2|smoke|Small dense 2B
|
|
qwen3.5-4b-q4|unsloth/Qwen3.5-4B-GGUF|Qwen3.5-4B-Q4_K_S.gguf|2.5|smoke|Small dense 4B
|
|
|
|
# ── Standard dense models ────────────────────────────────
|
|
qwen3.5-9b-q4|unsloth/Qwen3.5-9B-GGUF|Qwen3.5-9B-Q4_K_S.gguf|5.1|dense|Dense 9B
|
|
gpt-oss-20b-mxfp4|lmstudio-community/gpt-oss-20b-GGUF|gpt-oss-20b-MXFP4.gguf|12|dense|GPT-OSS 20B MXFP4
|
|
glm-4.7-flash-q6|lmstudio-community/GLM-4.7-Flash-GGUF|GLM-4.7-Flash-Q6_K.gguf|23|dense|GLM 4.7 Flash Q6
|
|
|
|
# ── Qwen3.5-27B dense (download needed) ─────────────────
|
|
qwen3.5-27b-q4|unsloth/Qwen3.5-27B-GGUF|Qwen3.5-27B-Q4_K_M.gguf|17|dense|Dense 27B, quality-first
|
|
|
|
# ── MoE models (fast generation, best for 64GB) ─────────
|
|
qwen3.5-35b-a3b-q4|unsloth/Qwen3.5-35B-A3B-GGUF|Qwen3.5-35B-A3B-UD-Q4_K_L.gguf|19|moe|MoE 35B, 3B active, Unsloth dynamic
|
|
qwen3.5-35b-a3b-q8|unsloth/Qwen3.5-35B-A3B-GGUF|Qwen3.5-35B-A3B-Q8_0.gguf|37|moe|MoE 35B Q8, near-full precision
|
|
nemotron-30b-a3b-q4|lmstudio-community/NVIDIA-Nemotron-3-Nano-30B-A3B-GGUF|NVIDIA-Nemotron-3-Nano-30B-A3B-Q4_K_M.gguf|23|moe|Nemotron MoE 30B, 3B active
|
|
nemotron-cascade2-q8|bartowski/nvidia_Nemotron-Cascade-2-30B-A3B-GGUF|nvidia_Nemotron-Cascade-2-30B-A3B-Q8_0.gguf|31|moe|Nemotron Cascade 2, Mamba-2 hybrid
|
|
|
|
# ── Coding models ─────────────────────────────────────────
|
|
qwen3-coder-30b-a3b-q6|unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF|Qwen3-Coder-30B-A3B-Instruct-UD-Q6_K_XL.gguf|26|moe|Agentic coding MoE, pure Transformer
|
|
|
|
# ── Draft models (speculative decoding) ───────────────────
|
|
qwen3.5-0.8b-q8-draft|unsloth/Qwen3.5-0.8B-GGUF|Qwen3.5-0.8B-Q8_0.gguf|0.8|draft|Draft for Qwen3.5 speculative decoding
|