chore: update model catalog with gemma 4, opus distill, and hw-bandwidth target
This commit is contained in:
@@ -5,17 +5,23 @@
|
||||
# Download with: huggingface-cli download REPO FILE --local-dir /data/models/llms/REPO
|
||||
|
||||
# ── Smoke tests (quick, small) ───────────────────────────
|
||||
qwen3.5-0.8b-q8|unsloth/Qwen3.5-0.8B-GGUF|Qwen3.5-0.8B-Q8_0.gguf|0.8|smoke|Tiny, Q8 full precision
|
||||
qwen2.5-0.5b-q8|lmstudio-community/Qwen2.5-0.5B-Instruct-GGUF|Qwen2.5-0.5B-Instruct-Q8_0.gguf|0.4|smoke|Tiny Qwen2.5, Q8
|
||||
qwen3.5-0.8b-q8|unsloth/Qwen3.5-0.8B-GGUF|Qwen3.5-0.8B-Q8_0.gguf|0.8|smoke|Tiny Qwen3.5, Q8
|
||||
qwen3.5-2b-q4|unsloth/Qwen3.5-2B-GGUF|Qwen3.5-2B-Q4_K_S.gguf|1.2|smoke|Small dense 2B
|
||||
qwen3.5-4b-q4|unsloth/Qwen3.5-4B-GGUF|Qwen3.5-4B-Q4_K_S.gguf|2.5|smoke|Small dense 4B
|
||||
|
||||
# ── Standard dense models ────────────────────────────────
|
||||
qwen3.5-9b-q4|unsloth/Qwen3.5-9B-GGUF|Qwen3.5-9B-Q4_K_S.gguf|5.1|dense|Dense 9B
|
||||
gpt-oss-20b-mxfp4|lmstudio-community/gpt-oss-20b-GGUF|gpt-oss-20b-MXFP4.gguf|12|dense|GPT-OSS 20B MXFP4
|
||||
glm-4.7-flash-q6|lmstudio-community/GLM-4.7-Flash-GGUF|GLM-4.7-Flash-Q6_K.gguf|23|dense|GLM 4.7 Flash Q6
|
||||
glm-4.7-flash-q6|unsloth/GLM-4.7-Flash-GGUF|GLM-4.7-Flash-UD-Q6_K_XL.gguf|24|moe|GLM 4.7 Flash, UD Q6 (MoE 30B, 3B active)
|
||||
|
||||
# ── Qwen3.5-27B dense (download needed) ─────────────────
|
||||
# ── Gemma 4 ────────────────────────────────────────────
|
||||
gemma-4-26b-a4b-q6xl|unsloth/gemma-4-26B-A4B-it-GGUF|gemma-4-26B-A4B-it-UD-Q6_K_XL.gguf|22|moe|Gemma 4 MoE 26B, 4B active, UD Q6 XL
|
||||
gemma-4-26b-a4b-q4s|unsloth/gemma-4-26B-A4B-it-GGUF|gemma-4-26B-A4B-it-UD-Q4_K_S.gguf|15|moe|Gemma 4 MoE 26B, 4B active, UD Q4
|
||||
gemma-4-31b-q3xl|unsloth/gemma-4-31B-it-GGUF|gemma-4-31B-it-UD-Q3_K_XL.gguf|14|dense|Gemma 4 dense 31B, UD Q3 XL
|
||||
|
||||
# ── Qwen3.5-27B dense ──────────────────────────────────
|
||||
qwen3.5-27b-q4|unsloth/Qwen3.5-27B-GGUF|Qwen3.5-27B-Q4_K_M.gguf|17|dense|Dense 27B, quality-first
|
||||
qwen3.5-27b-opus-distill|Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF|Qwen3.5-27B.Q4_K_M.gguf|15|dense|Dense 27B, Claude Opus reasoning distilled v2
|
||||
|
||||
# ── MoE models (fast generation, best for 64GB) ─────────
|
||||
qwen3.5-35b-a3b-q4|unsloth/Qwen3.5-35B-A3B-GGUF|Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf|21|moe|MoE 35B, 3B active, Unsloth dynamic XL
|
||||
@@ -24,6 +30,7 @@ nemotron-cascade2-q8|bartowski/nvidia_Nemotron-Cascade-2-30B-A3B-GGUF|nvidia_Nem
|
||||
|
||||
# ── Coding models ─────────────────────────────────────────
|
||||
qwen3-coder-30b-a3b-q6|unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF|Qwen3-Coder-30B-A3B-Instruct-UD-Q6_K_XL.gguf|26|moe|Agentic coding MoE, pure Transformer
|
||||
qwen3-coder-next-q3|unsloth/Qwen3-Coder-Next-GGUF|Qwen3-Coder-Next-UD-Q3_K_XL.gguf|34|moe|80B MoE coder, >70% SWE-bench, hybrid DeltaNet
|
||||
|
||||
# ── Draft models (speculative decoding) ───────────────────
|
||||
qwen3.5-0.8b-q8-draft|unsloth/Qwen3.5-0.8B-GGUF|Qwen3.5-0.8B-Q8_0.gguf|0.8|draft|Draft for Qwen3.5 speculative decoding
|
||||
|
||||
Reference in New Issue
Block a user