feat: add Qwen3.5 model catalog and agentic evaluation framework
Models: - configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick), Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding) - Updated benchmark setup to show catalog with download status - docs/model-recommendations.md: memory planning, quantization guide Agentic evaluation: - scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench in a Python venv - scripts/agentic/run-eval.sh: runs evaluations against local LLM server (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code (EvalPlus+BigCodeBench), tooluse (BFCL), full (all) - bin/agentic: dispatcher with help - docs/agentic-benchmarks.md: methodology, framework comparison, model recommendations for agentic use Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
22
configs/models.conf
Normal file
22
configs/models.conf
Normal file
@@ -0,0 +1,22 @@
|
||||
# Model catalog for benchmarking
|
||||
# Format: NAME|HF_REPO|FILE|SIZE_GB|CATEGORY|DESCRIPTION
|
||||
#
|
||||
# Categories: smoke, standard, moe, dense, coding, agentic
|
||||
# Download with: huggingface-cli download REPO FILE --local-dir data/models
|
||||
|
||||
# ── Smoke tests (quick, small) ───────────────────────────
|
||||
qwen3-4b|unsloth/Qwen3-4B-GGUF|Qwen3-4B-Q4_K_M.gguf|3|smoke|Quick validation
|
||||
|
||||
# ── Standard benchmarks ──────────────────────────────────
|
||||
qwen3-14b|unsloth/Qwen3-14B-GGUF|Qwen3-14B-Q4_K_M.gguf|9|standard|Standard test model
|
||||
|
||||
# ── Qwen3.5 MoE models (fast generation, best for 64GB) ─
|
||||
qwen3.5-35b-a3b-q8|unsloth/Qwen3.5-35B-A3B-GGUF|Qwen3.5-35B-A3B-Q8_0.gguf|37|moe|Top pick: near-full precision, 3B active
|
||||
qwen3.5-35b-a3b-q4|unsloth/Qwen3.5-35B-A3B-GGUF|Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf|22|moe|Best quality/size ratio, 3B active
|
||||
|
||||
# ── Qwen3.5 dense models ────────────────────────────────
|
||||
qwen3.5-27b-q4|unsloth/Qwen3.5-27B-GGUF|Qwen3.5-27B-Q4_K_M.gguf|17|dense|Dense 27B, quality-first
|
||||
qwen3.5-27b-q8|unsloth/Qwen3.5-27B-GGUF|Qwen3.5-27B-Q8_0.gguf|29|dense|Dense 27B, max quality
|
||||
|
||||
# ── Coding / agentic models ─────────────────────────────
|
||||
qwen3-coder-30b-a3b|unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF|Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf|18|agentic|Best for tool use + coding, 3B active
|
||||
Reference in New Issue
Block a user