feat: add Qwen3.5 model catalog and agentic evaluation framework
Models: - configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick), Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding) - Updated benchmark setup to show catalog with download status - docs/model-recommendations.md: memory planning, quantization guide Agentic evaluation: - scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench in a Python venv - scripts/agentic/run-eval.sh: runs evaluations against local LLM server (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code (EvalPlus+BigCodeBench), tooluse (BFCL), full (all) - bin/agentic: dispatcher with help - docs/agentic-benchmarks.md: methodology, framework comparison, model recommendations for agentic use Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -43,6 +43,24 @@ The most comprehensive community resource for Strix Halo LLM optimization.
|
||||
- [vLLM](https://github.com/vllm-project/vllm) — High-throughput serving
|
||||
- [llama-benchy](https://github.com/eugr/llama-benchy) — Multi-backend LLM benchmarking
|
||||
|
||||
## Qwen3.5 Models (GGUF)
|
||||
|
||||
- [unsloth/Qwen3.5-35B-A3B-GGUF](https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF) — Top pick for 64GB Strix Halo (MoE, 3B active)
|
||||
- [unsloth/Qwen3.5-27B-GGUF](https://huggingface.co/unsloth/Qwen3.5-27B-GGUF) — Dense 27B
|
||||
- [unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF](https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF) — Best for agentic/coding
|
||||
- [Qwen3.5 Official](https://github.com/QwenLM/Qwen3.5) — Model family overview
|
||||
- [Unsloth Dynamic 2.0](https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs) — Adaptive quantization methodology
|
||||
- [Unsloth Studio](https://unsloth.ai/docs/new/studio) — Training + inference UI (beta)
|
||||
|
||||
## Agentic Evaluation
|
||||
|
||||
- [Inspect AI](https://github.com/UKGovernmentBEIS/inspect_ai) — All-in-one eval framework (HumanEval, BFCL, IFEval, GAIA)
|
||||
- [EvalPlus](https://github.com/evalplus/evalplus) — HumanEval+ / MBPP+ with native ollama support
|
||||
- [BigCodeBench](https://github.com/bigcode-project/bigcodebench) — 1,140 coding tasks across 139 libraries
|
||||
- [BFCL](https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard) — Berkeley Function Calling Leaderboard
|
||||
- [SWE-bench](https://github.com/princeton-nlp/SWE-bench) — Real GitHub issue resolution
|
||||
- [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) — Optimized agentic framework for Qwen models
|
||||
|
||||
## AMD GPU Profiling
|
||||
|
||||
- [Radeon GPU Profiler (RGP)](https://gpuopen.com/rgp/) — Hardware-level Vulkan/HIP profiling
|
||||
|
||||
Reference in New Issue
Block a user