- Catalog * indicator now searches recursively (finds models in subdirs)
- GPU verification suppresses toolbox crun stderr (directory not found noise)
- Matches on "radeon" and "available devices" for Vulkan output
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Models:
- configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick),
Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding)
- Updated benchmark setup to show catalog with download status
- docs/model-recommendations.md: memory planning, quantization guide
Agentic evaluation:
- scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench
in a Python venv
- scripts/agentic/run-eval.sh: runs evaluations against local LLM server
(ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code
(EvalPlus+BigCodeBench), tooluse (BFCL), full (all)
- bin/agentic: dispatcher with help
- docs/agentic-benchmarks.md: methodology, framework comparison, model
recommendations for agentic use
Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>