feat: add Qwen3.5 model catalog and agentic evaluation framework
Models: - configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick), Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding) - Updated benchmark setup to show catalog with download status - docs/model-recommendations.md: memory planning, quantization guide Agentic evaluation: - scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench in a Python venv - scripts/agentic/run-eval.sh: runs evaluations against local LLM server (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code (EvalPlus+BigCodeBench), tooluse (BFCL), full (all) - bin/agentic: dispatcher with help - docs/agentic-benchmarks.md: methodology, framework comparison, model recommendations for agentic use Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
18
Makefile
18
Makefile
@@ -1,4 +1,4 @@
|
||||
.PHONY: help audit audit-full monitor monitor-simple benchmark benchmark-baseline benchmark-compare optimize verify test
|
||||
.PHONY: help audit audit-full monitor monitor-simple benchmark benchmark-baseline benchmark-compare optimize verify test agentic-setup agentic-quick agentic-full
|
||||
|
||||
help: ## Show available commands
|
||||
@echo "Strix Halo Optimization Toolkit"
|
||||
@@ -57,6 +57,22 @@ verify: ## Post-optimization verification checklist
|
||||
rollback: ## Rollback optimizations
|
||||
@bash scripts/optimize/rollback.sh
|
||||
|
||||
# --- Agentic Evaluation ---
|
||||
agentic-setup: ## Install agentic evaluation frameworks (inspect-ai, evalplus)
|
||||
@bash bin/agentic setup
|
||||
|
||||
agentic-quick: ## EvalPlus + IFEval quick eval (needs --model, ~1h)
|
||||
@bash bin/agentic quick $(ARGS)
|
||||
|
||||
agentic-code: ## Code generation eval: EvalPlus + BigCodeBench (~2-3h)
|
||||
@bash bin/agentic code $(ARGS)
|
||||
|
||||
agentic-tooluse: ## Tool/function calling eval: BFCL (~1-2h)
|
||||
@bash bin/agentic tooluse $(ARGS)
|
||||
|
||||
agentic-full: ## All agentic evaluations (~5-6h)
|
||||
@bash bin/agentic full $(ARGS)
|
||||
|
||||
# --- Tests ---
|
||||
test: ## Run BATS test suite
|
||||
@bats tests/
|
||||
|
||||
Reference in New Issue
Block a user