feat: add Qwen3.5 model catalog and agentic evaluation framework

Models: - configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick), Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding) - Updated benchmark setup to show catalog with download status - docs/model-recommendations.md: memory planning, quantization guide Agentic evaluation: - scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench in a Python venv - scripts/agentic/run-eval.sh: runs evaluations against local LLM server (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code (EvalPlus+BigCodeBench), tooluse (BFCL), full (all) - bin/agentic: dispatcher with help - docs/agentic-benchmarks.md: methodology, framework comparison, model recommendations for agentic use Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:20:23 +01:00
parent 71053997be
commit 58124cd657
11 changed files with 1354 additions and 16 deletions
--- a/README.md
+++ b/README.md
@@ -56,6 +56,12 @@ Each `[!!]` is an optimization opportunity. Run `make optimize` to address them.
 | `make optimize-vram` | BIOS VRAM guidance + GTT verification |
 | `make verify` | Post-optimization verification checklist |
 | `sudo make rollback` | Rollback optimizations |
+| `make agentic-setup` | Install agentic eval frameworks (inspect-ai, evalplus) |
+| `make agentic-quick ARGS="--model NAME"` | EvalPlus + IFEval (~1 hour) |
+| `make agentic-code ARGS="--model NAME"` | Code generation evals (~2-3 hours) |
+| `make agentic-tooluse ARGS="--model NAME"` | BFCL function calling eval (~1-2 hours) |
+| `make agentic-full ARGS="--model NAME"` | All agentic evaluations (~5-6 hours) |
+| `make test` | Run BATS test suite |

 ## Optimization Workflow

@@ -107,6 +113,8 @@ See [docs/architecture.md](docs/architecture.md) for the full architecture, data
 | [docs/benchmarking.md](docs/benchmarking.md) | Benchmark methodology, test params, result interpretation |
 | [docs/bios-vram-guide.md](docs/bios-vram-guide.md) | HP ZBook BIOS configuration for VRAM |
 | [docs/troubleshooting.md](docs/troubleshooting.md) | Common issues and fixes |
+| [docs/model-recommendations.md](docs/model-recommendations.md) | Qwen3.5 models, quantization, memory planning |
+| [docs/agentic-benchmarks.md](docs/agentic-benchmarks.md) | Agentic evaluation frameworks and methodology |
 | [docs/references.md](docs/references.md) | External links: AMD docs, toolboxes, community resources |

 ## Contributing