feat: add Qwen3.5 model catalog and agentic evaluation framework

Models:
- configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick),
  Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding)
- Updated benchmark setup to show catalog with download status
- docs/model-recommendations.md: memory planning, quantization guide

Agentic evaluation:
- scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench
  in a Python venv
- scripts/agentic/run-eval.sh: runs evaluations against local LLM server
  (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code
  (EvalPlus+BigCodeBench), tooluse (BFCL), full (all)
- bin/agentic: dispatcher with help
- docs/agentic-benchmarks.md: methodology, framework comparison, model
  recommendations for agentic use

Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Felipe Cardoso
2026-03-26 00:20:23 +01:00
parent 71053997be
commit 58124cd657
11 changed files with 1354 additions and 16 deletions

View File

@@ -56,6 +56,12 @@ Each `[!!]` is an optimization opportunity. Run `make optimize` to address them.
| `make optimize-vram` | BIOS VRAM guidance + GTT verification |
| `make verify` | Post-optimization verification checklist |
| `sudo make rollback` | Rollback optimizations |
| `make agentic-setup` | Install agentic eval frameworks (inspect-ai, evalplus) |
| `make agentic-quick ARGS="--model NAME"` | EvalPlus + IFEval (~1 hour) |
| `make agentic-code ARGS="--model NAME"` | Code generation evals (~2-3 hours) |
| `make agentic-tooluse ARGS="--model NAME"` | BFCL function calling eval (~1-2 hours) |
| `make agentic-full ARGS="--model NAME"` | All agentic evaluations (~5-6 hours) |
| `make test` | Run BATS test suite |
## Optimization Workflow
@@ -107,6 +113,8 @@ See [docs/architecture.md](docs/architecture.md) for the full architecture, data
| [docs/benchmarking.md](docs/benchmarking.md) | Benchmark methodology, test params, result interpretation |
| [docs/bios-vram-guide.md](docs/bios-vram-guide.md) | HP ZBook BIOS configuration for VRAM |
| [docs/troubleshooting.md](docs/troubleshooting.md) | Common issues and fixes |
| [docs/model-recommendations.md](docs/model-recommendations.md) | Qwen3.5 models, quantization, memory planning |
| [docs/agentic-benchmarks.md](docs/agentic-benchmarks.md) | Agentic evaluation frameworks and methodology |
| [docs/references.md](docs/references.md) | External links: AMD docs, toolboxes, community resources |
## Contributing