feat: add Qwen3.5 model catalog and agentic evaluation framework
Models: - configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick), Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding) - Updated benchmark setup to show catalog with download status - docs/model-recommendations.md: memory planning, quantization guide Agentic evaluation: - scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench in a Python venv - scripts/agentic/run-eval.sh: runs evaluations against local LLM server (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code (EvalPlus+BigCodeBench), tooluse (BFCL), full (all) - bin/agentic: dispatcher with help - docs/agentic-benchmarks.md: methodology, framework comparison, model recommendations for agentic use Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -56,6 +56,12 @@ Each `[!!]` is an optimization opportunity. Run `make optimize` to address them.
|
||||
| `make optimize-vram` | BIOS VRAM guidance + GTT verification |
|
||||
| `make verify` | Post-optimization verification checklist |
|
||||
| `sudo make rollback` | Rollback optimizations |
|
||||
| `make agentic-setup` | Install agentic eval frameworks (inspect-ai, evalplus) |
|
||||
| `make agentic-quick ARGS="--model NAME"` | EvalPlus + IFEval (~1 hour) |
|
||||
| `make agentic-code ARGS="--model NAME"` | Code generation evals (~2-3 hours) |
|
||||
| `make agentic-tooluse ARGS="--model NAME"` | BFCL function calling eval (~1-2 hours) |
|
||||
| `make agentic-full ARGS="--model NAME"` | All agentic evaluations (~5-6 hours) |
|
||||
| `make test` | Run BATS test suite |
|
||||
|
||||
## Optimization Workflow
|
||||
|
||||
@@ -107,6 +113,8 @@ See [docs/architecture.md](docs/architecture.md) for the full architecture, data
|
||||
| [docs/benchmarking.md](docs/benchmarking.md) | Benchmark methodology, test params, result interpretation |
|
||||
| [docs/bios-vram-guide.md](docs/bios-vram-guide.md) | HP ZBook BIOS configuration for VRAM |
|
||||
| [docs/troubleshooting.md](docs/troubleshooting.md) | Common issues and fixes |
|
||||
| [docs/model-recommendations.md](docs/model-recommendations.md) | Qwen3.5 models, quantization, memory planning |
|
||||
| [docs/agentic-benchmarks.md](docs/agentic-benchmarks.md) | Agentic evaluation frameworks and methodology |
|
||||
| [docs/references.md](docs/references.md) | External links: AMD docs, toolboxes, community resources |
|
||||
|
||||
## Contributing
|
||||
|
||||
Reference in New Issue
Block a user