Commit Graph

5 Commits

Author SHA1 Message Date
Felipe Cardoso
fb1e57f1bf feat: make llama-rocm-7.2 a required toolbox in benchmark setup
ROCm 7.2 is now created alongside vulkan-radv during setup, giving
Vulkan vs ROCm comparison in baseline and post-optimization benchmarks.

Smoke test: ROCm 7.2 on Qwen3.5-0.8B → 8090 t/s pp512, 161 t/s tg128
(vs Vulkan: 8900 t/s pp512, 177 t/s tg128)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:23:03 +01:00
Felipe Cardoso
d22c062ca7 fix: model catalog shows download status, GPU detection in toolbox
- Catalog * indicator now searches recursively (finds models in subdirs)
- GPU verification suppresses toolbox crun stderr (directory not found noise)
- Matches on "radeon" and "available devices" for Vulkan output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:14:31 +01:00
Felipe Cardoso
eb52ea52ce fix: follow symlinks in model discovery, update model catalog
- Add -L flag to find in benchmark scripts (follows symlinks to /data/models/llms/)
- Exclude mmproj-*.gguf (vision projection files, not LLM models)
- Update configs/models.conf: remove Qwen3-Coder (user prefers Qwen3.5-35B-A3B),
  add Qwen3.5-27B-Q4_K_M and Q8_0 variant, reflect actual downloaded models

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 09:44:16 +01:00
Felipe Cardoso
58124cd657 feat: add Qwen3.5 model catalog and agentic evaluation framework
Models:
- configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick),
  Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding)
- Updated benchmark setup to show catalog with download status
- docs/model-recommendations.md: memory planning, quantization guide

Agentic evaluation:
- scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench
  in a Python venv
- scripts/agentic/run-eval.sh: runs evaluations against local LLM server
  (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code
  (EvalPlus+BigCodeBench), tooluse (BFCL), full (all)
- bin/agentic: dispatcher with help
- docs/agentic-benchmarks.md: methodology, framework comparison, model
  recommendations for agentic use

Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:20:23 +01:00
Felipe Cardoso
c596e38e9e Initial commit 2026-03-25 20:13:15 +01:00