strix-halo-optimizations

Author	SHA1	Message	Date
Felipe Cardoso	d22c062ca7	fix: model catalog shows download status, GPU detection in toolbox - Catalog * indicator now searches recursively (finds models in subdirs) - GPU verification suppresses toolbox crun stderr (directory not found noise) - Matches on "radeon" and "available devices" for Vulkan output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 19:14:31 +01:00
Felipe Cardoso	cb25fa3f6f	feat: add benchmark filtering (--max-size, --category, --skip-longctx) Both run-baseline.sh and run-suite.sh now support: - --max-size GB: skip models larger than N GB (prevents OOM) - --category LIST: filter by catalog category (smoke,dense,moe) - --skip-longctx: skip 32K context tests (saves time + memory) - --reps N: configure repetition count - --help: shows usage with examples Safe pre-optimization run: benchmark baseline --max-size 20 --skip-longctx Full post-optimization: benchmark baseline (no filters, all models + longctx) Also: 4 new BATS tests for flag parsing (98 total, all passing) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 19:07:24 +01:00
Felipe Cardoso	eb52ea52ce	fix: follow symlinks in model discovery, update model catalog - Add -L flag to find in benchmark scripts (follows symlinks to /data/models/llms/) - Exclude mmproj-*.gguf (vision projection files, not LLM models) - Update configs/models.conf: remove Qwen3-Coder (user prefers Qwen3.5-35B-A3B), add Qwen3.5-27B-Q4_K_M and Q8_0 variant, reflect actual downloaded models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 09:44:16 +01:00
Felipe Cardoso	58124cd657	feat: add Qwen3.5 model catalog and agentic evaluation framework Models: - configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick), Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding) - Updated benchmark setup to show catalog with download status - docs/model-recommendations.md: memory planning, quantization guide Agentic evaluation: - scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench in a Python venv - scripts/agentic/run-eval.sh: runs evaluations against local LLM server (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code (EvalPlus+BigCodeBench), tooluse (BFCL), full (all) - bin/agentic: dispatcher with help - docs/agentic-benchmarks.md: methodology, framework comparison, model recommendations for agentic use Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 00:20:23 +01:00
Felipe Cardoso	e9cb5c491f	fix+test: improve test suite, fix 2 bugs found by tests Bugs fixed in production code: - compare.sh: Python truthiness on 0.0 — `if b_val` was False for 0.0 t/s, displaying it as a dash instead of "0.0". Fixed with `is not None` checks. - compare.sh: ZeroDivisionError when computing delta % with 0.0 baseline. Test improvements (review findings): - detect.bats: kernel param tests now use real detect_kernel_param logic pattern (not a separate reimplementation). Added non-GiB-aligned RAM test, device ID without 0x prefix, empty firmware version, llama-bench detection, detect_total_physical_ram_kb tests. - benchmark_compare.bats: assert delta percentages (+20.0%, -25.0%, 0.0%), test 0.0 t/s edge case, test per-directory error messages, test config change detection with specific field assertions. - log_metrics.bats: add assert_success, --help test, timestamp format validation. Remove unused mock sysfs setup. - common.bats: fix data_dir test, remove redundant assertion, add cleanup. - test_helper.sh: remove unused FIXTURES_DIR. - Remove empty tests/fixtures/ directory. 94 tests, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 22:22:41 +01:00
Felipe Cardoso	af0515d05d	fix: address code review findings (HIGH + MEDIUM) - Replace GNU \b with portable word-boundary sed patterns in kernel-params - Warn on unknown CLI arguments instead of silently swallowing - Add floor check on recommended_gttsize_mib to prevent negative values - Fix Python operator precedence in benchmark log parser - Add root checks to tuned-profile.sh and rollback.sh - Remove redundant sudo calls (scripts already require root at entry) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 20:19:44 +01:00
Felipe Cardoso	c596e38e9e	Initial commit	2026-03-25 20:13:15 +01:00

7 Commits