strix-halo-optimizations

Author	SHA1	Message	Date
Felipe Cardoso	6ab08537ca	fix: address code review findings — batch args, venv path, serve flags - Fix missing BATCH_ARGS in long-context commands (both benchmark scripts) - Fix CLAUDE.md stale venv path (data/venv → .venv) and add serve/power docs - Add -b/--batch to bin/benchmark help text - Add --no-think flag to serve script (--reasoning-budget 0) - Sanitize model names in eval run directories - Simplify agentic setup to use requirements.txt - Add serve --help test, batch flag assertions to existing tests - Add requirements.txt for reproducible venv setup (Python 3.13)	2026-03-31 10:10:48 +02:00
Felipe Cardoso	7531f6fa74	feat(benchmark): add --kv-types flag for KV cache quantization sweep	2026-03-27 12:29:19 +01:00
Felipe Cardoso	cb25fa3f6f	feat: add benchmark filtering (--max-size, --category, --skip-longctx) Both run-baseline.sh and run-suite.sh now support: - --max-size GB: skip models larger than N GB (prevents OOM) - --category LIST: filter by catalog category (smoke,dense,moe) - --skip-longctx: skip 32K context tests (saves time + memory) - --reps N: configure repetition count - --help: shows usage with examples Safe pre-optimization run: benchmark baseline --max-size 20 --skip-longctx Full post-optimization: benchmark baseline (no filters, all models + longctx) Also: 4 new BATS tests for flag parsing (98 total, all passing) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 19:07:24 +01:00
Felipe Cardoso	e9cb5c491f	fix+test: improve test suite, fix 2 bugs found by tests Bugs fixed in production code: - compare.sh: Python truthiness on 0.0 — `if b_val` was False for 0.0 t/s, displaying it as a dash instead of "0.0". Fixed with `is not None` checks. - compare.sh: ZeroDivisionError when computing delta % with 0.0 baseline. Test improvements (review findings): - detect.bats: kernel param tests now use real detect_kernel_param logic pattern (not a separate reimplementation). Added non-GiB-aligned RAM test, device ID without 0x prefix, empty firmware version, llama-bench detection, detect_total_physical_ram_kb tests. - benchmark_compare.bats: assert delta percentages (+20.0%, -25.0%, 0.0%), test 0.0 t/s edge case, test per-directory error messages, test config change detection with specific field assertions. - log_metrics.bats: add assert_success, --help test, timestamp format validation. Remove unused mock sysfs setup. - common.bats: fix data_dir test, remove redundant assertion, add cleanup. - test_helper.sh: remove unused FIXTURES_DIR. - Remove empty tests/fixtures/ directory. 94 tests, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 22:22:41 +01:00
Felipe Cardoso	a403dd9ce0	test: add BATS test suite (79 tests) - tests/common.bats: PROJECT_ROOT detection, is_cmd, timestamp, data_dir, logging functions, color handling, require_root - tests/detect.bats: GPU sysfs reads with mock sysfs tree, kernel param parsing (word boundary, dot escaping, edge positions), recommended GTT/pages computation (64GB, 128GB, tiny, zero), firmware bad detection, stack detection - tests/format.bats: human_bytes (0, KiB, MiB, GiB boundaries, 64GiB), human_mib (sub-GiB, exact-GiB, recommended values, empty input) - tests/benchmark_compare.bats: improvement/regression display, empty results, missing files, usage output, config change detection - tests/log_metrics.bats: CSV header, data format, field count, input validation, unknown argument handling - tests/test_helper.sh: mock sysfs tree builder, bats-assert/support setup Makefile: add 'make test' target Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 22:15:34 +01:00

5 Commits