Files
strix-halo-optimizations/CLAUDE.md
Felipe Cardoso 58124cd657 feat: add Qwen3.5 model catalog and agentic evaluation framework
Models:
- configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick),
  Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding)
- Updated benchmark setup to show catalog with download status
- docs/model-recommendations.md: memory planning, quantization guide

Agentic evaluation:
- scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench
  in a Python venv
- scripts/agentic/run-eval.sh: runs evaluations against local LLM server
  (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code
  (EvalPlus+BigCodeBench), tooluse (BFCL), full (all)
- bin/agentic: dispatcher with help
- docs/agentic-benchmarks.md: methodology, framework comparison, model
  recommendations for agentic use

Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:20:23 +01:00

4.1 KiB

CLAUDE.md — AI Assistant Context

Optimization toolkit for AMD Strix Halo (Ryzen AI MAX+ 395, Radeon 8060S gfx1151, 64 GB unified memory) on Fedora 43. Pure bash scripts with inline Python for JSON handling and GRUB editing. See README.md for user-facing commands.

Architecture

bin/ dispatchers → scripts/ implementations → lib/ shared libraries. Scripts source libs as needed: always common.sh first, then detect.sh if hardware detection is needed, then format.sh if formatted output is needed. Some scripts (e.g., rollback.sh) only need common.sh. Runtime data goes to data/ (gitignored). Full details in docs/architecture.md.

Safety Rules

  • scripts/optimize/kernel-params.sh modifies /etc/default/grub — requires root, backs up to data/backups/ first. Always maintain the Python-with-env-vars pattern for GRUB editing (no shell variable interpolation into Python code).
  • scripts/optimize/tuned-profile.sh and rollback.sh require root and save previous state for rollback.
  • data/backups/ contains GRUB backups and tuned profile snapshots — never delete these.
  • Optimization scripts that modify system state (kernel-params.sh, tuned-profile.sh, rollback.sh) check $EUID at the top and exit immediately if not root. Guidance-only scripts (vram-gtt.sh, verify.sh) do not require root.
  • All Python blocks receive data via environment variables (os.environ), never via shell interpolation into Python source. This prevents injection. Do not revert to '''$var''' or "$var" patterns inside Python heredocs.

Key Technical Details

  • GPU sysfs: Auto-detected by find_gpu_card() in lib/detect.sh (matches vendor 0x1002). Falls back to first card with mem_info_vram_total.
  • Memory recommendations: recommended_gttsize_mib() in detect.sh computes from total physical RAM = visible RAM + dedicated VRAM (the VRAM is still physical memory). Floor at 1 GiB.
  • Kernel param detection: detect_kernel_param() uses word-boundary-anchored regex to avoid iommu matching amd_iommu.
  • Benchmark invocation: toolbox run -c NAME -- [env ROCBLAS_USE_HIPBLASLT=1] /path/to/llama-bench -ngl 99 -mmp 0 -fa 1 -r N. ENV_ARGS passed as a proper bash array (not string splitting).
  • llama-bench output: Pipe-delimited table. Python parser at fixed column indices (parts[8]=test, parts[9]=t/s). Format changes upstream would break parsing.
  • ROCm for gfx1151: Scripts set ROCBLAS_USE_HIPBLASLT=1 in benchmark ENV_ARGS. HSA_OVERRIDE_GFX_VERSION=11.5.1 is set inside the toolbox containers (not by our scripts) — needed for ollama and native ROCm builds.
  • Fedora GRUB: Prefers grubby (BLS), falls back to grub2-mkconfig, then grub-mkconfig. All three paths are handled.

Conventions

  • set -euo pipefail in every executable script
  • snake_case function names, UPPER_CASE for constants and loop variables
  • 4-space indentation, no tabs
  • lib/ files are sourced (no shebang enforcement), but include #!/usr/bin/env bash for editor support
  • Colors gated on [[ -t 1 ]] (disabled when piped)
  • bc used for float math; python3 for JSON and GRUB editing only

Validating Changes

make audit          # Quick check — shows system status with pass/fail indicators
make verify         # 9-point optimization checklist
bin/audit --json | python3 -m json.tool   # Verify JSON output is valid

Agentic Evaluation

Scripts in scripts/agentic/ with dispatcher at bin/agentic. Uses a Python venv at data/venv/. Eval frameworks: inspect-ai (all-in-one), evalplus (HumanEval+/MBPP+), bigcodebench. All target an OpenAI-compatible endpoint (ollama or llama.cpp server). Model catalog at configs/models.conf.

External Resources

All external links are centralized in docs/references.md. Key ones:

  • AMD ROCm Strix Halo guide (kernel params, GTT configuration)
  • Donato Capitella toolboxes (container images, benchmarks, VRAM estimator)
  • Qwen3.5 model family (GGUF quants by Unsloth)
  • Agentic eval frameworks (Inspect AI, EvalPlus, BFCL, BigCodeBench)