Models: - configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick), Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding) - Updated benchmark setup to show catalog with download status - docs/model-recommendations.md: memory planning, quantization guide Agentic evaluation: - scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench in a Python venv - scripts/agentic/run-eval.sh: runs evaluations against local LLM server (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code (EvalPlus+BigCodeBench), tooluse (BFCL), full (all) - bin/agentic: dispatcher with help - docs/agentic-benchmarks.md: methodology, framework comparison, model recommendations for agentic use Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.1 KiB
CLAUDE.md — AI Assistant Context
Optimization toolkit for AMD Strix Halo (Ryzen AI MAX+ 395, Radeon 8060S gfx1151, 64 GB unified memory) on Fedora 43. Pure bash scripts with inline Python for JSON handling and GRUB editing. See README.md for user-facing commands.
Architecture
bin/ dispatchers → scripts/ implementations → lib/ shared libraries. Scripts source libs as needed: always common.sh first, then detect.sh if hardware detection is needed, then format.sh if formatted output is needed. Some scripts (e.g., rollback.sh) only need common.sh. Runtime data goes to data/ (gitignored). Full details in docs/architecture.md.
Safety Rules
scripts/optimize/kernel-params.shmodifies/etc/default/grub— requires root, backs up todata/backups/first. Always maintain the Python-with-env-vars pattern for GRUB editing (no shell variable interpolation into Python code).scripts/optimize/tuned-profile.shandrollback.shrequire root and save previous state for rollback.data/backups/contains GRUB backups and tuned profile snapshots — never delete these.- Optimization scripts that modify system state (
kernel-params.sh,tuned-profile.sh,rollback.sh) check$EUIDat the top and exit immediately if not root. Guidance-only scripts (vram-gtt.sh,verify.sh) do not require root. - All Python blocks receive data via environment variables (
os.environ), never via shell interpolation into Python source. This prevents injection. Do not revert to'''$var'''or"$var"patterns inside Python heredocs.
Key Technical Details
- GPU sysfs: Auto-detected by
find_gpu_card()inlib/detect.sh(matches vendor0x1002). Falls back to first card withmem_info_vram_total. - Memory recommendations:
recommended_gttsize_mib()indetect.shcomputes from total physical RAM = visible RAM + dedicated VRAM (the VRAM is still physical memory). Floor at 1 GiB. - Kernel param detection:
detect_kernel_param()uses word-boundary-anchored regex to avoidiommumatchingamd_iommu. - Benchmark invocation:
toolbox run -c NAME -- [env ROCBLAS_USE_HIPBLASLT=1] /path/to/llama-bench -ngl 99 -mmp 0 -fa 1 -r N. ENV_ARGS passed as a proper bash array (not string splitting). - llama-bench output: Pipe-delimited table. Python parser at fixed column indices (parts[8]=test, parts[9]=t/s). Format changes upstream would break parsing.
- ROCm for gfx1151: Scripts set
ROCBLAS_USE_HIPBLASLT=1in benchmark ENV_ARGS.HSA_OVERRIDE_GFX_VERSION=11.5.1is set inside the toolbox containers (not by our scripts) — needed for ollama and native ROCm builds. - Fedora GRUB: Prefers
grubby(BLS), falls back togrub2-mkconfig, thengrub-mkconfig. All three paths are handled.
Conventions
set -euo pipefailin every executable scriptsnake_casefunction names,UPPER_CASEfor constants and loop variables- 4-space indentation, no tabs
lib/files are sourced (no shebang enforcement), but include#!/usr/bin/env bashfor editor support- Colors gated on
[[ -t 1 ]](disabled when piped) bcused for float math;python3for JSON and GRUB editing only
Validating Changes
make audit # Quick check — shows system status with pass/fail indicators
make verify # 9-point optimization checklist
bin/audit --json | python3 -m json.tool # Verify JSON output is valid
Agentic Evaluation
Scripts in scripts/agentic/ with dispatcher at bin/agentic. Uses a Python venv at data/venv/. Eval frameworks: inspect-ai (all-in-one), evalplus (HumanEval+/MBPP+), bigcodebench. All target an OpenAI-compatible endpoint (ollama or llama.cpp server). Model catalog at configs/models.conf.
External Resources
All external links are centralized in docs/references.md. Key ones:
- AMD ROCm Strix Halo guide (kernel params, GTT configuration)
- Donato Capitella toolboxes (container images, benchmarks, VRAM estimator)
- Qwen3.5 model family (GGUF quants by Unsloth)
- Agentic eval frameworks (Inspect AI, EvalPlus, BFCL, BigCodeBench)