Commit Graph

11 Commits

Author SHA1 Message Date
Felipe Cardoso
c847991740 docs: add agentic coding evaluation landscape research
Comprehensive research (706 lines, dated 2026-03-30) covering evaluation
dimensions, benchmark suites, and open-weight model performance for
software engineering agent use cases on 64GB systems.

Also gitignore evalplus_results/ (runtime outputs) and ztop/ (nested repo).
2026-04-15 15:55:04 +02:00
Felipe Cardoso
15bb6a8ed9 feat(serve): set APEX I-Compact as default, harden benchmark workflow
Serving:
- make serve now launches Claude-distilled APEX 35B-A3B (16GB) with 2
  parallel slots and 256K context as the daily driver
- add serve-custom for ad-hoc model testing
- add flush-gpu to reclaim unified memory after stuck runs

Benchmarks:
- default Vulkan-only backends (ROCm trails at long context)
- add --backends filter to run-baseline.sh
- fix backend filter substring bug (grep -qFx for exact line match)
- fix model filter regex metacharacter bug (grep -qiF for literal)
- respect --tg in long-context tests instead of hardcoded n=32

ROCm bump to 7.2.1 (kernel 6.18.4+ patch); keep 7.2 as optional.

Catalog:
- add mudler APEX I-Compact (Claude-distilled 35B, 17GB)
- add 0xSero REAP-40 (pruned 122B-A10B, 46GB)
- update download instructions: hf download (huggingface-cli is gone)
2026-04-13 01:11:46 +02:00
Felipe Cardoso
dd403a907c feat(serve): add optimized llama-server launcher with n-gram speculation
Add `make serve` and `make serve-ngram` for launching llama-server with
baked-in optimal settings (Vulkan RADV, q4_0 KV cache, flash attention,
no-mmap, full GPU offload). N-gram speculative decoding gives 1.1-1.4x
tg speedup on repetitive content without upstream PR dependencies.
Update Phase 5 status: MTP is months away (4 unmerged PRs, no MoE
support), draft-model speculation stalled on ROCm buffer crashes.
2026-03-30 21:12:30 +02:00
Felipe Cardoso
ba24091791 feat(benchmark): add -b/--batch flag, test MoE batch size impact
Add batch size override to benchmark scripts. Testing -b 256 vs default
2048 on Vulkan RADV shows no meaningful difference for MoE pp2048
(826 vs 843 t/s, within noise). Community-reported +70% improvement
does not reproduce on this backend.
2026-03-30 20:01:24 +02:00
Felipe Cardoso
ea70687cd2 docs: update optimization guide with measured hardware data
Replace estimated values with clpeak measurements: DRAM 216-233 GB/s,
GPU clocks confirmed 2900 MHz under load (ROCm #5750 is sysfs reporting
only). Correct backend recommendation to Vulkan RADV (2.7x faster tg
than ROCm at 131K). Update KV cache recommendation to q4_0. Add
Nemotron-Cascade-2 to coder shootout results. Remove Nemotron-3-Nano
from catalog (replaced by Cascade-2). Update Q4_K_L to Q4_K_XL entry.
2026-03-30 19:56:18 +02:00
Felipe Cardoso
1549bc27c0 feat(optimize): add Phase 2 power profile and system tuning
Add `make optimize-power` (ryzenadj 85W, sysctl, THP, RADV nogttspill)
with systemd services for boot/resume persistence. Integrate into
`make optimize --all` as Phase 2. Update optimization log with RyzenAdj
results (+46% tg at 70W sustained), KV sweep data, and quant shootout.
Add Qwen3-Coder-30B and Nemotron-Cascade-2 to model catalog.
2026-03-30 18:53:52 +02:00
Felipe Cardoso
f92b710492 fix(benchmark): parse llama-bench output with variable column count
KV cache quantization adds type_k/type_v columns to llama-bench output,
shifting test and t/s to different indices. Parse from end of row instead
of hardcoded positions. Also fix KV suffix separator (underscore to dash)
to avoid regex ambiguity with type names like q8_0.

Add 5-phase optimization guide, optimization log for tracking results,
and research docs on llama.cpp and inference landscape optimizations.
2026-03-27 14:54:19 +01:00
Felipe Cardoso
58124cd657 feat: add Qwen3.5 model catalog and agentic evaluation framework
Models:
- configs/models.conf: catalog with Qwen3.5-35B-A3B (MoE, top pick),
  Qwen3.5-27B (dense), Qwen3-Coder-30B-A3B (agentic/coding)
- Updated benchmark setup to show catalog with download status
- docs/model-recommendations.md: memory planning, quantization guide

Agentic evaluation:
- scripts/agentic/setup.sh: installs inspect-ai, evalplus, bigcodebench
  in a Python venv
- scripts/agentic/run-eval.sh: runs evaluations against local LLM server
  (ollama or llama.cpp). Suites: quick (HumanEval+IFEval), code
  (EvalPlus+BigCodeBench), tooluse (BFCL), full (all)
- bin/agentic: dispatcher with help
- docs/agentic-benchmarks.md: methodology, framework comparison, model
  recommendations for agentic use

Updated: Makefile (6 new targets), README, CLAUDE.md, docs/references.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:20:23 +01:00
Felipe Cardoso
da2c4c6b8a fix(docs): address review findings — accuracy, consistency, completeness
- architecture.md: fix kernel param math to match actual computed values,
  use cardN placeholder in sysfs paths, clarify system_ram_kb is OS-visible
- benchmarking.md: normalize flags to -ngl 99 / -mmp 0 (matching code),
  add llama-rocm7-nightlies backend
- CLAUDE.md: clarify HSA_OVERRIDE_GFX_VERSION is set in containers not
  scripts, fix lib sourcing description, specify which scripts need root
- detect.sh: document detect_cpu_cores returns threads not cores
- troubleshooting.md: add link to references.md
- README.md: remove unsupported Fedora 42 claim, describe configs/ content

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 21:44:16 +01:00
Felipe Cardoso
5b81437637 docs: add README, CLAUDE.md, AGENTS.md, and full docs/ suite
- README.md: project overview, quick start, command reference, workflow
- CLAUDE.md: AI safety rules, technical details, conventions
- AGENTS.md: agent workflows, file responsibility map, dependency matrix
- docs/architecture.md: script layers, data flow, unified memory, JSON schemas
- docs/optimization.md: step-by-step optimization walkthrough
- docs/benchmarking.md: methodology, test params, result interpretation
- docs/troubleshooting.md: common issues and fixes
- docs/references.md: centralized external links (single source of truth)
- docs/bios-vram-guide.md: add back-link to optimization workflow

Cross-linked non-redundantly: each doc owns one layer, others link to it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 20:50:00 +01:00
Felipe Cardoso
c596e38e9e Initial commit 2026-03-25 20:13:15 +01:00