strix-halo-optimizations

Files

Felipe Cardoso 38daf953bf feat: add --pp and --tg flags for realistic benchmark workloads

Standard benchmarks use pp512/tg128 which underestimates real-world
agentic coding where responses are 500-2000 tokens. Now configurable:

  --pp N    Prompt processing tokens (default: 512)
  --tg N    Token generation count (default: 128)

Examples:
  benchmark run --tag realistic --tg 1024 --pp 2048 --category moe
  benchmark run --tag full-response --tg 2048 --category moe --reps 3

Log filenames include pp/tg when non-default (e.g., model__backend__fa1__pp2048_tg1024.log)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-26 22:48:32 +01:00

compare.sh

fix+test: improve test suite, fix 2 bugs found by tests

2026-03-25 22:22:41 +01:00

run-baseline.sh

feat: add --pp and --tg flags for realistic benchmark workloads

2026-03-26 22:48:32 +01:00

run-suite.sh

feat: add --pp and --tg flags for realistic benchmark workloads

2026-03-26 22:48:32 +01:00

setup.sh

feat: make llama-rocm-7.2 a required toolbox in benchmark setup

2026-03-26 19:23:03 +01:00