-
c847991740
docs: add agentic coding evaluation landscape research
main
Felipe Cardoso
2026-04-15 15:55:04 +02:00
-
15bb6a8ed9
feat(serve): set APEX I-Compact as default, harden benchmark workflow
Felipe Cardoso
2026-04-13 01:11:46 +02:00
-
474d94a07e
chore: update model catalog with gemma 4, opus distill, and hw-bandwidth target
Felipe Cardoso
2026-04-03 20:03:53 +02:00
-
6ab08537ca
fix: address code review findings — batch args, venv path, serve flags
Felipe Cardoso
2026-03-31 10:10:48 +02:00
-
dd403a907c
feat(serve): add optimized llama-server launcher with n-gram speculation
Felipe Cardoso
2026-03-30 21:12:30 +02:00
-
ba24091791
feat(benchmark): add -b/--batch flag, test MoE batch size impact
Felipe Cardoso
2026-03-30 20:01:24 +02:00
-
ea70687cd2
docs: update optimization guide with measured hardware data
Felipe Cardoso
2026-03-30 19:56:18 +02:00
-
1549bc27c0
feat(optimize): add Phase 2 power profile and system tuning
Felipe Cardoso
2026-03-30 18:53:52 +02:00
-
f92b710492
fix(benchmark): parse llama-bench output with variable column count
Felipe Cardoso
2026-03-27 14:54:19 +01:00
-
7531f6fa74
feat(benchmark): add --kv-types flag for KV cache quantization sweep
Felipe Cardoso
2026-03-27 12:29:19 +01:00
-
38daf953bf
feat: add --pp and --tg flags for realistic benchmark workloads
Felipe Cardoso
2026-03-26 22:48:32 +01:00
-
3686783f4d
feat: add --context flag for configurable long-context benchmarks
Felipe Cardoso
2026-03-26 22:46:16 +01:00
-
1b5b193e81
fix: suppress exit code 143 from metric logger cleanup
Felipe Cardoso
2026-03-26 22:38:48 +01:00
-
fb1e57f1bf
feat: make llama-rocm-7.2 a required toolbox in benchmark setup
Felipe Cardoso
2026-03-26 19:23:03 +01:00
-
7c8be55bfe
fix: resolve model paths for toolbox container access
Felipe Cardoso
2026-03-26 19:17:16 +01:00
-
d22c062ca7
fix: model catalog shows download status, GPU detection in toolbox
Felipe Cardoso
2026-03-26 19:14:31 +01:00
-
6f197a1455
fix: pass ARGS through in benchmark Makefile targets
Felipe Cardoso
2026-03-26 19:10:59 +01:00
-
cb25fa3f6f
feat: add benchmark filtering (--max-size, --category, --skip-longctx)
Felipe Cardoso
2026-03-26 19:07:24 +01:00
-
eb52ea52ce
fix: follow symlinks in model discovery, update model catalog
Felipe Cardoso
2026-03-26 09:44:16 +01:00
-
58124cd657
feat: add Qwen3.5 model catalog and agentic evaluation framework
Felipe Cardoso
2026-03-26 00:20:23 +01:00
-
71053997be
chore: remove .idea from tracking, add to .gitignore
Felipe Cardoso
2026-03-25 23:58:18 +01:00
-
e9cb5c491f
fix+test: improve test suite, fix 2 bugs found by tests
Felipe Cardoso
2026-03-25 22:22:41 +01:00
-
a403dd9ce0
test: add BATS test suite (79 tests)
Felipe Cardoso
2026-03-25 22:15:34 +01:00
-
da2c4c6b8a
fix(docs): address review findings — accuracy, consistency, completeness
Felipe Cardoso
2026-03-25 21:44:16 +01:00
-
5b81437637
docs: add README, CLAUDE.md, AGENTS.md, and full docs/ suite
Felipe Cardoso
2026-03-25 20:50:00 +01:00
-
af0515d05d
fix: address code review findings (HIGH + MEDIUM)
Felipe Cardoso
2026-03-25 20:19:44 +01:00
-
c596e38e9e
Initial commit
Felipe Cardoso
2026-03-25 20:13:15 +01:00