Commit Graph

4 Commits

Author SHA1 Message Date
Felipe Cardoso
c847991740 docs: add agentic coding evaluation landscape research
Comprehensive research (706 lines, dated 2026-03-30) covering evaluation
dimensions, benchmark suites, and open-weight model performance for
software engineering agent use cases on 64GB systems.

Also gitignore evalplus_results/ (runtime outputs) and ztop/ (nested repo).
2026-04-15 15:55:04 +02:00
Felipe Cardoso
6ab08537ca fix: address code review findings — batch args, venv path, serve flags
- Fix missing BATCH_ARGS in long-context commands (both benchmark scripts)
- Fix CLAUDE.md stale venv path (data/venv → .venv) and add serve/power docs
- Add -b/--batch to bin/benchmark help text
- Add --no-think flag to serve script (--reasoning-budget 0)
- Sanitize model names in eval run directories
- Simplify agentic setup to use requirements.txt
- Add serve --help test, batch flag assertions to existing tests
- Add requirements.txt for reproducible venv setup (Python 3.13)
2026-03-31 10:10:48 +02:00
Felipe Cardoso
71053997be chore: remove .idea from tracking, add to .gitignore
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 23:58:18 +01:00
Felipe Cardoso
c596e38e9e Initial commit 2026-03-25 20:13:15 +01:00