Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Codex Daily Benchmarks for Degradation Tracking (Marginlab.ai) (marginlab.ai)
1 point by wendgeabos 13 days ago | past | discuss
Claude Code daily benchmarks for degradation tracking (marginlab.ai)
760 points by qwesr123 13 days ago | past | 355 comments
No one is evaluating AI coding agents in the way they are used (marginlab.ai)
1 point by qwesr123 29 days ago | past
Claude Code Daily Degradation Tracker (marginlab.ai)
3 points by qwesr123 33 days ago | past | 3 comments
Anatomy of a Coding Agent: A step-by-step illustration (marginlab.ai)
3 points by qwesr123 51 days ago | past
How are coding assistants evaluated? SWE-Bench Pro Explorer (marginlab.ai)
2 points by qwesr123 53 days ago | past
SWE-Bench: The $500B Benchmark (marginlab.ai)
5 points by qwesr123 55 days ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: