Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

50% of the CLs in SWE-Bench Verified are the DJango codebase. So if you're a big contributor to Django you should care a lot about that benchmark. Otherwise the difference between models is +-2 tasks done correctly. I wouldn't worry too much about it. Just try it out yourself and see if its any better.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: