Note that Claude 2 scores 71.2% zero-shot on the python coding benchmark HumanEv...

famouswaffles · on July 14, 2023

GPT-4 out in the wild's (reproducible) performance appears to be much higher than 67. Testing from 3/15 (presumably on the 0314 model) seems to be at 85.36% (https://twitter.com/amanrsanger/status/1635751764577361921). And the linked paper from my post(https://doi.org/10.48550/arXiv.2305.01210) got a pass@1 of 88.4 from GPT-4 recently (May? June?).

lerchmo · on July 14, 2023

I have found just using it in the web interface comperable to OpenAI. But the context window makes a huge difference. I can dump alot more files in ( entire schema, sample records etc)