Anthropic’s Claude LLM is pretty interesting. In many ways it feels much more limited than GPT4. However, it is suspiciously good at a few edge-case code generation tasks (can’t go into details) that makes me wonder where it got its training data from. It also seems to be much less prone to hallucinating APIs and modules, preferring instead to switch back to natural language and describe the task without pretending it has a functioning solution handy.
Anthropic actually uses a more cutting edge fine-tuning than OpenAI, a technique that doesn't rely on RLHF. Maybe this gives it an advantage in some areas even if their base model is only on the level of GPT-3.5 (used in free ChatGPT).
There is a language with massive usage in the enterprise but with very few (if any) high quality code examples on the public internet.
When given a broad task, GPT4 doesn’t just write incorrect code, it tries to do entire categories of things the language literally cannot do because of the ecosystem it runs inside.
Claude does a much better job writing usable code, but more importantly it does NOT tell you to do things in code that need to be done out-of-band. In fact, it uses natural language to identify these areas and point you in the right direction.
If you dig into my profile & LinkedIn you can probably guess what language I’m talking about.
I wrote it using a framework whose most recent release is substantially different than what GPT-4 was trained on.
I quickly learned to just paste the docs and examples from the new framework to GPT, telling it "this is how the API looks now" and it just worked.
It helped me do everything. From writing the code, to setting up SSL on nginx, to generating my DB schema, to getting my DB schema into the prod db (I don't use migration tooling).
Most of my time was spent telling GPT "sorry, that API is out of date --- use it like this, instead". Very rarely did GPT actually produce incorrect code or code that does the wrong thing.
That makes sense. My brother, who has been coding since 1990 and worked his entire career in boring Fortune 500 companies, was wholly unimpressed by chatGPT. It failed pretty miserably whenever he threw any old tech stack at it.
Worth keeping an eye on for sure.