Hacker Newsnew | past | comments | ask | show | jobs | submit | jug's commentslogin

I personally use it as my general purpose and coding model. It's good enough for my coding tasks most of the time, has very good and rapid web search grounding that makes the Google index almost feel like part of its training set, and Google has a family sharing plan with individual quotas for Google AI Pro at $20/month for 5 users which also includes 2 TB in the cloud. Family sharing is a unique feature for Gemini 3 Flash Thinking (300 prompts per day and user) & Pro (100 prompts per day and user).

It's in the post?

Sorry, what I meant is if third party has them in their leaderboards. I don't usually trust most of what any of these vendors claim in their release notes without a third party. I know it says "verified" there, but I don't see were the SWE bench results are from a third party, whereas for the "HLE-Verified" they do have a citation to Hugging Face.

I was looking for something closer to: https://www.vals.ai/benchmarks/swebench


"SWE-Bench Verified" is the name of the benchmark: https://dev.to/duplys/swe-bench-swe-bench-verified-benchmark.... Same with "HLE-Verified". It's nothing to do with third party testing. The citation you point to makes that clear.

It's the ambition as a home user OS though, like macOS. And in the discussion of "it just works" operating systems, who else are we to go by than the vendor ambitions? Personal opinions? In that case, neither is because both struggle to always work in all scenarios since their respective inceptions.

When the phrase originated, manually updating CONFIG.SYS and AUTOEXEC.BAT were expected skills of a home PC owner. The idea of buying a device, plugging it in, and having it work without a complex setup was unheard of. "It just works" on the Mac meant the absence of a DOS layer, IRQs, command lines, etc.

This comparison holds up to me only in the long standing debate "LLMs as the new engineer", not "LLMs as a new programming language" (like here).

I think there are important distinctions there, predictably one of them.


Even as a SSWE I do often wonder if I am but a high-level language.


I've heard Opus 4.5 might have an edge especially in long running agentic coding scenarios (?) but personally yes Gemini 3 series is what I was expecting GPT-5 to be.

I'm also mostly on Gemini 3 Flash. Not because I've compared them all and I found it the best bar none, but because it fulfills my needs and then some, and Google has a surprisingly little noted family plan for it. Unlike OpenAI, unlike Anthropic. IIRC it's something like 5 shared Gemini Pro subs for the price of 1. Even being just a couple sharing it, it's a fantastic deal. My wife uses it during studies, I professionally with coding and I've never run into limits.


They do have some in-house LLM's (Phi) but they seem to either have issues with, or not thinking it's worth it, to develop large flagship ones.


The craziest thing was how Microsoft took the super established brand from decades, and renamed Microsoft Office to Microsoft 365.

I'm not sure if it's named Microsoft 365 Copilot nowadays, or if that's an optional AI addon? I thought it was renamed once more, but they themselves claim simply "Microsoft 365" (in a few various tiers) sans-Copilot. https://www.microsoft.com/microsoft-365/buy/compare-all-micr...


Surprised they didn't just try Clawbot first. I can see the case against "Clawd" (I mean; seriously...) but claws are a different matter IMHO, with that mascot and all.


It's probably still a bit too close. "Claw'd" might actually be a trademark of Anthropic now. The character and name originates from this Claude Sonnet 3.5 advertisement in June 2024, promoting the launch of the Artifacts feature by building an 8-bit game

https://www.youtube.com/watch?v=rHqk0ZGb6qo

"Have the crab jump up and over oncoming seashells... I think I want to name this crab... Claw'd."

Also, if you haven't found it hidden in Claude Code yet, there's a secret way to buy Clawd merch from Anthropic. Still waiting on them to make a Clawd plushie, though.


Surely this article is written by AI? The header emojis, the comparison table, the bolded "keywords"...


That means they're in a bad shape because this was labeled a last resort by Sam himself in 2024.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: