Hacker Newsnew | past | comments | ask | show | jobs | submit | golfer's commentslogin

Honest question: What could a hardware device do that your phone can't do already?


TIL the phrase remontada -- thank you :)

Slang term for comeback


Ads are already appearing in Google search AI overviews [1]. AI overviews have 2 billion users at the moment.

[1] https://support.google.com/google-ads/answer/16297775?hl=en


Maybe ChatGPT is sticky enough that people won't switch. But since we're talking about something as old as Google Video, we could also talk about AltaVista, which was "good enough" until people discovered a better and more useful alternative.

A lot of "normal people" are learning fast about ChatGPT alternatives now. Gemini in particular is getting a lot of mainstream buzz. Things like this [1] with 14k likes are happening everyday on social. Marc Benioff's love for Gemini broke through into the mainstream also.

[1] https://x.com/kimmonismus/status/1995900344224907500 [2] https://x.com/Benioff/status/1992726929204760661


I couldn't even get ChatGPT to let me download code it claimed to program for me. It kept saying the files were ready but refused to let me access or download anything. It was the most basic use case and it totally bombed. I gave up on ChatGPT right then and there.

It's amazing how different people have wildly varying experiences with the same product.


It's because comparing their "ChatGPT" experience with your "ChatGPT" experience doesn't tell anyone anything. Unless people start saying what models they're using and prompts, the discussions back and forth about what platform is the best provides zero information to anyone.


It’s the equivalent of the user that points at their workstation tower and exclaims that the “hard drive is broken!”

Use the right words, get the right response.

Ah… ahhh… I get now why they get such bad results from AI models.


Did you wait a while before downloading? The links it provides for temporary projects have a surprisingly brief window where you can download them. I've had similar experience when even waiting 1 minute to download the file.


Since LLMs are non deterministic it's not that amazing. You could ask it the same question as me and we could both get very different conversations and experiences


The same thing happens to me in Claude occasionally. I have to tell it "Generate a tar.gz archive for me to download".


Google trains its own AI with TPU's, which are designed in house. Google doesn't have to pay retail rates for Nvidia GPUs, like other hyperscalers in the AI rat race. Therefore, Google trains its AI for cheaper than everyone else. I think everyone else "loses big" other than Google.


Well, those who are aware of this definitely know what it is leading to. But most will act shocked surely.


But ... I don't understand why this is supposedly such a big deal. Look into it, calculate, and a very different picture comes forward, nVidia reportedly makes about 70% margin on their sales (which is COGS, in other words nVidia still pays about $1400 for chips and memory to produce a $4500 RTX5090 card, and that cost is rising fast).

When you include research for current and future cards, that margin drops to 55-60%.

When you include everything on their cash flow statement it drops to about 50%.

And this is disregarding what Michael Burry pointed out: you really should subtract their stock dilution which is due to stock-based compensation, or about 0.2% of 4.6 trillion dollars per year. Michael Burry's point is of course that this makes for slightly negative shareholders' equity, ie. brings the margin to just under 0, which is mathematically true. But for this argument let's very generously say it eats about another 10% out of that margin. As opposed to the 50% it mathematically eats.

Google and Amazon will have to be less efficient than nVidia, because they're making up ground. Let's very generously say that's another 10%, maybe 20%.

So really, for Google making their own chips saves them at best 30% to 40% on the price, generously. And let's again ignore that Google's claim is that they're 30% to 50% less efficient than nVidia chips, which for large training runs translates directly to dollars.

So for Google, TPUs are just about revenue neutral. It probably allows them to have more chips, more compute than they'd otherwise have, but it doesn't save them money over buying nVidia chips. Frankly, this conclusion sounds "very Google" to me.

It's exactly the sort of thing I'd expect Google to do. VERY impressive technical accomplishment ... but can be criticized for being beside the point. It doesn't actually matter. As an engineer I applaud that they do it, please keep doing it, but it's not building a moat, not building revenue or profit, so the finance guy in me is screaming "WHY????????"

At best, for Google, TPUs mean certainty of supply, relative to nVidia (whereas supplier contracts could build certainty of supply down the chain)



Every time I see a table like this numbers go up. Can someone explain what this actually means? Is there just an improvement that some tests are solved in a better way or is this a breakthrough and this model can do something that all others can not?


This is a list of questions and answers that was created by different people.

The questions AND the answers are public.

If the LLM manages through reasoning OR memory to repeat back the answer then they win.

The scores represent the % of correct answers they recalled.


That is not entirely true. At least some of these tests (like HLE and ARC) take steps to keep the evaluation set private so that LLMs can’t just memorize the answers.

You could question how well this works, but it’s not like the answers are just hanging out on the public internet.


Excuse my ignorance, how do these companies evaluate their models against the evaluation set without access to it?


Cooperation with the eval admins


I estimate another 7 months before models start getting 115% on Humanity's Last Exam.


If you believe another thread the benchmarks are comparing Gemini-3 (probably thinking) to GPT-5.1 without thinking.

The person also claims that with thinking on the gap narrows considerably.

We'll probably have 3rd party benchmarks in a couple of days.


This is easily shown that the numbers are for GPT 5.1 thinking high.

Just go to the leaderboard website and see for yourself: https://arcprize.org/leaderboard


Installed it. Thought it might be cool to ask it how to improve my site UI. It thought for about 2 minutes, supposedly made changes. It says it created a "searchable, filterable grid layout" but I don't see any difference on the page. I wonder what's up.


There are a few things that could have happened:

1. Maybe the domain matching missed? You can check this by going to the library tab and seeing if it appears in "Modifications for Current Page" when you're on the site.

2. Maybe there was a silent error. Our current error system relies on chrome notifications and we've come to realize that many people have them disabled. That means you don't get a clear error message when something goes wrong. We are actively working on this.

3. The script could be caught be a content policy. Checking console log could help to see if there are any errors.

4. Maybe the script just doesn't work on the first try. Can't guarantee it will work perfectly every time. You can try to update the script (Library -> click Modify on the script) and say that it didn't work/you don't see any changes.

Happy to provide more support via email (contact@trynextbyte.com) or discord (https://www.tweeks.io/discord)


Jumped into the discord, thanks!


Indeed, captcha vs captcha bot solvers has been an ongoing war for a long time. Considering all the cybercrime and ubiquitous online fraud today, it's pretty impressive that captchas have held the line as long as they have.


Unfortunately Gemini isn't the only culprit here. I've had major problems with ChatGPT reliability myself.


I only hit that problem in voice mode, it'll just stop halfway and restart. It's a jarring reminder of its lack of "real" intelligence


I've heard a lot that voice mode uses a faster (and worse) model than regular ChatGPT. So I think this makes sense. But I haven't seen this in any official documentation.


This is more because of VAD - voice activity detection


I think what I am seeing from ChatGPT is highly varying performance. I think this must be something they are doing to manage limitations of compute or costs. With Gemini, I think what I see is slightly different - more like a lower “peak capability” than ChatGPT’s “peak capability”.


I'm fairly sure there's some sort of dynamic load balancing at work. I read an anecdote from someone had a test where they asked it to draw a little image (something like an ascii cat, but probably not exactly that since it seems a bit basic), and if the result came back poor they didn't bother using it until a different time of day.

Of course it could all be placebo, but when you intuitively think about it, somewhere on the road the the hundreds of billions in datacenter capex, one would think that there will be periods where compute and demand are out of sync. It's also perfectly understandable why now would be a time to be seeing that.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: