Hacker Newsnew | past | comments | ask | show | jobs | submit | tlogan's commentslogin

Gemini pro medium is failing this:

I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.

But amazingly chatgpt is telling me to drive.

Anyway, this just shows how they just patched this because the tiktok video with this went viral. These systems are LLMs and all these logic steps are still just LLM steps.


Also the answers are non-deterministic

It has been patched. I tried it last week and it definitely suggested walking. It seems like all the models have been updated, which is not surprising given that the TikTok video has got 3.5 million views.

I tried ChatGPT today. Same results as others.

This has been viral on Tiktok far at least one week. Not really 4 hours.

This trick went viral on TikTok last week, and it has already been patched. To get a similar result now, try saying that the distance is 45 meters or feet.

The new one is with upside down glass: https://www.tiktok.com/t/ZP89Khv9t/


By "patched", you can't mean they added something to the internal prompt to show it how to answer this one specific question?!

Absolutely. There is a preflight guardrail that steers specific words, phrases, concepts with tweaked output.

This is pure speculation.

The fact that you can still reproduce the issue doesn't give it a lot of credibility.


Such AGI wow!

Why do you think they’re on GPT 5.2 now?

"Stupid Pencil Maker" by Shel Silverstein

Some dummy built this pencil wrong,

The eraser's down here where the point belongs,

And the point's at the top - so it's no good to me,

It's amazing how stupid some people can be.


I just got the “you should walk” result on ChatGPT 5.2

I got the "you should walk" answer 4 out of 5 times with free ChatGPT, until I told it to, basically, "think carefully": https://news.ycombinator.com/item?id=47040530

To me, the "patching" that is happening anytime some finds an absolutely glaring hole in how AIs work is so intellectually dishonest. It's the digital equivalent of house flippers slapping millennial gray paint on structural issues.

It can't math correctly, so they force it to use a completely different calculator. It can't count correctly, unless you route it to a different reasoning. It feels like every other week someone comes up with another basic human question that results in complete fucking nonsense.

I feel like this specific patching they do is basically lying to users and investors about capabilities. Why is this OK?


Counting and math makes sense to add special tools for because it’s handy. I agree with your point that patching individual questions like this is dishonest. Although I would say it’s pointless too. The only value from asking this question is to be entertained, and “fixing” this question makes the answer less entertaining.

From a technological standpoint, it is pointless. But from a marketing perspective, it is very important.

Take this trick question as an example. Gemini was the first to “fix” the issue, and the top comment on Hacker News is praising how Gemini’s “reasoning” is better.


> The only value from asking this question is to be entertained, and “fixing” this question makes the answer less entertaining.

You're thinking like a user. The people doing the patching are thinking like a founder trying to maintain the impression that this is a magical technology that CEOs can use to replace all their workers.

You don't have as much money to spend as the CEOs, so they don't care about your entertainment.


No, you are wrong. AGI is at our doorsteps! /s

I was able to reproduce on ChatGPT with the exact same prompt, but not with the one I phrased myself initially. Which was interesting. I tried also changing the number and didn't get far with it.

"patched" = the answer is in search results

Ah yes, one of those novelty reversible cups.

This is a trick cup, so it's okay to have a laugh.

Patched where; 4 models were responses were posted. Also, Azure deployed models are absolutely not "patched" on the fly; they are rarely updated and the dates are baked into the full sku.

"Patching" could be happening in "general public" tools but honestly sounds a lot like "Bro science".


still failed for me on opus 4.6 extended a second ago.

when i prompted about how walking would mean leaving my car behind the "thinking" done before coming to the right conclusion was:

> lmao, fair point. the user is right - you need to bring the car to the car wash. that's a legitimate correction. own it.


One thing I have learned from the Epstein files is that when someone labels themselves as “woke,” “progressive,” or committed to “class struggle,” you should not automatically take them at their word. People lie and cheat all the time.

Labels are like military medals, when people give them to themselves it's a red flag.

True believers can make easy marks.

Most people are basically good. Don’t let high profile monsters like Epstein and his friends turn you into a misanthrope.

Weird take considering Epstein was literal rightist, racist and sexist to the bone for whom feminisms was the biggest enemy. (I mean, of course, he was abuser and the woke idea that abuse is something wrong was the primary danger to him and his associates.)

Politically, both side are in the files, plenty of supposed centrists, but the hell more right wing and billionaires and soft right wing is on it more.


Ethnic Jewish supremacist as clearly evidenced by the email but very little reporting in the media. Complete racist also. I wonder if this was how he and his goons told themselves that it's okay to traffic the women from the former Soviet Union and other poor parts of the world, maybe they thought the girls were below them in every way possible.

You forgot to mention eugenics.

Weird take. To me it’s more that billionaires shouldn’t exist and trump is a pedophile

I used to think this sounds crazy but I am thinking a bit that being a billionaire corrupts you in the sense that normal things don't do it anymore for you and it's just more and more depravity.

Humans have been corrupted by power since power structures have existed, why would it be any different now.

The depravity also gets served up to you on a silver platter when you are at that level.

I saw the video and understand the problem but I cannot simulate it. The keyboard always works great for me. Could it be that this bug is related to AI? Or some language settings?

Same. I have both NL + EN configured languages on my keyboard and I couldn't reproduce it with either NL/EN and EN/NL. In the video you can see they have 4 keyboards configured. It would be helpful to know which keyboard they are using.

SHOULD = You are strongly recommended to do this, but it’s not absolutely required.

- In most cases, you are expected to follow it.

- You can choose not to follow it, but you must have a very good reason.

For example, RFC 7231 say that there should be DATE header but some embedded devices have no real-time clock so it ok not to implement.


India’s UPI is national service so fraud is “relatively easy” to combat but it depends on banks’ responsiveness.

However, i heard from my Indian friends is that UPI fraud is on the raise and becoming a big challenge.

Edit: UPI fraud rate is similar to CC fraud rate but only about ~6 % of the money lost to UPI fraud has been recovered. If this trend continues (fraud pct continues to grow and recovery rate does not improve) UPI system might get into trouble.

Btw, the stats say that the UPI fraud rate is doubling every year for past few years.


The problem arises when the banks are in two different countries. If money leaves your account and mistakenly ends up in an account in another country, even within Europe, it can be very difficult to recover.

That is why national payment systems tend to work relatively well. Cross border systems are a completely different challenge.


I think what people are missing in this conversation is fraud prevention and protection.

Any cross border payment solution, even within Europe, that lacks strong fraud protection is dead on arrival.

But I suspect the fraud problem will be ignored until it cannot be ignored anymore. And then we will go back to square one and try everything again.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: