Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like the noise variant! The "everything can in one trip" variant is the one I've been using, and I was able to get 4o to get it right in one shot with enough couching, and o1-preview without couching, which convinced me (of what, I'm not sure). my other riddle is the car accident doctor son one, which 4o couldn't get but o1-preview does.

I'll have to come up with more obscure riddles and not talk about them online and only use temporary chats which aren't used as training data and see what happens next. I'm sure I have a puzzle book in my library that I can use to help me make new ones.



Be careful with coaching. It's very easy to leak information. The point is to get it without letting it know.

As for o1, well I've been using this for a year and a few big players have used it too. So remember that they get spoiled because they end up in the training set


good point! problem is, I can't know what other people have spoiled it on either, so if we'd independently come up with the now spoiled "the boat can take all" variant, I can't know unless that gets revealed over Twitter or arvix or HN or wherever.


We won't know if it's spoiled, or rather how spoiled, it is unless the companies release their training data.

But, in this case we can study in a different way. Use things we are certain are spoiled. That's what the author here does.

But as an ML researcher, I'll let you know that I don't trust a single reasoning paper I've read.

You either have to start with the premise that the thing you're testing is in the training data (and thus spoiled), so you typically look at generalization and how robust it is. You can't prove reasoning this was but you can disprove this way. This also works for theory of mind (which is seems many HN readers failed to read the first paragraph).

The other way is you need to prove that the data isn't in training (for a strong condition you need to prove that it's not even indirectly in the data...). You still can't prove reasoning this way but you would build strong evidence that it is going on (proving reasoning is very tough, if possible). I think if this was shown, consistently, then most of the conversations about LLMs not reasoning would go away and we'd discuss like humans: capable of reasoning, but not necessarily always doing so.

But ML is in an existential crisis right now. Theory means nothing without experimentation but experimentation means nothing without theory. See von Neumann's elephant




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: