Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1. It isn't surprising to me that this happened in an advanced AI model. It seems hard to avoid in, as you say, "any sufficiently capable system".

2. It is a bit surprising to me that it happened in Claude. Without this result, I was unsure if current models had the situational awareness and non-myopia to reason about their training process.

3. There are some people who are unconcerned about the results of building vastly more powerful systems than current systems (i.e. AGI/ASI) who may be surprised by this result, since one reason people may be unconcerned is they feel like there's a general presumption that an AI will be good if we train it to be good.



Yeah. The whole notion that "AI will be good" is itself a category error, as if this could even be measured definitively.

https://x.com/mickeymuldoon/status/1859825564649128259


This is deeply confused nihilism. Humans are very bad at philosophy and moral inquiry, in an absolute sense, but neither are fields that are fundamentally impossible to make progress in.


I understand your point, and your totally valid concern about nihilism, but I disagree.

Nihilism is "nothing really matters, science can't define good or bad, so who cares?"

Whereas my view is, "Being good is the most important thing, but we have no conceivable way to measure if we are making progress, either empirically or theoretically. We simply have to lead by example, and fight the eternal battles against dishonesty, cowardice, sociopathy, deception, hatred, etc."

It's in that sense that I say that technical progress is impossible. Of course, if everyone agreed with my view, and lived by it, I'd consider that a form of progress, but only in the sense that "better ideas seem to be winning right now," rather than in any technical sense of absolute progress.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: