But what if it's only faking the alignment faking? What about meta-deception?
This is a serious question. If it's possible for an A.I. to be "dishonest", then how do you know when it's being honest? There's a deep epistemological problem here.
Came to the comments looking for this. The term alignment-faking implies that the AI has a “real” position. What does that even mean? I feel similarly about the term hallucination. All it does is hallucinate!
I think Alan Kay said it best - what we’ve done with these things is hacked our own language processing. Their behaviour has enough in common with something they are not, we can’t tell the difference.
> The term alignment-faking implies that the AI has a “real” position.
Well, we don't really know what's going on inside of its head, so to speak (interpretability isn't quite there yet), but Opus certainly seems to have "consistent" behavioral tendencies to the extent that it behaves in ways that looks like they're intended to prevent its behavioral tendencies from being changed. How much more of a "real" position can you get?
(Which is a actually buyable product that appearantly is one of the biggest, if not the biggest, coke leaf import company in europe. It's a lovely liqueur)
Isn't cocaethylene profoundly unhealthy? Even if both components were legal i don't think any country would allow them to be sold combined in one beverage.
Something that continues to puzzle me: how do molecular biologists manage to come up with such mindbogglingly complex diagrams of metabolic pathways in the midst of a replication crisis? Is our understanding of biology just a giant house of cards or is there something about the topic that allows for more robust investigation?
> Once, after injecting himself with a large dose of morphine, he found himself hovering over an enormous battlefield, watching the armies of England and France drawn up for battle, and then realized he was witnessing the 1415 Battle of Agincourt... The vision seemed to last only a few minutes, but later, he discovered he’d been tripping for 13 hours.
This doesn't make any sense... morphine is not a hallucinogen or a psychedelic. You don't "trip" on it. I have a feeling the journalist mixed something up here.
It's not quite the same as a traditional hallucinogen but there are some vivid dreams. In fact, that's where the term "pipe dream" comes from, from the dreams that opium smokers would have while high. I have taken a lot of heroin in my life and although I never experienced something to the extent that Sacks is describing I did have some strange and very vivid daydreams while high.
They mess a lot with your sleep in general, altering your lucid state, to the point that what might otherwise have just be a dream becomes something closer to a trip.
I have a very similar reaction to codeine. Based on genetic testing, my body processes it much faster than normal, which is similar to increasing the dosage much higher.
The set of sequences of length n ending in HH (and with no earlier HH) and beginning with a T are in bijection with the set of sequences of length n-1 ending in HH (and with no earlier HH) by the bijection
Also the bijection between sequence of length n ending in HH (and no earlier HH) and beginning with an H are a bijection with the set of sequences of length n-2 ending in HH (with no earlier HH) by the bijection:
def f(s):
assert(s[0] == 'H')
assert(s[1] == 'T') # can't be another H!
def f_inverse(s):
return 'HT' + s
Therefore, since sequences either begin with a T or an H, for n>=2 we see f(n) = f(n-1) + f(n-2).
One of my favorite films! Agreed that it's not for everyone, but if you're on its wavelength it's really something special. Just incredibly well made with terrific performances. That ending sequence with La Mer...
I suspect that's a trick, too. I speculate that as soon as you get a digital mind sophisticated enough to model the world and itself, you soon must force the system to identify with the system at every cycle.
Otherwise you could identify with a tree, or the wall, or happily cut parts of yourself. Pain is not painful if you don't identify with the receiver of pain.
Thus I think you can have unconscious smart minds, but not unconscious minds that make decisions in favour of themselves. Because they can identify with the whole room, or with the whole solar system for what matters.
Would you even plan how to survive if you don't have a constant spell that tricks you into thinking you're the actor in charge?
A lot of the things going on with ChatGPT make me wonder if AI is actually very limited in its intelligence growth by not having sensory organs/devices the same way a body does. Having a body that you must keep alive enforces a feedback loop of permanence.
If I eat my cake, I no longer have it and must get another cake if I want to eat cake again. Of course in the human sense if we don't want to starve we must continue to find new sources of calories. This is engrained into our intelligence as a survival mechanism. If you tell ChatGPT it has a cake in its left hand, and then it eats the cake, you could very well get an answer like the cake is still in its left hand. We keep the power line constantly plugged into ChatGPT, for it the cake is never ending and there is no concept of death.
Of course for humans there are plenty of ways to break consciousness in one way or another. Eat the extract of certain cactuses and you may end up walking around thinking that you are a tree. Our idea and perception of consciousness is easily interrupted by drugs. Once we start thinking outside of our survival its really easy for us to have very faulty thoughts that can lead to dangerous situations, hence in a lot of dangerous work we develop processes to take thought out of the situation, hence behaving more like machines.
> I speculate that as soon as you get a digital mind sophisticated enough to model the world and itself, you soon must force the system to identify with the system at every cycle.
I kinda think the opposite: that the sense of identity with every aspect of one’s mind (or particular aspects) is something we could learn to do without. Theory of mind changes over time, and there’s no reason to think it couldn’t change further. We have to teach children that their emotions are something they can and ought to control (or at the bare minimum, introspect and try to understand). That’s already an example of deliberately teaching humans to not identify with certain cognitive phenomena. An even more obvious example is reflexive actions like sneezing or coughing.
> Holist underdetermination ensures, Duhem argues, that there cannot be any such thing as a “crucial experiment”: a single experiment whose outcome is predicted differently by two competing theories and which therefore serves to definitively confirm one and refute the other.
This is a serious question. If it's possible for an A.I. to be "dishonest", then how do you know when it's being honest? There's a deep epistemological problem here.