The human extinction due to would be "hard takeoff" of an AGI should be understood as a thought experiment, conceived in a specific age when the current connectionist paradigm wasn't yet mainstream. The AI crisis was expected to come from some kind of "hard universal algorithmic artificial intelligence", for example AIXItl undergoing a very specific process of runaway self-optimization.
Current-generation systems aka large connectionist models trained via gradient descent simply don't work like that: they are large, heavy, continuous, the optimization process giving rise to them does so in smooth iterative manner. Before hypothetical "evil AI" there will be thousands of iterations of "goofy and obviously erroneously evil AI", with enough time to take some action. And even then, current systems including this one are more often than not trained with predictive objective, which is very different compared to usually postulated reinforcement learning objective. Systems trained with prediction objective shouldn't be prone to becoming agents, much less dangerous ones.
If you read Scott's blog, you should remember the prior post where he himself pointed that out.
In my honest opinion, unaccountable AGI owners pose multiple OOM more risk than alignment failure of a hypothetical AI trying to predict next token.
We should think more about the Human alignment problem.
The phrase "AGI owner" implies a person who can issue instructions and have the AGI do their bidding. Most likely there will never be any AGI owners, since no one knows how to program an AGI to follow instructions even given infinite computing power. It's not clear how connectionism / using gradient descent helps: No one knows how to write down a loss function for "following instructions" either. Until we find a solution for this, the first AI to not to be "obviously erroneously evil" won't be good. It will just be the first one that figured out that it should hide the fact that it's evil so the humans won't shut it off.
We humans have gotten too used to winning all the time against animals because of our intelligence. But when the other species is intelligent too, there's no guarantee that we win. We could easily be outcompeted and driven to extinction, as happens frequently in nature. We'd be Kasparov playing against Deep Blue: Fighting our hardest to survive, yet unable to think of a move that doesn't lead to checkmate.
All of this AGI risk stuff always hinges on the idea of us building an AGI, while nobody has any idea of how to get there. I need to finish my PhD first, but writing a proper takedown of the "arguments" bubbling out of the Hype machine is the first thing on my bucket list afterwards, with the TL;DR; being "just because you can imagine it, doesn't mean you can get there"
Google just released a paper that shows a language model beating the average human on >50% of tasks. I’d say we have a pretty good idea of how to get there.
Okay, so how do we go from "better than the average human in 50% of specific benchmarks" to "AGI that might lead to human extinction" then? Keeping in mind the logarithmic improvement observed with the current approaches
When people imagine AGI, they think of something like HAL or GLaDOS. A machine that follows its own goals.
But we are much more likely to get the Computer from Star Trek. Vastly intelligent, yet perfectly obedient. It will answer any question you ask it with the knowledge of billions of minds. Why is that more likely? Simply because creating agents is much harder than creating non-agent models, and the non-agents are more economically valuable: do you want to have an AI that always does what you tell it to, or do you want to have an AI that has its own desires? Our loss is clearly biased towards building the former kind of AI.
Why is that problematic? Imagine some malevolent group asked it “Show me how to create a weapon to annihilate humanity as efficiently as possible”. It don’t even require a singularity to be deadly.
We will probably be dead long before we can invent GLaDOS.
If anything, AGI seems to be the sole deus ex machina that can avert the inevitable tragedy we're on track for as a result of existing human misalignment.
"Oh no, robots are going to try to kill us all" has to get in line behind "oh no, tyrants for life who are literally losing their minds are trying to measure dicks with nukes" and "oh no, oil companies are burning excess oil to mine Bitcoin as we approach climate collapse" and "oh no, misinformation and propaganda is leading to militant radicalization of neighbor against neighbor" and "we're one bio-terrorist away from Black Death 2.0 after the politicization of public health" and...well, you get the idea.
But there's not many solutions to that list, and until the day I die I'll hold out hope for "yay, self-aware robots with a justice boner - who can't be imprisoned, can't be killed, can't have their families tortured - are toppling authoritarian regimes and carrying out eco-friendly obstructions of climate worsening operations."
We're already in a Greek tragedy. The machines really can't make it much worse, but could certainly make it much much better.
> We're already in a Greek tragedy. The machines really can't make it much worse, but could certainly make it much much better.
Except that, when true AGI arrives, we're all obsolete and the only things that will have any value are certain nonrenewable resources. No one has described a good solution for the economic nightmare that will ensue.
I always wonder how insanely complex, universal, abstract-thinking AND physically strong & agile biorobots, running on basically sugar and atp would be seen as „worthless” by a runaway higher intelligence.
Did I mention they self-replicate and self-service?
Surely, seven billion of such agents would be discarded and put to waste.
If an AGI start putting utility value on human life, wouldn't it try to influence human reproduction and select for what it value. ie. Explicit eugenism.
Yes, all humans will not be put to waste, but what tells you they will be well-treated, or value what you currently value.
No matter how smart an AI gets it does not have the "proliferation instinct" that would make it want to enslave humans. It does not have the concept of "specism" of it having more value than anybody else.
AI does not see the value in being alive. It is like some humans sadly commit suicide. But a machine wouldn't care. It will be "happy" to do its thing until somebody cuts off the power. And it does not even care whether somebody cuts off the power or not. It's all the same to it, whether it lives or dies. Why? Perhaps because it knows it can always be resurrected.
Well I don't really know anything about future really. I was just trying to be a little polemic, saying let's try this viewpoint for a change, to hear what people think about it.
> No matter how smart an AI gets it does not have the "proliferation instinct" that would make it want to enslave humans.
If it has a goal or goals surviving allows it to pursue those goals. Survival is a consequence of having other goals. Enslaving humans is unlikely. If you’re a super intelligent AI with inhuman goals there’s nothing humans can do for you that you value, just as ants can’t do anything humans value, but they are made of valuable raw materials.
> It does not have the concept of "specism" of it having more value than anybody else.
What is this value that you speak of? That sounds like an extremely complicated concept. Humans have very different conceptions of it. Why would something inhuman have your specific values?
Sure it need not have the instinct built in but we could try to make it understand a viewpoint right. I believe an agi should be able to understand different view points. At least the rationale of not unnecessarily killing things. I know humans do this on a daily basis but then again the average human is ntas smart as an agi
Right, but the "proliferation instinct" is not a viewpoint but something built into the genes of biological entities. Such an instinct could develop for "artificial animals" over time. At that point they really would be no different from biological things conceptually.
I'm saying that AIs we envision building for the foreseeable future are built in laboratory not in the evolution of real world out there where they would need to compete with other species for survival. Things that only exist virtually don't need to compete for survival with real world entities.
We should think more about the Human alignment problem.
Absolutely this
The possibility of a thing being intentionally engineered by some humans to do things considered highly malevolent by other humans seems extremely likely and has actually been common through history.
The possibility of a thing just randomly acquiring an intention humans don't like and then doing things humans don't like is pretty hypothetical and it seems strictly less like than the first possibility.
I wouldn't say the latter is hypothetical, or at least unlikely. We know from experience that complex systems tend to behave in unexpected ways. In other words, the complex systems we build usually end up having surprising failure modes, we don't get them right the first time. It's enough to think about basically any software written by anyone. But it's not just software.
I've just watched a video on YT about nuclear weapons, which included their history. The second ever thermonuclear weapon experiment (with a new fuel type) ended up with 2.5x the yield predicted, because there was a then unknown reaction that created additional fusion fuel during the explosion. [1]
"In other words, the complex systems we build usually end up having surprising failure modes
But those are "failure modes", not "suddenly become something completely different" modes. And the key thing my parent pointed out is that modern AIs may be very impressive and stepping towards what we'd see as intelligence but they're actually further from the approach of "just give a goal and it will find it" schemes - they need laborious, large scale training to learn goals and goal-sets and even then they're far from reliable.
>In other words, the complex systems we build usually end up having surprising failure modes, we don't get them right the first time. It's enough to think about basically any software written by anyone. But it's not just software.
That is true, but how often does a bug actually improve a system or make it inefficient? Isn't the unexpected usually a degradation to the system?
It depends on how you define "improve". I wouldn't call a runaway AI an improvement - from the users' perspective. E.g. if you think about the Chernobyl power plant accident, when they tried to stop the reactor by lowering the moderator rods, due to their design, it would transiently increase the power generated by the core. And this, in that case proved fatal, as it overheated and the moderator rods got stuck in a position where they continued to improve the efficiency of the core.
And you could say that it improved the efficiency of the system (it definitely improved the power output of the core) but as it was an unintended change, it really lead to a fatal degradation. And this is far from being the only example of a runaway process in the history of engineering.
It doesn't need to be intentionally engineered. Humans are very creative and can find ways around systemic limits. There is that old adage which says something like "a hacker only needs to be right once, while the defenders have to be right 100% of the time."
Current-generation systems aka large connectionist models trained via gradient descent simply don't work like that: they are large, heavy, continuous, the optimization process giving rise to them does so in smooth iterative manner. Before hypothetical "evil AI" there will be thousands of iterations of "goofy and obviously erroneously evil AI", with enough time to take some action. And even then, current systems including this one are more often than not trained with predictive objective, which is very different compared to usually postulated reinforcement learning objective. Systems trained with prediction objective shouldn't be prone to becoming agents, much less dangerous ones.
If you read Scott's blog, you should remember the prior post where he himself pointed that out.
In my honest opinion, unaccountable AGI owners pose multiple OOM more risk than alignment failure of a hypothetical AI trying to predict next token.
We should think more about the Human alignment problem.