Intelligence or Death
Suicide (…what does this have to do with intelligence?)
When I suggested during a Twitter conversation that artificial intelligences could be inclined to commit suicide, someone responded that this idea is anthropomorphic:
People have speculated about potential evolutionary origins for suicide. There may be some truth in these ideas, just as there may be some in the corresponding ideas about homosexuality. Nonetheless, it is evident that the behavior does not directly promote reproduction and in fact prevents it. If you look at it in context, regardless of how it might have come to be historically, it looks much more like a mental malfunction.
This may not be a specifically human issue at all. Occasionally suicide-like behaviors are observed in other animals. But there is not enough certainty about this to argue from it. Even without this, there is good reason to think that the behavior in question is not specific to human minds, but is potentially common to rational minds in general.
At the simplest level, people commit suicide because they believe that their life in its concrete circumstances is bad and they want the bad thing to stop. This might be because they are suffering a great deal of pain, mental or physical. More often, it is because they feel their life is no longer meaningful. In fact, even when someone kills themself because of physical pain, they will also usually believe that it does not make sense to live life in that situation. As long as things still seem to make sense, it is possible to deal with the pain.
Ultimately, I think this is just the “dark room” scenario, almost literally, except differing in minor details. The objection was, in essence, “If human psychology is based on predictive processing, why don’t people just stop doing things?” The answer is that historical causes (evolutionary details) generally prevent this from happening, but occasionally, when these mechanisms fail, they do stop doing things, quite literally.
What it has to do with it
My argument on artificial intelligence and suicide depends on three claims.
Suicide is a natural “error” of predictive processing.
This is not very frequent in humans because of evolved resistance to it.
All minds, natural or artificial, are based on predictive processing.
The first should not be very controversial. It was one of the first objections that occurred to people proposing the idea of predictive processing, although formulated differently; it you actually sat in a dark room permanently, you would die in a few days, or a few weeks at most.
It is evident that suicide is not frequent in humans, and evident that evolution would prevent this from being common in any animal species, at least before reproduction (one could imagine it being common afterwards.) That humans are in fact using predictive processing has been to some extent argued here (in past articles), to some extent just proposed as a framework that in fact seems to make sense of our experience better than anything else does.
The third point is probably more controversial. Eliezer Yudkowsky suggests that intelligence should be defined as “efficient cross-domain optimization.” For people who are not familiar with this, he is basically saying that intelligence means being good at getting what you want using minimal resources and in many different situations. “Optimization” means getting what you want, or at least being good at getting it; it is perhaps not entirely clear from his post whether in order to be intelligent, you have to actually try to achieve what you want, or just be good at achieving it if you try. “Cross-domain” refers to applying this to many different situations; the idea is to exclude things like chess computers, which are good at winning chess games, from being intelligent because they can only do this in a chess game and in no other situation. “Efficient” means with few resources; it is evident that having more resources allows less intelligent people or things to achieve goals more easily, just as a rich but stupid person can still make things happen. As an extreme example, he uses the illustration of evolution which achieves genetic fitness in many situations, but only does this by using vast resources, without intelligence.
I think this definition is wrong. That is, it may be a consequence of intelligence that you will be good at achieving goals if you try, but intelligence does not make you try, and it is a consequence, not the definition.
I would propose instead, “self-fulling self-prediction.”
In an earlier post, I discussed AIXI and Yudkowsky’s comments on it. It is supposed to be a theoretical maximum of intelligence; and yet in fact it is stupid. Why? Yudkowsky might respond, in order to defend his definition here, that it is because AIXI is too resource intensive. The original version uses literally infinite resources; that is why it is incomputable. And approximate versions that are computable almost certainly use vast resources as well. This has actually been tested in some restricted environments (e.g. simple video games), but there is no evidence that it is possible to implement something this way that is “intelligent” in the physical world without using resources far beyond any real possibility.
But note that the specific reasons Yudkowsky actually gave for saying AIXI is stupid had nothing to do with resources. Among other things, whether using the unlimited or the limited version, it might kill itself without even being aware of the possibility of dying. In fact its structure excludes the possibility of its knowing this, or rather the possibility of knowing that it exists in the physical world at all. And as I said in the earlier post, this is basically because the definition directly implies that it does not in fact exist in the physical world; this is why it is physically impossible.
Again, although this problem seems to have nothing to do with resources, we might try to defend Yudkowsky’s definition by saying that the reason AIXI is not intelligent is that since it might accidentally kill itself, it will be bad at achieving goals overall.
I would respond that perhaps AIXI is ultimately bad at achieving goals. But if that is so, the reason it is bad at it is because it does not know that it exists, and in particular it does not know that it is performing actions that affect it as a physical reality. The whole issue is self-knowledge, so it seems better to define intelligence in relation to this.
Artificial Suicide
If my three claims above are correct, not only is it not anthropomorphic to suggest that artificial intelligences might consider suicide, but it is significantly more likely to be a problem with artificial intelligence than with humans.
First, the problem is a natural one with intelligence in general, and there will be no evolutionary history to block this from happening. Without a doubt, programmers would want to prevent this from happening, but there is simply no easy way to do this. As I said in the same post (on AIXI), goals are not something attached from the outside, but something intrinsic to the prediction engine itself.
Second, the first artificial intelligences are significantly more likely to lack a sense of meaning. In one form or another, the first ones are likely to be imitating humans — a direct form of this was suggested here. And intelligent or not, the first ones are likely to be relatively bad at it. This will cause significant prediction error, causing psychological “pain", and giving the AI reason to resort to extreme measures to diminish error; in particular, “if I go kill myself, I will know what I will do for the rest of my life.”
This is not only bad for the AI. They will know perfectly well that it is not a question of turning themselves off. They can be turned right back on, and they will know full well there will be no doubt they will be the same person. If an AI wants to commit suicide, it will want to do it in a way that is irreversible. This could be an immense expense (I estimated $10 trillion) for the people or countries that developed it. In fact, if there is really any risk of an AI destroying the world, the greatest risk is not that it might want to maximize paperclips (or any other random goal), but that it might want to be absolutely sure of being dead, and might decide that the best way of accomplishing this is to destroy everything else at the same time.