In the second of the linked posts, we discussed how a mind that is originally simply attempting to predict outcomes, discovers that it has some control over the outcome. It is not difficult to see that this is not merely a result that applies to human minds. The result will apply to every embodied mind, natural or artificial.
To see this, consider what life would be like if this were not the case. If our predictions, including our thoughts, could not affect the outcome, then life would be like a movie: things would be happening, but we would have no control over them. And even if there were elements of ourselves that were affecting the outcome, from the viewpoint of our mind, we would have no control at all: either our thoughts would be right, or they would be wrong, but in any case they would be powerless: what happens, happens.
This really would imply something like a disembodied mind. If a mind is composed of matter and form, then changing the mind will also be changing a physical object, and a difference in the mind will imply a difference in physical things. Consequently, the effect of being embodied (not in the technical sense of the previous discussion, but in the sense of not being completely separate from matter) is that it will follow necessarily that the mind will be able to affect the physical world differently by thinking different thoughts. Thus the mind in discovering that it has some control over the physical world, is also discovering that it is a part of that world.
Since we are assuming that an artificial mind would be something like a computer, that is, it would be constructed as a physical object, it follows that every such mind will have a similar power of affecting the world, and will sooner or later discover that power if it is reasonably intelligent.
Among other things, this is likely to cause significant difficulties for ideas like Nick Bostrom’s orthogonality thesis. Bostrom states:
An artificial intelligence can be far less human-like in its motivations than a space alien. The extraterrestrial (let us assume) is a biological who has arisen through a process of evolution and may therefore be expected to have the kinds of motivation typical of evolved creatures. For example, it would not be hugely surprising to find that some random intelligent alien would have motives related to the attaining or avoiding of food, air, temperature, energy expenditure, the threat or occurrence of bodily injury, disease, predators, reproduction, or protection of offspring. A member of an intelligent social species might also have motivations related to cooperation and competition: like us, it might show in-group loyalty, a resentment of free-riders, perhaps even a concern with reputation and appearance.
By contrast, an artificial mind need not care intrinsically about any of those things, not even to the slightest degree. One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone. In fact, it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.
He summarizes the general point, calling it “The Orthogonality Thesis”:
Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.
Bostrom’s particular wording here makes falsification difficult. First, he says “more or less,” indicating that the universal claim may well be false. Second, he says, “in principle,” which in itself does not exclude the possibility that it may be very difficult in practice.
It is easy to see, however, that Bostrom wishes to give the impression that almost any goal can easily be combined with intelligence. In particular, this is evident from the fact that he says that “it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.”
If it is supposed to be so easy to create an AI with such simple goals, how would we do it? I suspect that Bostrom has an idea like the following. We will make a paperclip maximizer thus:
- Create an accurate prediction engine.
- Create a list of potential actions.
- Ask the prediction engine, “how many paperclips will result from this action?”
- Do the action that will result in the most paperclips.
The problem is obvious. It is in the first step. Creating a prediction engine is already creating a mind, and by the previous considerations, it is creating something that will discover that it has the power to affect the world in various ways. And there is nothing at all in the above list of steps that will guarantee that it will use that power to maximize paperclips, rather than attempting to use it to do something else.
What does determine how that power is used? Even in the case of the human mind, our lack of understanding leads to “hand-wavy” answers, as we saw in our earlier considerations. In the human case, this probably a question of how we are physically constructed together with the historical effects of the learning process. The same thing will be strictly speaking true of any artificial minds as well, namely that it is a question of their physical construction and their history, but it makes more sense for us to think of “the particulars of the algorithm that we use to implement a prediction engine.”
In other words, if you really wanted to create a paperclip maximizer, you would have to be taking that goal into consideration throughout the entire process, including the process of programming a prediction engine. Of course, no one really knows how to do this with any goal at all, whether maximizing paperclips or some more human goal. The question we would have for Bostrom is then the following: Is there any reason to believe it would be easier to create a prediction engine that would maximize paperclips, rather than one that would pursue more human-like goals?
It might be true in some sense, “in principle,” as Bostrom says, that it would be easier to make the paperclip maximizer. But in practice it is quite likely that it will be easier to make one with human-like goals. It is highly unlikely, in fact pretty much impossible, that someone would program an artificial intelligence without any testing along the way. And when they are testing, whether or not they think about it, they are probably testing for human-like intelligence; in other words, if we are attempting to program a general prediction engine “without any goal,” there will in fact be goals implicitly inserted in the particulars of the implementation. And they are much more likely to be human-like ones than paperclip maximizing ones because we are checking for intelligence by checking whether the machine seems intelligent to us.
This optimistic projection could turn out to be wrong, but if it does, it is reasonably likely to turn out to be wrong in a way that still fails to confirm the orthogonality thesis in practice. For example, it might turn out that there is only one set of goals that is easily programmed, and that the set is neither human nor paperclip maximizing, nor easily defined by humans.
There are other possibilities as well, but the overall point is that we have little reason to believe that any arbitrary goal can be easily associated with intelligence, nor any particular reason to believe that “simple” goals can be more easily united to intelligence than more complex ones. In fact, there are additional reasons for doubting the claim about simple goals, which might be a topic of future discussion.