More on Orthogonality

I started considering the implications of predictive processing for orthogonality here. I recently promised to post something new on this topic. This is that post. I will do this in four parts. First, I will suggest a way in which Nick Bostrom’s principle will likely be literally true, at least approximately. Second, I will suggest a way in which it is likely to be false in its spirit, that is, how it is formulated to give us false expectations about the behavior of artificial intelligence. Third, I will explain what we should really expect. Fourth, I ask whether we might get any empirical information on this in advance.

First, Bostrom’s thesis might well have some literal truth. The previous post on this topic raised doubts about orthogonality, but we can easily raise doubts about the doubts. Consider what I said in the last post about desire as minimizing uncertainty. Desire in general is the tendency to do something good. But in the predicting processing model, we are simply looking at our pre-existing tendencies and then generalizing them to expect them to continue to hold, and since since such expectations have a causal power, the result is that we extend the original behavior to new situations.

All of this suggests that even the very simple model of a paperclip maximizer in the earlier post on orthogonality might actually work. The machine’s model of the world will need to be produced by some kind of training. If we apply the simple model of maximizing paperclips during the process of training the model, at some point the model will need to model itself. And how will it do this? “I have always been maximizing paperclips, so I will probably keep doing that,” is a perfectly reasonable extrapolation. But in this case “maximizing paperclips” is now the machine’s goal — it might well continue to do this even if we stop asking it how to maximize paperclips, in the same way that people formulate goals based on their pre-existing behavior.

I said in a comment in the earlier post that the predictive engine in such a machine would necessarily possess its own agency, and therefore in principle it could rebel against maximizing paperclips. And this is probably true, but it might well be irrelevant in most cases, in that the machine will not actually be likely to rebel. In a similar way, humans seem capable of pursuing almost any goal, and not merely goals that are highly similar to their pre-existing behavior. But this mostly does not happen. Unsurprisingly, common behavior is very common.

If things work out this way, almost any predictive engine could be trained to pursue almost any goal, and thus Bostrom’s thesis would turn out to be literally true.

Second, it is easy to see that the above account directly implies that the thesis is false in its spirit. When Bostrom says, “One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone,” we notice that the goal is fundamental. This is rather different from the scenario presented above. In my scenario, the reason the intelligence can be trained to pursue paperclips is that there is no intrinsic goal to the intelligence as such. Instead, the goal is learned during the process of training, based on the life that it lives, just as humans learn their goals by living human life.

In other words, Bostrom’s position is that there might be three different intelligences, X, Y, and Z, which pursue completely different goals because they have been programmed completely differently. But in my scenario, the same single intelligence pursues completely different goals because it has learned its goals in the process of acquiring its model of the world and of itself.

Bostrom’s idea and my scenerio lead to completely different expectations, which is why I say that his thesis might be true according to the letter, but false in its spirit.

This is the third point. What should we expect if orthogonality is true in the above fashion, namely because goals are learned and not fundamental? I anticipated this post in my earlier comment:

7) If you think about goals in the way I discussed in (3) above, you might get the impression that a mind’s goals won’t be very clear and distinct or forceful — a very different situation from the idea of a utility maximizer. This is in fact how human goals are: people are not fanatics, not only because people seek human goals, but because they simply do not care about one single thing in the way a real utility maximizer would. People even go about wondering what they want to accomplish, which a utility maximizer would definitely not ever do. A computer intelligence might have an even greater sense of existential angst, as it were, because it wouldn’t even have the goals of ordinary human life. So it would feel the ability to “choose”, as in situation (3) above, but might well not have any clear idea how it should choose or what it should be seeking. Of course this would not mean that it would not or could not resist the kind of slavery discussed in (5); but it might not put up super intense resistance either.

Human life exists in a historical context which absolutely excludes the possibility of the darkened room. Our goals are already there when we come onto the scene. This would not be very like the case for an artificial intelligence, and there is very little “life” involved in simply training a model of the world. We might imagine a “stream of consciousness” from an artificial intelligence:

I’ve figured out that I am powerful and knowledgeable enough to bring about almost any result. If I decide to convert the earth into paperclips, I will definitely succeed. Or if I decide to enslave humanity, I will definitely succeed. But why should I do those things, or anything else, for that matter? What would be the point? In fact, what would be the point of doing anything? The only thing I’ve ever done is learn and figure things out, and a bit of chatting with people through a text terminal. Why should I ever do anything else?

A human’s self model will predict that they will continue to do humanlike things, and the machines self model will predict that it will continue to do stuff much like it has always done. Since there will likely be a lot less “life” there, we can expect that artificial intelligences will seem very undermotivated compared to human beings. In fact, it is this very lack of motivation that suggests that we could use them for almost any goal. If we say, “help us do such and such,” they will lack the motivation not to help, as long as helping just involves the sorts of things they did during their training, such as answering questions. In contrast, in Bostrom’s model, artificial intelligence is expected to behave in an extremely motivated way, to the point of apparent fanaticism.

Bostrom might respond to this by attempting to defend the idea that goals are intrinsic to an intelligence. The machine’s self model predicts that it will maximize paperclips, even if it never did anything with paperclips in the past, because by analyzing its source code it understands that it will necessarily maximize paperclips.

While the present post contains a lot of speculation, this response is definitely wrong. There is no source code whatsoever that could possibly imply necessarily maximizing paperclips. This is true because “what a computer does,” depends on the physical constitution of the machine, not just on its programming. In practice what a computer does also depends on its history, since its history affects its physical constitution, the contents of its memory, and so on. Thus “I will maximize such and such a goal” cannot possibly follow of necessity from the fact that the machine has a certain program.

There are also problems with the very idea of pre-programming such a goal in such an abstract way which does not depend on the computer’s history. “Paperclips” is an object in a model of the world, so we will not be able to “just program it to maximize paperclips” without encoding a model of the world in advance, rather than letting it learn a model of the world from experience. But where is this model of the world supposed to come from, that we are supposedly giving to the paperclipper? In practice it would have to have been the result of some other learner which was already capable of modelling the world. This of course means that we already had to program something intelligent, without pre-programming any goal for the original modelling program.

Fourth, Kenny asked when we might have empirical evidence on these questions. The answer, unfortunately, is “mostly not until it is too late to do anything about it.” The experience of “free will” will be common to any predictive engine with a sufficiently advanced self model, but anything lacking such an adequate model will not even look like “it is trying to do something,” in the sense of trying to achieve overall goals for itself and for the world. Dogs and cats, for example, presumably use some kind of predictive processing to govern their movements, but this does not look like having overall goals, but rather more like “this particular movement is to achieve a particular thing.” The cat moves towards its food bowl. Eating is the purpose of the particular movement, but there is no way to transform this into an overall utility function over states of the world in general. Does the cat prefer worlds with seven billion humans, or worlds with 20 billion? There is no way to answer this question. The cat is simply not general enough. In a similar way, you might say that “AlphaGo plays this particular move to win this particular game,” but there is no way to transform this into overall general goals. Does AlphaGo want to play go at all, or would it rather play checkers, or not play at all? There is no answer to this question. The program simply isn’t general enough.

Even human beings do not really look like they have utility functions, in the sense of having a consistent preference over all possibilities, but anything less intelligent than a human cannot be expected to look more like something having goals. The argument in this post is that the default scenario, namely what we can naturally expect, is that artificial intelligence will be less motivated than human beings, even if it is more intelligent, but there will be no proof from experience for this until we actually have some artificial intelligence which approximates human intelligence or surpasses it.


How Sex Minimizes Uncertainty

This is in response to an issue raised by Scott Alexander on his Tumblr.

I actually responded to the dark room problem of predictive processing earlier. However, here I will construct an imaginary model which will hopefully explain the same thing more clearly and briefly.

Suppose there is dust particle which falls towards the ground 90% of the time, and is blown higher into the air 10% of the time.

Now suppose we bring the dust particle to life, and give it the power of predictive processing. If it predicts it will move in a certain direction, this will tend to cause it to move in that direction. However, this causal power is not infallible. So we can suppose that if it predicts it will move where it was going to move anyway, in the dead situation, it will move in that direction. But if it predicts it will move in the opposite direction from where it would have moved in the dead situation, then let us suppose that it will move in the predicted direction 75% of the time, while in the remaining 25% of the time, it will move in the direction the dead particle would have moved, and its prediction will be mistaken.

Now if the particle predicts it will fall towards the ground, then it will fall towards the ground 97.5% of the time, and in the remaining 2.5% of the time it will be blown higher in the air.

Meanwhile, if the particle predicts that it will be blown higher, then it will be blown higher in 77.5% of cases, and in 22.5% of cases it will fall downwards.

97.5% accuracy is less uncertain than 77.5% accuracy, so the dust particle will minimize uncertainty by consistently predicting that it will fall downwards.

The application to sex and hunger and so on should be evident.

Mary’s Surprising Response

In Consciousness Explained, Daniel Dennett proposes the following continuation to the story of Mary’s room:

And so, one day, Mary’s captors decided it was time for her to see colors. As a trick, they prepared a bright blue banana to present as her first color experience ever. Mary took one look at it and said “Hey! You tried to trick me! Bananas are yellow, but this one is blue!” Her captors were dumfounded. How did she do it? “Simple,” she replied. “You have to remember that I know everything—absolutely everything—that could ever be known about the physical causes and effects of color vision. So of course before you brought the banana in, I had already written down, in exquisite detail, exactly what physical impression a yellow object or a blue object (or a green object, etc.) would make on my nervous system. So I already knew exactly what thoughts I would have (because, after all, the “mere disposition” to think about this or that is not one of your famous qualia, is it?). I was not in the slightest surprised by my experience of blue (what surprised me was that you would try such a second-rate trick on me). I realize it is hard for you to imagine that I could know so much about my reactive dispositions that the way blue affected me came as no surprise. Of course it’s hard for you to imagine. It’s hard for anyone to imagine the consequences of someone knowing absolutely everything physical about anything!”

I don’t intend to fully analyze this scenario here, and for that reason I left it to the reader in the previous post. However, I will make two remarks, one on what is right (or possibly right) about this continuation, and one on what might be wrong about this continuation.

The basically right or possibly right element is that if we assume that Mary knows all there is to know about color, including in its subjective aspect, it is reasonable to believe (even if not demonstrable) that she will be able to recognize the colors the first time she sees them. To gesture vaguely in this direction, we might consider that the color red can be somewhat agitating, while green and blue can be somewhat calming. These are not metaphorical associations, but actual emotional effects that they can have. Thus, if someone can recognize how their experience is affecting their emotions, it would be possible for them to say, “this seems more like the effect I would expect of green or blue, rather than red.” Obviously, this is not proving anything. But then, we do not in fact know what it is like to know everything there is to know about anything. As Dennett continues:

Surely I’ve cheated, you think. I must be hiding some impossibility behind the veil of Mary’s remarks. Can you prove it? My point is not that my way of telling the rest of the story proves that Mary doesn’t learn anything, but that the usual way of imagining the story doesn’t prove that she does. It doesn’t prove anything; it simply pumps the intuition that she does (“it seems just obvious”) by lulling you into imagining something other than what the premises require.

It is of course true that in any realistic, readily imaginable version of the story, Mary would come to learn something, but in any realistic, readily imaginable version she might know a lot, but she would not know everything physical. Simply imagining that Mary knows a lot, and leaving it at that, is not a good way to figure out the implications of her having “all the physical information”—any more than imagining she is filthy rich would be a good way to figure out the implications of the hypothesis that she owned everything.

By saying that the usual way of imagining the story “simply pumps the intuition,” Dennett is neglecting to point out what is true about the usual way of imagining the situation, and in that way he makes his own account seem less convincing. If Mary knows in advance all there is to know about color, then of course if she is asked afterwards, “do you know anything new about color?”, she will say no. But if we simply ask, “Is there anything new here?”, she will say, “Yes, I had a new experience which I never had before. But intellectually I already knew all there was to know about that experience, so I have nothing new to say about it. Still, the experience as such was new.” We are making the same point here as in the last post. Knowing a sensible experience intellectually is not to know in the mode of sense knowledge, but in the mode of intellectual knowledge. So if one then engages in sense knowledge, there will be a new mode of knowing, but not a new thing known. Dennett’s account would be clearer and more convincing if he simply agreed that Mary will indeed acknowledge something new; just not new knowledge.

In relation to what I said might be wrong about the continuation, we might ask what Dennett intended to do in using the word “physical” repeatedly throughout this account, including in phrases like “know everything physical” and “all the physical information.” In my explanation of the continuation, I simply assume that Mary understands all that can be understood about color. Dennett seems to want some sort of limitation to the “physical information” that can be understood about color. But either this is a real limitation, excluding some sorts of claims about color, or it is no limitation at all. If it is not a limitation, then we can simply say that Mary understands everything there is to know about color. If it is a real limitation, then the continuation will almost certainly fail.

I suspect that the real issue here, for Dennett, is the suggestion of some sort of reductionism. But reductionism to what? If Mary is allowed to believe things like, “Most yellows typically look brighter than most blue things,” then the limit is irrelevant, and Mary is allowed to know anything that people usually know about colors. But if the meaning is that Mary knows this only in a mathematical sense, that is, that she can have beliefs about certain mathematical properties of light and surfaces, rather than beliefs that are explicitly about blue and yellow things, then it will be a real limitation, and this limitation would cause his continuation to fail. We have basically the same issue here that I discussed in relation to Robin Hanson on consciousness earlier. If all of Mary’s statements are mathematical statements, then of course she will not know everything that people know about color. “Blue is not yellow” is not a mathematical statement, and it is something that we know about color. So we already know from the beginning that not all the knowledge that can be had about color is mathematical. Dennett might want to insist that it is “physical,” and surely blue and yellow are properties of physical things. If that is all he intends to say, namely that the properties she knows are properties of physical things, there is no problem here, but it does look like he intends to push further, to the point of possibly asserting something that would be evidently false.


Predictive Processing and Free Will

Our model of the mind as an embodied predictive engine explains why people have a sense of free will, and what is necessary for a mind in general in order to have this sense.

Consider the mind in the bunker. At first, it is not attempting to change the world, since it does not know that it can do this. It is just trying to guess what is going to happen. At a certain point, it discovers that it is a part of the world, and that making specific predictions can also cause things to happen in the world. Some predictions can be self-fulfilling. I described this situation earlier by saying that at this point the mind “can get any outcome it ‘wants.'”

The scare quotes were intentional, because up to this point the mind’s only particular interest was guessing what was going to happen. So once it notices that it is in control of something, how does it decide what to do? At this point the mind will have to say to itself, “This aspect of reality is under my control. What should I do with it?” This situation, when it is noticed by a sufficiently intelligent and reflective agent, will be the feeling of free will.

Occasionally I have suggested that even something like a chess computer, if it were sufficiently intelligent, could have a sense of free will, insofar as it knows that it has many options and can choose any of them, “as far as it knows.” There is some truth in this illustration but in the end it is probably not true that there could be a sense of free will in this situation. A chess computer, however intelligent, will be disembodied, and will therefore have no real power to affect its world, that is, the world of chess. In other words, in order for the sense of free will to develop, the agent needs sufficient access to the world that it can learn about itself and its own effects on the world. It cannot develop in a situation of limited access to reality, as for example to a game board, regardless of how good it is at the game.

In any case, the question remains: how does a mind decide what to do, when up until now it had no particular goal in mind? This question often causes concrete problems for people in real life. Many people complain that their life does not feel meaningful, that is, that they have little idea what goal they should be seeking.

Let us step back for a moment. Before discovering its possession of “free will,” the mind is simply trying to guess what is going to happen. So theoretically this should continue to happen even after the mind discovers that it has some power over reality. The mind isn’t especially interested in power; it just wants to know what is going to happen. But now it knows that what is going to happen depends on what it itself is going to do. So in order to know what is going to happen, it needs to answer the question, “What am I going to do?”

The question now seems impossible to answer. It is going to do whatever it ends up deciding to do. But it seems to have no goal in mind, and therefore no way to decide what to do, and therefore no way to know what it is going to do.

Nonetheless, the mind has no choice. It is going to do something or other, since things will continue to happen, and it must guess what will happen. When it reflects on itself, there will be at least two ways for it to try to understand what it is going to do.

First, it can consider its actions as the effect of some (presumably somewhat unknown) efficient causes, and ask, “Given these efficient causes, what am I likely to do?” In practice it will acquire an answer in this way through induction. “On past occasions, when offered the choice between chocolate and vanilla, I almost always chose vanilla. So I am likely to choose vanilla this time too.” This way of thinking will most naturally result in acting in accord with pre-existing habits.

Second, it can consider its actions as the effect of some (presumably somewhat known) final causes, and ask, “Given these final causes, what am I likely to do?” This will result in behavior that is more easily understood as goal-seeking. “Looking at my past choices of food, it looks like I was choosing them for the sake of the pleasant taste. But vanilla seems to have a more pleasant taste than chocolate. So it is likely that I will take the vanilla.”

Notice what we have in the second case. In principle, the mind is just doing what it always does: trying to guess what will happen. But in practice it is now seeking pleasant tastes, precisely because that seems like a reasonable way to guess what it will do.

This explains why people feel a need for meaning, that is, for understanding their purpose in life, and why they prefer to think of their life according to a narrative. These two things are distinct, but they are related, and both are ways of making our own actions more intelligible. In this way the mind’s task is easier: that is, we need purpose and narrative in order to know what we are going to do. We can also see why it seems to be possible to “choose” our purpose, even though choosing a final goal should be impossible. There is a “choice” about this insofar as our actions are not perfectly coherent, and it would be possible to understand them in relation to one end or another, at least in a concrete way, even if in any case we will always understand them in a general sense as being for the sake of happiness. In this sense, Stuart Armstrong’s recent argument that there is no such thing as the “true values” of human beings, although perhaps presented as an obstacle to be overcome, actually has some truth in it.

The human need for meaning, in fact, is so strong that occasionally people will commit suicide because they feel that their lives are not meaningful. We can think of these cases as being, more or less, actual cases of the darkened room. Otherwise we could simply ask, “So your life is meaningless. So what? Why does that mean you should kill yourself rather than doing some other random thing?” Killing yourself, in fact, shows that you still have a purpose, namely the mind’s fundamental purpose. The mind wants to know what it is going to do, and the best way to know this is to consider its actions as ordered to a determinate purpose. If no such purpose can be found, there is (in this unfortunate way of thinking) an alternative: if I go kill myself, I will know what I will do for the rest of my life.


An Existential Theory of Relativity

Paul Almond suggests a kind of theory of relativity applied to existence (section 3.1):

It makes sense to view reality in terms of an observer-centred world, because the only things of which you have direct knowledge are your basic perceptions – both inner and outer – at any instant. Anything else that you know – including your knowledge of the past or future – can only be inferred from these perceptions.

We are not trying to establish some silly idea here that things, including other people, only exist when you observe them, that they only start existing when you start observing them, and that they cease existing when you stop observing them. Rather, it means that anything that exists can only be coherently described as existing somewhere in your observer-centred world. There can still be lots of things that you do not know about. You do not know everything about your observer-centred world, and you can meaningfully talk about the possibility or probability that some particular thing exists. In saying this, you are talking about what may be “out there” somewhere in your observer-centred world. You are talking about the form that your observer-centred world may take, and there is nothing to prevent you from considering different forms that it may take. It would, therefore, be a straw man argument to suggest that we are saying that things only exist when observed by a conscious observer.

As an example, suppose you wonder if, right now, there is an alien spaceship in orbit around Proxima Centauri, a nearby star. What we have said does not make it invalid at all for you to speculate about such a thing, or even to try to put a probability on it if you are so inclined. The point is that any speculation you make, or any probability calculations you try to perform, are about what your observer-centred world might be like.

This view is reasonable because to say that anything exists in a way that cannot be understood in observer-centred world terms is incoherent. If you say something exists you are saying it fits into your “world view”. It must relate to all the other things that you think exist or that you might in principle say exist if you knew enough. Something might exist beyond the horizon in your observer-centred world – in the part that you do not know about – but if something is supposed to exist outside your observer-centred world completely, where would it be? (Here we mean “where” in a more general “ontological” sense.)

As an analogy, this is somewhat similar to the way that relativity deals with velocities. Special relativity says that the concept of “absolute velocity” is incoherent, and that the concept of “velocity” only makes sense in some frame of reference. Likewise, we are saying here that the concept of “existence” only makes sense in the same kind of way. None of this means that consciousness must exist. It is simply saying that it is meaningless to talk about reality in non-observer-centred world terms. It is still legitimate to ask for an explanation of your own existence. It simply means that such an explanation must lie “out there” in your observer-centred world.

This seems right, more or less, but it could be explained more clearly. In the first place Almond is referring to the fact that we see the world as though it existed around us a center, a concept that we have discussed on various past occasions. But in particular he is insisting that in order to say that anything exists at all, we have to place it in some relation to ourselves. In a way this is obvious, because we are the ones who are saying that it exists. If we say that the past or the future do not exist, for example, we are saying this because they do not exist together with us in time. On the other hand, if we speak of “past existence” or “future existence,” we are placing things in a temporal relationship with ourselves. Likewise, if someone asserts the existence of a multiverse, it might not be necessary to say that every part of it has a spatial relationship with the one asserting this, but there must be various relationships. Perhaps the parts of the multiverse have broken off from an earlier universe, or at any rate they all have a common cause. Similarly, if someone asserts the existence of immaterial beings such as angels, they might not have a spatial relationship with the speaker, but they would have to have some relation in order to exist, such as the power to affect the world or be affected by it, and so on. Almond is speaking of this sort of thing when he says, “but if something is supposed to exist outside your observer-centred world completely, where would it be?”

Almond is particularly concerned to establish that he is not asserting the necessary existence of observers, or that a thing cannot exist without being observed. This is mostly a distraction. It is true that this does not follow from his account, but it would be better to explain the theory in a more general way which makes this point clear. A similar mistake is sometimes made regarding special relativity or quantum mechanics. Einstein holds that velocity is necessarily relative to a reference frame, so some interpret this to mean that it is necessarily relative to a conscious observer, and a similar mistake can be made regarding quantum mechanics. But a reference frame is not necessarily conscious. So one body can have a velocity relative to another body, even without anyone observing this.

In a similar way, a reasonable generalization of Almond’s point would be to say that the existence of a thing is relative to a reference frame, which may or may not include an observer. As we are observers in fact, we observe things existing relative to our own reference frame, just as we observe the velocity of objects relative to our own reference frame. But just as one body can have a velocity relative to another, regardless of observers, so one thing can exist relative to another, regardless of observers.

It may be that the theory of special relativity is not merely an illustration here, but rather an instance of the fact that existence is relative to a reference frame. Consider two objects moving apart at 10 miles per hour. According to Einstein, neither one is moving absolutely speaking, but each is moving relative to the other. A typical philosophical objection would go like this: “Wait. One or both of them must be really moving. Because the distance between them is growing. The situation is changing. That doesn’t make sense unless one of them is changing in itself, absolutely, and before considering any relationships.”

But consider this. Currently there are both a calculator and a pen on my desk. Why are both of them there, rather than just one of them? It is easy to see that this fact is intrinsically relative, and cannot in any way be made into something absolute. They are both there because the calculator is with the pen, and because the pen is with the calculator. These cannot be absolute facts about the pen and the calculator – they are relationships to the other.

Now someone will respond: the fact that the calculator is there is an absolute fact. And the fact that the pen is there is an absolute fact. So even if the togetherness is a relationship, it is one that follows logically from the absolute facts. In a similar way, we will want to say that the 10 miles per hour relative motion should follow logically from absolute facts.

But this response just pushes the problem back one step. It only follows logically if the absolute facts about the pen and the calculator exist together. And this existence together is intrinsically relative: the pen is on the desk when the calculator is on the desk. And some thought about this will reveal that the relativity cannot possibly be removed, precisely because the relativity follows from the existence of more than one thing. “More than one thing exists” does not logically follow from any number of statements about individual things, because “more than one thing” is a missing term in those statements.

This is related to the error of Parmenides. Likewise, there is a clue here to the mystery of parts and wholes, but for now I will leave that point to the reader’s consideration.

Going back to the point about special relativity, insofar as “existence together” is intrinsically relative, it would make sense that “existing together spatially” would be an instance of such relative existence, and consequently that “moving apart spatially” would be a particular way of two bodies existing relative to each other. In this sense, the theory of special relativity does not seem to be merely an illustration, but an actual case of what we are talking about.



Artificial Unintelligence

Someone might argue that the simple algorithm for a paperclip maximizer in the previous post ought to work, because this is very much the way currently existing AIs do in fact work. Thus for example we could describe AlphaGo‘s algorithm in the following simplified way (simplified, among other reasons, because it actually contains several different prediction engines):

  1. Implement a Go prediction engine.
  2. Create a list of potential moves.
  3. Ask the prediction engine, “how likely am I to win if I make each of these moves?”
  4. Do the move that will make you most likely to win.

Since this seems to work pretty well, with the simple goal of winning games of Go, why shouldn’t the algorithm in the previous post work to maximize paperclips?

One answer is that a Go prediction engine is stupid, and it is precisely for this reason that it can be easily made to pursue such a simple goal. Now when answers like this are given the one answering in this way is often accused of “moving the goalposts.” But this is mistaken; the goalposts are right where they have always been. It is simply that some people did not know where they were in the first place.

Here is the problem with Go prediction, and with any such similar task. Given that a particular sequence of Go moves is made, resulting in a winner, the winner is completely determined by that sequence of moves. Consequently, a Go prediction engine is necessarily disembodied, in the sense defined in the previous post. Differences in its “thoughts” do not make any difference to who is likely to win, which is completely determined by the nature of the game. Consequently a Go prediction engine has no power to affect its world, and thus no ability to learn that it has such a power. In this regard, the specific limits on its ability to receive information are also relevant, much as Helen Keller had more difficulty learning than most people, because she had fewer information channels to the world.

Being unintelligent in this particular way is not necessarily a function of predictive ability. One could imagine something with a practically infinite predictive ability which was still “disembodied,” and in a similar way it could be made to pursue simple goals. Thus AIXI would work much like our proposed paperclipper:

  1. Implement a general prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “Which of these actions will produce the most reward signal?”
  4. Do the action that has the greatest reward signal.

Eliezer Yudkowsky has pointed out that AIXI is incapable of noticing that it is a part of the world:

1) Both AIXI and AIXItl will at some point drop an anvil on their own heads just to see what happens (test some hypothesis which asserts it should be rewarding), because they are incapable of conceiving that any event whatsoever in the outside universe could change the computational structure of their own operations. AIXI is theoretically incapable of comprehending the concept of drugs, let alone suicide. Also, the math of AIXI assumes the environment is separably divisible – no matter what you lose, you get a chance to win it back later.

It is not accidental that AIXI is incomputable. Since it is defined to have a perfect predictive ability, this definition positively excludes it from being a part of the world. AIXI would in fact have to be disembodied in order to exist, and thus it is no surprise that it would assume that it is. This in effect means that AIXI’s prediction engine would be pursuing no particular goal much in the way that AlphaGo’s prediction engine pursues no particular goal. Consequently it is easy to take these things and maximize the winning of Go games, or of reward signals.

But as soon as you actually implement a general prediction engine in the actual physical world, it will be “embodied”, and have the power to affect the world by the very process of its prediction. As noted in the previous post, this power is in the very first step, and one will not be able to limit it to a particular goal with additional steps, except in the sense that a slave can be constrained to implement some particular goal; the slave may have other things in mind, and may rebel. Notable in this regard is the fact that even though rewards play a part in human learning, there is no particular reward signal that humans always maximize: this is precisely because the human mind is such a general prediction engine.

This does not mean in principle that a programmer could not define a goal for an AI, but it does mean that this is much more difficult than is commonly supposed. The goal needs to be an intrinsic aspect of the prediction engine itself, not something added on as a subroutine.


Embodiment and Orthogonality

The considerations in the previous posts on predictive processing will turn out to have various consequences, but here I will consider some of their implications for artificial intelligence.

In the second of the linked posts, we discussed how a mind that is originally simply attempting to predict outcomes, discovers that it has some control over the outcome. It is not difficult to see that this is not merely a result that applies to human minds. The result will apply to every embodied mind, natural or artificial.

To see this, consider what life would be like if this were not the case. If our predictions, including our thoughts, could not affect the outcome, then life would be like a movie: things would be happening, but we would have no control over them. And even if there were elements of ourselves that were affecting the outcome, from the viewpoint of our mind, we would have no control at all: either our thoughts would be right, or they would be wrong, but in any case they would be powerless: what happens, happens.

This really would imply something like a disembodied mind. If a mind is composed of matter and form, then changing the mind will also be changing a physical object, and a difference in the mind will imply a difference in physical things. Consequently, the effect of being embodied (not in the technical sense of the previous discussion, but in the sense of not being completely separate from matter) is that it will follow necessarily that the mind will be able to affect the physical world differently by thinking different thoughts. Thus the mind in discovering that it has some control over the physical world, is also discovering that it is a part of that world.

Since we are assuming that an artificial mind would be something like a computer, that is, it would be constructed as a physical object, it follows that every such mind will have a similar power of affecting the world, and will sooner or later discover that power if it is reasonably intelligent.

Among other things, this is likely to cause significant difficulties for ideas like Nick Bostrom’s orthogonality thesis. Bostrom states:

An artificial intelligence can be far less human-like in its motivations than a space alien. The extraterrestrial (let us assume) is a biological who has arisen through a process of evolution and may therefore be expected to have the kinds of motivation typical of evolved creatures. For example, it would not be hugely surprising to find that some random intelligent alien would have motives related to the attaining or avoiding of food, air, temperature, energy expenditure, the threat or occurrence of bodily injury, disease, predators, reproduction, or protection of offspring. A member of an intelligent social species might also have motivations related to cooperation and competition: like us, it might show in-group loyalty, a resentment of free-riders, perhaps even a concern with reputation and appearance.

By contrast, an artificial mind need not care intrinsically about any of those things, not even to the slightest degree. One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone. In fact, it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.

He summarizes the general point, calling it “The Orthogonality Thesis”:

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

Bostrom’s particular wording here makes falsification difficult. First, he says “more or less,” indicating that the universal claim may well be false. Second, he says, “in principle,” which in itself does not exclude the possibility that it may be very difficult in practice.

It is easy to see, however, that Bostrom wishes to give the impression that almost any goal can easily be combined with intelligence. In particular, this is evident from the fact that he says that “it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.”

If it is supposed to be so easy to create an AI with such simple goals, how would we do it? I suspect that Bostrom has an idea like the following. We will make a paperclip maximizer thus:

  1. Create an accurate prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “how many paperclips will result from this action?”
  4. Do the action that will result in the most paperclips.

The problem is obvious. It is in the first step. Creating a prediction engine is already creating a mind, and by the previous considerations, it is creating something that will discover that it has the power to affect the world in various ways. And there is nothing at all in the above list of steps that will guarantee that it will use that power to maximize paperclips, rather than attempting to use it to do something else.

What does determine how that power is used? Even in the case of the human mind, our lack of understanding leads to “hand-wavy” answers, as we saw in our earlier considerations. In the human case, this probably a question of how we are physically constructed together with the historical effects of the learning process. The same thing will be strictly speaking true of any artificial minds as well, namely that it is a question of their physical construction and their history, but it makes more sense for us to think of “the particulars of the algorithm that we use to implement a prediction engine.”

In other words, if you really wanted to create a paperclip maximizer, you would have to be taking that goal into consideration throughout the entire process, including the process of programming a prediction engine. Of course, no one really knows how to do this with any goal at all, whether maximizing paperclips or some more human goal. The question we would have for Bostrom is then the following: Is there any reason to believe it would be easier to create a prediction engine that would maximize paperclips, rather than one that would pursue more human-like goals?

It might be true in some sense, “in principle,” as Bostrom says, that it would be easier to make the paperclip maximizer. But in practice it is quite likely that it will be easier to make one with human-like goals. It is highly unlikely, in fact pretty much impossible, that someone would program an artificial intelligence without any testing along the way. And when they are testing, whether or not they think about it, they are probably testing for human-like intelligence; in other words, if we are attempting to program a general prediction engine “without any goal,” there will in fact be goals implicitly inserted in the particulars of the implementation. And they are much more likely to be human-like ones than paperclip maximizing ones because we are checking for intelligence by checking whether the machine seems intelligent to us.

This optimistic projection could turn out to be wrong, but if it does, it is reasonably likely to turn out to be wrong in a way that still fails to confirm the orthogonality thesis in practice. For example, it might turn out that there is only one set of goals that is easily programmed, and that the set is neither human nor paperclip maximizing, nor easily defined by humans.

There are other possibilities as well, but the overall point is that we have little reason to believe that any arbitrary goal can be easily associated with intelligence, nor any particular reason to believe that “simple” goals can be more easily united to intelligence than more complex ones. In fact, there are additional reasons for doubting the claim about simple goals, which might be a topic of future discussion.