Artificial Unintelligence

Someone might argue that the simple algorithm for a paperclip maximizer in the previous post ought to work, because this is very much the way currently existing AIs do in fact work. Thus for example we could describe AlphaGo‘s algorithm in the following simplified way (simplified, among other reasons, because it actually contains several different prediction engines):

  1. Implement a Go prediction engine.
  2. Create a list of potential moves.
  3. Ask the prediction engine, “how likely am I to win if I make each of these moves?”
  4. Do the move that will make you most likely to win.

Since this seems to work pretty well, with the simple goal of winning games of Go, why shouldn’t the algorithm in the previous post work to maximize paperclips?

One answer is that a Go prediction engine is stupid, and it is precisely for this reason that it can be easily made to pursue such a simple goal. Now when answers like this are given the one answering in this way is often accused of “moving the goalposts.” But this is mistaken; the goalposts are right where they have always been. It is simply that some people did not know where they were in the first place.

Here is the problem with Go prediction, and with any such similar task. Given that a particular sequence of Go moves is made, resulting in a winner, the winner is completely determined by that sequence of moves. Consequently, a Go prediction engine is necessarily disembodied, in the sense defined in the previous post. Differences in its “thoughts” do not make any difference to who is likely to win, which is completely determined by the nature of the game. Consequently a Go prediction engine has no power to affect its world, and thus no ability to learn that it has such a power. In this regard, the specific limits on its ability to receive information are also relevant, much as Helen Keller had more difficulty learning than most people, because she had fewer information channels to the world.

Being unintelligent in this particular way is not necessarily a function of predictive ability. One could imagine something with a practically infinite predictive ability which was still “disembodied,” and in a similar way it could be made to pursue simple goals. Thus AIXI would work much like our proposed paperclipper:

  1. Implement a general prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “Which of these actions will produce the most reward signal?”
  4. Do the action that has the greatest reward signal.

Eliezer Yudkowsky has pointed out that AIXI is incapable of noticing that it is a part of the world:

1) Both AIXI and AIXItl will at some point drop an anvil on their own heads just to see what happens (test some hypothesis which asserts it should be rewarding), because they are incapable of conceiving that any event whatsoever in the outside universe could change the computational structure of their own operations. AIXI is theoretically incapable of comprehending the concept of drugs, let alone suicide. Also, the math of AIXI assumes the environment is separably divisible – no matter what you lose, you get a chance to win it back later.

It is not accidental that AIXI is incomputable. Since it is defined to have a perfect predictive ability, this definition positively excludes it from being a part of the world. AIXI would in fact have to be disembodied in order to exist, and thus it is no surprise that it would assume that it is. This in effect means that AIXI’s prediction engine would be pursuing no particular goal much in the way that AlphaGo’s prediction engine pursues no particular goal. Consequently it is easy to take these things and maximize the winning of Go games, or of reward signals.

But as soon as you actually implement a general prediction engine in the actual physical world, it will be “embodied”, and have the power to affect the world by the very process of its prediction. As noted in the previous post, this power is in the very first step, and one will not be able to limit it to a particular goal with additional steps, except in the sense that a slave can be constrained to implement some particular goal; the slave may have other things in mind, and may rebel. Notable in this regard is the fact that even though rewards play a part in human learning, there is no particular reward signal that humans always maximize: this is precisely because the human mind is such a general prediction engine.

This does not mean in principle that a programmer could not define a goal for an AI, but it does mean that this is much more difficult than is commonly supposed. The goal needs to be an intrinsic aspect of the prediction engine itself, not something added on as a subroutine.

Advertisements

Embodiment and Orthogonality

The considerations in the previous posts on predictive processing will turn out to have various consequences, but here I will consider some of their implications for artificial intelligence.

In the second of the linked posts, we discussed how a mind that is originally simply attempting to predict outcomes, discovers that it has some control over the outcome. It is not difficult to see that this is not merely a result that applies to human minds. The result will apply to every embodied mind, natural or artificial.

To see this, consider what life would be like if this were not the case. If our predictions, including our thoughts, could not affect the outcome, then life would be like a movie: things would be happening, but we would have no control over them. And even if there were elements of ourselves that were affecting the outcome, from the viewpoint of our mind, we would have no control at all: either our thoughts would be right, or they would be wrong, but in any case they would be powerless: what happens, happens.

This really would imply something like a disembodied mind. If a mind is composed of matter and form, then changing the mind will also be changing a physical object, and a difference in the mind will imply a difference in physical things. Consequently, the effect of being embodied (not in the technical sense of the previous discussion, but in the sense of not being completely separate from matter) is that it will follow necessarily that the mind will be able to affect the physical world differently by thinking different thoughts. Thus the mind in discovering that it has some control over the physical world, is also discovering that it is a part of that world.

Since we are assuming that an artificial mind would be something like a computer, that is, it would be constructed as a physical object, it follows that every such mind will have a similar power of affecting the world, and will sooner or later discover that power if it is reasonably intelligent.

Among other things, this is likely to cause significant difficulties for ideas like Nick Bostrom’s orthogonality thesis. Bostrom states:

An artificial intelligence can be far less human-like in its motivations than a space alien. The extraterrestrial (let us assume) is a biological who has arisen through a process of evolution and may therefore be expected to have the kinds of motivation typical of evolved creatures. For example, it would not be hugely surprising to find that some random intelligent alien would have motives related to the attaining or avoiding of food, air, temperature, energy expenditure, the threat or occurrence of bodily injury, disease, predators, reproduction, or protection of offspring. A member of an intelligent social species might also have motivations related to cooperation and competition: like us, it might show in-group loyalty, a resentment of free-riders, perhaps even a concern with reputation and appearance.

By contrast, an artificial mind need not care intrinsically about any of those things, not even to the slightest degree. One can easily conceive of an artificial intelligence whose sole fundamental goal is to count the grains of sand on Boracay, or to calculate decimal places of pi indefinitely, or to maximize the total number of paperclips in its future lightcone. In fact, it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.

He summarizes the general point, calling it “The Orthogonality Thesis”:

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

Bostrom’s particular wording here makes falsification difficult. First, he says “more or less,” indicating that the universal claim may well be false. Second, he says, “in principle,” which in itself does not exclude the possibility that it may be very difficult in practice.

It is easy to see, however, that Bostrom wishes to give the impression that almost any goal can easily be combined with intelligence. In particular, this is evident from the fact that he says that “it would be easier to create an AI with simple goals like these, than to build one that has a human-like set of values and dispositions.”

If it is supposed to be so easy to create an AI with such simple goals, how would we do it? I suspect that Bostrom has an idea like the following. We will make a paperclip maximizer thus:

  1. Create an accurate prediction engine.
  2. Create a list of potential actions.
  3. Ask the prediction engine, “how many paperclips will result from this action?”
  4. Do the action that will result in the most paperclips.

The problem is obvious. It is in the first step. Creating a prediction engine is already creating a mind, and by the previous considerations, it is creating something that will discover that it has the power to affect the world in various ways. And there is nothing at all in the above list of steps that will guarantee that it will use that power to maximize paperclips, rather than attempting to use it to do something else.

What does determine how that power is used? Even in the case of the human mind, our lack of understanding leads to “hand-wavy” answers, as we saw in our earlier considerations. In the human case, this probably a question of how we are physically constructed together with the historical effects of the learning process. The same thing will be strictly speaking true of any artificial minds as well, namely that it is a question of their physical construction and their history, but it makes more sense for us to think of “the particulars of the algorithm that we use to implement a prediction engine.”

In other words, if you really wanted to create a paperclip maximizer, you would have to be taking that goal into consideration throughout the entire process, including the process of programming a prediction engine. Of course, no one really knows how to do this with any goal at all, whether maximizing paperclips or some more human goal. The question we would have for Bostrom is then the following: Is there any reason to believe it would be easier to create a prediction engine that would maximize paperclips, rather than one that would pursue more human-like goals?

It might be true in some sense, “in principle,” as Bostrom says, that it would be easier to make the paperclip maximizer. But in practice it is quite likely that it will be easier to make one with human-like goals. It is highly unlikely, in fact pretty much impossible, that someone would program an artificial intelligence without any testing along the way. And when they are testing, whether or not they think about it, they are probably testing for human-like intelligence; in other words, if we are attempting to program a general prediction engine “without any goal,” there will in fact be goals implicitly inserted in the particulars of the implementation. And they are much more likely to be human-like ones than paperclip maximizing ones because we are checking for intelligence by checking whether the machine seems intelligent to us.

This optimistic projection could turn out to be wrong, but if it does, it is reasonably likely to turn out to be wrong in a way that still fails to confirm the orthogonality thesis in practice. For example, it might turn out that there is only one set of goals that is easily programmed, and that the set is neither human nor paperclip maximizing, nor easily defined by humans.

There are other possibilities as well, but the overall point is that we have little reason to believe that any arbitrary goal can be easily associated with intelligence, nor any particular reason to believe that “simple” goals can be more easily united to intelligence than more complex ones. In fact, there are additional reasons for doubting the claim about simple goals, which might be a topic of future discussion.

Age of Em

This is Robin Hanson’s first book. Hanson gradually introduces his topic:

You, dear reader, are special. Most humans were born before 1700. And of those born after, you are probably richer and better educated than most. Thus you and most everyone you know are special, elite members of the industrial era.

Like most of your kind, you probably feel superior to your ancestors. Oh, you don’t blame them for learning what they were taught. But you’d shudder to hear of many of your distant farmer ancestors’ habits and attitudes on sanitation, sex, marriage, gender, religion, slavery, war, bosses, inequality, nature, conformity, and family obligations. And you’d also shudder to hear of many habits and attitudes of your even more ancient forager ancestors. Yes, you admit that lacking your wealth your ancestors couldn’t copy some of your habits. Even so, you tend to think that humanity has learned that your ways are better. That is, you believe in social and moral progress.

The problem is, the future will probably hold new kinds of people. Your descendants’ habits and attitudes are likely to differ from yours by as much as yours differ from your ancestors. If you understood just how different your ancestors were, you’d realize that you should expect your descendants to seem quite strange. Historical fiction misleads you, showing your ancestors as more modern than they were. Science fiction similarly misleads you about your descendants.

As an example of the kind of past difference that Robin is discussing, even in the fairly recent past, consider this account by William Ewald of a trial from the sixteenth century:

In 1522 some rats were placed on trial before the ecclesiastical court in Autun. They were charged with a felony: specifically, the crime of having eaten and wantonly destroyed some barley crops in the jurisdiction. A formal complaint against “some rats of the diocese” was presented to the bishop’s vicar, who thereupon cited the culprits to appear on a day certain, and who appointed a local jurist, Barthelemy Chassenée (whose name is sometimes spelled Chassanée, or Chasseneux, or Chasseneuz), to defend them. Chassenée, then forty-two, was known for his learning, but not yet famous; the trial of the rats of Autun was to establish his reputation, and launch a distinguished career in the law.

When his clients failed to appear in court, Chassenée resorted to procedural arguments. His first tactic was to invoke the notion of fair process, and specifically to challenge the original writ for having failed to give the rats due notice. The defendants, he pointed out, were dispersed over a large tract of countryside, and lived in many villages; a single summons was inadequate to notify them all. Moreover, the summons was addressed only to some of the rats of the diocese; but technically it should have been addressed to them all.

Chassenée was successful in his argument, and the court ordered a second summons to be read from the pulpit of every local parish church; this second summons now correctly addressed all the local rats, without exception.

But on the appointed day the rats again failed to appear. Chassenée now made a second argument. His clients, he reminded the court, were widely dispersed; they needed to make preparations for a great migration, and those preparations would take time. The court once again conceded the reasonableness of the argument, and granted a further delay in the proceedings. When the rats a third time failed to appear, Chassenée was ready with a third argument. The first two arguments had relied on the idea of procedural fairness; the third treated the rats as a class of persons who were entitled to equal treatment under the law. He addressed the court at length, and successfully demonstrated that, if a person is cited to appear at a place to which he cannot come in safety, he may lawfully refuse to obey the writ. And a journey to court would entail serious perils for his clients. They were notoriously unpopular in the region; and furthermore they were rightly afraid of their natural enemies, the cats. Moreover (he pointed out to the court) the cats could hardly be regarded as neutral in this dispute; for they belonged to the plaintiffs. He accordingly demanded that the plaintiffs be enjoined by the court, under the threat of severe penalties, to restrain their cats, and prevent them from frightening his clients. The court again found this argument compelling; but now the plaintiffs seem to have come to the end of their patience. They demurred to the motion; the court, unable to settle on the correct period within which the rats must appear, adjourned on the question sine die, and judgment for the rats was granted by default.

Most of us would assume at once that this is all nothing but an elaborate joke; but Ewald strongly argues that it was all quite serious. This would actually be worthy of its own post, but I will leave it aside for now. In any case it illustrates the existence of extremely different attitudes even a few centuries ago.

In any event, Robin continues:

New habits and attitudes result less than you think from moral progress, and more from people adapting to new situations. So many of your descendants’ strange habits and attitudes are likely to violate your concepts of moral progress; what they do may often seem wrong. Also, you likely won’t be able to easily categorize many future ways as either good or evil; they will instead just seem weird. After all, your world hardly fits the morality tales your distant ancestors told; to them you’d just seem weird. Complex realities frustrate simple summaries, and don’t fit simple morality tales.

Many people of a more conservative temperament, such as myself, might wish to swap out “moral progress” here with “moral regress,” but the point stands in any case. This is related to our discussions of the effects of technology and truth on culture, and of the idea of irreversible changes.

Robin finally gets to the point of his book:

This book presents a concrete and plausible yet troubling view of a future full of strange behaviors and attitudes. You may have seen concrete troubling future scenarios before in science fiction. But few of those scenarios are in fact plausible; their details usually make little sense to those with expert understanding. They were designed for entertainment, not realism.

Perhaps you were told that fictional scenarios are the best we can do. If so, I aim to show that you were told wrong. My method is simple. I will start with a particular very disruptive technology often foreseen in futurism and science fiction: brain emulations, in which brains are recorded, copied, and used to make artificial “robot” minds. I will then use standard theories from many physical, human, and social sciences to describe in detail what a world with that future technology would look like.

I may be wrong about some consequences of brain emulations, and I may misapply some science. Even so, the view I offer will still show just how troublingly strange the future can be.

I greatly enjoyed Robin’s book, but unfortunately I have to admit that relatively few people will in general. It is easy enough to see the reason for this from Robin’s introduction. Who would expect to be interested? Possibly those who enjoy the “futurism and science fiction” concerning brain emulations; but if Robin does what he set out to do, those persons will find themselves strangely uninterested. As he says, science fiction is “designed for entertainment, not realism,” while he is attempting to answer the question, “What would this actually be like?” This intention is very remote from the intention of the science fiction, and consequently it will likely appeal to different people.

Whether or not Robin gets the answer to this question right, he definitely succeeds in making his approach and appeal differ from those of science fiction.

One might illustrate this with almost any random passage from the book. Here are portions of his discussion of the climate of em cities:

As we will discuss in Chapter 18, Cities section, em cities are likely to be big, dense, highly cost-effective concentrations of computer and communication hardware. How might such cities interact with their surroundings?

Today, computer and communication hardware is known for being especially temperamental about its environment. Rooms and buildings designed to house such hardware tend to be climate-controlled to ensure stable and low values of temperature, humidity, vibration, dust, and electromagnetic field intensity. Such equipment housing protects it especially well from fire, flood, and security breaches.

The simple assumption is that, compared with our cities today, em cities will also be more climate-controlled to ensure stable and low values of temperature, humidity, vibrations, dust, and electromagnetic signals. These controls may in fact become city level utilities. Large sections of cities, and perhaps entire cities, may be covered, perhaps even domed, to control humidity, dust, and vibration, with city utilities working to absorb remaining pollutants. Emissions within cities may also be strictly controlled.

However, an em city may contain temperatures, pressures, vibrations, and chemical concentrations that are toxic to ordinary humans. If so, ordinary humans are excluded from most places in em cities for safety reasons. In addition, we will see in Chapter 18, Transport section, that many em city transport facilities are unlikely to be well matched to the needs of ordinary humans.

Cities today are the roughest known kind of terrain, in the sense that cities slow down the wind the most compared with other terrain types. Cities also tend to be hotter than neighboring areas. For example, Las Vegas is 7 ° Fahrenheit hotter in the summer than are surrounding areas. This hotter city effect makes ozone pollution worse and this effect is stronger for bigger cities, in the summer, at night, with fewer clouds, and with slower wind (Arnfield 2003).

This is a mild reason to expect em cities to be hotter than other areas, especially at night and in the summer. However, as em cities are packed full of computing hardware, we shall now see that em cities will  actually be much hotter.

While the book considers a wide variety of topics, e.g. the social relationships among ems, which look quite different from the above passage, the general mode of treatment is the same. As Robin put it, he uses “standard theories” to describe the em world, much as he employs standard theories about cities, about temperature and climate, and about computing hardware in the above passage.

One might object that basically Robin is positing a particular technological change (brain emulations), but then assuming that everything else is the same, and working from there. And there is some validity to this objection. But in the end there is actually no better way to try to predict the future; despite David Hume’s opinion, generally the best way to estimate the future is to say, “Things will be pretty much the same.”

At the end of the book, Robin describes various criticisms. First are those who simply said they weren’t interested: “If we include those who declined to read my draft, the most common complaint is probably ‘who cares?'” And indeed, that is what I would expect, since as Robin remarked himself, people are interested in an entertaining account of the future, not an attempt at a detailed description of what is likely.

Others, he says, “doubt that one can ever estimate the social consequences of technologies decades in advance.” This is basically the objection I mentioned above.

He lists one objection that I am partly in agreement with:

Many doubt that brain emulations will be our next huge technology change, and aren’t interested in analyses of the consequences of any big change except the one they personally consider most likely or interesting. Many of these people expect traditional artificial intelligence, that is, hand-coded software, to achieve broad human level abilities before brain emulations appear. I think that past rates of progress in coding smart software suggest that at previous rates it will take two to four centuries to achieve broad human level abilities via this route. These critics often point to exciting recent developments, such as advances in “deep learning,” that they think make prior trends irrelevant.

I don’t think Robin is necessarily mistaken in regard to his expectations about “traditional artificial intelligence,” although he may be, and I don’t find myself uninterested by default in things that I don’t think the most likely. But I do think that traditional artificial intelligence is more likely than his scenario of brain emulations; more on this below.

There are two other likely objections that Robin does not include in this list, although he does touch on them elsewhere. First, people are likely to say that the creation of ems would be immoral, even if it is possible, and similarly that the kinds of habits and lives that he describes would themselves be immoral. On the one hand, this should not be a criticism at all, since Robin can respond that he is simply describing what he thinks is likely, not saying whether it should happen or not; on the other hand, it is in fact obvious that Robin does not have much disapproval, if any, of his scenario. The book ends in fact by calling attention to this objection:

The analysis in this book suggests that lives in the next great era may be as different from our lives as our lives are from farmers’ lives, or farmers’ lives are from foragers’ lives. Many readers of this book, living industrial era lives and sharing industrial era values, may be disturbed to see a forecast of em era descendants with choices and life styles that appear to reject many of the values that they hold dear. Such readers may be tempted to fight to prevent the em future, perhaps preferring a continuation of the industrial era. Such readers may be correct that rejecting the em future holds them true to their core values.

But I advise such readers to first try hard to see this new era in some detail from the point of view of its typical residents. See what they enjoy and what fills them with pride, and listen to their criticisms of your era and values. This book has been designed in part to assist you in such a soul-searching examination. If after reading this book, you still feel compelled to disown your em descendants, I cannot say you are wrong. My job, first and foremost, has been to help you see your descendants clearly, warts and all.

Our own discussions of the flexibility of human morality are relevant. The creatures Robin is describing are in many ways quite different from humans, and it is in fact very appropriate for their morality to differ from human morality.

A second likely objection is that Robin’s ems are simply impossible, on account of the nature of the human mind. I think that this objection is mistaken, but I will leave the details of this explanation for another time. Robin appears to agree with Sean Carroll about the nature of the mind, as can be seen for example in this post. Robin is mistaken about this, for the reasons suggested in my discussion of Carroll’s position. Part of the problem is that Robin does not seem to understand the alternative. Here is a passage from the linked post on Overcoming Bias:

Now what I’ve said so far is usually accepted as uncontroversial, at least when applied to the usual parts of our world, such as rivers, cars, mountains laptops, or ants. But as soon as one claims that all this applies to human minds, suddenly it gets more controversial. People often state things like this:

“I am sure that I’m not just a collection of physical parts interacting, because I’m aware that I feel. I know that physical parts interacting just aren’t the kinds of things that can feel by themselves. So even though I have a physical body made of parts, and there are close correlations between my feelings and the states of my body parts, there must be something more than that to me (and others like me). So there’s a deep mystery: what is this extra stuff, where does it arise, how does it change, and so on. We humans care mainly about feelings, not physical parts interacting; we want to know what out there feels so we can know what to care about.”

But consider a key question: Does this other feeling stuff interact with the familiar parts of our world strongly and reliably enough to usually be the actual cause of humans making statements of feeling like this?

If yes, this is a remarkably strong interaction, making it quite surprising that physicists have missed it so far. So surprising in fact as to be frankly unbelievable. If this type of interaction were remotely as simple as all the interactions we know, then it should be quite measurable with existing equipment. Any interaction not so measurable would have be vastly more complex and context dependent than any we’ve ever seen or considered. Thus I’d bet heavily and confidently that no one will measure such an interaction.

But if no, if this interaction isn’t strong enough to explain human claims of feeling, then we have a remarkable coincidence to explain. Somehow this extra feeling stuff exists, and humans also have a tendency to say that it exists, but these happen for entirely independent reasons. The fact that feeling stuff exists isn’t causing people to claim it exists, nor vice versa. Instead humans have some sort of weird psychological quirk that causes them to make such statements, and they would make such claims even if feeling stuff didn’t exist. But if we have a good alternate explanation for why people tend to make such statements, what need do we have of the hypothesis that feeling stuff actually exists? Such a coincidence seems too remarkable to be believed.

There is a false dichotomy here, and it is the same one that C.S. Lewis falls into when he says, “Either we can know nothing or thought has reasons only, and no causes.” And in general it is like the error of the pre-Socratics, that if a thing has some principles which seem sufficient, it can have no other principles, failing to see that there are several kinds of cause, and each can be complete in its own way. And perhaps I am getting ahead of myself here, since I said this discussion would be for later, but the objection that Robin’s scenario is impossible is mistaken in exactly the same way, and for the same reason: people believe that if a “materialistic” explanation could be given of human behavior in the way that Robin describes, then people do not truly reason, make choices, and so on. But this is simply to adopt the other side of the false dichotomy, much like C.S. Lewis rejects the possibility of causes for our beliefs.

One final point. I mentioned above that I see Robin’s scenario as less plausible than traditional artificial intelligence. I agree with Tyler Cowen in this post. This present post is already long enough, so again I will leave a detailed explanation for another time, but I will remark that Robin and I have a bet on the question.

Eliezer Yudkowsky on AlphaGo

On his Facebook page, during the Go match between AlphaGo and Lee Sedol, Eliezer Yudkowsky writes:

At this point it seems likely that Sedol is actually far outclassed by a superhuman player. The suspicion is that since AlphaGo plays purely for *probability of long-term victory* rather than playing for points, the fight against Sedol generates boards that can falsely appear to a human to be balanced even as Sedol’s probability of victory diminishes. The 8p and 9p pros who analyzed games 1 and 2 and thought the flow of a seemingly Sedol-favoring game ‘eventually’ shifted to AlphaGo later, may simply have failed to read the board’s true state. The reality may be a slow, steady diminishment of Sedol’s win probability as the game goes on and Sedol makes subtly imperfect moves that *humans* think result in even-looking boards. (E.g., the analysis in https://gogameguru.com/alphago-shows-true-strength-3rd-vic…/ )

For all we know from what we’ve seen, AlphaGo could win even if Sedol were allowed a one-stone handicap. But AlphaGo’s strength isn’t visible to us – because human pros don’t understand the meaning of AlphaGo’s moves; and because AlphaGo doesn’t care how many points it wins by, it just wants to be utterly certain of winning by at least 0.5 points.

IF that’s what was happening in those 3 games – and we’ll know for sure in a few years, when there’s multiple superhuman machine Go players to analyze the play – then the case of AlphaGo is a helpful concrete illustration of these concepts:

He proceeds to suggest that AlphaGo’s victories confirm his various philosophical positions concerning the nature and consequences of AI. Among other things, he says,

Since Deepmind picked a particular challenge time in advance, rather than challenging at a point where their AI seemed just barely good enough, it was improbable that they’d make *exactly* enough progress to give Sedol a nearly even fight.

AI is either overwhelmingly stupider or overwhelmingly smarter than you. The more other AI progress and the greater the hardware overhang, the less time you spend in the narrow space between these regions. There was a time when AIs were roughly as good as the best human Go-players, and it was a week in late January.

In other words, according to his account, it was basically certain that AlphaGo would either be much better than Lee Sedol, or much worse than him. After Eliezer’s post, of course, AlphaGo lost the fourth game.

Eliezer responded on his Facebook page:

That doesn’t mean AlphaGo is only slightly above Lee Sedol, though. It probably means it’s “superhuman with bugs”.

We might ask what “superhuman with bugs” is supposed to mean. Deepmind explains their program:

We train the neural networks using a pipeline consisting of several stages of machine learning (Figure 1). We begin by training a supervised learning (SL) policy network, pσ, directly from expert human moves. This provides fast, efficient learning updates with immediate feedback and high quality gradients. Similar to prior work, we also train a fast policy pπ that can rapidly sample actions during rollouts. Next, we train a reinforcement learning (RL) policy network, pρ, that improves the SL policy network by optimising the final outcome of games of self-play. This adjusts the policy towards the correct goal of winning games, rather than maximizing predictive accuracy. Finally, we train a value network vθ that predicts the winner of games played by the RL policy network against itself. Our program AlphaGo efficiently combines the policy and value networks with MCTS.

In essence, like all such programs, AlphaGo is approximating a function. Deepmind describes the function being approximated, “All games of perfect information have an optimal value function, v ∗ (s), which determines the outcome of the game, from every board position or state s, under perfect play by all players.”

What would a “bug” in a program like this be? It would not be a bug simply because the program does not play perfectly, since no program will play perfectly. One could only reasonably describe the program as having bugs if it does not actually play the move recommended by its approximation.

And it is easy to see that it is quite unlikely that this is the case for AlphaGo. All programs have bugs, surely including AlphaGo. So there might be bugs that would crash the program under certain circumstances, or bugs that cause it to move more slowly than it should, or the like. But that it would randomly perform moves that are not recommended by its approximation function is quite unlikely. If there were such a bug, it would likely apply all the time, and thus the program would play consistently worse. And so it would not be “superhuman” at all.

In fact, Deepmind has explained how AlphaGo lost the fourth game:

To everyone’s surprise, including ours, AlphaGo won four of the five games. Commentators noted that AlphaGo played many unprecedented, creative, and even“beautiful” moves. Based on our data, AlphaGo’s bold move 37 in Game 2 had a 1 in 10,000 chance of being played by a human. Lee countered with innovative moves of his own, such as his move 78 against AlphaGo in Game 4—again, a 1 in 10,000 chance of being played—which ultimately resulted in a win.

In other words, the computer lost because it did not expect Lee Sedol’s move, and thus did not sufficiently consider the situation that would follow. AlphaGo proceeded to play a number of fairly bad moves in the remainder of the game. This does not require any special explanation implying that it was not following the recommendations of its usual strategy. As David Wu comments on Eliezer’s page:

The “weird” play of MCTS bots when ahead or behind is not special to AlphaGo, and indeed appears to have little to do with instrumental efficiency or such. The observed weirdness is shared by all MCTS Go bots and has been well-known ever since they first came on to the scene back in 2007.

In particular, Eliezer may not understand the meaning of the statement that AlphaGo plays to maximize its probability of victory. This does not mean maximizing an overall rational estimate of the its chances of winning, giving all of the circumstances, the board position, and its opponent. The program does not have such an estimate, and if it did, it would not change much from move to move. For example, with this kind of estimate, if Lee Sedol played a move apparently worse than it expected, rather than changing this estimate much, it would change its estimate of the probability that the move was a good one, and the probability of victory would remain relatively constant. Of course it would change slowly as the game went on, but it would be unlikely to change much after an individual move.

The actual “probability of victory” that the machine estimates is somewhat different. It is a learned estimate based on playing itself. This can change somewhat more easily, and is independent of the fact that it is playing a particular opponent; it is based on the board position alone. In its self-training, it may have rarely won starting from an apparently losing position, and this may have happened mainly by “luck,” not by good play. If this is the case, it is reasonable that its moves would be worse in a losing position than in a winning position, without any need to say that there are bugs in the algorithm. Psychologically, one might compare this to the case of a man in love with a woman who continues to attempt to maximize his chances of marrying her, after she has already indicated her unwillingness: he may engage in very bad behavior indeed.

Eliezer’s claim that AlphaGo is “superhuman with bugs” is simply a normal human attempt to rationalize evidence against his position. The truth is that, contrary to his expectations, AlphaGo is indeed in the same playing range as Lee Sedol, although apparently somewhat better. But not a lot better, and not superhuman. Eliezer in fact seems to have realized this after thinking about it for a while, and says:

It does seem that what we might call the Kasparov Window (the AI is mostly superhuman but has systematic flaws a human can learn and exploit) is wide enough that AlphaGo landed inside it as well. The timescale still looks compressed compared to computer chess, but not as much as I thought. I did update on the width of the Kasparov window and am now accordingly more nervous about similar phenomena in ‘weakly’ superhuman, non-self-improving AGIs trying to do large-scale things.

As I said here, people change their minds more often than they say that they do. They frequently describe the change as having more agreement with their previous position than it actually has. Yudkowsky is doing this here, by talking about AlphaGo as “mostly superhuman” but saying it “has systematic flaws.” This is just a roundabout way of admitting that AlphaGo is better than Lee Sedol, but not by much, the original possibility that he thought extremely unlikely.

The moral here is clear. Don’t assume that the facts will confirm your philosophical theories before this actually happens, because it may not happen at all.