Eliezer Yudkowsky on AlphaGo

On his Facebook page, during the Go match between AlphaGo and Lee Sedol, Eliezer Yudkowsky writes:

At this point it seems likely that Sedol is actually far outclassed by a superhuman player. The suspicion is that since AlphaGo plays purely for *probability of long-term victory* rather than playing for points, the fight against Sedol generates boards that can falsely appear to a human to be balanced even as Sedol’s probability of victory diminishes. The 8p and 9p pros who analyzed games 1 and 2 and thought the flow of a seemingly Sedol-favoring game ‘eventually’ shifted to AlphaGo later, may simply have failed to read the board’s true state. The reality may be a slow, steady diminishment of Sedol’s win probability as the game goes on and Sedol makes subtly imperfect moves that *humans* think result in even-looking boards. (E.g., the analysis in https://gogameguru.com/alphago-shows-true-strength-3rd-vic…/ )

For all we know from what we’ve seen, AlphaGo could win even if Sedol were allowed a one-stone handicap. But AlphaGo’s strength isn’t visible to us – because human pros don’t understand the meaning of AlphaGo’s moves; and because AlphaGo doesn’t care how many points it wins by, it just wants to be utterly certain of winning by at least 0.5 points.

IF that’s what was happening in those 3 games – and we’ll know for sure in a few years, when there’s multiple superhuman machine Go players to analyze the play – then the case of AlphaGo is a helpful concrete illustration of these concepts:

He proceeds to suggest that AlphaGo’s victories confirm his various philosophical positions concerning the nature and consequences of AI. Among other things, he says,

Since Deepmind picked a particular challenge time in advance, rather than challenging at a point where their AI seemed just barely good enough, it was improbable that they’d make *exactly* enough progress to give Sedol a nearly even fight.

AI is either overwhelmingly stupider or overwhelmingly smarter than you. The more other AI progress and the greater the hardware overhang, the less time you spend in the narrow space between these regions. There was a time when AIs were roughly as good as the best human Go-players, and it was a week in late January.

In other words, according to his account, it was basically certain that AlphaGo would either be much better than Lee Sedol, or much worse than him. After Eliezer’s post, of course, AlphaGo lost the fourth game.

Eliezer responded on his Facebook page:

That doesn’t mean AlphaGo is only slightly above Lee Sedol, though. It probably means it’s “superhuman with bugs”.

We might ask what “superhuman with bugs” is supposed to mean. Deepmind explains their program:

We train the neural networks using a pipeline consisting of several stages of machine learning (Figure 1). We begin by training a supervised learning (SL) policy network, pσ, directly from expert human moves. This provides fast, efficient learning updates with immediate feedback and high quality gradients. Similar to prior work, we also train a fast policy pπ that can rapidly sample actions during rollouts. Next, we train a reinforcement learning (RL) policy network, pρ, that improves the SL policy network by optimising the final outcome of games of self-play. This adjusts the policy towards the correct goal of winning games, rather than maximizing predictive accuracy. Finally, we train a value network vθ that predicts the winner of games played by the RL policy network against itself. Our program AlphaGo efficiently combines the policy and value networks with MCTS.

In essence, like all such programs, AlphaGo is approximating a function. Deepmind describes the function being approximated, “All games of perfect information have an optimal value function, v ∗ (s), which determines the outcome of the game, from every board position or state s, under perfect play by all players.”

What would a “bug” in a program like this be? It would not be a bug simply because the program does not play perfectly, since no program will play perfectly. One could only reasonably describe the program as having bugs if it does not actually play the move recommended by its approximation.

And it is easy to see that it is quite unlikely that this is the case for AlphaGo. All programs have bugs, surely including AlphaGo. So there might be bugs that would crash the program under certain circumstances, or bugs that cause it to move more slowly than it should, or the like. But that it would randomly perform moves that are not recommended by its approximation function is quite unlikely. If there were such a bug, it would likely apply all the time, and thus the program would play consistently worse. And so it would not be “superhuman” at all.

In fact, Deepmind has explained how AlphaGo lost the fourth game:

To everyone’s surprise, including ours, AlphaGo won four of the five games. Commentators noted that AlphaGo played many unprecedented, creative, and even“beautiful” moves. Based on our data, AlphaGo’s bold move 37 in Game 2 had a 1 in 10,000 chance of being played by a human. Lee countered with innovative moves of his own, such as his move 78 against AlphaGo in Game 4—again, a 1 in 10,000 chance of being played—which ultimately resulted in a win.

In other words, the computer lost because it did not expect Lee Sedol’s move, and thus did not sufficiently consider the situation that would follow. AlphaGo proceeded to play a number of fairly bad moves in the remainder of the game. This does not require any special explanation implying that it was not following the recommendations of its usual strategy. As David Wu comments on Eliezer’s page:

The “weird” play of MCTS bots when ahead or behind is not special to AlphaGo, and indeed appears to have little to do with instrumental efficiency or such. The observed weirdness is shared by all MCTS Go bots and has been well-known ever since they first came on to the scene back in 2007.

In particular, Eliezer may not understand the meaning of the statement that AlphaGo plays to maximize its probability of victory. This does not mean maximizing an overall rational estimate of the its chances of winning, giving all of the circumstances, the board position, and its opponent. The program does not have such an estimate, and if it did, it would not change much from move to move. For example, with this kind of estimate, if Lee Sedol played a move apparently worse than it expected, rather than changing this estimate much, it would change its estimate of the probability that the move was a good one, and the probability of victory would remain relatively constant. Of course it would change slowly as the game went on, but it would be unlikely to change much after an individual move.

The actual “probability of victory” that the machine estimates is somewhat different. It is a learned estimate based on playing itself. This can change somewhat more easily, and is independent of the fact that it is playing a particular opponent; it is based on the board position alone. In its self-training, it may have rarely won starting from an apparently losing position, and this may have happened mainly by “luck,” not by good play. If this is the case, it is reasonable that its moves would be worse in a losing position than in a winning position, without any need to say that there are bugs in the algorithm. Psychologically, one might compare this to the case of a man in love with a woman who continues to attempt to maximize his chances of marrying her, after she has already indicated her unwillingness: he may engage in very bad behavior indeed.

Eliezer’s claim that AlphaGo is “superhuman with bugs” is simply a normal human attempt to rationalize evidence against his position. The truth is that, contrary to his expectations, AlphaGo is indeed in the same playing range as Lee Sedol, although apparently somewhat better. But not a lot better, and not superhuman. Eliezer in fact seems to have realized this after thinking about it for a while, and says:

It does seem that what we might call the Kasparov Window (the AI is mostly superhuman but has systematic flaws a human can learn and exploit) is wide enough that AlphaGo landed inside it as well. The timescale still looks compressed compared to computer chess, but not as much as I thought. I did update on the width of the Kasparov window and am now accordingly more nervous about similar phenomena in ‘weakly’ superhuman, non-self-improving AGIs trying to do large-scale things.

As I said here, people change their minds more often than they say that they do. They frequently describe the change as having more agreement with their previous position than it actually has. Yudkowsky is doing this here, by talking about AlphaGo as “mostly superhuman” but saying it “has systematic flaws.” This is just a roundabout way of admitting that AlphaGo is better than Lee Sedol, but not by much, the original possibility that he thought extremely unlikely.

The moral here is clear. Don’t assume that the facts will confirm your philosophical theories before this actually happens, because it may not happen at all.

 

Leave a comment