AMA: We are David Silver and Julian Schrittwieser from DeepMind’s AlphaGo team. Ask us anything.

gwern · 2017-10-19T01:30:49+00:00

How/why is Zero's training so stable? This was the question everyone was asking when DM announced it'd be experimenting with pure self-play training - deep RL is notoriously unstable and prone to forgetting, self-play is notoriously unstable and prone to forgetting, the two together should be a disaster without a good (imitation-based) initialization & lots of historical checkpoints to play against. But Zero starts from zero and if I'm reading the supplements right, you don't use any historical checkpoints as opponents to prevent forgetting or loops. But the paper essentially doesn't discuss this at all or even mention it other than one line at the beginning about tree search. So how'd you guys do it?

Cassandra120 · 2017-10-17T19:16:37+00:00

Do you think that AlphaGo would be able to solve Igo Hatsuyôron's problem 120, the "most difficult problem ever", i. e. winning a given middle game position, or confirm an existing solution (e.g. http://igohatsuyoron120.de/2015/0039.htm)?

sml0820 · 2017-10-17T15:52:23+00:00

How much more difficult are you guys finding Starcraft II versus Go, and potentially what are the technical roadblocks you are struggling with most? When can we expect a formal update?

fischgurke · 2017-10-17T19:37:47+00:00

As developers on the computer Go mailing list have stated, it is not "hard" for them to implement the algorithms presented in your paper, however it is impossible for them to provide the same amount of training to their programs as you could to AlphaGo.

In computer chess, we have observed that developers copied algorithm parts (heuristics, etc.) from other programs, including for commercial purposes. Generally, it seems with new software based on DCNNs, the algorithm is not as important as the data resulting from training. The data, however, is much easier to copy than the algorithm.

Would you say that data is more important than the algorithm at all? Your new paper about AG0 implies otherwise. Nevertheless, do you think the fact that "AI" is "copy-pastable" will be an issue in the future? Do you think that as reinforcement learning and neural networks become more important, we will see attempts to protect trained networks in similar ways as other intellectual property (e.g., patents, copyright)?

RayquazaDD · 2017-10-18T20:23:17+00:00

Thanks for the the AMA. According to the new paper,

Is AlphaGo Zero still training now? Will we get another new self-play in the future if there is a breakthrough(ex: 70% win rate vs previous version)?
AlphaGo Zero played two hoshi(star points) against AlphaGo master whether Zero is black or white. However, we saw AlphaGo Zero had played komoku in the last period of its self-play. Is there any reason?
In the paper, you mentioned AlphaGo Zero won 89 games to 11 versus AlphaGo Master. Could you release all 100 games?

Uberdude85 · 2017-10-17T22:22:06+00:00

At a talk Demis Hassabis gave in Cambridge in March he said one of the future aims of the AlphaGo project was interpretability of the neural networks. So my question is have you made any progress in interpreting the neural networks of AlphaGo or are they still essentially mysterious black boxes? Is there any emergent structure that you can correlate with the human concepts we think about when we play the game, such as parsing the board into groups and then assigning them properties like strong or weak, alive or dead?

For example in this illustrative neural network trained to produce wikipedia articles sections of the network related to producing urls could be identified (see under "Visualizing the predictions and the “neuron” firings in the RNN"). So is there anything similar in AlphaGo's networks, such as this area of the network shows greater activity when it is attacking vs defending, or fighting a ko? Perhaps even more interesting would be if there were some emergent features which do not correlate with current human Go concepts, for example we humans think of groups or stones having positions on scales of a variety of properties such as weak/strong, amount of territory/influence, alive/dead, light/heavy, thick/thin, good/bad eyeshape etc but maybe AlphaGo could introduce a whole new dimension to how we think about the game.

tr1pzz · 2017-10-18T20:16:57+00:00

Two questions after reading the amazing AlphaGo Zero paper, wow, just wow!!

Q1: Could you explain why exactly the input dimensionality for AlphaGo's residual blocks is 19x19x17?

I don't really get why it would be useful to include 8 stacked binary feature plains per player to include the recent history of the game? (In my mind 2 (or even just 1?) would be enough..) (I'm not 100% familiar with all the rules of Go, so maybe I'm missing something here (I know move repetitions are prohibited etc..) but in any case 8 seems like a lot!)

Additionally, the presence of a final, full 19x19 binary feature plain C to simply indicate which player's move it is seems like a rather awkward construction since it's duplicating a single useful bit 361 times..

In summary I'm just surprised: the input dimensionality seems unnecessarily high... (I was expecting something more like 19x19x3 + 1 (a single 19x19 plane with 3 possible values: black, white or empty + 1 binary value indicating which player's turn it is))

Q2: Since the entire pipeline uses only self-play against the latest/best version of the model, do you guys think there is any risk in overfitting to the specific SGD-driven trajectory the model is taking through parameter space? It seems like the final model-gameplay is kind of dependent on the random initialisation weights and the actual encountered game states (as a result of stochastic action sampling).

This just reminded me of OpenAI's wrestling RL agents that learn to counter their immediate opponent resulting in a strategy that doesn't generalize as well as when it would be facing multiple, diverse opponents...

ThomasWAnthony · 2017-10-18T22:01:54+00:00

Super excited to see results of AlphaGo Zero. In our NIPS paper, Thinking Fast and Slow with Deep Learning and Tree Search, we propose a very similar idea. I'm particularly interested in learning more about behaviour in longer training runs than we achieved

As AlphaGo Zero trains, how does the relative performance of greedy play by the MCTS used to create learning targets, greedy play by the policy network, and greedy play of the value function change during training? Does the improvement over the networks achieved by the MCTS ever diminish?
In light of the success of this self-play method, will deepmind/blizzard be making it possible to use self-play games in the recent Starcraft 2 API (which was not available at launch)?

brkirby · 2017-10-18T13:39:50+00:00

Any plans to open source AlphaGo?

clumma · 2017-10-17T17:48:45+00:00

With strong chess engines we can now give players intrinsic ratings -- Elo ratings inferred from move-by-move analysis of their play. This lets us do neat things like compare players of past eras, and potentially offers a platform for the study of human cognition.

Could this be done with AlphaGo? I suppose it could be more complicated for go, since in chess there is no margin of victory to consider (there is material vs depth to mate, but only rarely are these two out of sync).

reddittimiscal · 2017-10-18T22:57:10+00:00

Why stop the training at 40 days? It's still climbing the performance ladder, no? What happened if you let it run for, say, 3 months?

JulianSchrittwieser · 2017-10-18T23:35:21+00:00

[deleted]

kamui7x · 2017-10-18T02:12:46+00:00

In 1846 Shusaku played a game against Gennan Inseki with the most famous move in go history of move #127 which has been named "the ear-reddening move." This move has been praised for how spectacular it was. Does Alphago agree this is the best path forward? If not, what sequence would Alphago play?

sfenders · 2017-10-18T01:04:23+00:00

Earlier in its development, I heard that AlphaGo was guided in specific directions in its training to address weaknesses that were detected in its play. Now that it has apparently advanced beyond human understanding, is it possible that it might need another such nudge to get it out of any local maximum it has found its way into? Is that something which has been, or will be attempted?

Paranaix · 2017-10-18T10:12:43+00:00

The 50 self-play games released after Wuzhen were a shock for the professional go community. Many moves look almost alien to a human player.

Is there any chance that you

Release another set of self-play games?
Include some variations which AG thinks plausible/probably, which might help us deepen our understanding of why AG chooses certain moves?

JulianSchrittwieser · 2017-10-19T17:00:06+00:00

Hi everyone, we are here to answer your questions :)

HeyApples · 2017-10-17T20:41:44+00:00

The small sample of AlphaGo vs. AlphaGo games published showed white winning a disproportionate amount of the time. Which led some to speculate that komi was too high.

With access to a larger dataset, have you been able to make any interesting conclusions about the basic Go ruleset? (ie: Black or white have an intrinsic advantage, komi should be higher or lower, etc.)

ExtraTricky · 2017-10-18T10:47:06+00:00

One of the things that stood out to me most in the Nature paper was the fact that two of the feature planes used explicit ladder searches. I've heard several commentators on AlphaGo be surprised by its awareness of ladders, but to me it feels like a go player thinking about a position when someone taps him on the shoulder and says "Hey, in this variation the ladder stops working." Much less impressive! In addition, the pure MCTS programs that predated AlphaGo were notoriously bad at reading ladders. Do you agree that using explicit ladder searches as feature planes feels like sidestepping the problem rather than solving it? Have you made any progress or attempts at progress on that front since your last publication?

I'm also interested in the ladder problem because it's in some sense a very simple form of the general semeai problem, where one side has only one liberty. When we look at other programs such as JueYi that are based on the Nature publication, we see many cases of games (maybe around 10% of games against top pros) where there is a very large semeai with many liberties on both sides and the program decides to ignore it, resulting in a catastrophically large dead group. When AlphaGo played online as Master, we didn't see any of that in 60 games. What does AlphaGo do differently from what was described in the Nature paper that allows it to play semeai much better?

When a sufficiently strong human player approaches these positions they are able to resolve it by counting the liberties on both sides, and determining the result by comparing the two counts. From my understanding of the nature paper, it seems that the liberty counts get encoded into the 8 feature planes, which are described as representing liberty counts 1, 2, 3, 4, 5, 6, 7, and 8 or more. It seems like this would work for small semeai, as the network could easily learn that if one group has the input for 7 liberties and the other has the input for 6 liberties then the group with 7 liberties will win the race. But for large semeai, say two groups with 10 liberties each, then when we compare playing there versus not playing there, the they both look like an "8+" vs "8+" race, which would probably be learned to be counted something like a seki, since there's no way to know which side wins just from that. So I was thinking that this could explain these programs' tendencies to disastrously play away from large semeai.

Does this thinking match the data that you've observed? If so, have you made any insights into techniques for machines to learn these "count and compare"-style approaches to problems in ways that would generalize to arbitrarily high counts?

seigenblues · 2017-10-18T20:47:27+00:00

Hi David & Julian, congratulations on the fantastic paper! 5 ML questions and a Go question:

How did you know to move to a 40-block architecture? I.e., was there something you were monitoring to suggest that the 20-block architecture was hitting a ceiling?
Why is it needed to do 1600 playouts/move even at the beginning, when the networks are mostly random noise? Wouldn't it make sense to play a lot of fast random games, and to search deeper as the network gets progressively better?
Why are the input features only 8 moves back? Why not fewer? (or more?)
Would a 'delta featurization' work, where you essentially have a one-hot for the most recent moves? (from brian lee)
Implementation detail: do you actually use an infinitesimal temperature (in the deterministic playouts), or just 'approximate' it by always picking the most visited move?
Any chance of getting more detailed analysis of joseki occurences in the corpus? :)

Congratulations again!

pjox · 2017-10-18T07:15:38+00:00

Considering that AlphaGo is now retired, when do you plan to open source it? This would have a huge impact on both the Go community and the current research in machine learning.

When are you planning to release the Go tool that Demis Hassabis announced at Wuzhen?

adum · 2017-10-17T21:53:14+00:00

As an AlphaGo superfan, watching all these matches was awesome. The biggest itch left unscratched is wondering how many handicap stones AlphaGo could give top pros. We know that AlphaGo can play handicap games since the papers talk about it. I understand that the political implications of giving H2 to Ke Jie were untenable. However, as the creators, you must be very curious yourselves. Have you done any internal tests, or is there anything else you can hint at? Thanks!

David_Silver · 2017-10-18T20:24:22+00:00

[deleted]

David_Silver · 2017-10-18T22:48:59+00:00

It seems that training by self-play entirely would have been the first thing you would try in this situation before trying to scrape together human game data. What was the reason that earlier versions of AlphaGo didn't train through self-play or if it was attempted, why didn't it work as well?

In general, I am curious about how development and progress works in this field. What would have been the bottleneck two years ago in designing a self-play trained AlphaGo compared to today? What "machine learning intuition" was gained from all the iterations that finally made a self-play system viable?

fischgurke · 2017-10-17T19:40:42+00:00

Can you give any news about an "AlphaGo tool" that you hinted at during the Ke Jie match? Will it be some kind of credit-based (for example, 1 per day) online interface where you can consult AlphaGo for its opinion on Go positions?

mosicr · 2017-10-18T10:32:19+00:00

To David Silver: in your video lectures you mentioned RL can be used for financial trading. Do you have any examples of real world use ? How would you deal with Black Swans ( previously unencountered situations ) ? Thanks

seigenblues · 2017-10-18T20:48:00+00:00

Ah, and one more -- the AGZ algorithm seems very applicable to other games -- have you run it on other games like Chess or Shogi?

empror · 2017-10-17T20:58:29+00:00

Can you tell us something about the first move in the game? Does AlphaGo sometimes play moves that we haven't seen it play in any of the games you published? Like 10-10 or 5-3 or even really strange moves? If not, is it just out of "habit", or does it have a strong belief that 3-3, 3-4 and 4-4 are superior?

semi_colon · 2017-10-17T19:05:48+00:00

Grettings from /r/baduk! I don't actually have a question, but I do want to thank your team for stimulating interest in Go in the West. I've been playing it for about ten years and it's nice being able to explain Go as, "Oh, it's that game that Google made that AI for last year" and people always know what I'm talking about.

KapitalC · 2017-10-17T17:18:54+00:00

Hello David Silver and Julian Schrittwieser and thank you for taking the time to talk with us about your work. A couple months ago I've seen David's course on deep learning on YouTube and I was hooked ever since!

And now for the question:

It seems that using or simulating long term memory for RL agents is a big hurdle. Looking towards the future, do you believe we are close to “solve” this with a new way of thinking? Or is it just a matter of creating extremely large networks, and waiting for the technology to get there?

P. S. I'm aspiring to be an AI engineer but interested to get there by showcasing independent projects and not through doing a master’s degree. Do I have a chance to work at a company such as DeepMind or is a master’s degree a must?

CitricBase · 2017-10-18T02:21:56+00:00

It was said that the version of AlphaGo that played Ke Jie needed only a tenth of the processing power of the one that played against Lee Sedol. What kind of optimizations did you do to accomplish that? Was it simply that AlphaGo was ten times stronger?

Borthralla · 2017-10-18T05:15:30+00:00

I'm a huge fan of AlphaGo!
My first question is about handicap games. Is AlphaGo's Neural Network applicable to handicap games, or is strictly trained for even games with standard 7.5 komi chinese rules?

Secondly, everyone is waiting with baited breath for the AlphaGo teaching software hinted at the end of Wuzhen. Although nothing is certain yet, who will be able to get the software? And also, what will be required to run the software? Does AlphaGo's Neural Network take up a lot of space?

Third, has AlphaGo been continuing to learn since the Wuzhen games? Are you going to continue training it? If so, do you think you'll ever release more Self Play games? Also, could it review some of the games played in the 60-game self-play series? Micheal Redmond and Chris Garlock are making a series on the self-play games and I'm sure they would find that sort of thing incredibly insightful.

Edit: with the reveal of AlphaGo 0, how strong is it from the version that played at Wuzhen? Wow!!

Thank you!!!!

Adjutor_de_Vernon · 2017-10-19T03:19:32+00:00

Have you thought of using generative adversarial network?

We all love AlphaGo but it has a tendency to slow down when ahead. This is annoying for go players because it hides its real strength and play suboptimal endgame. I know this is not a bug but a feature resulting from the fact that AlphaGo maximise his winning probability. What could be cool would be to create demon version of AlphaGo that maximise his expected winning margin. That demon would not slow down when ahead, not hide his strength, not play unreasonable move when loosing and always play optimal endgame. That demon could serve as a generative adversarial network to an angel version that maximise his probability of winning. As we know, we all improve by playing against different styles. This could make hellish matches between the angel and the demon. Of course the angel would win more games, but it would be like winning the Electoral College without winning the popular vote...

rlsing · 2017-10-17T18:07:56+00:00

Michael Redmond's reviews of AlphaGo's self-play have brought up some interesting points for behavioral differences between AlphaGo and human professionals:

(1) AlphaGo clearly plays bad moves in particular situations that a human pro would never play

(2) AlphaGo was not able to learn deep procedural knowledge (joseki)

How difficult would it be to have AlphaGo pass a "Go Turing Test"? E.g., what kind of research or techniques would be necessary before it would be possible to have AlphaGo play like an actual professional? How soon could this happen? What are the roadblocks?

pvkooten · 2017-10-17T20:13:03+00:00

Thanks for doing this! And David: thanks for the RL course.

I have a few questions, I hope you can answer them:

How's life at DeepMind?
Who were the members of team AlphaGo?
Could you say something about how the work was divided within the AlphaGo team?
What's the next big challenge?

goPlayerJuggler · 2017-10-18T11:01:55+00:00

Thanks a lot for organising this Q&A. Here are my 11 (!) questions, in no particular order of preference. Some of them have already been asked by others.

How was the 50-game self-play set chosen? Was it picked from a larger set?
Could you outline the sizes of other non-published sets of AG games you have been working with?
Apparently you have stated that 7.5 komi is the best value for balancing the game, according to your data. How does that relate to Black only winning 12 games in the 50-game set?
Was Godmoves actually AlphaGo incognito? https://www.reddit.com/r/baduk/comments/5kuo93/what_is_this_god_move_thing/ http://gokifu.com/playerother/GodMoves More generally, can you tell us of any other incognito games on Go servers, apart from the Master / Magist series?
How does AG manage with triple kos, molasses ko etc? Does it have a superko implementation? What experimentation did you do in this area?
How would you go about preparing AIs for playing Go variants such as Toroidal Go? It could be a good project for an intern at DeepMind maybe? :) Here are some sample variants that would be interesting: https://senseis.xmp.net/?ToroidalGo https://senseis.xmp.net/?VetoGo https://senseis.xmp.net/?environmentalGo https://senseis.xmp.net/?SuperpowerGo (a whole family of variants) Maybe my challenge is to create a single “generic” Go AI that would play at (near) AG level for different komis, board sizes and variants.
Would it be possible to tweak AG so as to get instances with different playing styles?
Do you have a tool that takes a set of games by a single player as input, and as output returns an estimate of the player’s strength? If not, how feasible do you think creating such a tool would be? Also the problem could be made more open ended by requiring the tool to also indicate the player’s strong/weak points (fuseki, chuban, yose, positional judgement, …)
Did exposure to AG improve skills of strong Go players within Deepmind (people like Fan Hui, Aja Huang, T Hubert)? And how? Have there been experiments on using AG and related tools for training human players?
Would Deepmind reconsider retiring AG? Say aliens appeared and challenged humanity to a jubango – how much further do you think AG could be improved?
If the latest AI technology were used to play Chess, do you think something significantly stronger than the current “brute-force” chess engines could be produced?

Sorry it’s such long list.

As well as answering my and other people’s questions, I would be greatly interested to hear about your most recent research with AG. Perhaps that would be even more interesting than answering some of our questions!

Cheers; I thank you and all the Deepmind team for all your incredible work.

(edit: added line returns and question #11)

aegonbittersteel · 2017-10-17T17:07:49+00:00

The original paper mentioned that AlphaGo was initially trained using supervised learning from over a million games and then through a huge amount of self play. For most tasks that amount of initial human supervision would not exist. Now with AlphaGo's success are you looking into making a Go player entirely from self-play (without the initial supervision)? Does such a network successfully train?

Finally, a big thank you to David for your online reinforcement learning lecture videos. They are an excellent resource for anyone new to the field.

EDIT: This question has been answered in Deepmind's new blog post. See link below.

roryhr · 2017-10-18T02:12:15+00:00

What are y'all working on now?

2017-10-18T14:24:22+00:00

What are some of the most interesting things you've seen AlphaGo do?

xuzou · 2017-10-18T18:45:29+00:00

Can we have all 100 AG Zero vs AG master games instead of only the first 20 in supplementary materials? Thanks very much.

say_wot_again · 2017-10-17T14:51:42+00:00

Since both you and Facebook were working on the problem at roughly the same time, what was the advantage that allowed you to get to grandmaster level performance so much sooner?

What do you see as the next frontier for ML, and especially for RL, in areas where getting as much training data as AlphaGo had is untenable?

somebodytookmynick · 2017-10-17T21:43:46+00:00

Please tell us about Tengen.

Or … perhaps rather about why not Tengen :-)

Also, have you tried forcing AlphaGo (black) to play Tengen as first move?

If yes, can we see some games, please?

<edit>

I must re-think my question …

Could it happen that, if AGZ would play a few million more games, or a billion, it might actually discover that Tengen indeed is the best first move?

</edit>

2017-10-17T15:52:51+00:00

When do you think robots will efficiently be able to solve/generalise to highly dimensional, real world problems (e.g. a device that learns by itself how to pick up litter of any shape, size, in any location... )?

Do you think some flavour of Policy Gradient methods will be key to this?

sml0820 · 2017-10-17T16:03:52+00:00

The documentary was compelling. Although it is playing in screenings around the world: https://www.alphagomovie.com/screenings, when can we expect the ability to purchase or stream it?

sml0820 · 2017-10-17T15:48:09+00:00

You mentioned a new research paper being released in relation to the Master version of AlphaGo. You also said you may try to train AlphaGo from scratch without leveraging the initial policy network trained on human games. Do you know when the paper will be released and what is the status on training from scratch?

Orc762 · 2017-10-18T05:15:05+00:00

Glad you guys are able to take some time for us!

Will there be any more matches against pros?

newproblemsolving · 2017-10-18T06:04:33+00:00

Can AlphaGo have two exhibiting matches (not competitive matches as I know AlphaGo is retired.) with Michael Redmmon or any professional players(or high-dan amateur) with (A) 2 or 3 stone handicaps (B) White mirror go with AlphaGo taking Black?

BTW, for (B) it's just so fun to see how AlphaGo deal with it, so sad it doesn't happen so far.

splendor01 · 2017-10-18T10:32:12+00:00

I wrote a program for playing gomoku(https://github.com/splendor-kill/ml-five) based on AlphaGo paper. The SL network has been trained by datasets gathered from Gomocup top 3 players’ games. At the RL stage, the RL agent are initialized to SL NN parameters at the beginning, At battling mode, since opponent parameter is fixed, and the RL agent is gradually learning with RL algorithms. therefore, after some time, when the winning rate is greater than certain level, for example 55%. I will stop and replicate the RL agent and put it into the opponent pool. I will randomly select another opponent from the pool and repeat like this.

But here is an interesting thing I found out: The RL agent at first easily and quickly realizes the shortcomings of its opponent, defeating the opponent. However after several rounds, the agent became “stupid” and seemed to forget everything the agent has learned before.

I am wondering how does AlphaGo solve this?

Look forward to your reply .Thanks!

Walther_ · 2017-10-18T14:02:27+00:00

How to get involved in the AI work today?

I think one obvious approach is "complete a PhD and apply for a job", but that feels like an answer to the slightly different question of "what's the most common way to get a career in AI".

In today's world with hackathons, agile development, open-source communities and such, I'm fairly optimistic there have to be ways for an eager soon-to-be BSc to be able to start poking at things, to learn via experimenting, participating in group efforts, and getting mentoring from more experienced people, in addition to formal education.

(Personally, I'm currently writing my BSc thesis on AlphaGo, so I've got that going already, which is nice.)

Big thanks for all of your work and this AmA.

hyh123 · 2017-10-19T01:38:13+00:00

On AlphaGo, now that you have done AlphaGo Zero, do you think you could have created it without developing the previous versions first? It seems like it's very different from the earlier ones.

smurfix · 2017-10-19T07:57:31+00:00

Would it be possible to do this again, substituting chess for Go?

I realize that it's just another game that's already been "done" with computers, but it'd be very interesting to contrast the style of play that Deep Blue exhibited, to whatever style AlphaGoZero might develop. Also, AlphaGoZero is reported to have come up with some interesting new Go stratagems. I wonder if that'd happen with chess also. And, frankly, thirdly, as a hobbyist chess player I can at least appreciate intricate chess moves, while Go is as obscure as it gets. ;-)

ogs_kfp_t · 2017-10-19T13:17:55+00:00

I challenge you to make such a heatmap of opening move, with Alphago Zero:

http://i.imgur.com/7hz0qEL.png

I am very curious. If you send me the probabilities, I will help to create the image.

Jameswinegar · 2017-10-17T14:56:02+00:00

When working on AlphaGo what was the most difficult obstacle you faced concerning the architecture of the system?

undefdev · 2017-10-17T15:05:18+00:00

Are there any plans to release a dataset of some of the situations that are "very difficult" for AlphaGo? It seems like finding good strategies for these situations should be the next challenge we should face to further deepen our understanding of Go.

sml0820 · 2017-10-17T15:53:30+00:00

What real life areas do you find most promising for applications of reinforcement algorithms such as AlphaGo - 5, 10, and 15 years out?

empror · 2017-10-17T21:22:47+00:00

Would it be possible to train your AI to decide itself how long it wants to think about a move? For example, in the game Alphago lost against Lee Sedol, would Alphago have found a better move if it had had more time to think about the famous wedge? How about those needless forcing moves that Michael Redmond likes to criticize, aren't they a sign that Alphago cries out to have control over its pace?

Edit: Maybe my wording was a bit vague, so I'll try to explain what I mean with the last question: Often Alphago plays moves where it is obvious that the opponent has to answer (e.g. fills a liberty). For many of these forcing moves, strong players agree that the move itself cannot possibly have any positive effect (while it is not entirely clear whether the effect is negative or neutral). Michael Redmond and others have been speculating that Alphago has only some limited time for each move, and if it wants to think longer, then it plays some forcing move. So my question is: If Alphago already knows that the time is not enough, wouldn't it be feasible to just let it take longer for this move than for others?

sritee · 2017-10-17T17:26:30+00:00

Do you think we can see RL being used in Self-driving vehicles any time soon? If not, would the primary reason be its data inefficiency, or some other concerns?

2017-10-17T19:28:20+00:00

What are the stages that AlphaGo goes through, when trained from scratch (if you did this experiment), after reaching say amateur Dan level?

Do these stages correspond somehow with they way Go style evolved during the past few hundreds years for humans?

alcoholicfox · 2017-10-18T08:03:26+00:00

What do you recommend an undergrad should do if he is interested in research in deep learning

valdanylchuk · 2017-10-18T10:07:24+00:00

What are some expected milestone dates and achievements in Starcraft? Are there more exciting things to come soon, e.g. in VR or NLP?

darkmighty · 2017-10-18T15:41:59+00:00

AlphaGo is remarkable for finally combining an intuitive, heuristic, learned framework of the value and policy network, with an exact planning algorithm which are the explicit Monte Carlo rollouts.

Do you expect this approach to be enough for more general intelligence tasks, such the games Starcraft or Dota when played with visual input, or maybe the game Portal?

Notable shortcomings in those cases are that

a) Complex environments don't have simple state transition functions. Predicting the future in a Monte Carlo rollout is thus very difficult.

b) The future states are not equally important. Sometimes your actions need precision down to milliseconds, sometimes you're just strolling though a passage with nothing of note happening. Uniform steps in time seem infeasible.

c) AlphaGo is non-recursive. Thus it cannot accomplish tasks that require arbitrary computations. This is perhaps irrelevant in Go, where the state of the board itself provides a sort of memory for its thinking, with the policy network functioning more or less as an evolution function of the thinking process. Even in complex scenarios one could imagine the agent using the predicted world itself as a sort of "blackboard" to carry out complex planning. The efficiency of this seems questionable however: the environment needs to support such "blackboard" memory (have many states that can be modified with low cost); and modifying this blackboard in the real world seems largely redundant.

If not, what immediate improvements do you have in mind?

Borgut1337 · 2017-10-18T18:05:08+00:00

About AlphaGo Zero and its self-play:

Do you think that the MCTS it still uses is critical to make self-play work out correctly? I would personally suspect that Reinforcement Learning purely from self-play without any search would suffer from a risk of ''overfitting'' against itself. I suspect incorporating a bit of search helps to combat that. Do you have any thoughts on this?

EAD86 · 2017-10-18T21:40:33+00:00

How did you decide on the 40-day training time for AlphaGo Zero? Would it get stronger if you let it train longer?

NotModusPonens · 2017-10-19T01:04:57+00:00

Does alphago zero eventually only play two 4-4 points in the opening?

Edit: also, have you tried training on bigger board sizes? 21x21, 37x37, even something bigger than that?

hyperforce · 2017-10-19T01:26:58+00:00

This new approach seems much simpler than the initial AlphaGo which had a much more complicated architecture.

Was this the first time you tried this simpler approach? Why did the initial AlphaGo you went public with not use this self-learning approach? Did something change recently that made bootstrapping more feasible? Did the work into the initial AlphaGo make the road to Zero easier?

danielrrich · 2017-10-17T21:40:51+00:00

Any further updates about the discussed teaching/review assistant? I really think it would be cool from a perspective of transferring that superhuman knowledge/behavior of alphago to people.

Feryll · 2017-10-18T08:38:50+00:00

Is there any new information on the "AG training tool" that was mentioned as being something we could soon look forward to? Many of us in the go community are wondering what that is, and what a very tentative schedule for that might be.

YearZero · 2017-10-18T22:06:39+00:00

Would you guys consider applying the AlphaGo Zero technique to chess? Would it have an advantage over current top heuristic based engines like Komodo or Stockfish, which are around 3400 ELO? It would be interesting to see what would happen, even just as a curiosity. However, even better if it’s possible to release as a competing engine onto the scene, especially if it dramatically trumps all that came before, forcing the entire community to change methods and follow suit. Thanks!

Revoltwind · 2017-10-17T17:26:18+00:00

How many stones Fan Hui needs to play an even game against AlphaGo?

Is alphago able to run on mobile? If yes, How strong is it? If no, what would be the limitation to port it on mobile?

Thank you for this AMA! Looking forward for your paper.

m2u2 · 2017-10-17T21:12:20+00:00

What did you think of the Chinese government's censorship of the Ke Jie matches? Was it due to you being a google owned company or simply embarrassment that a west based team cracked this game that was invented in China?

Really looking forward to the documentary!

BuckeyeInSeattle · 2017-10-17T23:08:54+00:00

Thanks for the AMA!

DeepMind has said on multiple occasions that this foray into Go is just a stepping stone to other applications, such as medical diagnosis, which is obviously laudable.

With that in mind, I'm troubled by the way AlphaGo makes provably sub-optimal moves in the end game. When given a choice between N moves that win, AlphaGo will select the "safest", but if they're all equally safe, it appears to choose more or less at random. One specific example I can remember is when it decided to make two eyes with a group, and chose to make the second eye by playing a stone inside its own territory, rather than by playing on the boundary of its territory, losing 1 point for no reason.

The reason this concerns me is because this behavior only makes sense if you assume it can never be wrong about its analysis. In other words, it does not give any consideration to the notion that it might have calculated something wrong. If it had any idea of uncertainty, it would prefer the move that doesn't lose 1 point 100% of the time, just in case there was some move it hadn't anticipated that made it lose some points elsewhere on the board.

While playing Go, this isn't a big deal, but coming back to my original point, with things like medical diagnosis this could be a real life and death matter (pun fully intended). It seems self-evident to me that you would like your AI to account for the possibility that it has calculated something wrong, when it can be done at no cost (as is the case when choosing between two moves that both make a second eye).

Do you have any thoughts about this, or more generally about it "giving away" points in winning positions when doing so doesn't actually reduce uncertainty?

2017-10-17T19:25:39+00:00

Does AlphaGo play actual handicap games, or are the comparisons between versions done at even play, and the reported size of handicap amount is just inferred from win ratio?

Can you please publish some of the actual handicap games?

ViktorMV · 2017-10-18T01:45:30+00:00

Hi David, Julian, thanks for this thread!

1) How strong is a current version of the AG? For example compare to the Ke Jie version and to the Master version. What is it's number? Do you continue it's training?

2) Can you share self-played games with handicap vs older versions and new self-played games of the latest version?

3) Why did you decided to follow marketers recommendations to retire AG as there was still at least one very interesting for the Go community and still open questions - with how many handicap stones AG still can win a top pro?

4) Can you share AG comments with variants and win probabilities for it's self-play games on English?

5) Are there any chances that you share more information from AG - analysis of some comtemporary fuseki, new self-played games with comments, etc?

Good with your research, looking forward to see your Starcraft 2 progress!

tallguy1618 · 2017-10-18T05:16:28+00:00

Do you guys have any whacky AI's that just do fun things around the office?

salunero · 2017-10-18T00:32:11+00:00

Is it possible to derive some heuristics from the current neural networks that Alphago uses or should we only view them as mystery boxes that give out answers but not telling how and why it gave those answers? Or does this kind of thinking make no sense?

newproblemsolving · 2017-10-18T05:53:52+00:00

Is AlphaGo still training itself and will does so in the foreseeable future or it just stops completely now?

berndscb1 · 2017-10-18T11:15:14+00:00

Would it be possible for DeepMind to produce annotations of famous classic games using AlphaGo (or make AlphaGo accessible enough that others could produce something like this)?

temitope-a · 2017-10-18T12:50:37+00:00

Have you peaked inside the layers of Alpha-Go?

At times the sequences of inputs and outputs of different layers can reveal the 'understanding' the network has of the problem.

Were you able to isolate ladders, miai, hane, invasions or some other concepts of Go in AlphaGo?

Question from the Oxford Student Go Society

brkirby · 2017-10-18T13:38:59+00:00

AlphaGo cannot explain its play, which poses a problem when similar techniques are applied to areas such as health care. Any thoughts on improving this flaw? How can society trust AI when it’s known to be subject to mistakes that it can’t articulate to humans?

_tomakko · 2017-10-18T13:40:39+00:00

Hi! How did your proceed when designed the neural net architecture for Alpha Go? What kind of theoretical considerations did you do regarding e.g. effective receiptive fields, no. of layers, filter sizes? Did you fine tune the architecture by trial and error afterwards?

hawking1125 · 2017-10-18T15:53:46+00:00

What game(s) are you planning to conquer next?
What lessons did you learn from AlphaGo helped you in subsequent research?
What for you is the future of AI and how has AlphaGo affected it?
How will the results from AlphaGo Zero affect how you approach RL in Starcraft?
Do you plan on trying to beat OpenAI at DotA 2?

EDIT: Added some more questions

P42- · 2017-10-18T16:00:30+00:00

Do you expect that AGI will be able to independently design technology that is decades or centuries beyond unassisted technological progression?

temitope-a · 2017-10-18T16:38:21+00:00

Can AlphaGo be made to 'talk' about Go, beside playing it, i.e. explain what it is doing? After AlphaGo, Deepmind has explored memory / immagination / planning. Would Alpha Go improve with such techniques?

Question from the Oxford Student Go Society

enntwo · 2017-10-18T17:06:57+00:00

For the self-play games, are both "players" using the same trained network, or is each player using a separately trained network?

My assumption is that it is the same network, and if that is the case I was wondering if you could speak to any inherent biases that may arise in games where the same network plays both sides. Would each player have the same blindspots/oversights? I feel like that some of the non-humanness of these self-play games stem from biases like these where both players pretty much have the same "strategies"/"thoughts" for lack of betters terms behind each move.

If it is the case where it is the same network, do you think AG games where each player is a separately trained network of similar strengths that the games would appear more "human-like" or look different overall to those of the same network?

-S7evin- · 2017-10-18T18:58:29+00:00

You said that the AlphaGo Zero algorithm can be used in other fields besides the game, do you have a road map to start with? Thank you.

rick_rick_rick · 2017-10-18T19:27:29+00:00

It would be interesting and useful for later analysis if the sgf files for the Master vs Zero games also had the moves that Zero would have played in the place of Master.

Is there any hope that DeepMind would release some AlphaGo games that include alternative lines in that fashion, which would greatly assist in later human analysis of these games?

charm001 · 2017-10-18T19:52:33+00:00

Is one of your goals with Alphago zero to develop a version of alphago that we can buy and use on normal computers and maybe even our phones?

If so when do you think that will be possible?

picardythird · 2017-10-18T22:25:43+00:00

1.) With the advances in hardware requirements for AlphaGo Master and AlphaGo Zero making it less expensive to run, will you be providing a way for amateurs or professionals to access AlphaGo as a tool?

2.) Why do AlphaGo Master and AlphaGo Zero play random forcing moves? Michael Redmond has speculated that they are "time-saving" moves, although in the Game 11 review he mentions that he got the side-eye from a researcher when he suggested that, indicating that this is not the case.

3.) It has been mentioned that AlphaGo Master was tweaked in terms of complicated tsumego with a custom training regimen composed by Mr. Fan Hui, which some such as Michael Redmond have suggested is a reason that AlphaGo Master is prone to extremely complicated games. In comparison, while AlphaGo Zero's games are not simple by any stretch, they seem to be less confrontational than AlphaGo Master's games. Is this because AlphaGo Zero was not so tweaked by any such custom training program?

Smallpaul · 2017-10-18T23:02:27+00:00

Could the AlphaGo Zero program be taught to play Reversi or Connect Four just by changing the ruleset? Isn't this a more important milestone than Tabula Rasa mastering of a game that is already mastered? If you could apply the same engine to multiple games, the claim of generalizable technology would be indisputable.

gin_and_toxic · 2017-10-19T00:01:19+00:00

Hi David, saw the movie recently. You're especially hilarious when trolling everyone at the end of the last game. It's great to see all of your team's struggles and point of view than what we saw on the stream last year.

Questions: What are members of previous AlphaGo team working on now that you can tell us? Are everyone still working on different variations of AlphaGo, or are you moving on to something else?

If you were to give AlphaGo an avatar, what would you personally choose?

Thanks for the AMA.

zebub9 · 2017-10-19T10:12:11+00:00

Could you release a winrate map for the empty board? And maybe some selfplay games with komi 7?
Do you plan to let AG0 play a few games against humans, at decent handicap, to see the strength difference and some interesting games?
There seems significantly less strength difference between AG0 and AGMaster than between AGMaster and earlier version. Is this because there is less room towards perfect play, or for some other reason?

nestedsoftware · 2017-10-19T10:13:00+00:00

After AG lost game 4 to Lee Sedol, it was apparently trained against an “anti-AlphaGo” to fix the weaknesses in reading this loss exposed. Was AlphaGo Zero also trained in this manner? If not, how were these kind of potential problems handled?

Thank you!

icosaplex · 2017-10-19T11:48:17+00:00

So it seems like there is mounting evidence that at AlphaGo's level, white is significantly favored at 7.5 komi. I presume that black would be favored significantly at 5.5 komi.

One funny issue is that with Taylor-Tromp or other area-scoring rules, the final score (except in rare cases) only has a granularity of 2 points, whereas in Japanese rules or other territory-scoring rules, it has a genuine granularity of 1 point and presumably on average the ability to more finely differentiate in precision of play. However, territory-based rules are a nightmare to formally implement.

But there are alternatives. Have you considered using Taylor-Tromp-like rules, except with a "button", to achieve territory-scoring levels of result granularity? (https://senseis.xmp.net/?ButtonGo) If one were to use 6.5 komi with the increased granularity, do you think there would still be a strong bias in favor of one side or the other at an AlphaGo level of strength?

tobasz · 2017-10-19T13:33:41+00:00

if you replaced the board and rules of Go with the chess board and rules, would AlphaGo be able to learn to play better than a current open source chess program like Stockfish? Would anything else need to be changed, e.g., MCTS?

apriltea0409 · 2017-10-19T16:17:20+00:00

I have 3 questions. First of all, I understand all AlphaGos are trained under the Chinese rule with a 7.5 komi. Does Zero continue to perform slightly better when she plays white? Has there been such an attempt to have Zero play under 6.5 or any other numbers of komi? And if so, how did the change of komi affect Zero's performance? In theory, a perfect komi is the number of points by which Black would win given optimal play by both sides. As AlphaGo Zero is apparently much closer to a perfect player than any of the human players is as of today, we're interested to know, that based on Zero's game data, what would be a perfect komi of the Go game?

Similarly, I'd be interested in learning how well Zero would do on a larger Go board, for example, 25 by 25. Have you ever had such a try?

And here's my last question. As far as I understand, AlphaGo would come up with a few choices for each move. In case there're two or three moves that have the same odds of winning, what is the mechanism AlphaGo would use to make the final choice? Or is it just a random pick?

ffontana · 2017-10-17T20:18:46+00:00

What's the future of Alphago? Will it be publicly available? For example, renting an hour to play with the AI. Thanks!

GetInThereLewis · 2017-10-17T21:25:07+00:00

First, thank you for all your hard work on AlphaGo and your contributions to the Go playing community!

My questions are:

Do you have an update on the next publication that Demis mentioned at Wuzhen?
How closely were you watching other Go AI programs such as DeepZen and FineArt, and have you ever tested AlphaGo against them?
Will AlphaGo ever be released, or at least accessible to the public?
Can you sell DeepMind/AlphaGo swag please (shirts, hoodies, etc)?!

edit: You already answered question 1! Thank you!

cutelyaware · 2017-10-17T22:20:47+00:00

Do you have any estimation about how far is AlphaGo from perfect play, maybe by studying the progress graph over time - did the training process hit any ceiling?

IDe- · 2017-10-17T22:08:29+00:00

Has any work been done on visualizing the factors that affect the decision making process? Do you think this is something that has to be solved for domain expert + machine pairings to work effectively? Do you see teaching potential in AIs like these?

RayquazaDD · 2017-10-18T09:16:20+00:00

Thanks for the AMA.

How does AlphaGo deal with mimic go? Does AlphaGo set up double ladders or make Tengen be a good point?
Nowadays, if Go AI meets a long dragon situation(such as long liberty comparison), it will often be trouble. Does AlphaGo have same problem? How does AlphaGo solve the problem?
We saw AlphaGo 55 self-play games. Did you choose some special fuseki or random? Did you remove any game owing to some reasons? If yes, then what are the reasons?

AndrewVashevnik · 2017-10-18T10:14:52+00:00

Hi, David and Julian! Thanks a lot for your work. And thank you for publishing scientific papers and making your research available for everyone, this is amazing.

1) Have you tried to teach AlphaGo from scratch without data from human games? Doest it fall to inefficient equilibrium? Do two different attempts to train AlphaGo converge to similar result? Could you please provide some insight what are the difficulties you are facing when teaching AlphaGo from scratch?

2) As I understood from the Nature paper AlphaGo is not 100% learning algorithm. At the first stage handcrafted algorithm is used to process board position. This algorithm calculates number of liberties, whether ladders work etc, which are later passed as inputs to learning algorithm. Is it possible to make AlphaGo without this handcrafted part? Would the learning algorithm be able to come up with concepts like liberties or ladder? What ML techniques could be used to approach this problem?

3) What are blind spots of AlphaGo and the ways to solve them? Like modern chess engines often struggle with fortresses.

4) Is Fan Hui + AlphaGo significantly stronger than AlphaGo alone? Is there still a way how a pro can still make an impact when teamed with an AlphaGo?

I am curious about capabilities of AlphaGo to solve hardest go problems too.

Thanks, Andrew

UPDATE: Well, my initial question was before AlphaGo Zero was published, which pretty much answers 1) and 2)

I am really excited about general-purpose learning algorithm. Thanks for sharing it.

Some questions on AlphaGo Zero

5) Have you tried this general-learning approach to other board games? AlphaChess Zero, AlphaNoLimitHeadsUp Zero, etc

6) If you train two separate versions of AlphaGo Zero from scratch, do they gather the same knowledge, invent the same josekis? AlphaGo Zero training is stochastic (mcts), how much randomness is there in final result after 70 hours of training? Is it a good idea to train ten different AlphaGo Zero and then combine their knowledge or training one AlphaGo Zero ten times longer is better?

7) let's look at AlphaGo Zero 1 dan, which is an AlphaGo Zero after 15 hours of training which has 2000 elo and a level of an amateur 1 dan. I guess that AlphaGo Zero 1 dan would be considerably better than human 1 dan in some aspects of play and worse in some other (although their overall level is the same). Which aspects of play (close fighting, direction of play, etc) are stronger for AlphaGo Zero 1 dan and which are stronger for amateur 1 dan? What knowledge is easier and harder for AI to grasp. I have read that AI understands ladder much later than human players, are there some more examples?

8) On real-world applications: I am sure that this kind of learning algorithm could able to learn how to drive a car. The catch is that it would take millions of crashes to do so as it took millions of beginner level games to train AlphaGo Zero. How can you train an AlphaCar without allowing to crash it many times? Building a virtual simulator based on real car data? Could you please provide your thoughts on using AlphaGo general learning algorithm when simulator is not as easily available as in the game of go.

9) what would happen if you use AlphaGo Zero training algorithms, but start with AlphaGo Lee strategy rather than with complete random strategy? Would it converge to the same AlphaGo Zero after 70+ hours of training or AlphaGo Lee patterns would "spoil" something?

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

Two questions after reading the amazing AlphaGo Zero paper, wow, just wow!!

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

Welcome to Reddit.

Want to add to the discussion?

Two questions after reading the amazing AlphaGo Zero paper, wow, just wow!!