AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

gwern · 2019-10-31T00:56:47+00:00

Paper: "Grandmaster level in StarCraft II using multi-agent reinforcement learning", Vinyals et al 2019:

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional e-sports, and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions^{1–3^,the} strongest agents have simplified important aspects of the game, utilised superhuman capabilities, or employed hand-crafted subsystems. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks^{5,6^.} We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.
Blog: https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning
Video: https://www.youtube.com/watch?v=6eiErYh_FeY (not worth watching)
Replay files: https://deepmind.com/research/open-source/alphastar-resources
Media:
- https://www.nature.com/articles/d41586-019-03298-6
  
  The final version of AlphaStar relied on a cumulative 44 days of training and frequently ran into professional players. The AI wasn’t able to beat the best player in the world, as AIs have in chess and Go, but DeepMind considers its benchmark met, and says it has completed the StarCraft II challenge.
  
  The DM blog post doesn't explicitly say that the SC2 project is over, but it implies as much, with a name like 'AlphaStar Final'. Twitter convos like https://twitter.com/LiquidTLO/status/1189620887013744641 also sound like it's over. The BBC quotes David Silver as saying
  
  But Prof Silver said the lab "may rest at this point", rather than try to get AlphaStar to the level of the very elite players.
- https://www.technologyreview.com/s/614650/ai-deepmind-outcompeted-most-players-at-starcraft-ii/
- https://www.theverge.com/2019/10/30/20939147/deepmind-google-alphastar-starcraft-2-research-grandmaster-level
Discussion:
will there be demo matches at Blizzcon? Vinyals says the AS team will be there and there will be "more surprises". At least part of it is "We prepared a fun mix of #AlphaStar agents available for anyone to play. Come try them out and meet the team in the Arcade!" Interesting outcome: Serral lost 4-1 in informal AS matches in the Arcade. People are making excuses for how Serral was playing under suboptimum conditions, but still interesting.
Previous: https://www.reddit.com/r/reinforcementlearning/comments/aiocrt/deepmind_schedules_starcraft_2_demonstration_on/ https://www.reddit.com/r/reinforcementlearning/comments/ajeg5m/deepminds_alphastar_starcraft_2_demonstration/

Observations:

imitation learning does better than I'd expect
the z encoding a build order seems like a nasty hack to me.
what's a 'scatter connection'? It's not defined anywhere, even though Figure 3F says it accounts for +16% win rate against the elite bot (71%->87%), which is enormous.
as expected, some serious hardware here for those 44 days:

For every training agent in the League, we run 16,000 concurrent StarCraft II matches and 16 actor tasks (each using a TPU v3 device with 8 TPU cores^23⁾ to perform inference. The game instances progress asynchronously on preemptible CPUs (roughly equivalent to 150 processors with 28 physical cores each), but requests for agent steps are batched together dynamically to make efficient use of the TPU. Utilising TPUs for batched inference provides large efficiency gains over prior work^{14,28^}

Actors send sequences of observations, actions, and rewards over the network to a central 128-core TPU learner worker, which updates the parameters of the training agent. The received data is buffered in memory and replayed twice. The learner worker performs large-batch synchronous updates. Each TPU core processes a mini-batch of 4 sequences, for a total batch size of 512. The learner processes about 50,000 agent steps per second. The actors update their copy of the parameters from the learner every 10 seconds.

sanxiyn · 2019-10-30T20:14:42+00:00

The timing of this release is interesting with respect to the upcoming world championships at Blizzcon on Friday and Saturday. So far I haven't been able to find any information about whether DeepMind will be there as they have in prior years. Perhaps a showmatch with a longer trained version of AlphaStar Final is still possible?

The links to the Nature paper don't seem to work, but going from their blog post and the open access version here, it sounds like there is now only one agent per race, rather than having different agents with their own strategies, like they did in the January showmatch against TLO and MaNa.

In StarCraft, each player chooses one of three races — Terran, Protoss or Zerg — each with distinct mechanics. We trained the league using three main agents (one for each StarCraft race), three main exploiter agents (one for each race), and six league exploiter agents (two for each race). Each agent was trained using 32 third-generation tensor processing units (TPUs23) over 44 days. During league training almost 900 distinct players were created.

Along with the APM restrictions and camera interface, this goes a long way toward making AlphaStar play more like a human.

I haven't had time to read the paper in detail, but it looks like these are the final MMR figures for AlphaStar Final after training for 44 days:

AlphaStar Final achieved ratings of 6,275 Match Making Rating (MMR) for Protoss, 6,048 for Terran and 5,835 for Zerg, placing it above 99.8% of ranked human players, and at Grandmaster level for all three races (Fig. 2A and Extended Data Fig. 7 (analysis), Supplementary Data, Replays (game replays)). AlphaStar Supervised reached an average rating of 3,699, which places it above 84% of human players and shows the effectiveness of supervised learning.

Judging from fig. 2, AlphaStar Mid, trained on the same setup for 27 days, looks to be in the range of 5500-5700 MMR for the three races. The top pros are 7000+ MMR, with Serral at 7300+. Judging by the training curve, I'm going to make a tentative guess that we will not see this agent being competitive with the top human pros under the current, more realistic restrictions.

ought_org · 2019-11-06T00:56:08+00:00

There's some good discussion on HN: https://news.ycombinator.com/item?id=18992698

sanxiyn · 2019-10-30T19:46:02+00:00

I wonder if they will stop there or continue until they crush all humans.

reinforcementlearning

MODERATORS

Welcome to Reddit.

Want to add to the discussion?