×
all 12 comments

[–]gwern 6 points7 points  (4 children)

Observations:

  • imitation learning does better than I'd expect
  • the z encoding a build order seems like a nasty hack to me.
  • what's a 'scatter connection'? It's not defined anywhere, even though Figure 3F says it accounts for +16% win rate against the elite bot (71%->87%), which is enormous.
  • as expected, some serious hardware here for those 44 days:

    For every training agent in the League, we run 16,000 concurrent StarCraft II matches and 16 actor tasks (each using a TPU v3 device with 8 TPU cores23) to perform inference. The game instances progress asynchronously on preemptible CPUs (roughly equivalent to 150 processors with 28 physical cores each), but requests for agent steps are batched together dynamically to make efficient use of the TPU. Utilising TPUs for batched inference provides large efficiency gains over prior work14,28^

    Actors send sequences of observations, actions, and rewards over the network to a central 128-core TPU learner worker, which updates the parameters of the training agent. The received data is buffered in memory and replayed twice. The learner worker performs large-batch synchronous updates. Each TPU core processes a mini-batch of 4 sequences, for a total batch size of 512. The learner processes about 50,000 agent steps per second. The actors update their copy of the parameters from the learner every 10 seconds.

[–]Miffyli 1 point2 points  (0 children)

Regarding scatter connections, there is this part in the supplementary material ("detailed-architecture.txt"). It is the only part I could find referencing "scatter":

Two additional map layers are added to those described in the interface. The first is a camera layer with two possible values: whether a location is inside or outside the virtual camera. The second is the scattered entities. `entity_embeddings` are embedded through a size 32 1D convolution followed by a ReLU, then scattered into a map layer so that the size 32 vector at a specific location corresponds to the units placed there.

[–]sanxiyn 0 points1 point  (2 children)

Agreed on z encoding. The paper says as much: "We found our use of human data to be critical in achieving good performance with reinforcement learning". In other words, AlphaStar is not AlphaGo Zero, whose paper was titled "Mastering the game of Go without human knowledge".

[–]Nicolas_Wang 0 points1 point  (1 child)

I think they mentioned the reason that self-playing will cause forgetting: "forget how to win against a previous version of itself".

[–]sanxiyn 2 points3 points  (0 children)

Yes, but note that this did not cause problems for Go.

[–][deleted] 4 points5 points  (1 child)

The timing of this release is interesting with respect to the upcoming world championships at Blizzcon on Friday and Saturday. So far I haven't been able to find any information about whether DeepMind will be there as they have in prior years. Perhaps a showmatch with a longer trained version of AlphaStar Final is still possible?

The links to the Nature paper don't seem to work, but going from their blog post and the open access version here, it sounds like there is now only one agent per race, rather than having different agents with their own strategies, like they did in the January showmatch against TLO and MaNa.

In StarCraft, each player chooses one of three races — Terran, Protoss or Zerg — each with distinct mechanics. We trained the league using three main agents (one for each StarCraft race), three main exploiter agents (one for each race), and six league exploiter agents (two for each race). Each agent was trained using 32 third-generation tensor processing units (TPUs23) over 44 days. During league training almost 900 distinct players were created.

Along with the APM restrictions and camera interface, this goes a long way toward making AlphaStar play more like a human.

I haven't had time to read the paper in detail, but it looks like these are the final MMR figures for AlphaStar Final after training for 44 days:

AlphaStar Final achieved ratings of 6,275 Match Making Rating (MMR) for Protoss, 6,048 for Terran and 5,835 for Zerg, placing it above 99.8% of ranked human players, and at Grandmaster level for all three races (Fig. 2A and Extended Data Fig. 7 (analysis), Supplementary Data, Replays (game replays)). AlphaStar Supervised reached an average rating of 3,699, which places it above 84% of human players and shows the effectiveness of supervised learning.

Judging from fig. 2, AlphaStar Mid, trained on the same setup for 27 days, looks to be in the range of 5500-5700 MMR for the three races. The top pros are 7000+ MMR, with Serral at 7300+. Judging by the training curve, I'm going to make a tentative guess that we will not see this agent being competitive with the top human pros under the current, more realistic restrictions.

[–]sanxiyn 1 point2 points  (0 children)

Since Nature reviews on its own schedule, I don't think timing is coordinated. (I mean, Nature could have sped up review to be before BlizzCon, but I find that unlikely.) Paper submission probably was roughly timed.

[–]ought_org 1 point2 points  (0 children)

There's some good discussion on HN: https://news.ycombinator.com/item?id=18992698

[–][deleted] 1 point2 points  (3 children)

I wonder if they will stop there or continue until they crush all humans.

[–]sanxiyn 1 point2 points  (1 child)

DeepMind said they will stop.

[–]Nicolas_Wang 0 points1 point  (0 children)

Probably will start a new project I guess. Curious to know...

[–]tlalexander -1 points0 points  (0 children)

Prob gonna crush humans