25 users here now
This is for any reinforcement learning related work ranging from purely computational RL in artificial intelligence to the models of RL in neuroscience.
The standard introduction to RL is Sutton & Barto's Reinforcement Learning.
Related subreddits:
Come for the cats, stay for the empathy.
and start exploring.
DL, MF, NOpenAI Five Benchmark: crushes audience team; stream of 3-game match against pros begins (twitch.tv)
submitted 5 years ago by gwern
Post a comment!
[–]gwern[S] 6 points7 points8 points 5 years ago* (2 children)
OA discussion of announcement: https://blog.openai.com/openai-five-benchmark-results/ worth noting:
Simple tree search using the value function for implementing the apparently-complicated drafting (so adding more heroes shouldn't be too hard...)
In late June we added a win probability output to our neural network to introspect what OpenAI Five is predicting. When later considering drafting, we realized we could use this to evaluate the win probability of any draft: just look at the prediction on the first frame of a game with that lineup. In one week of implementation, we crafted a fake frame for each of the 11 million possible team matchups and wrote a tree search to find OpenAI Five’s optimal draft.
Heavy use of Net2Net/transfer-learning to avoid needing to retrain from scratch as they expanded the NN architecture to handle more possible actions, yielding a very large final architecture:
Our usual development cycle is to train each major revision of the system from scratch. However, this version of OpenAI Five contains parameters that have been training since June 9th across six major system revisions. Each revision was initialized with parameters from the previous one. We invested heavily in “surgery” tooling which allows us to map old parameters to a new network architecture. For example, when we first trained warding, we shared a single action head for determining where to move and where to place a ward. But Five would often drop wards seemingly in the direction it was trying to go, and we hypothesized it was allocating its capacity primarily to movement. Our tooling let us split the head into two clones initialized with the same parameters.
Compute estimates:
We estimate that we used the following amounts of compute to train our various Dota systems: 1v1 model: 8 petaflop/s-days June 6th model: 40 petaflop/s-days Aug 5th model: 190 petaflop/s-days
We estimate that we used the following amounts of compute to train our various Dota systems:
Past discussion of research:
Notes so far:
Discussion of the August The International tournament matches: https://www.reddit.com/r/reinforcementlearning/comments/99ieuw/n_first_openai_oa5_dota2_match_begins/
[–]untrustable2 0 points1 point2 points 5 years ago (1 child)
What's the evidence that they are focussing early-game to the detriment of later? Could it not just be optimal play?
[–]gwern[S] 1 point2 points3 points 5 years ago* (0 children)
It could be, but it's interesting that OA5 seems to focus so heavily on the early game, discounts Shadow Fiend, thought it'd won the first game from the start, and it is trained with a method which should make it very hard to learn very long-range planning. So it's hard to say, but there's some evidence that OA5 might have a flaw of that sort. If so, human players could get an edge by enduring the initial rush in exchange for some sort of major late-game advantage.
EDIT: a lot of people are saying this about Game 3 - the bots seemed to have a lot of trouble with a coherent strategy in the middle and end, which is what you would expect if they are early-game centric because they either can't or don't need to learn much about later games. An example commentary making this point at length, noting that in game 3 the bots start making massive numbers of outright errors and even having trouble moving units sanely: http://www.gamesbyangelina.org/2018/08/openai-dota-2-game-is-hard/
[–]untrustable2 6 points7 points8 points 5 years ago (0 children)
There were several moments where the AI had a threat come into view and instantly hexed(?) the enemy before a trained human had time to even process the data, thereby making the humans essentially impotent. Couldn't help but see some rather unpleasant military overtones.
π Rendered by PID 35 on reddit-service-r2-loggedout-5f7f7c5d76-5wdhv at 2024-04-08 23:50:58.369135+00:00 running f703be3 country code: US.
Want to add to the discussion?
Post a comment!