×
top 200 commentsshow 500

[–]nadipity[S] 226 points227 points  (5 children)

It appears that some of our team members don't use Reddit and their freshly made accounts are getting rate limited. Will be translating some of their answers using our Redditor team member accounts =P

[–]1-800-REDDITARDS 11 points12 points  (0 children)

Tell them to go to ask reddit and farm some karma lol

Make a sympathetic comment to a rising thread.

Its like playing 100 turbo to unlock ranked lol

[–]Sheever made me go pinkprohjort 81 points82 points  (23 children)

What is the bots logic when warding 4 wards on the same spot, or leaving a creep left in their own creep camp?

[–]suchenzang 125 points126 points  (9 children)

We have a theory that Five drops wards to keep item slots available for when they receive more valuable items. All of these are "learned" behaviors, so we can only theorize as to why they decide dropping multiple wards is the most likely / optimal action to take at a give time.

[–]100kV 34 points35 points  (7 children)

the decision process for choosing which

Are they aware that putting items in their backpacks is an option?

[–]suchenzang 69 points70 points  (6 children)

They are, but item swapping (from backpack to inventory) is also scripted.

[–][deleted] 17 points18 points  (5 children)

So, hardcoded? Were they unable to figure out its usage, or are you aware of any issues that would prevent them from using it?

[–]suchenzang 42 points43 points  (4 children)

We ran an experiment to let them learn this behavior, and it seemed like they were capable of learning it to a reasonable level. Unfortunately it didn't learn to use it any better than its scripted behavior, so we decided to take it out before our OG match.

[–][deleted] 18 points19 points  (3 children)

Out of curiosity: why not leave it in the self-learned mode? If the performance is on par with the scripted mode, what would be the motivation to revert?

[–]suchenzang 53 points54 points  (2 children)

We had a lot of model instability issues over the last few weeks leading up to the OG match. One of the suspicions was that newly introduced actions / parameters were breaking the model somehow (training runs were diverging at a really slow pace). We had to revert a lot of changes last minute and restart the training from a previous checkpoint, which unfortunately also removed the model-based item swap logic.

We also had a theory about how our introduction / implementation of item swap had broken gradients. These will all be topics we investigate over the next few months.

[–]nadipity[S] 74 points75 points  (7 children)

Currently our consumable logic is scripted, so the AI isn't really choosing when they're buying wards or regen. When the courier drops off something that the hero doesn't want, they'll often just use it right away - especially if their slots are full and they want whatever got shoved into their backpack.

As for creep camps, it's unclear if they understand the rules behind blocking a camp / finishing a camp - and even less clear if they understand the timers on those camps. The simple answer would just be that they haven't figured those concepts out yet.

[–]trebuch3t 10 points11 points  (1 child)

Additionally does this mean the salve over tango choice was yours or theirs?

[–]nadipity[S] 28 points29 points  (0 children)

Eliminating tangoes was originally our choice (particularly because we started out not telling them about all the trees in the game). We did train it over the last month or so but eventually we had to roll back due to some issues about a week before the OG match.

In terms of choice, it's a bit of a combination - while we tell them what to buy, we start out by seeing how they perform under different scripted circumstances (aka, figure out what they like or what they're good at) and then compare win rates to see which option is better for them.

[–]trebuch3t 3 points4 points  (0 children)

Can you share the scripted logic used for consumables? Some combination of health percent and available gold?

[–]FakePsyho 26 points27 points  (4 children)

Warding is one of those weird mysteries. I'm pretty sure that warding during benchmark was much better than now. ¯\(ツ)

[–]HoNTrashColonelWilly 218 points219 points  (103 children)

When the bot is training, is there an advantage between Dire and Radiant?

For human players, Radiant has a huge advantage: https://www.dotabuff.com/heroes/meta?view=played&metric=faction

[–]suchenzang 366 points367 points  (90 children)

We see a roughly +5% winrate when Five plays Radiant instead of Dire.

[–]TravisGurley 162 points163 points  (83 children)

Doesn't this mean the advantage Radiant has over dire does not have to do with the camera?

[–]Zett, the Arc WardenTheZett 153 points154 points  (68 children)

"Camera advantage" depends on subjective factors anyway.

Some people prefer playing on Dire and even play better on Dire than on Radiant.

Since the bots aren’t subjective, and they still have a 5% advantage, it can be concluded that the camera factor is indeed a non-factor after all.

[–]HowIsBuffakeeTaken 10 points11 points  (57 children)

Can you give an example of a player that has a higher dire winrate?

[–]NoveltyCritique 15 points16 points  (0 children)

Being a team game, if the average is skewed this far in favor of Radiant then it's unlikely that even a player who performs better on Dire will win more Dire matches than he loses; his win rate on Dire will simply be closer to 50% than the average player's.

[–]Weshtonio 13 points14 points  (0 children)

It's time for Valve to write you a check so that you put some agents fighting each other until the game is AI-certified balanced.

[–]nadipity[S] 75 points76 points  (10 children)

Our test teams have noticed that the behavior between Radiant and Dire are also vaguely different - either in terms of objective prioritization (ex: overprioritizing taking Radiant's outer safe lane tower when playing on Dire) or lane matchups, which then impact performance and thus winrate. Overall, the bias is likely different than humans (ex: they don't have the camera angle issue), but there may be some overlap as well.

[–]Ragoz 13 points14 points  (8 children)

If Open AI doesn't have the camera angle issue are you saying they are receiving more information than is provided from the field of view of a player?

The big issue for players is the angle of the field of view shows more information at the top of the screen than at the bottom as demonstrated in this image: https://imgur.com/IhVsx23

[–]FatChocobo 10 points11 points  (7 children)

Yes, the agents receive all visible information (i.e. not obscured by fog of war) via an API. They can see everything that's going on at all times.

[–]FakePsyho 47 points48 points  (0 children)

55-56% winrate for random mirror matchups in our 17-hero pool

[–]reapr56 73 points74 points  (17 children)

Would you guys consider adding a gimped version as replacement for the dota2 bots?

[–]suchenzang 87 points88 points  (16 children)

Would need Valve to ask us about it :)

[–]hinterlufer 9 points10 points  (13 children)

Wouldn't it be way to resource intensive compared to scripted bots?

[–]Plebinator6000 129 points130 points  (19 children)

Hey! Is there a possibility of OpenAI Five being accessible to the public again in the future? I'm away for the weekend and I'm gutted I can't play against them, and I'm sure the community would love having an extra bot mode in the game to practise with (and be demolished by)

Thanks a lot for all the work you guys have done, it's been really interesting

[–]suchenzang 119 points120 points  (18 children)

At this time, we don't have plans to keeping access to OpenAI Five public, unfortunately.

[–]dfarhi 132 points133 points  (16 children)

The main difficulty here is that every time Valve releases a game patch, Five's understanding would fall a little further behind.

[–][deleted] 9 points10 points  (7 children)

Is it not possible to keep such an AI continuously "in the loop" by keeping them busy playing throughout the new changes? If it is possible, what would be the main issue to prevent it from being realized? Is energy supply in any way a concern when running a model training perpetually?

[–]d2wraithking 41 points42 points  (3 children)

The amount of compute necessary to keep training a new model is enormous (and thus pretty expensive).

[–]SheepSlapper 21 points22 points  (2 children)

I thought GPUs grew on trees??

[–]pretty blyatkarabuka 4 points5 points  (1 child)

Its far more efficient to just download them...

[–]Be water my friendColopty 14 points15 points  (2 children)

[–]Honest_Banker 6 points7 points  (1 child)

Sell hats then! This community is willing to pay good money for an upgrade of Valve's shitty bots.

[–]meatgrind89 8 points9 points  (0 children)

GabeN has entered the chat

[–]Curiosity is what you lackhearthebell 4 points5 points  (0 children)

Aww we’ll miss them ;_;

[–]jstq 59 points60 points  (21 children)

So after this weekend, the dota part of OpenAI is done?

[–]nadipity[S] 144 points145 points  (20 children)

from dfarhi:

After this weekend we will close out the competitive portion of our project - after beating OG in the 17 hero pool, there's not as much to be gained by pushing further in the competitive direction. Instead, we're going to focus on research and using the Dota 2 environment to test tricky ideas and learn what we can about reinforcement learning and artificial intelligence. Now that we have one of the most complex and deep AI environments out there, it will hopefully unlock the ability to study really important questions about algorithms, exploration, environment structure, and more.

[–][deleted] 12 points13 points  (0 children)

Are there any insights specifically from constructing an AI for Dota 2? Is there something you'd learned that pertains to training an AI on this particular game?

[–]Decency 33 points34 points  (9 children)

After this weekend we will close out the competitive portion of our project - after beating OG in the 17 hero pool, there's not as much to be gained by pushing further in the competitive direction.

I don't follow. A tremendous amount of the depth of competitive Dota2 comes from the interplay between the massive of amount of entirely distinct heroes available to a team during each draft. Taking a tiny subset of that while ignoring the other 100 heroes, and saying there's not as much to be gained feels like the equivalent of Deep Blue mastering one line of the Sicilian and declaring victory- it's a very artificial threshold.

[–]Korvacs 39 points40 points  (6 children)

I think the point is more that the model they've built can clearly learn a hero and play it better in almost all cases than a human, at any level of play. Spending more time and money expanding the hero pool doesn't actually achieve anything from a research point of view.

[–]Decency 18 points19 points  (2 children)

I think the point is more that the model they've built can clearly learn a hero and play it better in almost all cases than a human

I don't agree, at least not based on what's been shown publicly. OpenAI can play a hero excellently in all cases where the only heroes in the game are the 17 that have been chosen and trained against. For another way to phrase the argument: with 17 heroes, there are 6188 possible lineups. Just over the course of this weekend, they'll have played about that many games. But when adding the 18th hero, that number doesn't go up linearly- it goes up by about 40%. What happens when you double it, say to 34 heroes? Suddenly there are 278,256 possible lineups: 45 times more than what the AI has trained with for this event.

With the full hero roster, there are 167,549,733 possible 5 hero lineups. So for this weekend, OpenAI is showcasing its mastery of 0.0037% of all possible Dota2 lineups. It's absolutely an accomplishment- but it's not Dota2, not by a long shot. Each of these lineups has nuances, similarities, and differences to others that human players have to determine and evaluate on the fly (often having never played a given 5 hero combination together). The AI doesn't- it's played plenty games with each of these lineups and against each lineup it faces.

Another problem is that some of the ways we've seen the AI gain an advantage is through things that aren't at all related to intelligence, tactics, or strategy. Calculating the maximum damage of three spells and an autoattack to an exact value against a given magic resistance and armor isn't "outplaying" a human, it's just out-mathing it. Reaction times were another issue- I know they've tweaked it multiple times to be more accurate, but human reaction times have a variance that players need to account for. You can't just automatically rely on hitting a perfect BKB against a Lion blink->stun because Lion's cast point is 0.3 and your peak reaction time is less than that... that's not realistic at all.

If they've simply chosen to adjust their priorities based on what they've accomplished in Dota2 already, that's understandable. But phrasing that as if Dota2 has somehow been conquered when literally 100 heroes have been ignored (including all of the most complex ones) just seems ridiculous to me- certainly more marketing than science. I'd love to see an article on why they feel that the gap between 17 heroes and 117 heroes is so easily bridged just by throwing hardware at the problem, and what kind of specific training they have to do for each new hero that's introduced.

[–]Korvacs 4 points5 points  (1 child)

This is a good post however the crux of the issue is simply time, the model can incorporate and master every hero given enough time. That's the only thing that OpenAI needs, the model itself is clearly capable of learning and delivering on this scale with enough time. As it stands I believe the learning process after a new patch for 18 heroes is two weeks, increasing the pool size dramatically increases the learn time to the point where it's simply impractical to learn that many heroes from the point of view of a research project. Plus there simply isn't any benefit.

And as I said in another post, the point of this isn't to build the best bot for Dota 2, it's a research project to build a model which can be used in real world applications, Dota 2 just offers the kind of complex environment that really tests it's ability to learn and master tasks, and also gives it a lot of publicity.

The fact that the reaction times aren't exactly fair compared to humans, or that it can do maths more precisely are irrelevant to the goals of the project, the fact that I can do these things quickly and precisely are actually to it's benefit.

[–]jQiNoBi 55 points56 points  (14 children)

How can we be sure that you guys are not an AI as well?

[–]FakePsyho 172 points173 points  (13 children)

Can't be sure. I frequently fail captcha tests.

[–]suchenzang 78 points79 points  (11 children)

+1

[–]FakePsyho 77 points78 points  (10 children)

you know, you can upvote on reddit ;)

[–]suchenzang 70 points71 points  (9 children)

I like typing +1

[–]FakePsyho 50 points51 points  (5 children)

I like typing +1

FTFY

[–]suchenzang 50 points51 points  (4 children)

:(

[–]FakePsyho 40 points41 points  (2 children)

Hi!

[–]carrymugabe 16 points17 points  (1 child)

These AI conversation systems here seem to be pretty close to passing Turing Test.

[–]unluckycowboy 12 points13 points  (0 children)

+1

[–]satosoujirou 23 points24 points  (2 children)

im pretty sure you guys are bots.

please dont destroy humans.

[–]FakePsyho 35 points36 points  (1 child)

We love humans!

[–]pw0300 1 point2 points  (0 children)

That is what a bit would say. A real human hates other humans, you got a lot to learn.

[–]mechkg 51 points52 points  (8 children)

Hi guys. I was wondering how much does it cost to train the bots to the current level of play purely in terms of computational resources if you used AWS or the Google equivalent?

How much would it cost to train the bots to play the full hero roster at the same level?

[–]overminder 89 points90 points  (7 children)

Not from OpenAI, but their website says the latest version takes 800 PFLOPS-day to train. One unit of TPU v3 preemptive provides 420 TFLOPS and costs US$2.4/h. So in total that's US$~110k. Note that this is a very rough calculation...

[–]crashlnds_player 15 points16 points  (0 children)

It would likely cost more than that though since they also need to carry out small experimental and tweak. This can easily waste a lot of their credit especially if they train from scratch which I think they always use weight initialized network from previous version.

[–]FakePsyho 170 points171 points  (11 children)

Btw, there's a small easter egg that we have hidden in the drafting phase. As far we know, no one found it yet!

Funnily enough, it's there since benchmark match. But since we streamed matches with custom UI for drafting, no could see it before.

[–]kmsUFO 322 points323 points  (8 children)

FOUND https://i.imgur.com/F9d78M8.png

Omni

Pheonix

Ember

Naga

Ancient Apparation

Invoker

[–]FakePsyho 129 points130 points  (0 children)

YES!!!

[–]Wivyx 43 points44 points  (3 children)

And the next ban phase includes Faceless and Visage so I bet it spells out FIVE :)

[–]theclarice 19 points20 points  (0 children)

FAVI = NaVi?

[–]MidSolo 8 points9 points  (0 children)

Five = 5 = V

[–]ginnaz 8 points9 points  (1 child)

Are you genius or something

[–]j2i2t2u2 38 points39 points  (1 child)

Huge congrats to the team. Couple of questions, thanks for answering.
1) Now that you have achieved super-human perf on this complex games, what is the 6 months roadmap for RL for Dota 2 ?
2) What is your day to day like as engineer of RL for a MOBA game?
3) What is your (OpenAI-s) cooperation with Valve like? To what degree, did Valve support you in achieving super-human AI for dota 2?

[–]suchenzang 53 points54 points  (0 children)

From @christyopenai:

1) There's still a lot left to understand! The main goal of this project is to research RL, and we've mainly been focused on getting Five to be the best it can. We can now take a step back and figure out why Five works the way it does, and hopefully help to make RL more efficient and train better.

2) Being an engineer means you have to understand Tensorflow, RL, the game engine, basically the entire stack. On a typical day, we might watch replays and see issues with training. Does Five need a new observation? Could the observations be processed in a way that is more optimal? We look at performance reports and try to find ways to crunch down the time. What is the win rate if a hero starts with an extra salve? Our team is made of engineers and researchers, but everyone knows engineering and everyone works together, so engineers frequently do research too. It's a lot of fun to be on this team :)

3) Valve helped us get frozen builds. Since we need to retrain every time there is a new patch, and that upgrading process can be time-consuming, it was important to get a version that wouldn't change.

[–]Yamakasinge 77 points78 points  (6 children)

How much computing ressources does it cost to run one bot after training is done ?

[–]suchenzang 101 points102 points  (4 children)

32 CPU cores is enough to run a game with Five.

[–]mpetrov 86 points87 points  (0 children)

to clarify, this is 32 Intel Skylake cores which are really hyper-threads - so the real number is closer to 16 physical cores to run both the game and the bot.

[–]Petrroll 22 points23 points  (2 children)

So the inference is able to run on CPU in realtime? Any reason for not using GPU?

[–]mpetrov 73 points74 points  (0 children)

It's simpler not to use a GPU for a real time game like Dota because the gains in efficiency from using a GPU are due to being able to batch multiple passes in parallel. However, batching introduces latency / queueing problems which is not ideal for a real time game.

Also, today it would be slightly faster if you do use one GPU per game but that would be insanely expensive compared to a CPU.

[–]jonathanraiman 21 points22 points  (0 children)

a recent laptop :)

[–]TentacularMaelrawn 75 points76 points  (34 children)

What's the decision process for choosing which heroes for the OpenAI Five to train on?

[–]nadipity[S] 109 points110 points  (33 children)

When we first started out, we picked heroes that we thought were easiest for the AI to learn (ranged, straightforward abilities, etc). After we started seeing some progress, we attempted to balance out the pool a bit by adding melee heroes and pos 4 heroes. Next on our list were more fun / interesting heroes, but they unfortunately didn't get to the level where they were as competitive as the original set.

[–]47-11 27 points28 points  (32 children)

Can you tell how many heroes that extended pool includes?

[–]nadipity[S] 65 points66 points  (30 children)

The first 2 we added were Drow and Huskar, and after they were nearly on par with the original set we added Pugna, Pudge, Venomancer, Mirana, and Windranger to see if we could learn new mechanics that didn't exist in the original pool. We also trained a pool of ~80 heroes (excluding summon/illusion heroes) at very low scale to see the impact.

[–]Mr_Enzyme 26 points27 points  (1 child)

The pool of 80 sounds really cool - was there a much bigger drop off in the learning rate than with the pool of 25?

[–]jonathanraiman 2 points3 points  (0 children)

Skill measurements with larger hero pools become a bit tricky. Particularly when you lack good reference opponents that you can regularly measure against to detect learning slowdown. We were able to detect high growth on totally unseen heroes, but it’s anecdotal at this point.

[–]Castature 34 points35 points  (13 children)

Are you guys planning on branching out into other games? Whether they be mobas, rts games, fps etc.

[–]suchenzang 88 points89 points  (1 child)

At this time, we're not planning on branching out to other games. There's still open questions within Dota that we can explore and utilize as an RL environment for research.

[–]LivingOnCentauri 9 points10 points  (0 children)

What those gonna be, there are still a lot open topics in AI research, are you open to show those results at one point to the public if you are satisfied?

[–]NitroBubblegum 7 points8 points  (10 children)

There is also DeepMind, for Starcraft 2 that is also smashing the pros

[–]y2kkmac 14 points15 points  (9 children)

That bot's micro was impossible.

[–]Wivyx 28 points29 points  (7 children)

Watching games where the humans win, it feels like Open AI is quite bad/not capabale of anticipating moves or planning for the long term. They react to what they see, and don't seem to think "we can't see the enemy, they are probably planning a gank/smoked" or "this hero has a tendency to splitpush top, let's set a trap to catch him" like humans would do. Do you think these are strict limitations to the AI or do you think the AI could learn such human-like behaviour if they trained with (high skilled) humans? Why?

[–]suchenzang 30 points31 points  (2 children)

It's a bit hard to map how Five works to how humans reason about the state of the game. While we may not be able to see it reason explicitly, Five has learned to play in such a way to counter strategies it develop throughout the course of its training.

If you were to rewatch our first OG match, there was a moment where Five predicted a 95% chance of winning, despite the game appearing even to most of us. Shortly after this prediction, Five wins a team fight and pushes to the high ground, at which point its 95% win prediction finally seemed accurate. Five simply has a different way of approaching how it would achieve its goal of winning, which may or may not map to how humans think about "strategy".

[–]Yamakasinge 79 points80 points  (14 children)

Will we ever see bot play full hero pool dota ?

[–]suchenzang 114 points115 points  (13 children)

We currently don't have plans to expand to the full hero pool, though we may explore this in the future if we were to discover drastic improvements to training efficiency.

[–]I miss the Old Alliance. sheeverThatForearmIsMineNow 61 points62 points  (9 children)

/u/ArgetDota was downvoted for our sins

[–]JackeyWhip 36 points37 points  (2 children)

Wtf are all these downvotes and the "for now" comments, OpenAI already said 5 days ago they'd stopped the learning process.

[–]NPSimco_ 22 points23 points  (1 child)

I didn't vote on that post but tons of people auto downvote anyone who edits just to address downvoting or who calls people retards.

[–]100kV 6 points7 points  (1 child)

Do you forsee any drastic improvements to training efficiency? Or is it just not technologically possible right now?

[–]atlatic 5 points6 points  (0 children)

Innovations in reinforcement learning algorithms could lead to it. OpenAI uses a model-free algorithm. A lot of RL researchers are working on model-based algorithms, which are more data-efficient, but these algorithms still need to be proven on smaller problems before a game as complex as Dota 2 can be attempted.

(Not from OpenAI)

[–]rawriclark 56 points57 points  (10 children)

can you please not close this? i wanna play this forever

[–]nadipity[S] 87 points88 points  (9 children)

We would love to keep it open for people to play but unfortunately, each patch for Dota 2 currently requires additional training to bring the AI up to speed.

[–]JackeyWhip 41 points42 points  (6 children)

So it is not possible to run it on a custom game that would be a copy of the current patch?

[–]nadipity[S] 65 points66 points  (4 children)

That is possible - still takes some maintenance but doable - though more difficult for the wider public to do since it takes upgrading/downgrading the client. Right now we're crossing our fingers and hoping that Valve doesn't have a patch for Dota 2 planned before Arena closes!

[–]PuppeyFacerastla 47 points48 points  (3 children)

I think what he meant was something like this: https://steamcommunity.com/sharedfiles/filedetails/?id=818848098

Then it'd be a custom game (like e.g. Dota Auto Chess)

[–]FakePsyho 18 points19 points  (0 children)

Oh lol, my tired brain filtered out "custom".

Yeah, not possible. At least not without non-trivial modifications. Those are essentially different games.

[–]atlatic 4 points5 points  (0 children)

/u/FakePsyho might not whether this is possible.

[–]rawriclark 4 points5 points  (0 children)

could you open source the code so others can maintain it and host servers for you guys?

[–]buck614 25 points26 points  (17 children)

How does the AI get vision on itself, friendly units, and friendly structures? Can it 'see' all those at once in real time wherein a normal player only see the native field of view? I hope that makes sense.

[–]jonathanraiman 61 points62 points  (10 children)

OpenAI Five uses the bot api to observe the state of the game. We cannot break the fog of war, however we can see all visible units at once and remember where we saw them last. This means that events far off from the controlled hero are available to us.

We do however cap the number of units we can see during a game and sort by distance to our heroes. This means that when the map is crowded, we only see the closest units.

[–]Mr_Enzyme 6 points7 points  (3 children)

So it only looks at the nearest N units, probably prioritizing heroes above creeps? Were there any other areas where you capped the length of a potentially long vector like that (maybe trees or projectiles)?

[–]LvS 12 points13 points  (0 children)

Nature's Prophet with Aghanim's basically makes your hero invisible!

[–]suchenzang 32 points33 points  (5 children)

We have access to an API from which Five is able to access state of the game. It effectively then sees all these data points in real time - unlike the vision limitation that normal players would have.

[–]RogueCarpet 60 points61 points  (4 children)

I think it's important for people to realize that not all information that human players see is available in these APIs. For example, the bots don't handle Shrapnel very well because the bots can't see where the spell is positioned. You'll notice the bots walk into the Shrapnel briefly, and then when they take damage they realize there's an AoE there and walk away. Similarly, they only can tell where Fissure is by trying to move somewhere and having their pathing be unexpected.

[–]suchenzang 29 points30 points  (0 children)

+1 As with any engineering effort, there will be code paths that we miss and observations that we forget to integrate. The amount of observations that are added to the model during training is definitely a subset of all that is available for a game state at a given moment in time.

[–]nadipity[S] 20 points21 points  (0 children)

Additionally, we're pretty far from fully utilizing everything coming through from API because of the amount of info there and the engineering we'd have to do to support it. Sometimes it took us a significant amount of time before realizing we were the blocker for the AI doing things (such as allowing it to see and attack Gyro's missile).

[–]tutori 9 points10 points  (1 child)

But at the same time, it also allows them to do things that humans cannot, effectively seeing the whole map at once, where we are limited to a screen plus minimap.

[–]Deamon- 23 points24 points  (3 children)

will you ever show us what those bots can do with heroes like ember meepo invoker etc?

[–]nadipity[S] 126 points127 points  (2 children)

We have a few clips at various ability levels for other heroes that we'd love to share once things calm down a bit - some pretty cool (as well as hilariously bad..) game videos =D

[–][deleted] 22 points23 points  (1 child)

Do you have any data on Average MMR of team vs Win Rate against OpenAI?

[–]FakePsyho 42 points43 points  (0 children)

We don't have access to any data that is not publicly available. Which essentially means that we know as much as you do.

[–]dinosaur_noises 38 points39 points  (3 children)

One of the biggest surprises for me was that the relatively simple Proximal Policy Optimization method seems to be successful with the long-term thinking required for success in DotA 2, as you mentioned in your blog post about it. I think it aligns nicely with the recent short essay from Rich Sutton called The Bitter Lesson. I've noticed though that both OpenAI Five and the DeepMind SC2 AI seem to do best against human in short-term tactics and are perhaps just competitive in long-term strategy. It is amazing that a general learning method can be successful in playing in such a complex, cooperative, and partial information setting, but is it really measuring long-term strategic thinking? I know your team thinks carefully about this in limiting response times and ensuring their performance is similar to a humans to avoid beating them only in mirco. Do you believe the AI is succeeding in this long-term planning or is this a weaknesses? Thanks!

[–]jonathanraiman 37 points38 points  (2 children)

Detecting and measuring long-term planning in strategy games is definitely confounded with other aspects of gameplay. From some preliminary assessments based on extra predictions we make within Five, we find that 60-90s ahead of time we commit to specific towers and objectives in the map.
You can see these predictions as lines going from heroes to towers and lanes in this video: https://s3-us-west-2.amazonaws.com/openai-assets/how-to-train-your-openai-five/game1_og_minimap.mov (more linked here https://openai.com/blog/how-to-train-your-openai-five/#replays)

[–]RogueCarpet 16 points17 points  (2 children)

How are item builds and skill builds handled? I believe an early version of OpenAI had a few pre-selected builds for each hero and the bots would pick between these. Any changes here?

[–]FakePsyho 29 points30 points  (0 children)

We're still using fixed (scripted) item & skill builds. During training they are randomized, so the model is able to learn how to play vs different builds.

We experimented with RL-based item builds and we had promising results. Unfortunately, we ran out of time in order to utilize for our Finals & Arena events.

[–]mpetrov 18 points19 points  (0 children)

The builds are mostly preselected but the bots do affect which ones are selected for different games. This is an area where we would love to give more control of it to the bots!

[–]I got jizz on me chinFortheseoccasions 11 points12 points  (6 children)

What are your mmr?

[–]FakePsyho 39 points40 points  (4 children)

I stopped playing around a half year ago I was slightly below 3k.

[–]jonathanraiman 9 points10 points  (0 children)

If we train OpenAI Five from scratch, it takes it 24h before I cannot beat it anymore.

[–]HoNTrashColonelWilly 11 points12 points  (2 children)

I know the team has worked to compensate for the fact that the bots do not have the same physical barriers that humans do by limiting actions per minute or reaction time, but have they considered solutions for the loss of efficiency from how humans are forced to physically interact with the game (moving the mouse, only having so many fingers to press keys, eyes having a cone of focus, etc)?

I ask because, as I'm sure you've considered, the bot can "out-play" a human opponent not through strategy but because we do not have direct I/O to the game.

[–]nadipity[S] 22 points23 points  (1 child)

It's a bit difficult to translate the number of fingers that a human has into how many milliseconds of delay this is equivalent to =D. Overall we're not necessarily going for an exactly even playing field since the two sides are so inherently different - humans have advantages (ex, being able to learn game to game, knowing they're playing an AI) and bots have advantages (they're not humans). We're more of interested in how the two different paths that each side took landed them in a somewhat similar place in terms of approaching Dota.

[–]xpkoala 20 points21 points  (2 children)

Had a blast watching the show with OG and the OpenAI crew. Are any technical papers about the current capabilities available to read? Will raw stats on the matches taking place over the weekend be made public (game win/loss, hero selection, apm, gpm, etc)? You all seem to be having a blast working on the project, wish you all the best as it continues to grow.

[–]nadipity[S] 27 points28 points  (1 child)

From jonathanraiman:

We are planning on posting a follow-up blog post when Arena is over analyzing the results of the games (win/loss, heroes, coop, etc..) and post replay files.

We're also planning a technical paper detailing the work in greater detail. Our blog post contains architecture details and other info in the meantime: https://openai.com/blog/openai-five/ .

[–]sheever FIGHTING !! gogo !!Bokoloony 20 points21 points  (3 children)

So a lot of people argue that since your AI "figured out" DotA, there's no incentive for you to make it train against more heroes. 17 (is it ?) or 117, it's only a matter of computation power and training. Do you think that's correct ?

I wouldn't be surprised if the computation power required to train for 117 heroes is orders of magnitude above what you needed for 17, making it an actual challenge. Because the time required is not linear at all but rather quadratic (or exponential, or factorial even, I don't know). How wrong am I ?

Another argument is that the other heroes add a lot more diversity, making it heck of a lot easier to exploit openAI's weaknesses (such as splitpushing, or AOE denial spells like shrapnel, apparently it's bad against that). I guess you could tweak the set of rewards you laid out for it to learn, but would that be enough ? Does OpenAI adapts its rewards according to the enemy team composition and its own ?

[–]suchenzang 54 points55 points  (1 child)

We do agree that there is not much incentive for us to train against more heroes at this time, due to the degree of engineering difficulty in integrating more heroes into our training pipeline and battling issues with our integration with Dota. We've ran experiments where our hero pool expanded to 25 and above (up to 80 at one point), and saw that most heroes were able to play at a roughly ~3-5k MMR level within a very short amount of time. This led us to believe that our model was able to transfer these learned behaviors from a small subset of heroes to the rest, without incurring the orders of magnitudes of computation cost that comes with the combinatorial explosion of hero line-ups. We haven't fully validated this theory yet, and we may reconsider exploring this in the future.

[–]HPA97 10 points11 points  (1 child)

Could putting the AI through custom scenarios to teach stuff like smoking/warding/invis be a way to fix the current problems they have with those things? Instead of having them only play the regular dota map. Have a map where they need go get from A to B without getting detected ( smoke or deward type scenario )

[–]suchenzang 31 points32 points  (0 children)

Yes, we've tried multiple ways of randomizing the environment so that we can place Five into these situations where it's easier to learn some of these behaviors. For example, we randomized roshan health so that it was easier for Five to discover the value in taking rosh.

[–]⬆️heypaps 10 points11 points  (1 child)

Are there any professional fields that have expressed interest in the learning system of OpenAI for practical application?

[–]suchenzang 17 points18 points  (0 children)

We've already utilized the same training pipeline for Five within our robotics team (https://openai.com/blog/learning-dexterity/).

[–]LooseGoose0 8 points9 points  (2 children)

I think the ability of OpenAI Five to be able to cooperate with humans is really interesting, especially as it was not trained to be able to do this. For AI to be able to cooperate and work with humans, rather than just replace them, is really bloody cool. Are there plans for your team to work on this problem moving forwards? Either within Dota or not.

[–]FakePsyho 10 points11 points  (0 children)

Are there plans for your team to work on this problem moving forwards? Either within Dota or not.

Personally, I'd love to further explore this area as it's fascinating both from AI / game design perspective and eventual practical applications.

[–]suchenzang 7 points8 points  (0 children)

As part of our mission (https://openai.com/charter/) this is definitely an interesting area to explore, but we don't have immediately plans on the Dota team to work on this problem going forward.

[–]BubbsTheCuber 18 points19 points  (3 children)

Hey! I wrote a paper about deep learning and the sort. Artificial Intelligence is really interesting to me. Do you think in the near future a artificial general intelligence will be created? Thanks for the AMA guys!

[–]hponde 32 points33 points  (1 child)

We are working towards that goal. It's part of our charter: https://openai.com/charter/

[–]TheGraysmith 5 points6 points  (0 children)

https://www.youtube.com/watch?v=bIrEM2FbOLU

This podcast will interest you!

[–]Xexos1 17 points18 points  (2 children)

Whats the main reason you choose dota2?

[–]FakePsyho 52 points53 points  (1 child)

There were few reasons: - Popularity (and huge prize pools) - Reflex/Micro is a secondary skill - Depth (complexity) - Availability for linux - API

All of the are equally important.

Complexity gives us a very interesting problem to tackle. Not relying on reflexes makes the game a more fair human-vs-AI testbed. Popularity/prizepools ensures that people invested countless hours into the game and we will get a proper benchmark for our model. And lastly, linux support & API makes everything more cost-effective.

[–]fdasilva59 15 points16 points  (0 children)

Any possibility to have a collaboration with Deepmind in order to have AlphaStar and OpenAiFive to compete against each other and have a technical debrief on the approaches, what is working and what is not working ?

I mean both agents competing both against each oghers at both Dota2 and Starcraft2. That would give a nice insight about how the 2 approaches can generalize to another competitive environment.

[–]burnmelt 7 points8 points  (8 children)

Any plans to lift all restrictions (heroes, summons, items, etc)?

What is the most interesting thing y’all learned?

Are there any other experiences or information you want to share, but haven’t been asked about yet?

[–]FakePsyho 19 points20 points  (3 children)

What is the most interesting thing y’all learned?

The thing that surprised me the most was that a lot of problems that we believed will be extremely hard for AI to learn turned out to be not-that-hard in the end. The best example of this is map rotation during early game, since it does require a bit of exploration with immediate loss in reward.

Generally speaking, it seems that as humans we tend to believe that a lot of things we do is very complex and requires a lot of expertise. And, in the end it turns out that it is not exactly the case.

[–]Phnrcm 6 points7 points  (2 children)

The thing that surprised me the most was that a lot of problems that we believed will be extremely hard for AI to learn turned out to be not-that-hard in the end.

Was there anything that turned out to be unexpectedly hard for AI to learn?

[–]FakePsyho 15 points16 points  (1 child)

Yeah

  • Warding is way worse than expected
  • Item swapping through RL (we had to revert back to scripted)
  • Power threads switching
  • Figuring out to get melee rax instead of ranged rax (although there is small chance, we're all wrong here)

Some of those are probably some bugs/mistakes on our side. With such a complex project, it's honestly very hard to tell if something went wrong because "it's hard for AI" or "humans did something wrong". There are so many areas where it may go wrong (engineering bug, bad design of training/network, unexpected dota behavior, lack of understanding of environment, random bugs in network architecture, gradients going crazy for some reasons) that sometimes we just had to scrap an idea and start from scratch.

[–]suchenzang 22 points23 points  (3 children)

Right now, we don't have plans to continue lifting restrictions and building a better agent to play as Five. We are definitely surprised how far we were able to push the limits of existing algorithms by scaling up to the scale that we have for training Five. We were also surprised about our ability to transfer the model across different patches of Dota and continue training, while growing the model at the same time.

[–]SFKillkenny 7 points8 points  (0 children)

When the two teams of AI vs each other do they both predict the same win probabilities or do they predict separate ones because of the lack of information. Also if they do predict separate how big is the discrepancy usually and have you ever had both teams thinking they are ahead before?

[–]turingalan_ 14 points15 points  (7 children)

Kudos to OpenAI team for AMA!
First and foremost congrats on the winning of the OG, it's big for both AI and DOTA communities, and shifts the perspective on how well simple algorithms could actually scale and get to the point of winning the best human player.

I have a couple of questions for anyone who could address them:

  • Have you observed any hierarchical behavior on how the agent is controlling the hero when it plays with other AI in the team vs in collab mode? E.g would the frequency of the actions the agent takes would be much higher because of the uncertainty the human teammate introduces?
  • On Twitter, Ilya Sutskever has mentioned that the agent was trained continuously for 10 months, any insights on how different it is from the regular lifecycle of other ML/RL project when the training is almost always started from scratch? What were the challenges there and what worked the best?
  • And lastly, one of the goals of the project was to demonstrate the capabilities of the scaling the algorithms to absurd level (in nowadays computational resource terms), what are the other things you have learned and what do you expect to learn but continuing working on this project further?

Thank you!

[–]suchenzang 18 points19 points  (5 children)

  • We haven't fully researched our coop mode and how Five behaves differently. The Arena will provide some interesting data this weekend for us to dive into this.
  • As far as we know, training Five continuously over 10 months is rather unusual for RL projects. There were definitely challenges in building out the tooling for us to "surgery" parameters from one version of the model to the next as we grew our model over time. Outside of having dimension/shape errors that come up, there were many instances where surgery failed silently, and we ended up with Five suddenly behaving very strangely after an experiment restart.
  • We will definitely be diving deeper into our learnings over the next few months. In our push to develop Five, there were a lot of decisions that were made where it wasn't 100% clear whether or not they benefitted Five's learning curve. We hope to examine each of these in detail and release as much of our findings as we can.

[–]TweetsInCommentsBot 5 points6 points  (0 children)

@ilyasut

2019-04-15 17:45

OpenAI Five was trained continuously trained for 10 months. Typical ML models are trained in under 2 weeks. The most capable ML systems of the future will be trained for an even longer time. https://twitter.com/gdb/status/1117845462608826368


This message was created by a bot

[/r/DotA2, please donate to keep the bot running] [Contact creator] [Source code]

[–]Ziggy_st 13 points14 points  (4 children)

Do you think it is better if bots could train with only +1, -1 rewards for winning/losing instead of RL with rewards for 'small' things like cs, wards, towers etc. ?

[–]nadipity[S] 16 points17 points  (2 children)

It'd definitely be interesting and would open up opportunities for the AI to learn to win the game that potentially doesn't follow the typical path of a Dota game. We did try this with 1v1 and saw some success, but haven't attempted it with 5v5.

[–]savvy_eh 6 points7 points  (0 children)

The smaller rewards seem to be a 'shortcut' to encourage the desired behavior to occur more quickly than it would organically, so it can be 'learned'.

The OAI5 team spent ten months training the current iteration. Imagine how long it would've taken if the AI had to first learn that hitting creeps might give gold, and having gold might result in an increased chance of winning - or taking damage might mean dying, and dying might mean losing.

[–]FeIiix 5 points6 points  (1 child)

What hardware setup are the agents currently playing in the arena running on?

Have you done tests/benchmarks on how much different hardware affects agent performance?

Are there plans to release the trained model to the general public?

[–]hanmas_aaa 4 points5 points  (8 children)

Any plan to tone down AI's reaction time so they can't instant eul/hex blink initiator? Actually, are those plays really 200ms?

[–]nadipity[S] 47 points48 points  (7 children)

It's actually a bit less about pure reaction time and more about the lack of the AI being surprised about the play. The real solution to making it more human-like would be able to dynamically nerf the response depending on whether the play is coming from out of vision or whether it would be unexpected. When a human and AI are racing to accomplish the same expected thing (such as grabbing the bounty rune), the human almost always wins.

[–]Kitchen_Owl 4 points5 points  (2 children)

Is there a way that the bots could learn other methods to win apart from the 5-man deathball observed in the games? Not that I'm saying it isn't effective, just curious if they are capable of playing from behind( let's say), and one major win condition is ratting (destroying buildings), while the other team members engage a fight. In short, can there be various strats considered as early as in the drafting phase against specific teams with specific playstyles?

[–]suchenzang 21 points22 points  (0 children)

There definitely is a way for Five to learn other methods, but we haven't explicitly encoded any of the strategies that Five ended up discovering (in this case, the 5-man deathball strat).

The goal of this project was to let Five discover these strategies through the process of training and selfplay, as opposed to explicitly enforcing a playstyle that mimics those from humans.

[–]FakePsyho 15 points16 points  (0 children)

The strats are way more varied that just a 5-man death push.

Due to incredible teamfight coordination, 5-man push is just way more scary in Five's hands than in humans' hands. Since Five only plays vs itself, it does greatly undervalue expected power of 5-man push vs human players.

[–]kamelasa11 4 points5 points  (1 child)

Firstly, I love what you guys have done! Amazing work :-) Will there be more heroes in the mis any time soon? And please make it available to the public some time in the future as well!

I was looking at the architecture of your neural network and was confused about one thing. For each of the five heroes on one team you take N units into account at any point (such as creeps, heroes, etc which makes sense). But you need a fixed size vector to feed your network. Is the procedure here to just take the max value for each element (some form for max-pooling)? E.g. if you have two units represented with vectors [10, 7, 8] and [1, 2, 15] then the resulting vector is [10, 7, 15]. But let's say you have a thousand units you are looking at and the max at each of those also results in a vector [10, 7, 15], but these two states are not equal, even though the resulting vectors are equal. I guess max pooling also has this issue in 2D, but not to the same extend as here..

[–]JustAprofile 4 points5 points  (2 children)

It seems that while the bots reached a far more optimum learned methodology to playing dota they still lagged behind any active reasoning in the middle of the game. Only operating from a constrained set of parameters without employing strategy or creativity. Only specializing in a narrow discipline and excelling along those lines possible above any competitive team. Does there exist a way to engrave creativity or even narrow forms of higher order reasoning using either software or hardware solutions, to emulate some smaller parts of cognition?

[–]suchenzang 5 points6 points  (1 child)

OpenAI currently is forming a reasoning team to explore these topics (https://twitter.com/gdb/status/1116381180079656960)

[–]Lagmawnster 4 points5 points  (0 children)

As a (finishing) PhD student in computer science myself, currently working on my third publication involving Deep Learning and Transfer Learning. Do you have any recommendations as to what could make my profile particularly interesting for companies like OpenAI? I know about the general profile you are looking for, from your recruiting pages, but would, for example, a Deep Learning side-project that is utilizing state-of-the-art methods on Dota 2 be worth noting?

[–]buck614 3 points4 points  (10 children)

How often will the AI update this weekend? After every game, day, or after the weekend is over? Also ... any additional info on how the AI updates after it finishes matches would be great!

[–]nadipity[S] 25 points26 points  (5 children)

from dfarhi:

The AI is not updating at all from the Arena games; we export a frozen model from the training pipeline a few days ago. It has been training against itself in the past few days, but we probably won't pull a new model because the difference will be too minor to be worth the technical risk that comes with any change.

It might be an interesting research avenue to pursue incorporating human games into training, but with our current process those games would just get drowned out when averaged together with the millions of bot v bot games. Fun fact: since opening, the Arena still has not produced as much total gameplay of data as a single iteration (~1 min) of training.

[–]buck614 5 points6 points  (3 children)

I assume the .07% (currently) of games won by non killing machines will be looked at in some way. How do you analyze that? Just curious.

[–]suchenzang 13 points14 points  (2 children)

The team will watch them and see if we find anything unusual. :)

[–]suchenzang 26 points27 points  (3 children)

Five will not learn from the games that are played during this weekend - it currently only trains via selfplay (games against itself). It currently does not train with any data taken from games between human players.

[–]buck614 7 points8 points  (2 children)

So this is purely a widely distributed public test of the AI ... not really incorporating its experiences over the weekend to learn upon?

[–]i will reach 1.83 , believe meNortrom_ 3 points4 points  (1 child)

Sorry to ask but how i can play againts open ai bots?

[–]nadipity[S] 17 points18 points  (0 children)

If you go here -> https://arena.openai.com/#/, you can create a login linked with your Steam account and then request a server to play!

[–]TheSausageKing 2 points3 points  (0 children)

How do you feel about OpenAI changing from an open, non-profit, to a for-profit entity that keeps some research proprietary? Has it affected the work you do or your view of the organization?

[–]surrealmemoir 6 points7 points  (2 children)

Have you run into difficulties of letting bots perform “big jumps” of their strategies? My understanding of Deep Learning is that with gradient descent, you usually make small changes of their strategies each time.

For example, “macro” strategic decisions like 5-man vs split push may deviate from each other significantly. If the bot is being improved mostly by self-play, how would you adapt if it turns out the split strategy is effective?

[–]suchenzang 16 points17 points  (0 children)

It's a bit unintuitive how strategy space would map to some metric space onto which we can gradient descent upon. The fact that we see Five learn these 5-man strategies doesn't necessarily imply that it's a "leap" to go to split push, given that we can't really quantify how far apart these "strategies" are in how we have parameterized our model.

[–]realjebby 7 points8 points  (2 children)

With AlphaStar there was the issue about how it was too good at micro aspects ("mechanical skill") comparing to a human. And such advantage feels like some kind of cheating, like an aimbot in a shooter. I think OpenAI Five has a similar issue. It's just too good at mechanical skill related things like right-clicking (with Sniper) and casting spells (all 5 bots perfectly focusing someone) in a teamfight, but has no signs of understanding of the big picture (the macro aspect).

So what would you prefer between two options: developing a strong brute-force bot which is able to defeat any human team using that artificial mechanical skill advantage or a mechanically weak bot (below average skill), but able to win (sometimes) by using different strategies, showing some kind of adaptation to what the opponent is doing ("understanding" of the big picture)?