Emergent Complexity via Multi-agent Competition
Code for Environments and Trained Policies: https://github.com/openai/multiagent-competition
Code for Environments and Trained Policies: https://github.com/openai/multiagent-competition
Task 1: Run to Goal
Task 1: Run to Goal
Task 2: You Shall Not Pass
Task 2: You Shall Not Pass
Task 3: Sumo
Task 3: Sumo
Task 4: Kick and Defend
Task 4: Kick and Defend
Training against Ensemble of Policies
Training against Ensemble of Policies
Sumo agent trained in an ensemble of 3 policies
Robustness of learnt policy to wind-attack
Robustness of learnt policy to wind-attack
Right: Humanoid trained on walking
Left: Humanoid trained on Sumo
The length of the arrow is indicative of the applied force which varies from 400 to 800
Effect of Exploration Curriculum
Effect of Exploration Curriculum
Left: Kick and Defend agents trained without curriculum (no annealing of the dense exploration reward)
Right: Humanoid Sumo agent trained without curriculum (no annealing of the dense exploration reward)