Emergent Complexity via Multi-agent Competition

Task 1: Run to Goal

Task 2: You Shall Not Pass

Task 3: Sumo

Task 4: Kick and Defend

Training against Ensemble of Policies

Sumo agent trained in an ensemble of 3 policies

Robustness of learnt policy to wind-attack

Right: Humanoid trained on walking

Left: Humanoid trained on Sumo

The length of the arrow is indicative of the applied force which varies from 400 to 800

Effect of Exploration Curriculum

Left: Kick and Defend agents trained without curriculum (no annealing of the dense exploration reward)

Right: Humanoid Sumo agent trained without curriculum (no annealing of the dense exploration reward)