“Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, 2019-09-17 ():
We’ve observed agents discovering progressively more complex tool use while playing a simple game of hide-and-seek. Through training in our new simulated hide-and-seek environment, agents build a series of 6 distinct strategies and counter-strategies, some of which we did not know our environment supported. The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.
…Surprising behaviors: We’ve shown that agents can learn sophisticated tool use in a high fidelity physics simulator; however, there were many lessons learned along the way to this result. Building environments is not easy and it is quite often the case that agents find a way to exploit the environment you build or the physics engine in an unintended way. [video samples in original]
- Box surfing: Since agents move by applying forces to themselves, they can grab a box while on top of it and “surf” it to the hider’s location.
- Endless running: Without adding explicit negative rewards for agents leaving the play area, in rare cases hiders will learn to take a box and endlessly run with it.
- Ramp exploitation (hiders): reinforcement learning is amazing at finding small mechanics to exploit. In this case, hiders abuse the contact physics and remove ramps from the play area.
- Ramp exploitation (seekers): In this case, seekers learn that if they run at a wall with a ramp at the right angle, they can launch themselves upward.
…These results inspire confidence that in a more open-ended and diverse environment, multi-agent dynamics could lead to extremely complex and human-relevant behavior.</>p
View HTML (18MB):
Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior