https://preview.redd.it/s3v81leu955c1.jpg?width=1920&format=pjpg&auto=webp&s=d25d4db609bd60f86b4acea7fd50870b5bce5849
Full video link
When fighting in an Endurance stage, the player faces two opponents in each round, one after the other (in this case, Kano first and Sonya second), and, in order to win, it needs to beat them both.
It is not a trivial task, as player's health does not reset, so it is easy for the second opponent to win. This is what happens in the first round when Sonya kills Sektor.
But look what happens in the second round, the model found an easier way to win: it almost kills Kano, the first opponent, and instead of finishing him, he engages in a robot dance to trick the game and make the round timer expire, securing a win without facing the second opponent!
This is an emergent behavior resulting from RL training alone, no specific code nor reward function has been tweaked to obtain it. We have seen it happening consistenly and being leveraged by the model to circumvent the intrinsic difficulty of that specific stage.
One of the most fascinating aspects of Reinforcement Learning is seeing emergent behaviors achieving a task in ways you would have never expected.
Want to add to the discussion?
Post a comment!