[P] The Power of Reinforcement Learning: look how this DeepRL Sektor model found a smart, super-cool exploit for Ultimate Mortal Kombat 3 in the video of a submission on DIAMBRA competition platform!

gwern · 2023-12-10T01:40:33+00:00

Pausing or running out the clock or looping is a classic reward-function hack.

NSADataBot · 2023-12-09T19:29:43+00:00

I'm not familiar with the full game mechanics but is that an "exploit" as in a cheat? It's pretty cool for sure, good job.

UnusualClimberBear · 2023-12-11T06:53:54+00:00

In my experience, policy optimization is a very good way to debug your simulator.

TotesMessenger · 2023-12-10T01:04:34+00:00

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

^{If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.} ^(Info ^/ ^Contact)

MachineLearning