×
all 10 comments

[–]gwern 3 points4 points  (1 child)

Pausing or running out the clock or looping is a classic reward-function hack.

[–]DIAMBRA_AIArena[S] 0 points1 point  (0 children)

Absolutely yes, and is very interesting how these hacks come out.

Think about it from a game designer point of view: you can be interested in a tool that finds these reward hacks to remove loophole in the AI gameplay design and so use them to iterate on your development

[–]NSADataBot 2 points3 points  (4 children)

I'm not familiar with the full game mechanics but is that an "exploit" as in a cheat? It's pretty cool for sure, good job.

[–]DIAMBRA_AIArena[S] 6 points7 points  (3 children)

Thanks for the comment! Well, this is different than the typical cheats that write values in memory to alter the game execution flow.

This is a sequence of moves that the RL discovered to trick the CPU classical AI (scripted bots, typically behavioral trees) with the goal of letting the time expire.

The coolest thing is that there was no specific reward for that, the algorithm just discovered that it was a strategy leading to a higher global reward in the long run!

[–]NSADataBot 1 point2 points  (0 children)

Cool stuff man

[–]gwern 1 point2 points  (1 child)

What happens if the AI-controlled character just stands in place to run out the clock? Is the learned pacing back and forth actually necessary to avoid the game scripts from going into attack sequences?

[–]DIAMBRA_AIArena[S] 1 point2 points  (0 children)

This is easy to test with our open source diambra arena python package, just script a no op python agent interfaced with the game. But I can tell you that in that case, the opponent chases you and kills you. We use that no op scripted agent to run some CICD tests on our lib.

Here some useful links: * 🌐 Homepage: https://diambra.ai/ * 🖥️ Github: https://github.com/diambra * 💬 Discord: https://diambra.ai/discord * 📺 Twitch: https://www.twitch.tv/diambra_ai * 📚 Documentation: https://docs.diambra.ai/ * 📄 Paper: https://arxiv.org/abs/2210.10595

Please do not hesitate to get in touch, we are very happy to collaborate with people passionate about this amazing field!

[–]UnusualClimberBear 1 point2 points  (1 child)

In my experience, policy optimization is a very good way to debug your simulator.

[–]DIAMBRA_AIArena[S] 0 points1 point  (0 children)

That's a great point, perfectly aligned with one specific use case we mentioned in one of our previous comments

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)