Hello everyone, I've been working on a project to use reinforcement learning to learn PvP in Old School RuneScape for the past year. I've finally reached a point where I'm satisfied with the result, so I've open sourced (most of) the project, and released a youtube video going over how it works from a high level.
The video is pretty high-level to keep it accessible, but the code is comprehensive and has a ton of cool stuff including:
- Full PPO implementation
- Self-play strategies including prioritized past-self play
- Autoregressive and parameterized multi-discrete actions with action masking
- Full game state visibility for the critic network (can see full player and opponent information)
- Customizable model architectures
- Reward and observation normalizing
- Novelty reward using running observation statistics
- AsyncIO vectorized environment
- Distributing rollout collection using Ray
There's too much to list here, so check out the code if you're curious!
For those who are understandably concerned, note that no software here is being released that allows people to use these models on the real game. The open-sourced code is purely for training and evaluating on a simulation.
Want to add to the discussion?
Post a comment!