×
all 24 comments

[–]CanadianTueroPhD 15 points16 points  (3 children)

As a fellow scaper, this is very interesting! I've always wanted to do something similar, especially creating some sort of a stripped down engine so that forward search techniques could be applied to it. I'll have to check out how you integrate with elvarg. If you don't mind me asking, what resources did you use to figure out how to create any hooks for the custom environments?

[–]Naton1-[S] 1 point2 points  (2 children)

Glad to hear you find it interesting! Let me know if you have any questions with how the integration with Elvarg works. From a high-level, there's a 'remote environment' socket server waiting to receive requests that I built in to Elvarg, which follows an interface similar to the standard Gym API to control the agents. Happy to provide more info where I can - would you mind elaborating on what you mean by hooks for the custom environments?

[–]CanadianTueroPhD 2 points3 points  (1 child)

I mean like suppose I wanted to create environments for learning to quest (do cooks assistant). The hooks I would need would be to get the necessary flags for when subsections inside cooks assistant quest book entry (to give partial rewards), the text from the dialog boxes when interacting with the quest NPCs, and any inventory items held as a state observation. Are these easy to add for custom environments?

Ninja Edit: Also, you mentioned you tested this on the real game. Are you able to get the same data required for your state observations from the official client (runelite + plugins) as you are with the sim environment? Have you also tried taking trajectories from the sim client as training data for supervised learning, so you can run on just an image observation alone (i.e. when running inference on the trained RL model with the in-game info for observations, save the current image + action taken and use that as a dataset for supervised learning training)

Sorry one final question (this is exciting ahah). You mentioned your accounts were disabled. Did Jagex detect you were botting or did you just self-disable as a personal integrity? If Jagex caught you, it would be a good experiment to play around with missclicks and other classic anti-bot behaviour to try and see where the boundary edges are with their bot detection is (I bet Jagex would appreciate that data)

[–]Naton1-[S] 2 points3 points  (0 children)

Since it’s hooking in to an RSPS for training, it has access to the full server state so it’d be pretty easy to add in new observations like that. The RSPS can easily be modified for this. The one thing to keep in mind is the server has access to some state that the client doesn’t, so if you want to keep it “fair” (or test on the real game later on) would want to make sure you only use observations the client can see.

Yes I was able to get the same data from the real game that I used in the sim. It wasn’t easy in some cases though because the client doesn’t have everything immediately available (such as tick counters like ticks until next attack or ticks until unfrozen) which required a lot of detail to figure out. You end up having to track a ton of state client-side to try to recreate this information by listening to animations, graphic changes, and so on.

I didn’t try using game images as data because it would be finicky (things like brightness or incorrect sprites/graphics could throw it off). It’d also require significantly more compute to render game clients for every agent (when training I would have 1000+ across multiple simulations). I think it’d be interesting to explore being able to learn from experiences on the real game too, but my main concern is it’d be so data inefficient.

Unfortunately, Jagex did ban my accounts in this case - completely understandable though. I was kind of expecting it since it was doing crazy perfect 8 way gear switches in a single tick haha.

One callout is the current environment setup is specific for PvP but I’m sure with some engineering work you could easily modify it to support other types of tasks too.

[–]KomradKot 9 points10 points  (1 child)

I've always thought of OSRS as an amazing potential environment for RL agent research. The tile system and low tick rate allows for easier state representation and simulation than games with higher resolutions. The game world is also extremely rich and quests are quite varied and often require reading between the lines to figure out what needs to be done (no repetitive kill y NPC x times trope like other MMOs). With the availability of open source clients, and p-server code, I'm thinking a "tutorial island to dragon slayer challenge" would be a good benchmark for autonomous generalist agents.

[–]Naton1-[S] 0 points1 point  (0 children)

That's quite an interesting idea! The simplicity of OSRS as you've mentioned makes it have a lot of potential for experimenting with more generalized learning and exploration too, instead of a just a well-defined task like PvP here. It makes me think of how people have experimented with RL tasks on Minecraft too.

[–]preordains 1 point2 points  (3 children)

I have been looking at your code for longer than I would like to admit and Im struggling to see one thing: does this RSPS allow you to interact with it by only sending actions, and returns to you an observation? How does the interaction with the game take place?

[–]Naton1-[S] 2 points3 points  (0 children)

I've modified the RSPS to include a built-in server to handle these action request and return observations, and that code is all available in the github repo too. I'll link some critical parts here and that should answer your question!

[–]toastjam 1 point2 points  (1 child)

They said they're not releasing the plugin that interacts with the game, just the RL training code itself.

[–]Naton1-[S] 0 points1 point  (0 children)

The RSPS code is actually available there too! So what's open sourced will allow training and testing models strictly on a RSPS. What's excluded is anything that would directly run on the real game - such as seen in the video, there's clips of testing the models on the real game, and that sort of stuff is not available.

[–]itsPixels 1 point2 points  (1 child)

This is absolutely incredible work! How would you think the model would fair in a more typical edge style fight instead of nhing? If traning is optimized more towards ko potential instead of outlasting. Might be something I need to try myself as its the style that interests me more. Anyhow, this is just fantastic (and slightly worrying)!

[–]Naton1-[S] 0 points1 point  (0 children)

Edge style fights would be super cool to explore too. I was thinking something like Dharok's fights would be interesting because there's the trade-off of keeping low HP to do more damage, but also not trying to die at the same time. It could likely learn it, but may require a fair bit of experimentation to get it right.

You can see a half-implemented a Dharok's-style environment in the code, but never finished it since NH style fights were my real objective here. Would be super interested to see if anyone experiments with something like that!

[–]infinitay_ 1 point2 points  (2 children)

I was wondering when someone would finally do something like this given how prominent RSPS' are. The same seems like a match made in heaven for reinforcement learning. Given how many people bot/cheat in OSRS, it's only a matter of time until PVP is flooded with PK bots with the help of this.

Not to blame you OP, this is a really cool project and great work. But I am sure there will be a script-kiddie adding support for their cheat clients given a month or so.

Anyways, I find it fascinating how it learns that it's better to stand under your opponent when in combat, so your opponent can't hit you as easily. Further everything is done tick-perfect and even one-ticking such as armor swaps so it's even more advanced. One idea that I had when considering RL within an RSPS was manipulating the game tick rate for faster training. Although, thinking about it again, now I'm not too sure how to scale it back up to 600. First thought was to just add a delay to the actions. Second thought is to fine tune the model on 600ms/tick after you train it on say 100ms/tick.

[–]Naton1-[S] 0 points1 point  (1 child)

Interesting you mention speeding up the game-tick rate. This kind of scenario is perfect for that!

I actually did speed up the tick rate here and made it dynamic. When the server goes through and processes each player, each individual agent will essentially block until it has an action for the current tick (or if it already has an action generated, it won't block at all). At the end of processing each tick, it immediately starts the next, so the only delay is the time it takes to generate actions.

There's no actually scaling or anything that needs to be done to get the models to work on different tick rates with the approach that 1 game tick = 1 step. As long as you can observe the environment at the end of each game tick, the underlying duration doesn't really matter!

[–]infinitay_ 1 point2 points  (0 children)

with the approach that 1 game tick = 1 step

Oh you're right. I didn't even consider that but it makes perfect sense. Thanks for clearing it up.

[–]hazard02 0 points1 point  (1 child)

How important is the novelty reward?

[–]Naton1-[S] 3 points4 points  (0 children)

The novelty reward wasn't a game-changer, but I found that it did help a bit with exploration by rewarding the model when in rare/unseen states. Also note that the novelty reward was annealed to 0, so at the end, there was no novelty reward.

One of the situations that sparked motivation for it was learning to drink potions. At the start of this project, the model struggled with potions. It learned they were good so it would drink them, but since it learned they were good, it never stopped drinking them until they were empty. For example, it should ideally drink a boost potion at the start, and only re-drink when it's stats are lower (such as after drinking a brew), but it would just drink all the boost potions right away.

I was able to get around this originally by masking out drinking potions when there was no real benefit which worked, but the novelty reward concept could also help here too.

[–]low-day-leh-sun 0 points1 point  (1 child)

Good work !!

[–]Naton1-[S] 0 points1 point  (0 children)

Thank you!