Jump to content

Adventures in Tensorflow & reinforcement learning with RuneScape


Recommended Posts

This is a topic I've seen mentioned on the forums a number of times.  If you're expecting to hear of a great success, check back in a few months because the current working product with about ~10 hours of development is as of yet pretty useless, but still promising.

 

### Background information

Q-Learning is a reinforcement learning technique that uses measured environments, an actor, and a reward metric.  In our case, the environment is a screen full of pixels (760*510*3 for width * length * 3 colors) of 8 bit integers, the actor is going to be a screen clicker, and the reward is going to be a combination of the current HP and current damage dealt.

Tensorflow is a machine learning toolkit/library that enables convolutional neural networks.  It can achieve almost any task, but needs thousands of examples to "learn" the relationship between an input (i.e. a screen full of pixels) and an output (in our case, an action).  I chose tensorflow because of its easy-to-use API and built-in optimizers (i.e. it works well out of the box with little customization).

Tensorflow is possible in Java, but works more natively in python (and examples are certainly few and far between in the java version).  I decided a pixel bot written in python would be a good first pass.

 

### A Repeatable Environment

For Q learning to actually learn, it has to be able to make a series of decisions from the same beginning step.  This is challenging for a game like RuneScape, because we can't reset the world at our whim.  The sole solution to this was therefore to use a private server (osps).  With a private server, I could adjust the player respawn point, npc respawn rate, even tick rate to make the bot train faster.  After fidding with a few in-the-box solutions from another forum, I settled on a 317 clone with working God Wars Dungeon.  I was now ready to make a General graardor bot.

I also needed a performance metric in order for Q learning to work.  In GWD, success is a combination of staying alive, keeping HP up (including activating prayer and drinking a prayer potion), and getting kills.  But that information wasn't available just from looking at the screen (my only view into the client, since I had blockaded myself from tribot by using python.  I partly solve this by adding a RabbitMQ inter-process-communication server (essentially allowing the java server to give more detailed information about my character back to the bot in python).  The final equation was time_alive ^ 2 + hp_change ^ 3, to reward for a combination of eating and staying alive.

 

### Results

After one night of training in a VM, I managed only 8000 iterations (8000 clicks).  Looking at the flow of the program, it seems that tensorflow latency is not good enough to get 1 click per tick (essentially one action per game update).  I definitely need to more smartly reward the bot for prayer.

It... kinda sucks.  Unsurprisingly, the bot did not figure out how to connect 387,600 (760*510) inputs to a reliable output in only 8000 iterations.

 

### Future Works

Now that it's obvious training will take too long if I simply expect the bot to interact with pixels and no higher interfaces.  Forcing the network to learn that pixel (600,400) is a shark, but only when input (600,400) is colored properly is taking too long.  By restricting the actions to things like Click Inventory Item, Click NPC, Enable Prayer, Disable Prayer instead off the 387,600 possible mouse locations, the network should converge more easily.  Now I just need to get Tribot to hook into a RSPS (does anybody know how?).

 

### The code/Contribute(???)

I didn't include the code for convenience reasons, not because of secrecy.  If someone wants it, I'll post it.  It's just not worth getting excited over right now.  If any other developers are interested in doing this project with me, I'd be happy to explain the code not just the premise.

  • Like 1
Link to post
Share on other sites

If you are brute forcing the pixels you should probably look at using a CNN to take advantage of spatial coherence.

I agree the better route would be setting up an environment using the tribot API and linking that to a network and it can simple output the actions that can be taken. You may also want to use a DDQN instead of Q learning. If you use the tribot api to build the environment the state space will be large. 

For training such a network I think a first try might be recording your gameplay and having it use that as training data to get it a start in making the right choices. Not sure what that would like or how it would perform. Good luck this sounds like a cool project! Also use Keras all the way, much easier.

Link to post
Share on other sites
3 hours ago, Apothum said:

If you are brute forcing the pixels you should probably look at using a CNN to take advantage of spatial coherence.

I agree the better route would be setting up an environment using the tribot API and linking that to a network and it can simple output the actions that can be taken. You may also want to use a DDQN instead of Q learning. If you use the tribot api to build the environment the state space will be large. 

For training such a network I think a first try might be recording your gameplay and having it use that as training data to get it a start in making the right choices. Not sure what that would like or how it would perform. Good luck this sounds like a cool project! Also use Keras all the way, much easier.

Yeah, I hooked a bot to my private server to take advantage of the API and translated the actions I expect from a java wrapper to a python script.  Heavily reduces the action space from hundreds of thousands to a couple hundred.  I'm planning on training some this weekend, I'll give updates if it's worth anything.

Link to post
Share on other sites

Made huge progress in the wrapper so I'm quite excited.  I've written a python/java library that works on any bot with little code (Tribot is what I want to end up on, but for now I need rsps).

 

Clicking a bot via python notebook: https://gfycat.com/reflectingeminentindiancow

 

The pixel bot is dead, long live the hook.  Now to connect the input/output layers of the neural network to the new hooks.

Link to post
Share on other sites
  • 2 weeks later...

Looks Interesting!

 

Can you speed up the process by hardcoding some factors that are always true like

  • praying protect melee if your tanking / protect range if your not tank / even adding prayer flicking
  • dont eat untill x hp. / repot until stats lowered / prayer pot
  • equiping guthans after boss is dead.
  • tele out once u have x food left.

or am I missing the point of the whole learning process? I just think alot of time is wasted learning something that doesnt need to be learnt. 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Our picks

    • What to expect from TRiBot moving forward.
        • Thanks
        • Like
      • 10 replies
    • TRiBot 12 Release Candidate

      The TRiBot team has been hard at work creating the last major version of TRiBot before the TRiBot X release. We've noticed many problems with TRiBot 11 with a lot of users preferring TRiBot 10 over 11. We've heard you, so we took TRiBot 10, added the new features introduced with 11, introduced some other new things, and created TRiBot 12. So without further adieu, here's TRiBot 12.
        • Sad
        • Like
      • 39 replies
    • Gradle is a build tool used to accelerate developer productivity.

      We recently setup a Maven repository (TRiBot Central) to make it easier for scripters to create scripts. Check it out here: https://gitlab.com/trilez-software/tribot/tribot-central/-/packages

      Furthermore, we've released a simple Gradle project to make it easy to run TRiBot and develop scripts for it. Check it out here: https://gitlab.com/trilez-software/tribot/tribot-gradle-launcher

      The goals of TRiBot Central are to:

      Deliver updates to TRiBot faster


      Better organize TRiBot's dependencies (AKA dependancies)


      Make it easier to develop scripts for TRiBot


      Make it easier to use and run TRiBot


      Note: TRiBot won't be able to run scripts from within this project until TRiBot's next release.
        • Like
      • 13 replies
    • Hi everyone,

      I'd like to thank everyone for their patience in this transition period. Since last week, we've worked out the remaining bugs with this integration.

      Some users have still been having issues with connecting their forums account to their Auth0 account. To resolve this, we've imported all forums accounts into Auth0.

      Unfortunately, the accounts which were imported today were using an unsupported password hashing algorithm. Hence, random passwords were set during the import.

      What does this mean for me?

      If you've previously linked your forums account to your Auth0 account, you don't have to do anything. Nothing changes for you.


      If you haven't logged in via our new login yet,

      Try logging in with your forums email address and the last password you used


      If you are unable to login, please use the "Forgot password" tool on the login page:
      Follow the instructions to reset your password
       
        • thonking
        • Like
      • 17 replies
    • Hello everyone,

      Last week we tried to roll out Auth0 Login, but we lost that battle. Now it's time to win the war!

      Important changes

      When logging into the client, you'll now have to enter your Auth0 account credentials instead of your forums credentials

      Note: 2FA is still handled through your forums account (for the time being)



      Changes for existing users

      You'll have to link your Auth0 account to your forums account here: https://tribot.org/forums/settings/login/?service=11


      Auth0 accounts have been created for most existing users. Please use your forums email address and password to login.



      Important notes

      Make sure to verify your email address upon creating a new Auth0 account


      When we mention your Auth0 account, we mean your account used for auth.tribot.org as displayed below
      • 81 replies
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...