Building A Reddit Bot

Building A Reddit Bot

part of the Building A Reddit Bot series

show all in series

back to index

There was a recent post in the r/RequestABot subreddit that caught my eye. A moderator of a subreddit for an online game wanted to automate the process of posting "results threads" after scheduled matches were completed rather than manually creating these posts him/her self. That's a perfect job for Python, especially with the powerful PRAW library available to us as developers. Let's build this bot together!

I'm a big believer in measuring twice and cutting once, as the old saying goes, so we need to spend some time planning out this project before we get things rolling.

My favorite way to start mapping out a new project is by writing a paragraph or two (or ten) about what I'm trying to accomplish. The game we're working with (Mabinogi Duel, r/MabinogiDuel subreddit) is an online, card-based game that holds "rounds" once every 48 hours (at 3AM GMT every other day). Within an hour of that time, the results of these rounds are available to check online. There are three "arenas" that each run a round at the appointed time, so in order to get all of the results we want, we'll need to check the results page for each arena. We'll end up with three results, every two days - one result for each round from each arena. Once we've compiled these results, we want to extract some data, format it in such a way that it'll make for a pretty Reddit post, and then submit a new text post to our target subreddit.

Now let's review that paragraph, looking for nouns and verbs we've used multiple times. "Round," "Arena," and "Results" look like our most important nouns. For verbs, "check" shows up a couple of times, and "format" and "submit" are usually important words. We can use these words to start thinking about what the flow of our program will look like:


  1. Check to see if results for a new round have been posted

  2. If so, get those results and reformat the data into Reddit-friendly text

  3. Submit that text to Reddit as a new post

(This may feel a little self-indulgent for a small project like this, but trust me - organizing your thoughts in some way before mapping out your project is hugely helpful when your projects get larger!)

We know that PRAW can help us with #3. We'll need to do the heavy lifting for #1 and #2 ourselves. Fortunately for us, the game developers give us some help. Match results are always posted in a URL that follows a consistent format:

http://devcat.nexon.com/duel/us/arena/view?arenaName_arena_g.XX.1

arenaName will always be one of three text values: rookie, pvp, or randomdraft. XX will always be an integer that represents the round number. So, to get the results for the rookie arena for round 5, you would visit:

http://devcat.nexon.com/duel/us/arena/view?rookie_arena_g.5.1

If you change the round number to a round that doesn't exist, like -1, you'll be redirected to a 404 page. However, if you pay attention to your browser, you'll notice a bit of a redirect. Pop open a Python shell and take a look at this:

>>import requests
>>requests.get("http://devcat.nexon.com/duel/us/arena/view?rookie_arena_g.5.1")
<Response 200>
>>requests.get("http://devcat.nexon.com/duel/us/arena/view?rookie_arena_g.-1.1")
<Response 200>

Even though the second URL pulls up a 404 page in our browser, the actual response code is 200. That means if we want to see if a match exists or not, we can't just simply send a get request and check the response code (which was my first thought). Fortunately, we don't have to paw through the actual response body and make more work for ourselves. The game developers also make an API endpoint available for match results here:

http://devcat.nexon.com/api-g/duel/arena/arenaName_arena_g.XX.1?lang=en_US

The same parameters apply to this URL as applied to the previous one. Pop that shell open once again:

>>requests.get("http://devcat.nexon.com/api-g/duel/arena/rookie_arena_g.-1.1?lang=en_US").json()
{u'status': 404, u'message': u'arena rookie_arena_g.-1.1 not found'}

Notice that we added .json() to the end of our request - that's because we know this API endpoint returns data in a JSON format. Try pulling up the same URL in your web browser - that's JSON, baby. The status attribute of the response is 404, which is great - that gives us an easy, surface-level check as to whether or not a URL has match results we care about. Try changing that -1 to a 3 and sending the request again. That response looks WAY different! If it's difficult to read in your terminal/console, feel free to pull it up in your browser - the formatting is a little nicer.

What we have here is a JSON response that tells us everything we could want to know about a given round. Let's poke around inside.

>>response = requests.get("http://devcat.nexon.com/api-g/duel/arena/rookie_arena_g.3.1?lang=en_US").json()
>>response['id'] #gives a unique identifier which contains the arena name and round number
>>response['round'] #the round number
>>response['title'] #nicely formatted arena name
>>response['description'] #an ordered list of dictionaries, each dictionary containing data about one player in the round

Thanks to the way response['description'] is structured, the first dictionary in its list will always contain data about the winner of the round in question.

Let's refer back to our list of project goals for a moment:


  1. Check to see if results for a new round have been posted

  2. If so, get those results and reformat the data into Reddit-friendly text

  3. Submit that text to Reddit as a new post

We've now got a great way to accomplish step number one - check the API endpoint. If the JSON response contains a status variable, we know that we don't have any valid results to parse. If the JSON response doesn't have that status key in its dictionary, then we're working with a results page that has been populated with valid data. We don't actually have to check and make sure that the status is equal to 404 since the value is not returned at all with a valid response - its presence alone indicates a failure.

With that revelation, we're actually ready to start writing code! Let's create our file, duelbot.py, and open it up in the environment of your choice. We'll start off by adding a bunch of import statements at the top - for now, you'll just have to trust me when I say they'll all be useful.

import praw
import requests
import webbrowser
import re
from datetime import datetime

Now let's build a function that can accept a text string (arena name) and number (round number), make a call to the appropriate API endpoint, and return the JSON response.

def get_match_details(arena, number):
url = "http://devcat.nexon.com/api-g/duel/arena/{}.{}.1?lang=en_US".format(arena, number)
return requests.get(url).json()

We used Python's .format() to insert the values we want into our API URL. Then, we used the requests library to send a GET request to that URL. We don't need the entire response object, so we just return the JSON info we care about. Now let's manipulate that info into a format that better suits our needs.

For any given match, there are three pieces of data we want to end up with: a description of the data, a URL to the match results, and a URL that points to the winning "deck" for that round. Let's write a function that accepts data from our get_match_details function and returns a dictionary of just those values. Before we start, recall that for any valid match, match['description'] gives you a list of dictionaries, the first of which "belongs' to the winner of the round. Within those dictionaries, there is a key sharedDeck that gives us access to the ID of the player's deck. Any given deck can be seen online at http://devcat.nexon.com/duel/us/deck?DECKIDHERE.

def format_match_details(match):
#Takes a JSON dump of match results and returns a dictionary with values needed to craft a Reddit post
#Note that this MUST be a dump of valid match results; this function will not gracefully handle 404 responses
arena_name = match['title']
round_number = match['round']
id = match['id']
match_url = "http://devcat.nexon.com/duel/us/arena/view?{}".format(id)
crafted_description = "{} Round {}".format(arena_name, round_number)
#Access 0th index of match['description']; must slice off the first two characters because of formatting voodoo
winningDeck = match['description'][0]['sharedDeck'][2:]
deck_url = deck_url = "http://devcat.nexon.com/duel/us/deck?{}".format(winningDeck)
return {'description':crafted_description, 'match_url':match_url, 'deck_url':deck_url}

Nothing too interesting, just some finicky string formatting to make sure we end up with exactly the URLs we want. Now let's take that info and manipulate it so that it makes a nice Reddit post.

There are a couple of things to remember about Reddit's markdown:


  1. If it looks wrong, you probably don't have enough line breaks

  2. Putting an asterisk in front of a line will make it part of an unordered (bulleted) list

  3. Stringing together a few dashes (ex: --------) will make an attractive horizontal line

  4. If you want to link to a subreddit or user, you can just type u/souldeux or r/learnprogramming - no need for fully-qualified domain names

  5. Links are created like this: [anchor text](http://link.example)

Let's think about how our format_match_details function works for a moment - we know that for any given post, we will always have three matches to format (one per arena). However, our function returns only a single dictionary for a single match at a time. Thinking ahead, we really need access to all three matches at once when trying to format our Reddit post. We'll write a function that accepts a list of dictionaries and think more about how to generate that list in a bit. Each dictionary will, of course, have been generated from format_match_details.

def format_reddit_post(matchlist):
body = ""
for match in matchlist:
body += \n* [{}]({}) - [Champion's Deck]({})\n\n ".format(match['description'],
match['match_url'],
match['deck_url'])
body += " \n-----------------\n "
body += " \nRemember: decks are subject to change every two hours. Listed decks may not always correspond to the decks \
the top 10 started the Arena with.\n "
body += " \n-----------------\n"
body += "by u/souldeux - please PM or contact at [souldeux.com](http://souldeux.com/contact) to report problems or request features"
title = datetime.now().date().strftime('%b %d %Y Arena TOP 10s')
return (title, body)

Again, nothing too fancy - just a bunch of string formatting. As you can see, we iterate through the list of dictionaries and use the description, match_url and deck_url to create a nicely labeled link for each round. We also use Python's strftime to nicely format a datetime object in order to generate a title with the current date. Lastly, we package up the title and body into a tuple.

Right now, our program has no way of knowing what round number to start on for each arena. And that number will obviously change over time - each new round will increment it by one - so we don't want to hardcode a starting point. Instead, let's do something moderately clever.

Our bot will only ever make one kind of post, so if we're up to date we can be confident that our most recent post describes the results for the most recent matches for each arena. That means we can grab our last post, look at the round numbers we used there, increment them by one and send a request to see if new match results are available yet. This requires some setup - we'll need to create a new Reddit app. You can find a thorough guide to doing that in PRAW's documentation, so we'll just hit the highlights here:

Go to the Reddit apps page and hit "are you a developer? Create an app" button. Fill out the name field; description and about URL are optional. Set the redirect URI to http://127.0.0.1:65010/authorize_callback.

On the next screen, you'll see two important values. Just under your app's name, you'll see your client_id. A bit further down, you'll see your client_secret (next to the secret label).

Since our application will only ever be authorized by one account (u/MabinogiDuelBot), we can be pretty lazy with how we get our oAuth2 tokens. Open up a Python shell for a moment and type the following (using your ID and key, of course):

>>import praw
>>r = praw.Reddit("Descriptive User Agent String Here / VersionNumber")
>>r.set_oauth_app_info(client_id = YOUR_CLIENT_ID, client_secret = YOUR_CLIENT_SECRET, redirect_uri="http://127.0.0.1:65010/authorize_callback")
>>url = r.get_authorize_url('uniqueKey', 'identity submit history', True)
>>webbrowser.open(url)

At this point, your web browser will open. The page won't open properly, but that's okay - check out the URL. You'll notice a keyword argument at the end: code=accessTokenFromURL. If you're curious about that "identity submit history" part, head over to the PRAW docs and read up on the various "scopes" that Reddit makes available to authorized applications. Copy that interesting stuff and go back into the same shell:

>>r.get_access_information("accessTokenFromURL")

The dictionary that prints contains a value keyed to refresh_token. That's the value we've been trying to get at this whole time. With Reddit's oAuth2, access tokens are only useful for an hour. But, refresh tokens are good until an application's permission is revoked by the owner of the Reddit account. Since you need access tokens to do anything useful, you'll need to store this refresh token and trade it for fresh access tokens as needed. Take that refresh_token and head back into duelbot.py.

def refresh_oauth_login():
r = praw.Reddit("Descriptive User Agent String Here / VersionNumber")
r.set_oauth_app_info(client_id = YOUR_CLIENT_ID, client_secret = YOUR_CLIENT_SECRET, redirect_uri="http://127.0.0.1:65010/authorize_callback")
refresh_token = "YOUR_REFRESH_TOKEN"
r.refresh_access_information(refresh_token)
return r

This instantiates a new praw.Reddit object, r and uses the refresh token to refresh its oAuth2 login.

Now that we have an authorized r object, we can implement that logic we thought up earlier: grab our last post, look at the round numbers we used there, increment them by one and send a request to see if new match results are available yet. One problem - we don't have a "last post" just yet, so we need to make one. We'll make the post manually, using exactly the formatting we expect our code to generate for new posts. This will tell us two things: does the generated post match our expectations, and are we grabbing the info from our posts like we want?

Our manually-generated post is here (notice the custom subreddit - these are easy and free to make for testing your own bots). Now, let's write a function to grab our last post and initialize our round counters properly based on the info we scrape from it.

def initialize_counters(r):
#r should be the result of refresh_oauth_login
last_post = r.get_me().get_submitted().next().selftext
bracket_captures = set(re.findall("\[([^]]+)\]",last_post))
for description in bracket_captures:
if description.startswith('PVP'):
pvp_round = int(description.split(' ')[3]) + 1
elif description.startswith('Rookie'):
rookie_round = int(description.split(' ')[3]) + 1
elif description.startswith('Random Draft'):
draft_round = int(description.split(' ')[4]) + 1
else:
continue
return {'pvp_arena_g':pvp_round, 'rookie_arena_g':rookie_round, 'randomdraft_arena_g':draft_round}

Let's walk through this line by line.

First, we get the most recent post made by our bot. r.get_me().get_submitted() returns a generator object, but we only need the first item - and then, only the text from that.

Second, we find all text in square brackets in the selftext of the post. Recall our Reddit markdown - these strings are all of the "anchor texts" for the links in our post. Thinking back to our format_match_details() function, we know that those anchor texts will look like this:

For those first three items, the X corresponds to the round number. That means we can look to see what a given string starts with, split it at its spaces, then look at the final value returned from the split in order to determine the number of the last round we reported on for each arena. Remember that split returns a list of strings, so the extra word in "Random Draft Arena Round X" means we need to access a different index location than with our other two arenas. We convert that value to an integer, increment it by one, and store it in a variable. We ignore strings that don't have values we care about in this context.

Finally, we return a dictionary with descriptive labels keyed to integers that represent the "next" round for each arena.

Remember a while back when we said our format_reddit_post() function would need a list of dictionaries? Now we're in a position to create that list. Let's define a new function that accepts the output (we'll call it counters) from our initialize_counters() function.

def fetch_matches(counters):
matchlist = []
for arena, number in counters.items():
details = get_match_details(arena, number)
try:
status = details['status'] #remember, we only see a 'status' on failed / 404 requests!
return None
except:
matchdict = format_match_details(details)
matchlist.append(matchdict)
return matchlist

Submitting the actual post to Reddit is easy with PRAW. Remember that format_reddit_post() returns a (title, body) tuple, and that you can access stuff inside a tuple by index location. Let's do it!

def submit_match_update(matchlist, r):
post_tuple = format_reddit_post(matchlist)
return r.submit('secondsoulenterprises', post_tuple[0], text=post_tuple[1])

We're going to keep posting in our test subreddit for now. No need to spam the live thing until we're sure it works!

We want to be able to pop open a command line, type python duelbot.py and have everything Just Work. Let's do it! Add this to the very end of your file, outside of every function.

if __name__ == '__main__':
r = refresh_oauth_login()
counters = initialize_counters(r)
matchlist = fetch_matches(counters)
if matchlist is not None:
try:
print submit_match_update(matchlist, r)
except Exception as e:
print e
else:
print "Sorry - a match for the round IDs you provided was not available."

This is a little more verbose than necessary so that it's easy to see what's happening. First, we load up an r object with the proper oAuth2 scopes.

Then, we use that r object to grab our last post and get the proper round counters.

Next, we feed those counters into fetch_matches() to generate our matchlist. Remember, if we got a 404 error when trying to fetch our matches, matchlist will actually be equal to None - that means we'll jump to the bottom else block.

If we have a valid matchlist (which will only happen when there are results available for matches that we have not yet posted), we try to submit a post to Reddit. This may fail due to rate limiting, solar flares, etc. so we'll handle the possible exception gracefully.

Now we can run our script by opening a command line in the same directory as our script and typing python duelbot.py - woohoo!

You've got a lot of options from here. We know that matches happen every two days, but we want to check more often than that for two reasons. One, if the script fails or crashes for some unexpected reason (maybe the API endpoint is down for a couple of hours), we don't want to fall behind. If we only check every 48 hours and fall behind one round, we've got no way to catch up! Two, this futureproofs the bot a bit - if rounds become more frequent in the future, we won't have to worry about the whole thing breaking down immediately.

Right now, this bot runs as a cronjob every hour here at souldeux.com. Depending on your deployment environment you may rather use a task scheduler or any other number of tools. The power is yours!

Check out the repo for this project here. Questions? Comments? Ideas? Contact me using the link in the nav bar at the top of this page!


<< back to blog index