Scraping Gravity Forms & Generating a Third-Party Posting Plugin

Scraping Gravity Forms & Generating a Third-Party Posting Plugin

part of the Quick Tutorials series

show all in series

back to index

According to their website, "Gravity Forms for WordPress is a full featured contact form plugin that features a drag and drop interface, advanced notification routing, lead capture, conditional logic fields, multi-page forms, pricing calculations and the ability to create posts from external forms." I have a couple of WordPress sites kicking around that make use of Gravity Forms, and from my experience all of that stuff works and works well.

However, I often find myself wanting to POST my form data to an external URL in addition to storing it locally. Gravity Forms provides a way to do this, but it's a little tricky - you need to hook into gform_after_submission and send your form data via a PHP function. There's a solid tutorial on how to do this at the Gravity Forms website, but if you've never done it before it can still be a little intimidating. Additionally, they suggest placing your function in the functions.php file of your active theme. That works, but you'll need to re-insert your function if you ever change themes. You can avoid that hassle by writing your own custom plugin.

But we're programmers! Why waste time writing a boring plugin when we could write a fun program to create a custom plugin for us? Let's use BeautifulSoup and requests to make a Python script do the boring stuff for us. Create a file called gravityscraper.py in a working directory of your choosing, open it up in your favorite editor, and follow along to build with me. Or, download the finished project from github.

Here's what we want our script to accomplish:


  1. Scrape a webpage, grabbing all form elements created by the Gravity Forms plugin.

  2. Get the Form ID# for each form, and the Field ID# for each input/select/textarea field in that form along with the labels for those fields.

  3. Using those IDs and labels, generate a .php file that (once installed as a plugin) will POST data from the scraped forms to a given outside URL

We'll use this simple demo form to work with for now. Open up that link, find the form, then right click and inspect the input field element just below the Email label. This is what the HTML for that field looks like:

<input name="input_2" id="input_1_2" type="text" value="" class="medium" tabindex="6" placeholder="Your Email Address">

We can get two important pieces of information here. First, the name element of any field will always indicate that field's ID#. In this case, we can tell that our Email field has an ID of 2. Second, the first number in the id element will always tell you what form this field belongs to - here, it belongs to the form with ID 1. The second number in the id field indicates the field ID again.

Take a quick look back at the gform_after_submission docs, specifically the final example under "send entry data to third-party:"

add_action( 'gform_after_submission', 'post_to_third_party', 10, 2 );
function post_to_third_party( $entry, $form ) {
$post_url = 'http://thirdparty.com'
$body = array(
'first_name' => rgar( $entry, '1.3' ),
'last_name' => rgar( $entry, '1.6' ),
'message' => rgar( $entry, '3' ),
);
...

Notice how the body is constructed. This says "send the value from field ID 1.3 under the label first_name, send the value from field ID 1.6 under the label last_name, and send the value from field ID 3 under the label message."

Our add_action definition currently makes this function apply to all GF forms. To make it apply to only a certain form, change gform_after_submission to gform_after_submission_X where X is the ID of the desired form.

With that understood, it's clear why we need all of these IDs and labels to build a decent plugin. Let's start building a scraper that will grab all of that stuff for us.

BeautifulSoup makes scraping web pages pythonic and easy. We'll be using it along with the requests library. Let's start by writing a simple function to scrape the generated HTML of our form-containing page (here) and return just the GF form elements. We can tell when a form has been generated by GF because it'll look something like this:

<form method="post" enctype="multipart/form-data" id="gform_1" action="/contact-us-form/">

The ID attribute gives it away - it starts with "gform" and ends with the form's ID#. Knowing this pattern makes it easy to grab GF-generated forms:

def get_forms_from_pagel(url):
page = requests.get(url)
soupSpoon = BeautifulSoup(page.text)
return soupSpoon.findAll('form', id = lambda x: x and x.startswith('gform'))

First, we used requests to make a GET request to our target URL. Then, we instantiated a BeautifulSoup parser (which we named soupSpoon) and loaded the HTML response from our GET request into it. Lastly, since we know that any form element whose ID starts with gform was generated by Gravity Forms, we can use that information to parse out only those forms whose ID follows that pattern.

findAll returns a ResultSet object, but for our purposes here it'll work just like a list. Since our demo form page only has one form, our list only contains one item - a special BeautifulSoup object of type tag. Let's parse our form to get the GF form ID.

def get_id_from_form(form):
return form.get('id').split('_')[1]

The get method on a given tag object grabs the value of a specific attribute on that tag. In our case, we want the number that comes after the underscore - the internal ID number that Gravity Forms uses to identify the form. We grab it by splitting at the underscore and getting the second item that results.

Getting the form's fields is a little trickier. Some GF forms make use of onClick events and unnamed hidden inputs, both of which confound efforts to automatically detect and label all relevant fields. We'll get a working version going for now and take a look at some areas that could use polish later.

Recall the Email field we inspected earlier:

<input name="input_2" id="input_1_2" type="text" value="" class="medium" tabindex="6" placeholder="Your Email Address">

Inspect a few other fields and a pattern quickly emerges: the elements we want all have IDs that start with the word "input." Furthermore, if we inspect the label elements associated with our visible fields, we'll find that their for attributes match the id attributes of the elements they're labelling. Let's try to use this information to create a dictionary of data about our form. Hopefully it'll look like this:

{
"FormID": formID,
"Data": {
"LabelForField1": field_1_id,
"LabelForField2": field_2_id,
...
}
}

Worth a shot, right?

def get_inputs_from_form(form):
inputs = form.findAll('input', id = lambda x: x and x.startswith('input'))
selects = form.findAll('select', id = lambda x: x and x.startswith('input'))
textareas = form.findAll('textarea', id = lambda x: x and x.startswith('input'))
fields = inputs + selects + textareas
labels = form.select('label[for^="input"]')
form_dict = {
"FormID": get_id_from_form(form),
"Data": {}
}
for f in fields:
try:
field_id = f.get('name')[6:] #slicing off "input_"
label_list = [l for l in labels if f.get('id') == l.get('for')]
try:
label = label_list.pop()
form_dict["Data"][label.text] = field_id
except:
#no label detected, like for a hidden input
form_dict["Data"][f.get('id')] = field_id
except TypeError:
#field with no name attribute; typically means an input to handle an onClick event or something
pass
return form_dict

First, we use findAll to get all of the inputs, selects and textareas whose id attributes start with "input" (following the pattern we discovered earlier). Then we use select to grab all of the label elements whose for attributes start with "input" as well. We use the get_id_from_form function we wrote earlier to grab the form's ID when we start building our dictionary, and then we try to match labels with fields.

For each field, we try to find a label whose for attribute matches the field's id attribute. If we find a match, we associate the field ID# with the label in our Data dictionary. If we don't find a match (perhaps the field is unlabelled, like a hidden input would be), we store the field's full ID attribute as the key and the internal GF field id as the value in our Data dictionary. If we run into a field without a name attribute, we're probably dealing with a button to handle onClick events or some such nonsense - not useful for our purposes. Let's try using this function on our demo form:

>>> formlist = get_forms_from_page("http://demo.gravityforms.com/contact-us-form/")
>>> print get_inputs_from_form( formlist[0] )
{'Data': {u'Email*': '2', u'Last': '1.6', u'Your Comments/Quesitons*': '3', u'First': '1.3'}, 'ID': '1'}

If we inspect the form in our browser, we can see that things are matching up decently well. The form definitely has ID# 1, the email field definitely has field ID# 2, etc. So far, so good!

Let's write one quick convenience function to turn all of the forms on a given page into these cool form-dictionaries, then we'll start generating our plugin:

def get_all_form_dicts_for_url(url):
return [ get_inputs_from_form(form) for form in get_forms_from_page(url) ]

With our forms translated into dictionaries of data, generating the necessary text for our plugin is fairly simple. We'll start with a text variable and just add stuff to it until we've got the result we want. We'll start by adding the boilerplate stuff WordPress expects at the top of every plugin

def generate_plugin_text(dict_list):
text = """<?php
/**
* Plugin Name: Auto-Generated Third-Party Poster for Gravity Forms
* Plugin URI: https://github.com/souldeux/GravityScraper
* Description: Posts GravityForms Form Data to a third party
* Version: 1.0.0
* Author: Nick Perry
* Author URI: http://souldeux.com
* License: GPL2
*/\n
"""

Now let's loop through our dict_list and create function definitions for each form represented therein (remember, this is in the same function, one indent level deep):

    for form_dict in dict_list:
text += "add_action( 'gform_after_submission_{}', 'post_to_third_party', 10, 2 );\n".format(form_dict["ID"])
text += "function post_to_third_party( $entry, $form ) {\n"
text += "\t$post_url = 'WEEWOOWEEWOOBETTERNOTLEAVETHISHERE';\n"
text += "\t$body = array(\n"
for k, v in form_dict["Data"].items():
text += "\t\t'{}' => rgar( $entry, '{}' ),\n".format(k,v)
text += """
);
GFCommon::log_debug( 'gform_after_submission: body => ' . print_r( $body, true ) );
$request = new WP_Http();
$response = $request->post( $post_url, array( 'body' => $body ) );
GFCommon::log_debug( 'gform_after_submission: response => ' . print_r( $response, true ) );
}"""
text += "?>"
return text

We use two for-loops here to create one custom function for each form we've scraped. In our initial add_action definition we format in the form ID to make sure each function applies only to one specific form rather than all forms in general. Then we use our data dictionary in a second loop to construct a body that GravityForms' gform_after_submission hook can work with. If we had a page with multiple forms, we'd end up with one generated function for each form, all contained within the same text string.

For our last step, we'll write this generated text to a .php file. If you have your own WordPress site with GravityForms, you can compress that .php file and upload it to your own site as a plugin. Here we go!

if __name__ == "__main__":
url = raw_input("Enter the URL of the page you wish to scrape for forms:\n")
recipient = raw_input("Enter the URL of the endpoint to which you wish to POST data:\n")
plugin_text = generate_plugin_text( get_all_form_dicts_for_url( url ) ).replace( "WEEWOOWEEWOOBETTERNOTLEAVETHISHERE", recipient )
with open("GravityFormsAutoPoster.php", "w+") as file:
file.write(plugin_text)

Now we can run our script from the command line with python gravityscraper.py. We'll enter a couple of URLs at the prompts indicating where we're scraping from and where we want to send to, create a new file (with w+ permission) and write our plugin text to that file. Try it out on http://demo.gravityforms.com/contact-us-form/ and see how it works!

We've built a working plugin generator, but it's far from perfect. The generated labels will likely require some editing by the plugin's end user in order to format them properly for POSTing (for instance, you might want to send the field currently called "Your Comments / Questions" under the simpler label "comments"). We also get some quirky results from certain form elements - try it out on http://demo.gravityforms.com/build-a-pizza/ to see what I mean.

Nonetheless, we've learned a lot and built a useful tool. Feel free to play with the github repo and add some polish, extra functionality, your own flavor, whatever. Thanks for reading!


<< back to blog index