Deprecated: add_custom_background is deprecated since version 3.4.0! Use add_theme_support( 'custom-background', $args ) instead. in /home/pqmz7qzy9yt5/public_html/wp-includes/functions.php on line 5084

Deprecated: add_custom_image_header is deprecated since version 3.4.0! Use add_theme_support( 'custom-header', $args ) instead. in /home/pqmz7qzy9yt5/public_html/wp-includes/functions.php on line 5084

Notice: wp_enqueue_script was called incorrectly. Scripts and styles should not be registered or enqueued until the wp_enqueue_scripts, admin_enqueue_scripts, or login_enqueue_scripts hooks. This notice was triggered by the jquery handle. Please see Debugging in WordPress for more information. (This message was added in version 3.3.0.) in /home/pqmz7qzy9yt5/public_html/wp-includes/functions.php on line 5536

Notice: wp_enqueue_script was called incorrectly. Scripts and styles should not be registered or enqueued until the wp_enqueue_scripts, admin_enqueue_scripts, or login_enqueue_scripts hooks. This notice was triggered by the smoothscroll handle. Please see Debugging in WordPress for more information. (This message was added in version 3.3.0.) in /home/pqmz7qzy9yt5/public_html/wp-includes/functions.php on line 5536

Deprecated: The called constructor method for WP_Widget in Yoko_SocialLinks_Widget is deprecated since version 4.3.0! Use __construct() instead. in /home/pqmz7qzy9yt5/public_html/wp-includes/functions.php on line 5177
Questions About APIs and Dirty Data and Best Practices | Megan Taylor

Megan Taylor

front-end dev, volunteacher, news & data junkie, bibliophile, Flyers fan, sci-fi geek and kitteh servant

Questions About APIs and Dirty Data and Best Practices

So I’m working on this Farmers Market Locator project, and I’ve got a pretty basic version up and running. Everything is client-side. And I’m using the New York State Open Data API to get the information on the farmers markets. Right now, all that happens when you use the site is a query to Google to find out where you are, and then a query to the NYSOD API to get the nearest markets’ info, and then some Google Maps API stuff. But some of the data is a little dirty: missing spaces in addresses, that kind of thing.

More complications: There’s a bunch of features I want to add, some of which involves incorporating data from the USDA Farmers Market API. Now the USDA API has a little more info about the markets, like what kind of products are sold at each market. This data might also be pretty dirty. And as far as I can tell, the only way to match up markets between the two APIs is to do string matching. (Meaning that, having determined that the Union Square Farmers Market is the closest to your location, I then have to search the USDA API for “Union Square Farmers Market”)

Is there a better way to approach matching up the markets between the two different APIs? Do I need to switch to a back-end solution? What’s the best way to clean up the dirty data? Should I be pulling this info into something else, like a Google Spreadsheet, clean it up there, and make queries to the spreadsheet instead of NYSOD?

I don’t even know how to approach this.

Edit: Adding links to raw JSON.

http://data.ny.gov/resource/qq4h-8p86.json

http://search.ams.usda.gov/farmersmarkets/v1/data.svc/zipSearch?zip=10008

Edit: Some suggestions have been made…but as usual they only spawn new questions.

  • Best way to match markets:
    • search name (problem: not standard)
    • search address (problem: not standard)
    • use URLs as keys (problem: what URLs??!!)
  • Cleaning up dirty data:
    • You can use a back-end solution with some data store OR you could do it all in JS. If you were to do it all in JS you would just have to call a few API’s, compare the data returned, fill in missing data from one API with data from another. You would also need to decided which would have priority if there was a conflict. (problem: wouldn’t doing all that matching, comparison, and cleaning on the client-side make it slower?)
    • I think you are wise to consider grabbing the data, cleaning it up and storing it under your own backend. This allows you to keep your app up and running even if they change their format. (sure your data might get stale, but probably not by much, and it would buy you time address any new formatting from your sources) You could pull down the source data, create your combined json set, and just use that. (depending on how huge all the data was) While a backend solution might sound really complicated, it doesn’t have to be. A google spreadsheet could work, or Google Fusion tables, or Parse.com, FireBase.com, or Tableau.

So far the best suggestion is pretty close to my initial idea, but I’m hoping to get some more feedback on this before I commit myself. Chime in!

November 11, 2013 | Comments Off on Questions About APIs and Dirty Data and Best Practices | Categories: Posts | Permalink

Comments are closed.


Notice: Undefined index: host in /home/pqmz7qzy9yt5/public_html/wp-content/plugins/jetpack/modules/stats.php on line 209