{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "| DS 2000 | \n", "|:----------:|\n", "| Prof. Rachlin | \n", "| JSON and Web APIs | \n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### JSON Files\n", "\n", "JSON stands for Javascript Object Notation. It is a data interchange format that is widely used on the web and for representing hierarchical data, something tabular *flat* files (e.g., csv files) don't do so well. You can learn more about JSON [here](https://www.json.org), but from a python point of view, JSON looks much like a dictionary." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "more example.json" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Reading json into a dictionary\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now, let's use the `json` and `requests` modules to call web-service APIs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Application Programming Interfaces (APIs) provide a general means of interacting with other programs. More specifically, many websites will provide data via a web API -- this means we can make *requests* to remote applications / websites for data. Here's what this looks like schematically.\n", "\n", "![alt text](https://cdn-images-1.medium.com/max/2000/1*q9CRTmO258jWLsMZAd5JLw.png \"Image credit: http://www.robert-drummond.com/2013/05/08/how-to-build-a-restful-web-api-on-a-raspberry-pi-in-javascript-2/\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The way that we do this in Python is via the `requests` module. The idea is that the Response (in the above depiction) is nicely formatted `json`, which feels a lot like the dictionaries you now know and love. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clinical Trials \n", "Here is an example of performing a search on a government website that documents on-going clinical trials. Documentation about the API can be found here: https://www.clinicaltrials.gov/api/gui" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "\n", "Here is another example: calling the Yelp API. This one is a little trickier as it requires that we have registered our application with Yelp in order to obtain an API KEY. This will be our way of telling Yelp who we are. It's how we authenticate our request with Yelp. Yelp will keep track of how many requests we make and place daily limits on our usage (5000 API calls per day.) According to the Yelp API documentation, we must pass our API_KEY using a *request header* (discussed in more below) of the form:\n", "\n", "**Authorization: Bearer API_KEY**\n", "\n", "Request headers are additional metadata about our request in the form of key-value pairs. Which request headers we have to supply is API-specific. There are also additional parameters we can specify like the location (Boston) and the type of business we are looking for (Pizza). \n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "API_KEY = 'k04EveojLDWtsNI9D_GcEAYSDLNThcspgKOAPuP3TgaCH7u97JdAtaoFni8FiD612pkEJRQyvkSI0iCMXbM8xVWe6e6N0_KWNB-e1zQw7JR1Qv-hg_R-Rwy0L7TMXXYx'\n", "API_URL = 'https://api.yelp.com/v3/businesses/search'\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How good is Boston Pizza? Let's find out!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ " " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# How many cheap, moderate, expensive pizza places?\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Does price matter?\n", "\n", "# Create a ratings histogram\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As another an example, reddit makes it possible to fetch content from sub-redit streams. In particular, let's fetch some recent cat photos from reddit - because there aren't enough cat photos on the Internet!\n", "\n", "The reddit API requires that we supply a descriptive value for the 'User-Agent' in the request headers. The User-Agent attribute is used to define the caller, the browser, or the application being used to make the API request. In otherwords, reddit is telling us that we can freely access their database via a documented API call, but they want to know who's calling! Additional rules for using the reddit API can be found here: https://www.reddit.com/dev/api/\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import json\n", "import pprint as pp\n", "\n", "# here's we're hitting the 'new' endpoint, see: https://www.reddit.com/dev/api/#GET_new\n", "\n", "reddit_url = 'https://www.reddit.com/r/cats/new.json'\n", "my_params = {'limit': 10}\n", "my_headers = {'User-agent': 'a cat bot API call demo for NEU DS2000'}\n", "\n", "\n", "# Make the request. BTW, the result you get is likely to change every few minutes.\n", "# Cat's are really really popular on reddit!\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# How many items did we receive?\n", "# We asked for at most 10.\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Here are the photo captions:\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Let's see some cats!\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Let's see some cats!\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 2 }