DS2000 (Spring 2020, NCH) :: Lecture 13a¶

0. Administrivia¶

Please take the time to fill out TRACE -- we read these closely and always try to improve!
Have a wonderful summer :)

1. Setting Context: Recall Web APIs & Scraping¶

A couple weeks ago we talked about how we could get data from websites and considered the following image...

alt text

During that week, we were considering how to use Python on the "left" as the client making requests (via the requests module) and interpretting responses (either as JSON, or HTML via the bs4 module).

This week we'll think in reverse: now we show how Python can be used on the "right" to produce responses based upon requests. While there are many ways to do this in Python (e.g., Django), we'll make use of the flask module to build web applications that produce HTML, images, and JSON (note: flask isn't built into Python, but it is so commonly used that it is included with Anaconda). All code can be accessed in the following repository:

https://github.com/natederbinsky/wine-web

2. The Basics of `flask`¶

Consider the following simplest flask application (step0.py; the last two lines are commented out for purposes of the lecture notes)...

import flask

# Create a flask instance
app = flask.Flask(__name__)

# When someone goes to the base URL,
# run this function
# (Note: the @ is called a "decorator" in Python: 
# https://www.learnpython.org/en/Decorators)
@app.route('/')
def hello():
    # Just return the following text
    return "Hello, World!"

# Uncomment the following two lines
# if __name__ == '__main__':
#     app.run()

Here are the basic steps:

Import flask; create an instance of Flask called app
For as many "endpoints" (i.e., URLs) you want to respond to, create those functions and preface with @app.route('whatever URL')
Start the application

Copy this code into Atom (uncommenting the last two lines), save, and then run. You'll see something like the following...

* Serving Flask app "step0" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Now, in a browser, go to the URL listed when you ran the program (in this case http://127.0.0.1:5000) - you should see our hello world message (returned by the hello function).

Everytime your browser makes a request of the website you'll see it logged in the terminal, such as...

127.0.0.1 - - [22/Mar/2020 07:19:31] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [22/Mar/2020 07:19:33] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [22/Mar/2020 07:19:33] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [22/Mar/2020 07:21:31] "GET /sneaky HTTP/1.1" 404 -

If you recall, 200 is the HTTP status code meaning that everything is OK, whereas 404 means that the page wasn't found.

Finally, CTRL+C to stop the program and you'll no longer be able to access the URL in your browser. It is good practice to always stop a web server from running on your computer any longer than necessary.

3. A Slightly More Advanced `flask` Example¶

Where the previous example only returned text for a single URL, now try the following code (step1.py)...

import flask
import datetime

# Create a flask instance
app = flask.Flask(__name__)

# When someone goes to the base URL,
# run this function
# (Note: the @ is called a "decorator" in Python: 
# https://www.learnpython.org/en/Decorators)
@app.route('/')
def hello():
    # Just return the following text
    # Note #1: three quotes allows you to provide
    # very long strings
    # Note #2: the <a> tag allows you to make a link
    # Note #3: <br> is the tag for a line break
    return """Hello, World!<br />
Click <a href="time">here</a> for the time"""

# Now support multiple URLs
@app.route('/time')
def time():
    return "Now: {}".format(datetime.datetime.now())

# When this program is run, start flask!
# if __name__ == '__main__':
#     app.run()

This web application supports two URLs (/ and /time). Furthermore, the former URL uses HTML to break up lines of text as well as make a link to the second URL.

Copy the code into Atom, save, run, and load the URL in a browser. Finally, right-click on the first URL and click "View Page Source" in order to verify that while the browser renders the link, it actually did receive exactly what the hello function returned.

You can close that source and click the link to view the time -- notice that each time you refresh the page, you see the current time: this is because your browser is (a) making a new request, (b) sending it to flask, which then (c) runs your function code, getting the current time, and then (d) returns that result back to your browser.

4. Responding to Input¶

So far our web application doesn't have the ability to take into account input from the user (aside from which URL is selected). Now consider the following example (step2.py)...

import flask
import datetime

# Create a flask instance
app = flask.Flask(__name__)

# When someone goes to the base URL,
# run this function
# (Note: the @ is called a "decorator" in Python: 
# https://www.learnpython.org/en/Decorators)
@app.route('/')
def hello():
    # Similar to a dictionary, see if the parameter
    # "name" has been supplied; if so, return it,
    # otherwise return "World"
    name = flask.request.args.get('name', 'World')

    # Just return the following text
    # Note #1: three quotes allows you to provide
    # very long strings
    # Note #2: the <a> tag allows you to make a link
    # Note #3: <br> is the tag for a line break
    return """Hello, {}!<br />
Click <a href="time">here</a> for the time""".format(name)

# Now support multiple URLs
@app.route('/time')
def time():
    return "Now: {}".format(datetime.datetime.now())

# When this program is run, start flask!
# if __name__ == '__main__':
#     app.run()

When you run this program and load it into a browser you won't see any difference. BUT, now try accessing the following URL...

http://127.0.0.1:5000/?name=Python

That URL encodes an "argument", which like a dictionary is a key-value pair (in this case the key is "name" and the value is "Python"). The first code line of the hello function looks for this argument and uses that as the name variable (note: like with a dictionary, the get function of the request arguments returns a default of "World" if no such input was supplied).

But this is generally unfriendly to the user ... and really, you've probably never done anything like that ... so how do real websites get user input? Now try the following code (step3.py)...

import flask
import datetime

# Create a flask instance
app = flask.Flask(__name__)

# When someone goes to the base URL,
# run this function
# (Note: the @ is called a "decorator" in Python: 
# https://www.learnpython.org/en/Decorators)
@app.route('/')
def hello():
    # Similar to a dictionary, see if the parameter
    # "name" has been supplied; if so, return it,
    # otherwise return "World"
    name = flask.request.args.get('name', 'World')

    # Just return the following text
    # Note #1: three quotes allows you to provide
    # very long strings
    # Note #2: the <a> tag allows you to make a link
    # Note #3: <br> is the tag for a line break
    return """Hello, {}!<br />
Click <a href="time">here</a> for the time<br />
<form action="/">
Input your name: <input type="text" name="name" />
<input type="submit" />
</form>""".format(name)

# Now support multiple URLs
@app.route('/time')
def time():
    return "Now: {}".format(datetime.datetime.now())

# When this program is run, start flask!
# if __name__ == '__main__':
#     app.run()

The slightly more complicated result of the hello function uses HTML to create a small form (via the form tag): when the submit button (input tag) is pressed, it will send an argument of key="name" and value whatever the user typed into the text box (via the input tag) to the "/" URL (via the "action" attribute of the form tag).

Thus, URLs and web forms typically serve as the basis by which users provide input to a web application.

5. I Can Haz (Python-Generated) Pictures?¶

We've now seen how Python can dynamically change text on a webpage, but what about pictures (e.g., a plot from matplotlib)? The general way to do so is to generate the image, store it's data, and then send it back to the browser. That last step has two general approaches: either encode the image data directly into the page (step4.py)...

import flask
import datetime

# There is a bug with Flask+PyPlot,
# these next two lines are the fix
import matplotlib
matplotlib.use('Agg')

# Then import PyPlot as usual
import matplotlib.pyplot as plt

# Useful for capturing input/output in
# variables (e.g., picture data)
import io

# Useful for encoding data in a way
# that can be communicated as text
# (e.g., picture on a webpage)
import base64

# Create a flask instance
app = flask.Flask(__name__)

# When someone goes to the base URL,
# run this function
# (Note: the @ is called a "decorator" in Python: 
# https://www.learnpython.org/en/Decorators)
@app.route('/')
def hello():
    # Similar to a dictionary, see if the parameter
    # "name" has been supplied; if so, return it,
    # otherwise return "World"
    name = flask.request.args.get('name', 'World')

    # Just return the following text
    # Note #1: three quotes allows you to provide
    # very long strings
    # Note #2: the <a> tag allows you to make a link
    # Note #3: <br> is the tag for a line break
    return """Hello, {}!<br />
Click <a href="time">here</a> for the time<br />
Click <a href="pretty">here</a> for a pretty picture<br />
<form action="/">
Input your name: <input type="text" name="name" />
<input type="submit" />
</form>""".format(name)

# Now support multiple URLs
@app.route('/time')
def time():
    return "Now: {}".format(datetime.datetime.now())

# What about pictures?
@app.route('/pretty')
def pic():
    # Variable to capture the rendered picture
    pic_result = io.BytesIO()

    # Make the graph (as usual)
    plt.plot([1,2,3,5], [1,2,1,2], color="k")

    # Save the picture data to a variable
    plt.savefig(pic_result, format='png')
    
    # Convert the picture data to a text representation
    # for a webpage
    html_pic = base64.encodebytes(pic_result.getvalue()).decode('utf-8')

    # Insert the picture data into the HTML
    return '<img src="data:image/png;base64,{}" />'.format(html_pic)

# When this program is run, start flask!
# if __name__ == '__main__':
#     app.run()

Notice that when the /pretty URL is accessed, we use matplotlib to produce a plot, save it to a variable (of type BytesIO), encode it as text, and then directly return it with an image tag. If you view the source of the produced page, there is a LONG string of data placed there representing the data of the image.

Alternatively, flask allows a URL to directly return the data of an image (step4b.py)...

import flask
import datetime

# There is a bug with Flask+PyPlot,
# these next two lines are the fix
import matplotlib
matplotlib.use('Agg')

# Then import PyPlot as usual
import matplotlib.pyplot as plt

# Useful for capturing input/output in
# variables (e.g., picture data)
import io

# Create a flask instance
app = flask.Flask(__name__)

# When someone goes to the base URL,
# run this function
# (Note: the @ is called a "decorator" in Python: 
# https://www.learnpython.org/en/Decorators)
@app.route('/')
def hello():
    # Similar to a dictionary, see if the parameter
    # "name" has been supplied; if so, return it,
    # otherwise return "World"
    name = flask.request.args.get('name', 'World')

    # Just return the following text
    # Note #1: three quotes allows you to provide
    # very long strings
    # Note #2: the <a> tag allows you to make a link
    # Note #3: <br> is the tag for a line break
    return """Hello, {}!<br />
Click <a href="time">here</a> for the time<br />
Click <a href="pretty">here</a> for a pretty picture<br />
<form action="/">
Input your name: <input type="text" name="name" />
<input type="submit" />
</form>""".format(name)

# Now support multiple URLs
@app.route('/time')
def time():
    return "Now: {}".format(datetime.datetime.now())

# What about pictures?
# This is an alternate method
# that has one function dedicated
# to producing/serving the image (pic_data below)
# and another to referring to it (pic here)
@app.route('/pretty')
def pic():
    return '<img src="/image_data" />'

# On-demand produces a picture
@app.route('/image_data')
def pic_data():
    # Variable to capture the rendered picture
    pic_result = io.BytesIO()

    # Make the graph (as usual)
    plt.plot([1,2,3,5], [1,2,1,2], color="k")

    # Save the picture data to a variable
    plt.savefig(pic_result, format='png')

    # "rewind" to the beginning of the image data
    pic_result.seek(0)

    # send the image data to the browser using a file
    # name that lets flask figure out the file type
    return flask.send_file(pic_result, attachment_filename='plot.png')

# When this program is run, start flask!
# if __name__ == '__main__':
#     app.run()

In this case, the img tag tells the browser to look to the /image_data URL for the image, which we then provide via the pic_data function. The send_file function sends the image to the browser, as well as some additional "headers" telling it that what it is receiving is a PNG image.

The benefit of the former approach is that there is one request/response with all data in one place (though, that response is quite large). The benefit of the latter approach is that multiple pages could refer to the same image URL for different reasons (though, the browser is now having to make multiple requests on the page).

6. Putting It All Together: A Data Science Web App¶

Consider the following code (step5.py), which allows the user to perform the pandas analysis of red wines from a couple weeks ago, checking for any given attribute the relationship between that feature and overall wine quality...

import flask

app = flask.Flask(__name__)

# 

import matplotlib
matplotlib.use('Agg')

import matplotlib.pyplot as plt

# 

import io
import base64

# 

import pandas as pd
import statsmodels.formula.api as stats

##############################################

@app.route('/')
def home():
    # Read the CSV, get a list of all columns except the last
    wine = pd.read_csv("winequality-red.csv")
    wine_columns = wine.columns.values[:-1]

    # Create a form allowing the user to select a column
    # and a legal output format
    return """Wine Analysis<br />
<form action="/analyze">
    Column: <select name="column">{}</select><br />
    Format: <select name="format"><option value="html">html</option><option value="json">json</option></select><br />
    <input type="submit" />
</form>
""".format("".join(['<option value="{}">{}</option>'.format(col, col) for col in wine_columns]))

def analyze_html(column_x, column_y, x, y, y_predicted, r2, m, b):
    # Output the original data as red dots
    plt.plot(x, y, 'ro', label='Actual')

    # And the predicted line in black
    plt.plot(x, y_predicted, 'k', label='Predicted')

    # Axes labels
    plt.xlabel(column_x)
    plt.ylabel(column_y)

    # Equation of the predicted line, with r^2
    plt.title('{} ~ {:.3f}({}) + {:.3f} (R^2={:.3f})'.format(column_y, m, column_x, b, r2))
    plt.legend()

    # Grab the resulting image
    pic_result = io.BytesIO()
    plt.savefig(pic_result, format='png')

    # Clear the plot (in case future requests are made)
    plt.clf()

    # Produce HTML-friendly version of the picture
    html_pic = base64.encodebytes(pic_result.getvalue()).decode('utf-8')

    # Produce the image + a back link
    return """<img src="data:image/png;base64,{}" /><br /><a href="/">back</a>
""".format(html_pic)

@app.route('/analyze')
def analyze():
    # Get parameters sent by the user
    output_format = flask.request.args.get('format', '')
    column = flask.request.args.get('column', '')

    # Read the CSV
    wine = pd.read_csv("winequality-red.csv")

    # If a legal column (aside from the last)
    if column in wine.columns.values[:-1]:
        # Get the name of the last column
        y_column = wine.columns.values[-1]

        # Perform a linear regression
        # (the Q function takes care of column names with spaces)
        regression = stats.ols(formula='Q("{}") ~ Q("{}")'.format(y_column, column), data=wine).fit()
        
        # Grab x/y data points
        x = list(wine[column].values)
        y = list(wine[y_column].values)

        # Extract regression parameters
        m = regression.params['Q("{}")'.format(column)]
        b = regression.params['Intercept']
        r2 = regression.rsquared
        y_predicted = list(regression.fittedvalues)

        # Produce output based upon the requested format
        if output_format == "html":
            return analyze_html(column, y_column, x, y, y_predicted, r2, m, b)
        elif output_format == "json":
            result = {
                'column':column,
                'r_squared':r2,
                'slope':m,
                'intercept':b
            }

            # Converts to proper JSON and tells
            # the browser to expect it
            return flask.jsonify(result)
    
    # If bad parameters, return home
    return home()

# if __name__ == '__main__':
#     app.run()

You'll notice that you can choose to output the feature-analysis as either HTML (in which case a dynamic image is generated, good for humans) or JSON (in which case the core parts of the analysis are in a dictionary, good for Python). Thus, we now have an example application that we could access via a Python client as a web API or for scraping purposes.

When you run flask, the resulting web application is only accessible on your computer. However, it is possible to have your web app running on a publicly accessible computer, such that anyone can access your page. While this brings many security and other issues into consideration, you can get a feel for it using a free service, such as Heroku...

https://devcenter.heroku.com/articles/getting-started-with-python

In fact, the GitHub repository can be used to produce a publicly accessible version of the last code example...

https://web-wine.herokuapp.com