DS2000 (Spring 2019, NCH) :: Lecture 9a

0. Administrivia

  1. Wednesday (in Practicum): LAST in-class quiz (via Blackboard; no book/notes/Python)
    • Review PCQ 6/7
  2. Due Friday @ 9pm: HW8 (submit via Blackboard)
  3. No Derbinsky office hour Friday :(

1. What are Jupyter Notebooks?

(From First Python Notebook: http://www.firstpythonnotebook.org/notebook/) A Jupyter Notebook is a browser application where you can write, run, remix and republish code. It is free software you can install and run like any other open-source library. It is used by scientists, scholars, investors, and corporations to create and share their research. It is also used by journalists to develop stories and show their work.

Some benefits:

  • Provides a single view in which you can see both the result of computation as well as the code that produced it (great for reproducibility!)
  • Allows you interleave code with annotation and visualization
  • Because of its ubiquity, it's cross-platform and has support on many sites (e.g., Google Drive, GitHub)
  • Works with multiple languages in addition to Python, such as R

Caution:

  • As you'll see below, it is possible to execute code out-of-order in a notebook, which can cause confusion; always be sure to re-run your notebooks from scratch to make sure they do what you want them to!

2. Getting Started with Jupyter Notebooks

Jupyter Notebooks come pre-installed with Anaconda, so to get started, go to your Command Line or Terminal and type...

jupyter notebook

This will then launch a web browser, where' you'll do all your notebook'ing. To exit, simply control+c in the terminal/command prompt and close your browser windows/tabs.

From the Home screen in your browser, now either click and existing notebook (files typically end in .ipynb) or New -> Python 3.

Click the title to name/rename the notebook. Notebooks auto-save every so often, but you can also save manually (control/command+s).

The textbox below is a "cell" - you can have as many as you wish (see Insert and Edit menus to add/remove cells), each for either (Python) code or annotation (use the dropdown list to choose Python or Markdown). Jupyter Notebooks support the Markdown language for simple formatting of text, including italics, bold, and links. It can also do tables...

Tables Are (centered!) Fun (right-aligned!)
this is r1 and I can put some value
row 2! centered 25
tables are fun very cool

and pretty equations (using LaTeX)...

$$ e = mc^2 \\ \pi = 4 \sum_{n=0}^{\infty} \frac{(-1)^n}{2n + 1} $$

To execute a cell (i.e., see the output of Python code or to render Markdown annotation), click the Run button or press shift+enter/return. For Python code, the "In [X]" next to the cell shows when the cell is executing (X is *) or a counter (always increasing) of the last time the cell was executed -- this is important, because cells can be executed out-of-order. As a result, before submitting a notebook, always click Restart & Run All from the Kernel menu to re-run the notebook from scratch to make sure it still works.

In [2]:
# Run first
x = 5
In [3]:
# Now me!
print(x)
5
In [4]:
# You can also just put a command/expression to see its result
x
Out[4]:
5
In [5]:
# Functions are also fine within/between cells

def myfunc(x):
    return x**2

myfunc(17)
Out[5]:
289

3. Visualizations in Jupyter Notebooks with pyplot

There is a commonly used visualization tool called matplotlib -- while others are out there (e.g., Bokeh), we will use this for making various forms of charts/plots.

To start, import the module...

In [6]:
import matplotlib.pyplot as plt

There are MANY commands in matplotlib, we'll just cover some of the basics...

In [7]:
# Draw a line graph

# first X coordinates, then associated Y's
plt.plot([1,2,3,5], [1,2,1,2])

# Might need to re-execute if rendering is slow
plt.show()
In [8]:
# Draw the same points, but without lines and in a different color!
plt.plot([1,2,3,5], [1,2,1,2], 'ro')
Out[8]:
[<matplotlib.lines.Line2D at 0x115a1aef0>]
In [9]:
# You can overlay two plots
plt.plot([1,2,3,5], [1,2,1,2], 'k')
plt.plot([1,2,3,5], [1,2,1,2], 'ro')
Out[9]:
[<matplotlib.lines.Line2D at 0x115abdbe0>]
In [10]:
# Now let's provide all that you would expect from
# a graph intended for a user

plt.plot([1,2,3,5], [1,2,1,2], 'k', label='Model')
plt.plot([1,2,3,5], [1,2,1,2], 'ro', label='Data')

plt.xlabel('X-Axis Description/Units')
plt.ylabel('X-Axis Description/Units')
plt.title('Plot Title')
plt.text(2.25, 1.9, 'Some important information')

plt.axis([0, 5.5, 0, 2.5])
plt.grid(True)
plt.legend()
Out[10]:
<matplotlib.legend.Legend at 0x115bb5da0>
In [11]:
# Other kinds of plots are possible...

plt.bar([0, 1], [1, 2], color=["Red", "Blue"], yerr=[0.5, 0], tick_label=['A', 'B'], align='center')
plt.ylabel('Y-Axis Description/Units')
plt.title('Plot Title')

plt.axis([-0.5, 1.5, 0, 3])

plt.show()
In [12]:
plt.hist([12, 27, 29, 34, 47, 48, 49, 56, 57, 57, 58, 62, 78, 99], bins=5, range=(0, 100))

plt.title('Sad Exam :(')
plt.xlabel('Grades')
plt.yticks(range(0, 11))

plt.show()
In [13]:
# Often you want data read from a file or generated via code to produce these plots
import random 
x = list(range(20))
y = [x_i + random.gauss(0, 1) for x_i in x]

plt.plot(x, y, 'k')

plt.show()