{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# DS2500 Lesson6\n",
    "\n",
    "Jan 31, 2023\n",
    "\n",
    "### Content:\n",
    "- Pandas\n",
    "    - series\n",
    "    - dataframe\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# before you begin, make sure you can load data from seaborn\n",
    "import seaborn as sns\n",
    "df_penguin = sns.load_dataset('penguins')\n",
    "df_titanic = sns.load_dataset('titanic')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Having trouble?\n",
    "- see piazza for long-term solution\n",
    "    - [mac SSL error](https://piazza.com/class/lbxsbawi9yq2f9/post/55)\n",
    "- use code below for today:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# if these lines give you trouble, use the csvs available on the website\n",
    "# (be sure they're adjacent to this .ipynb file on your machine)\n",
    "import pandas as pd\n",
    "df_penguin = pd.read_csv('penguin.csv')\n",
    "df_titanic = pd.read_csv('titanic.csv')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Requirement already satisfied: pandas in /home/matt/.local/lib/python3.10/site-packages (1.5.2)\n",
      "Requirement already satisfied: numpy>=1.21.0 in /home/matt/.local/lib/python3.10/site-packages (from pandas) (1.24.1)\n",
      "Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas) (2022.1)\n",
      "Requirement already satisfied: python-dateutil>=2.8.1 in /home/matt/.local/lib/python3.10/site-packages (from pandas) (2.8.2)\n",
      "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)\n"
     ]
    }
   ],
   "source": [
    "!pip3 install pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Admin:\n",
    "- lab1\n",
    "    - talk to friends\n",
    "    - lab digest\n",
    "    - part b (part c)\n",
    "- hw0 due friday @ 11:59 PM\n",
    "    - .py and .ipynb\n",
    "    - see canvas announcement\n",
    "    - see piazza\n",
    "- look at schedule together\n",
    "- tutoring groups\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The value of talking-out-loud about programming\n",
    "\n",
    "... I learned 2 new ways to approach lab1's part B `get_win_set()` this morning!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if all items in an array are the same, then the std dev is 0\n",
    "import numpy as np\n",
    "\n",
    "np.array([1, 1, 2]).std() == 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if all items in an array are the same, \n",
    "\n",
    "# mysterious section 2 student\n",
    "len(set(np.array([1, 1, 2]))) == 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Pandas\n",
    "\n",
    "Pandas is a python module which stores data in `pd.DataFrame` and `pd.Series` objects.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3  Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "\n",
       "   body_mass_g     sex  \n",
       "0       3750.0    Male  \n",
       "1       3800.0  Female  \n",
       "2       3250.0  Female  \n",
       "3          NaN     NaN  \n",
       "4       3450.0  Female  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import seaborn as sns\n",
    "\n",
    "# Example DataFrame:\n",
    "# df stands for dataframe.  df_penguin is a dataframe of penguin data\n",
    "df_penguin = sns.load_dataset('penguins')\n",
    "df_penguin.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      39.1\n",
       "1      39.5\n",
       "2      40.3\n",
       "3       NaN\n",
       "4      36.7\n",
       "       ... \n",
       "339     NaN\n",
       "340    46.8\n",
       "341    50.4\n",
       "342    45.2\n",
       "343    49.9\n",
       "Name: bill_length_mm, Length: 344, dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# example series: the \"s_\" is a (personal) convention for variables which are series\n",
    "s_bill_length_mm = df_penguin['bill_length_mm']\n",
    "\n",
    "s_bill_length_mm\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "\n",
    "### `pd.DataFrame` are two-dimensional, `pd.Series` are one-dimensional\n",
    "\n",
    "### If we already have `np.array()`, why do we need pandas?\n",
    "- pandas supports non numeric data (strings for categorical data, for example)\n",
    "- pandas supports reading / storing data from more formats\n",
    "    - csv (spreadsheets)\n",
    "- pandas more elegantly deals with missing data\n",
    "- pandas handles indexing woes\n",
    "\n",
    "You could do almost everything pandas does with numpy arrays ... but it'd be much more difficult to accomplish.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Pandas Series\n",
    "\n",
    "### building:\n",
    "- building: default index\n",
    "- building: custom index\n",
    "- building: from a dict\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "\n",
       "   body_mass_g     sex  \n",
       "0       3750.0    Male  \n",
       "1       3800.0  Female  \n",
       "2       3250.0  Female  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# look at first 3 rows of dataframe (for reference)\n",
    "df_penguin.head(3)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "species                 Adelie\n",
       "island               Torgersen\n",
       "bill_length_mm            39.1\n",
       "bill_depth_mm             18.7\n",
       "flipper_length_mm        181.0\n",
       "body_mass_g             3750.0\n",
       "sex                       Male\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# each row, or column of dataframe is a series object\n",
    "# below is first row of dataframe (more on iloc indexing later...)\n",
    "# (remember: each row is a sample -> this is 1 penguin's data)\n",
    "penguin0_series = df_penguin.iloc[0, :]\n",
    "penguin0_series\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas series contain a sequence of labelled data elements:\n",
    "- penguin0's `species` is `Adelie`\n",
    "- penguin0's `island` is `Torgersen`\n",
    "- penguin0's `bill_length_mm` is `39.1` ...\n",
    "- penguin0's `<index-name>` is `<corresponding-value>`\n",
    "\n",
    "A series is quite similar to a dictionary ...\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "penguin0_dict = {'species': 'Adelie',\n",
    "                 'island': 'Torgersen',\n",
    "                 'bill_length_mm': 39.1,\n",
    "                 'bill_depth_mm': 18.7,\n",
    "                 'flipper_length_mm': 181.0,\n",
    "                 'body_mass_g': 3750.0,\n",
    "                 'sex': 'Male'}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "species                 Adelie\n",
       "island               Torgersen\n",
       "bill_length_mm            39.1\n",
       "bill_depth_mm             18.7\n",
       "flipper_length_mm        181.0\n",
       "body_mass_g             3750.0\n",
       "sex                       Male\n",
       "dtype: object"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# build a series from dict\n",
    "penguin0_series = pd.Series(penguin0_dict)\n",
    "penguin0_series\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "species                 Adelie\n",
       "island               Torgersen\n",
       "bill_length_mm            39.1\n",
       "bill_depth_mm             18.7\n",
       "flipper_length_mm        181.0\n",
       "body_mass_g             3750.0\n",
       "sex                       Male\n",
       "dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can also pass two corresponding lists / tuples\n",
    "index = ['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']\n",
    "values = ['Adelie', 'Torgersen', 39.1, 18.7, 181.0, 3750.0, 'Male']\n",
    "\n",
    "penguin0_series = pd.Series(values, index=index)\n",
    "penguin0_series\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0          vanilla\n",
       "1        chocolate\n",
       "2    cherry garcia\n",
       "3          oatmeal\n",
       "dtype: object"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sometimes your data has no meaningful index\n",
    "# pandas will default to indexing things with integers\n",
    "ice_cream_flavors = 'vanilla', 'chocolate', 'cherry garcia', 'oatmeal'\n",
    "pd.Series(ice_cream_flavors)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Adelie', 'Torgersen', 39.1, 18.7, 181.0, 3750.0, 'Male'],\n",
       "      dtype=object)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can access values as an array via .values\n",
    "penguin0_series.values\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['species', 'island', 'bill_length_mm', 'bill_depth_mm',\n",
       "       'flipper_length_mm', 'body_mass_g', 'sex'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can access index (as a special pandas \"index\" object) via .index\n",
    "penguin0_series.index\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### indexing into `pd.Series`: accessing / changing data\n",
    "- accessing / setting using index:\n",
    "    - by name: `series.loc[name]`\n",
    "    - by position: `series.iloc[idx]`\n",
    "- iterating: keys, items, iteritems (much like dict)\n",
    "- deleting an entry\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt      6\n",
       "riva      7\n",
       "eli      11\n",
       "zeke    101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dict_fav_num = {'matt': 6, 'riva': 7, 'eli': 11, 'zeke': 101}\n",
    "series_fav_num = pd.Series(dict_fav_num)\n",
    "series_fav_num"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "11"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# lookup by position: get value in position 2 (third)\n",
    "series_fav_num.iloc[2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "101"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# lookup by index (name): get value associated with index='matt'\n",
    "series_fav_num.loc['zeke']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can also address directly into the series object to lookup by index\n",
    "# (my mild preference nobody follows: avoid this ... a bit more ambiguous)\n",
    "series_fav_num['matt']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt       6\n",
       "riva       7\n",
       "eli     1000\n",
       "zeke     101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# each of these access methods can also set the value\n",
    "series_fav_num.iloc[2] = 1000\n",
    "series_fav_num"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check membership of item in index\n",
    "'matt' in series_fav_num.index\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'bob' in series_fav_num.index\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "1000 in series_fav_num.values\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Iterating through elements of a `pd.Series`\n",
    "\n",
    "... pretty much the same as a dictionary except pandas uses an \"index\" while a dictionary has \"keys\".\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "matt\n",
      "riva\n",
      "eli\n",
      "zeke\n"
     ]
    }
   ],
   "source": [
    "# iterating through index (note: no parenthases around .index below)\n",
    "for idx in series_fav_num.index:\n",
    "    print(idx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6\n",
      "7\n",
      "1000\n",
      "101\n"
     ]
    }
   ],
   "source": [
    "# iterating through values (notice: no parenthases on .values belwo)\n",
    "for val in series_fav_num.values:\n",
    "    print(val)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "matt 6\n",
      "riva 7\n",
      "eli 1000\n",
      "zeke 101\n"
     ]
    }
   ],
   "source": [
    "# iterating through index, value pairs (just like dict!)\n",
    "for key, val in series_fav_num.items():\n",
    "    print(key, val)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Removing an element\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# removing a pair by its corresponding index (just like dict!)\n",
    "del series_fav_num['matt']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "riva       7\n",
       "eli     1000\n",
       "zeke     101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_fav_num\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Examining a `pd.Series`\n",
    "\n",
    "Just like numpy arrays:\n",
    "- `Series.argmin()`\n",
    "    - which index has smallest value\n",
    "    - pandas gives the row number, not the index\n",
    "- `Series.argmax()`\n",
    "    - which index has largest value\n",
    "    - pandas gives the row number, not the index\n",
    "- `Series.mean()`\n",
    "- `Series.min()`\n",
    "- `Series.max()`\n",
    "- `Series.std()`\n",
    "- `Series.var()`\n",
    "\n",
    "But wait, there's more!  These are in pandas objects but not numpy array\n",
    "- `Series.count()`\n",
    "    - number of item pairs in series\n",
    "- `Series.value_counts()`\n",
    "    - count of every unique value in series (like a histogram)\n",
    "    - (see example below please)\n",
    "- `Series.describe()`\n",
    "    - summary statistics\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt       6\n",
       "riva       7\n",
       "eli       11\n",
       "zeke     101\n",
       "sally    101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dict_fav_num = {'matt': 6, 'riva': 7, 'eli': 11, 'zeke': 101, 'sally': 101}\n",
    "series_fav_num = pd.Series(dict_fav_num)\n",
    "series_fav_num\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "### Our old friends from numpy\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt       6\n",
       "riva       7\n",
       "eli       11\n",
       "zeke     101\n",
       "sally    101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# for reference\n",
    "series_fav_num\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6, 101, 50.97254162782154, 2598.2)"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# our familiar friends ...\n",
    "series_fav_num.min(), series_fav_num.max(), series_fav_num.std(), series_fav_num.var()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# notice: pandas gives the position of the row with smallest value\n",
    "# (one might think they'd get index 'matt' here instead)\n",
    "series_fav_num.argmin()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'matt'"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# index 0 (first entry) has the lowest favorite number\n",
    "idx_min = series_fav_num.argmin()\n",
    "series_fav_num.index[idx_min]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# index 3 (last entry) has the highest favorite number\n",
    "series_fav_num.argmax()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### New functionality, only in pandas\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt       6\n",
       "riva       7\n",
       "eli       11\n",
       "zeke     101\n",
       "sally    101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_fav_num\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# number of entries (rows)\n",
    "series_fav_num.count()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "101    2\n",
       "6      1\n",
       "7      1\n",
       "11     1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# how many times did each of the favorite numbers occur?\n",
    "# (101 occurs twice in series_fav_num, while all other values occur once)\n",
    "series_fav_num.value_counts()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Adelie       152\n",
       "Gentoo       124\n",
       "Chinstrap     68\n",
       "Name: species, dtype: int64"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_penguin['species'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count      5.000000\n",
       "mean      45.200000\n",
       "std       50.972542\n",
       "min        6.000000\n",
       "25%        7.000000\n",
       "50%       11.000000\n",
       "75%      101.000000\n",
       "max      101.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# describe is useful to get a sense of how values are distributed\n",
    "# \"50%\" is equivilent to the median\n",
    "# \"25%\"\" indicates that 25% of data is less than this value (and 75% is greater)\n",
    "series_fav_num.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Extracting a `pd.DataFrame` column as a series\n",
    "\n",
    "A dataframe is a two dimensional table of data.  Each row or column is a series object.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \\\n",
       "0         0       3    male  22.0      1      0   7.2500        S  Third   \n",
       "1         1       1  female  38.0      1      0  71.2833        C  First   \n",
       "2         1       3  female  26.0      0      0   7.9250        S  Third   \n",
       "3         1       1  female  35.0      1      0  53.1000        S  First   \n",
       "4         0       3    male  35.0      0      0   8.0500        S  Third   \n",
       "\n",
       "     who  adult_male deck  embark_town alive  alone  \n",
       "0    man        True  NaN  Southampton    no  False  \n",
       "1  woman       False    C    Cherbourg   yes  False  \n",
       "2  woman       False  NaN  Southampton   yes   True  \n",
       "3  woman       False    C  Southampton   yes  False  \n",
       "4    man        True  NaN  Southampton    no   True  "
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import seaborn as sns\n",
    "\n",
    "# may take a 15 sec on first run to download titanic data\n",
    "df_titanic = sns.load_dataset('titanic')\n",
    "df_titanic.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      22.0\n",
       "1      38.0\n",
       "2      26.0\n",
       "3      35.0\n",
       "4      35.0\n",
       "       ... \n",
       "886    27.0\n",
       "887    19.0\n",
       "888     NaN\n",
       "889    26.0\n",
       "890    32.0\n",
       "Name: age, Length: 891, dtype: float64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get the age column of dataframe as a series\n",
    "df_titanic['age']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## In Class Activity A\n",
    "\n",
    "- `.describe()` how much people paid to get aboard the titanic.  \n",
    "- count how many passengers of each age were on board\n",
    "- each passenger corresponds to a row, what is the index of the passenger who paid the highest price?\n",
    "- change the price paid of the passenger in row index 2 (the 3rd row) to `123`\n",
    "    - notice: does anything funny happen here?  If so ... investigate\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \\\n",
       "0         0       3    male  22.0      1      0   7.2500        S  Third   \n",
       "1         1       1  female  38.0      1      0  71.2833        C  First   \n",
       "2         1       3  female  26.0      0      0   7.9250        S  Third   \n",
       "3         1       1  female  35.0      1      0  53.1000        S  First   \n",
       "4         0       3    male  35.0      0      0   8.0500        S  Third   \n",
       "\n",
       "     who  adult_male deck  embark_town alive  alone  \n",
       "0    man        True  NaN  Southampton    no  False  \n",
       "1  woman       False    C    Cherbourg   yes  False  \n",
       "2  woman       False  NaN  Southampton   yes   True  \n",
       "3  woman       False    C  Southampton   yes  False  \n",
       "4    man        True  NaN  Southampton    no   True  "
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_titanic.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    891.000000\n",
       "mean      32.204208\n",
       "std       49.693429\n",
       "min        0.000000\n",
       "25%        7.910400\n",
       "50%       14.454200\n",
       "75%       31.000000\n",
       "max      512.329200\n",
       "Name: fare, dtype: float64"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_titanic['fare'].describe()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "24.00    30\n",
       "22.00    27\n",
       "18.00    26\n",
       "19.00    25\n",
       "28.00    25\n",
       "         ..\n",
       "36.50     1\n",
       "55.50     1\n",
       "0.92      1\n",
       "23.50     1\n",
       "74.00     1\n",
       "Name: age, Length: 88, dtype: int64"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count how many passengers of each age were on board\n",
    "df_titanic['age'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "258"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# each passenger corresponds to a row, what is the index of the passenger who paid the highest price?\n",
    "df_titanic['fare'].argmax()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        7.2500\n",
       "1       71.2833\n",
       "2      123.0000\n",
       "3       53.1000\n",
       "4        8.0500\n",
       "         ...   \n",
       "886     13.0000\n",
       "887     30.0000\n",
       "888     23.4500\n",
       "889     30.0000\n",
       "890      7.7500\n",
       "Name: fare, Length: 891, dtype: float64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# change the price paid of the passenger in row index 2 (the 3rd row) to 123\n",
    "s_fare = df_titanic['fare']\n",
    "s_fare"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_12591/3786600315.py:1: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  s_fare.iloc[2] = 123\n"
     ]
    }
   ],
   "source": [
    "s_fare.iloc[2] = 123"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>123.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>886</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>male</td>\n",
       "      <td>27.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>13.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Second</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>887</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>19.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>B</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>23.4500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>889</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>32.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.7500</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>891 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     survived  pclass     sex   age  sibsp  parch      fare embarked   class  \\\n",
       "0           0       3    male  22.0      1      0    7.2500        S   Third   \n",
       "1           1       1  female  38.0      1      0   71.2833        C   First   \n",
       "2           1       3  female  26.0      0      0  123.0000        S   Third   \n",
       "3           1       1  female  35.0      1      0   53.1000        S   First   \n",
       "4           0       3    male  35.0      0      0    8.0500        S   Third   \n",
       "..        ...     ...     ...   ...    ...    ...       ...      ...     ...   \n",
       "886         0       2    male  27.0      0      0   13.0000        S  Second   \n",
       "887         1       1  female  19.0      0      0   30.0000        S   First   \n",
       "888         0       3  female   NaN      1      2   23.4500        S   Third   \n",
       "889         1       1    male  26.0      0      0   30.0000        C   First   \n",
       "890         0       3    male  32.0      0      0    7.7500        Q   Third   \n",
       "\n",
       "       who  adult_male deck  embark_town alive  alone  \n",
       "0      man        True  NaN  Southampton    no  False  \n",
       "1    woman       False    C    Cherbourg   yes  False  \n",
       "2    woman       False  NaN  Southampton   yes   True  \n",
       "3    woman       False    C  Southampton   yes  False  \n",
       "4      man        True  NaN  Southampton    no   True  \n",
       "..     ...         ...  ...          ...   ...    ...  \n",
       "886    man        True  NaN  Southampton    no   True  \n",
       "887  woman       False    B  Southampton   yes   True  \n",
       "888  woman       False  NaN  Southampton    no  False  \n",
       "889    man        True    C    Cherbourg   yes   True  \n",
       "890    man        True  NaN   Queenstown    no   True  \n",
       "\n",
       "[891 rows x 15 columns]"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_titanic"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Pandas: DataFrame\n",
    "\n",
    "Remember:\n",
    "- `Series`:  1d data object\n",
    "- `DataFrame`: 2d data object\n",
    "\n",
    "`DataFrame`s represent two-dimensional data, like the quiz scores from last class:\n",
    "\n",
    "|           | Quiz 0 | Quiz 1 | Quiz 2 |\n",
    "|-----------|--------|--------|--------|\n",
    "| Student 0 | 80     | 90     | 50     |\n",
    "| Student 1 | 87     | 92     | 80     |\n",
    "\n",
    "Each column or row above could be considered a `Series` object\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "quiz_array = np.array([[80, 90, 50],\n",
    "                       [87, 92, 80]])\n",
    "\n",
    "df_quiz = pd.DataFrame(quiz_array, \n",
    "                       columns=('quiz0', 'quiz1', 'quiz2'), \n",
    "                       index=('student0', 'student1'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>asufdhsaidufhasiudfh</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                      quiz0  quiz1  quiz2\n",
       "student0                 80     90     50\n",
       "asufdhsaidufhasiudfh     87     92     80"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we construct a dataframe as a dictionary\n",
    "# keys of the dictionary are columns of dataframe\n",
    "# values are lists (or tuples) of the values in each column\n",
    "quiz_dict = {'quiz0': [80, 87],\n",
    "            'quiz1': [90, 92],\n",
    "            'quiz2': [50, 80]}\n",
    "pd.DataFrame(quiz_dict, index=('student0', 'asufdhsaidufhasiudfh'))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2\n",
       "0  80  90  50\n",
       "1  87  92  80"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# can we make dataframe without labelling rows / columns?\n",
    "df_quiz = pd.DataFrame(quiz_array)\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we can just add the names in afterwards if you'd like to\n",
    "df_quiz.columns = ['quiz0', 'quiz1', 'quiz2']\n",
    "df_quiz.index = ('student0', 'student1')\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Describing a `pd.DataFrame`\n",
    "\n",
    "Just like numpy arrays:\n",
    "- `DataFrame.argmin()`\n",
    "    - which index has smallest value\n",
    "    - pandas gives the row number, not the index\n",
    "- `DataFrame.argmax()`\n",
    "    - which index has largest value\n",
    "    - pandas gives the row number, not the index\n",
    "- `DataFrame.mean()`\n",
    "- `DataFrame.min()`\n",
    "- `DataFrame.max()`\n",
    "- `DataFrame.std()`\n",
    "- `DataFrame.var()`\n",
    "\n",
    "New to pandas:\n",
    "- `DataFrame.count()`\n",
    "    - number of item pairs in series\n",
    "- `DataFrame.describe()`\n",
    "    - summary statistics\n",
    "- `DataFrame.value_counts()`\n",
    "    - count how many unique rows there are\n",
    "    - see falcon / dog / cat example below please\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quiz0    83.5\n",
       "quiz1    91.0\n",
       "quiz2    65.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# by default, each method applies operation to entire column of data\n",
    "df_quiz.mean()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quiz0    83.5\n",
       "quiz1    91.0\n",
       "quiz2    65.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we can also pass axis parameter to specify if operation should be applied to row or column\n",
    "# !remember!\n",
    "# axis=0 -> apply operation across all rows (returns operation per col)\n",
    "# axis=1 -> apply operation across all cols (returns operation per row)\n",
    "df_quiz.mean(axis=0)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    73.333333\n",
       "student1    86.333333\n",
       "dtype: float64"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# applies each operation to entire column of data (row)\n",
    "df_quiz.mean(axis=1)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Take a moment to appreciate a panda:\n",
    "Those labels on the pandas objects are super help in understanding the output immediately above, right?\n",
    "\n",
    "(The `axis=0` vs `axis=1` stuff was easy to get turned around with in numpy)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>83.500000</td>\n",
       "      <td>91.000000</td>\n",
       "      <td>65.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>4.949747</td>\n",
       "      <td>1.414214</td>\n",
       "      <td>21.213203</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>80.000000</td>\n",
       "      <td>90.000000</td>\n",
       "      <td>50.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>81.750000</td>\n",
       "      <td>90.500000</td>\n",
       "      <td>57.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>83.500000</td>\n",
       "      <td>91.000000</td>\n",
       "      <td>65.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>85.250000</td>\n",
       "      <td>91.500000</td>\n",
       "      <td>72.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>87.000000</td>\n",
       "      <td>92.000000</td>\n",
       "      <td>80.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           quiz0      quiz1      quiz2\n",
       "count   2.000000   2.000000   2.000000\n",
       "mean   83.500000  91.000000  65.000000\n",
       "std     4.949747   1.414214  21.213203\n",
       "min    80.000000  90.000000  50.000000\n",
       "25%    81.750000  90.500000  57.500000\n",
       "50%    83.500000  91.000000  65.000000\n",
       "75%    85.250000  91.500000  72.500000\n",
       "max    87.000000  92.000000  80.000000"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# describe only works on columns (no axis param given)\n",
    "df_quiz.describe()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>num_legs</th>\n",
       "      <th>num_wings</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>falcon</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dog</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cat</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ant</th>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        num_legs  num_wings\n",
       "falcon         2          2\n",
       "dog            4          0\n",
       "cat            4          0\n",
       "ant            6          0"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# borrowing from pandas documentation for new example\n",
    "df = pd.DataFrame({'num_legs': [2, 4, 4, 6],\n",
    "                   'num_wings': [2, 0, 0, 0]},\n",
    "                  index=['falcon', 'dog', 'cat', 'ant'])\n",
    "df\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "num_legs  num_wings\n",
       "4         0            2\n",
       "2         2            1\n",
       "6         0            1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# notice that value_counts() gives \n",
    "df.value_counts()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`value_counts()` on a `pd.DataFrame` tells us how many times we observed each full row.  It tells us that `df` has:\n",
    "- 2 row(s) in `df` with `num_legs=4, num_wings=0`  \n",
    "- 1 row(s) in `df` with `num_legs=2, num_wings=2`\n",
    "- 1 row(s) in `df` with `num_legs=6, num_wings=0`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Indexing / Accessing a DataFrame\n",
    "- indexing: \n",
    "    - `.loc[]` indexing by name of row or column\n",
    "    - `.iloc[]` indexing by position integer (0, 1, 2, 3, 4 ...)\n",
    "    & slicing & subsets\n",
    "- using the slice operator `:` to get full rows or columns\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87, 50, 89],\n",
    "            'quiz1': [90, 92, 24, 85],\n",
    "            'quiz2': [50, 80, 21, 40]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "80"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# indexing data by \"name\"\n",
    "# remember: rows first, then columns ... \n",
    "# 1st entry describes which row ('student0')\n",
    "# 2nd entry describes which col ('quiz0')\n",
    "\n",
    "df_quiz.loc['student0', 'quiz0']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# index data by position\n",
    "# 1st entry describes which row.  0 -> the 1st (topmost) row\n",
    "# 2nd entry describes which col.  2 -> the 3rd (from the left) col\n",
    "df_quiz.iloc[0, 2]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### you can use same slicing syntaxes on both .loc and .iloc\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    90\n",
       "student1    92\n",
       "student2    24\n",
       "student3    85\n",
       "Name: quiz1, dtype: int64"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get the column with idx 1 (second col)\n",
    "df_quiz.iloc[:, 1]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1st row, last col\n",
    "df_quiz.iloc[0, -1]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    80\n",
       "student1    87\n",
       "student2    50\n",
       "student3    89\n",
       "Name: quiz0, dtype: int64"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# all rows, only quiz0\n",
    "df_quiz.loc[:, 'quiz0']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quiz0    80\n",
       "quiz1    90\n",
       "quiz2    50\n",
       "Name: student0, dtype: int64"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# slicing with named cols and rows\n",
    "# you can get a range, by name of row/col\n",
    "# note: this includes both start and stop columns (! unlike array / list)\n",
    "df_quiz.loc['student0', 'quiz0':'quiz2' ]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quiz0    80\n",
       "quiz1    90\n",
       "Name: student0, dtype: int64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# watch out:\n",
    "# when you get ranges indexed by position: include start idx, exclude stop idx)\n",
    "df_quiz.iloc[0, 0:2 ]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    80\n",
       "student1    87\n",
       "student2    50\n",
       "student3    89\n",
       "Name: quiz0, dtype: int64"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if you access directly into dataframe, it will assume you're looking for a column\n",
    "# (below is equivilent to df_quiz.loc[:, 'quiz0'])\n",
    "# mild preference: avoid this\n",
    "df_quiz['quiz0']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### I've seen someone use `pd.DataFrame.ix` to index like above, what does that do?\n",
    "\n",
    "It was something of a hybrid between `.iloc` / `.loc` ... but it was weird to use.\n",
    "\n",
    "[Please don't use it.](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.ix.html)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Modifying a DataFrame\n",
    "- updating values: single cell\n",
    "- adding a new column or row\n",
    "    - good practice: use a `pd.Series` to add a new row / col\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87, 50, 89],\n",
    "            'quiz1': [90, 92, 24, 85],\n",
    "            'quiz2': [50, 80, 21, 40]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80    123     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# setting single entry in dataframe\n",
    "df_quiz.loc['student0', 'quiz1'] = 123\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80    123    456\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# setting multiple (contiguous) entries in dataframe\n",
    "df_quiz.loc['student0', 'quiz1': 'quiz2'] = 123, 456\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "      <th>overall grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2 overall grade\n",
       "student0     80    123    456             b\n",
       "student1     87     92     80             c\n",
       "student2     50     24     21             d\n",
       "student3     89     85     40             a"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# adding a new column (error prone handling of indexing ... which student got which grade?)\n",
    "df_quiz['overall grade'] = 'b' , 'c', 'd', 'a'\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80    123    456\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# delete a column\n",
    "del df_quiz['overall grade']\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student1                            b-\n",
       "student2          f (no quizzes taken)\n",
       "student1928918                    asdf\n",
       "dtype: object"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# adding a column (next 2 cells) robust way of handling indexing\n",
    "# by explicilty labelling the index we're sure to match more explicitly\n",
    "s_overgrade = pd.Series({'student1': 'b-',                         \n",
    "                         'student2': 'f (no quizzes taken)',\n",
    "                        'student1928918': 'asdf'})\n",
    "s_overgrade\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "      <th>overall grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>b-</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>f (no quizzes taken)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2         overall grade\n",
       "student0     80    123    456                   NaN\n",
       "student1     87     92     80                    b-\n",
       "student2     50     24     21  f (no quizzes taken)\n",
       "student3     89     85     40                   NaN"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# notice how pandas helps us out in aligning our new column with proper row\n",
    "df_quiz.loc[: , 'overall grade'] = s_overgrade\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "      <th>overall grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>b-</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>f (no quizzes taken)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2         overall grade\n",
       "student1     87     92     80                    b-\n",
       "student2     50     24     21  f (no quizzes taken)\n",
       "student3     89     85     40                   NaN"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# how to 'drop' a row (returns a dataframe with row removed)\n",
    "df_quiz_short = df_quiz.drop('student0')\n",
    "df_quiz_short\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "      <th>overall grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>b-</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>f (no quizzes taken)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz1  quiz2         overall grade\n",
       "student0    123    456                   NaN\n",
       "student1     92     80                    b-\n",
       "student2     24     21  f (no quizzes taken)\n",
       "student3     85     40                   NaN"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can drop a column too by specifying `axis=1`\n",
    "# (by default it uses axis=0 to drop rows)\n",
    "df_quiz.drop('quiz0', axis=1)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# If slicing fails ... just pass a list\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87, 50, 89],\n",
    "            'quiz1': [90, 92, 24, 85],\n",
    "            'quiz2': [50, 80, 21, 40]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# how to get an arbitrary set of rows\n",
    "df_quiz.loc[['student0', 'student2', 'student3'], :]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## In Class Activity B\n",
    "1. Build the following `df_grade`.  Be sure to include the row and column names:\n",
    "\n",
    "|        | StudentB | StudentA | StudentC |\n",
    "|-------:|----:|------:|------:|\n",
    "| Quiz 1 |  89 |   100 |    78 |\n",
    "| Quiz 2 |  75 |    90 |    90 |\n",
    "| Quiz 3 |  93 |    85 |    65 |\n",
    "| Quiz 4 |  92 |    92 |    76 |\n",
    "\n",
    "1. index into this dataframe to build a `df_grade_subset`:\n",
    "    - only includes rows studentB and studentC\n",
    "    - only includes Quiz 2, Quiz 3, Quiz 4\n",
    "1. Using the `df_grade_subset` from the step above:\n",
    "    * calculate mean scores of studentB and studentC from the selected quizes\n",
    "    * calculate mean score of each quiz \n",
    "        * (remember the `axis` parameter)\n",
    "        \n",
    "Operating on `df_grade`:\n",
    "1. Add a new column `'StudentD'` with grades `60, 70, 80, 90` for quizes 1, 2, 3, 4 respectively\n",
    "    * can you do this by adding a new `pd.Series` object (to be a bit more explicit)?\n",
    "1. Add a new row, `quiz5`, with any grades\n",
    "1. Delete StudentC's column\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "      <th>StudentC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89</td>\n",
       "      <td>100</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75</td>\n",
       "      <td>90</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93</td>\n",
       "      <td>85</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92</td>\n",
       "      <td>92</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA  StudentC\n",
       "Quiz1        89       100        78\n",
       "Quiz2        75        90        90\n",
       "Quiz3        93        85        65\n",
       "Quiz4        92        92        76"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "student_grade_dict = {'StudentB': [89, 75, 93, 92],\n",
    "                      'StudentA': [100, 90, 85, 92],\n",
    "                      'StudentC': [78, 90, 65, 76]}\n",
    "\n",
    "df_quiz = pd.DataFrame(student_grade_dict, index=('Quiz1', 'Quiz2', 'Quiz3', 'Quiz4'))\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentC\n",
       "Quiz2        75        90\n",
       "Quiz3        93        65\n",
       "Quiz4        92        76"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# index into this dataframe to build a df_grade_subset:\n",
    "# only includes rows studentB and studentC\n",
    "# only includes Quiz 2, Quiz 3, Quiz 4\n",
    "df_grade_subset = df_quiz.loc['Quiz2': , ['StudentB', 'StudentC']]\n",
    "df_grade_subset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StudentB    86.666667\n",
       "StudentC    77.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Using the df_grade_subset from the step above:\n",
    "# calculate mean scores of studentB and studentC from the selected quizes\n",
    "df_grade_subset.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Quiz1    89.000000\n",
       "Quiz2    85.000000\n",
       "Quiz3    81.000000\n",
       "Quiz4    86.666667\n",
       "dtype: float64"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate mean score of each quiz\n",
    "# (remember the axis parameter)\n",
    "df_quiz.mean(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "      <th>StudentC</th>\n",
       "      <th>StudentD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89</td>\n",
       "      <td>100</td>\n",
       "      <td>78</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75</td>\n",
       "      <td>90</td>\n",
       "      <td>90</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93</td>\n",
       "      <td>85</td>\n",
       "      <td>65</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92</td>\n",
       "      <td>92</td>\n",
       "      <td>76</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA  StudentC  StudentD\n",
       "Quiz1        89       100        78        60\n",
       "Quiz2        75        90        90        70\n",
       "Quiz3        93        85        65        80\n",
       "Quiz4        92        92        76        90"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Operating on df_grade:\n",
    "# Add a new column 'StudentD' with grades 60, 70, 80, 90 for quizes 1, 2, 3, 4 respectively\n",
    "# can you do this by adding a new pd.Series object (to be a bit more explicit)?\n",
    "\n",
    "# implicit (not great)\n",
    "# df_quiz.loc[:, 'StudentD'] = 60, 70, 80, 90\n",
    "\n",
    "# a bit more error-proof (notice: order doesn't matter!)\n",
    "s_studentd = pd.Series({'Quiz1': 60, 'Quiz4': 90, 'Quiz2': 70, 'Quiz3': 80})\n",
    "df_quiz.loc[:, 'StudentD'] = s_studentd\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "      <th>StudentD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92.0</td>\n",
       "      <td>92.0</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz5</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA  StudentD\n",
       "Quiz1      89.0     100.0      60.0\n",
       "Quiz2      75.0      90.0      70.0\n",
       "Quiz3      93.0      85.0      80.0\n",
       "Quiz4      92.0      92.0      90.0\n",
       "Quiz5       1.0       2.0       NaN"
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dict_studentd = {'Quiz1': 60, 'Quiz4': 90, 'Quiz2': 70, 'Quiz3': 80}\n",
    "df_quiz.loc[:, 'StudentD'] = dict_studentd\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "      <th>StudentC</th>\n",
       "      <th>StudentD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>78.0</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92.0</td>\n",
       "      <td>92.0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz5</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA  StudentC  StudentD\n",
       "Quiz1      89.0     100.0      78.0      60.0\n",
       "Quiz2      75.0      90.0      90.0      70.0\n",
       "Quiz3      93.0      85.0      65.0      80.0\n",
       "Quiz4      92.0      92.0      76.0      90.0\n",
       "Quiz5       1.0       2.0       3.0       4.0"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Add a new row, quiz5, with any grades\n",
    "\n",
    "# implicit way\n",
    "df_quiz.loc['Quiz5', :] = 1, 2, 3, 4\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "      <th>StudentD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92.0</td>\n",
       "      <td>92.0</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz5</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA  StudentD\n",
       "Quiz1      89.0     100.0      60.0\n",
       "Quiz2      75.0      90.0      70.0\n",
       "Quiz3      93.0      85.0      80.0\n",
       "Quiz4      92.0      92.0      90.0\n",
       "Quiz5       1.0       2.0       4.0"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Delete StudentC's row\n",
    "del df_quiz['StudentC']\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Operating on DataFrame & Series Objects\n",
    "\n",
    "Your operators do pretty much what you'd expect them to.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 128,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87],\n",
    "            'quiz1': [90, 92],\n",
    "            'quiz2': [50, 80]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1'))\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80000</td>\n",
       "      <td>90000</td>\n",
       "      <td>50000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87000</td>\n",
       "      <td>92000</td>\n",
       "      <td>80000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0  80000  90000  50000\n",
       "student1  87000  92000  80000"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz * 1000\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>1000000000079</td>\n",
       "      <td>1000000000089</td>\n",
       "      <td>1000000000049</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  quiz0          quiz1          quiz2\n",
       "student0  1000000000079  1000000000089  1000000000049\n",
       "student1             87             92             80"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# thats some extra credit ...\n",
    "df_quiz.loc['student0', :] += 999999999999\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>1000000000079</td>\n",
       "      <td>1000000000089</td>\n",
       "      <td>1000000000049</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  quiz0          quiz1          quiz2\n",
       "student0  1000000000079  1000000000089  1000000000049\n",
       "student1             87             92             80"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0   True   True   True\n",
       "student1  False  False  False"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we can also use comparison operators (super helpful, see boolean indexing next)\n",
    "df_quiz > 100\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Boolean Indexing into DataFrame\n",
    "\n",
    "Sometimes we want to grab only the rows or columns which meet a particular condition.\n",
    "\n",
    "\"Get all students whose grade was higher than 85 on quiz 1\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>60</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>30</td>\n",
       "      <td>23</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     60     60     70\n",
       "student3     30     23     64"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87, 60, 30],\n",
    "            'quiz1': [90, 92, 60, 23],\n",
    "            'quiz2': [50, 80, 70, 64]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    90\n",
       "student1    92\n",
       "student2    60\n",
       "student3    23\n",
       "Name: quiz1, dtype: int64"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# quiz 1 is a series object which contains every index's quiz 1 grade\n",
    "s_quiz1 = df_quiz.loc[:, 'quiz1']\n",
    "s_quiz1\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0     True\n",
       "student1     True\n",
       "student2    False\n",
       "student3    False\n",
       "Name: quiz1, dtype: bool"
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we create a series of booleans which is True only in the positions we're interested in\n",
    "s_bool = s_quiz1 > 85\n",
    "s_bool\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>60</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>30</td>\n",
       "      <td>23</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     60     60     70\n",
       "student3     30     23     64"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 137,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# boolean indexing: using a boolean series as index returns only those entries which are True\n",
    "# notice that since student2 & student3's quiz1 grade wasn't > 80 they aren't included below\n",
    "df_quiz.loc[s_bool, :]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       True\n",
       "1       True\n",
       "2       True\n",
       "3       True\n",
       "4       True\n",
       "       ...  \n",
       "339    False\n",
       "340    False\n",
       "341    False\n",
       "342    False\n",
       "343    False\n",
       "Length: 344, dtype: bool"
      ]
     },
     "execution_count": 145,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_bool = (df_penguin['species'] == 'Adelie') & (df_penguin['island'] == 'Torgersen')\n",
    "s_bool"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.3</td>\n",
       "      <td>20.6</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3650.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.9</td>\n",
       "      <td>17.8</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3625.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.2</td>\n",
       "      <td>19.6</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4675.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>34.1</td>\n",
       "      <td>18.1</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3475.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.0</td>\n",
       "      <td>20.2</td>\n",
       "      <td>190.0</td>\n",
       "      <td>4250.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>37.8</td>\n",
       "      <td>17.1</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3300.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>37.8</td>\n",
       "      <td>17.3</td>\n",
       "      <td>180.0</td>\n",
       "      <td>3700.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.1</td>\n",
       "      <td>17.6</td>\n",
       "      <td>182.0</td>\n",
       "      <td>3200.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.6</td>\n",
       "      <td>21.2</td>\n",
       "      <td>191.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>34.6</td>\n",
       "      <td>21.1</td>\n",
       "      <td>198.0</td>\n",
       "      <td>4400.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.6</td>\n",
       "      <td>17.8</td>\n",
       "      <td>185.0</td>\n",
       "      <td>3700.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.7</td>\n",
       "      <td>19.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.5</td>\n",
       "      <td>20.7</td>\n",
       "      <td>197.0</td>\n",
       "      <td>4500.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>34.4</td>\n",
       "      <td>18.4</td>\n",
       "      <td>184.0</td>\n",
       "      <td>3325.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>46.0</td>\n",
       "      <td>21.5</td>\n",
       "      <td>194.0</td>\n",
       "      <td>4200.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>35.9</td>\n",
       "      <td>16.6</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3050.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.8</td>\n",
       "      <td>19.4</td>\n",
       "      <td>198.0</td>\n",
       "      <td>4450.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>33.5</td>\n",
       "      <td>19.0</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3600.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.7</td>\n",
       "      <td>18.4</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3900.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.6</td>\n",
       "      <td>17.2</td>\n",
       "      <td>196.0</td>\n",
       "      <td>3550.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>45.8</td>\n",
       "      <td>18.9</td>\n",
       "      <td>197.0</td>\n",
       "      <td>4150.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>35.5</td>\n",
       "      <td>17.5</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3700.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.8</td>\n",
       "      <td>18.5</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4250.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.9</td>\n",
       "      <td>16.8</td>\n",
       "      <td>191.0</td>\n",
       "      <td>3700.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>37.2</td>\n",
       "      <td>19.4</td>\n",
       "      <td>184.0</td>\n",
       "      <td>3900.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.2</td>\n",
       "      <td>16.1</td>\n",
       "      <td>187.0</td>\n",
       "      <td>3550.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.1</td>\n",
       "      <td>19.1</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>34.6</td>\n",
       "      <td>17.2</td>\n",
       "      <td>189.0</td>\n",
       "      <td>3200.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.9</td>\n",
       "      <td>17.6</td>\n",
       "      <td>196.0</td>\n",
       "      <td>4700.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>18.8</td>\n",
       "      <td>187.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>35.1</td>\n",
       "      <td>19.4</td>\n",
       "      <td>193.0</td>\n",
       "      <td>4200.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>116</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.6</td>\n",
       "      <td>17.0</td>\n",
       "      <td>188.0</td>\n",
       "      <td>2900.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>37.3</td>\n",
       "      <td>20.5</td>\n",
       "      <td>199.0</td>\n",
       "      <td>3775.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>118</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>35.7</td>\n",
       "      <td>17.0</td>\n",
       "      <td>189.0</td>\n",
       "      <td>3350.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>119</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.1</td>\n",
       "      <td>18.6</td>\n",
       "      <td>189.0</td>\n",
       "      <td>3325.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.2</td>\n",
       "      <td>17.2</td>\n",
       "      <td>187.0</td>\n",
       "      <td>3150.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>37.7</td>\n",
       "      <td>19.8</td>\n",
       "      <td>198.0</td>\n",
       "      <td>3500.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.2</td>\n",
       "      <td>17.0</td>\n",
       "      <td>176.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.4</td>\n",
       "      <td>18.5</td>\n",
       "      <td>202.0</td>\n",
       "      <td>3875.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>124</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>35.2</td>\n",
       "      <td>15.9</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3050.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>125</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.6</td>\n",
       "      <td>19.0</td>\n",
       "      <td>199.0</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>126</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.8</td>\n",
       "      <td>17.6</td>\n",
       "      <td>191.0</td>\n",
       "      <td>3275.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.5</td>\n",
       "      <td>18.3</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4300.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.0</td>\n",
       "      <td>17.1</td>\n",
       "      <td>191.0</td>\n",
       "      <td>3050.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>44.1</td>\n",
       "      <td>18.0</td>\n",
       "      <td>210.0</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.5</td>\n",
       "      <td>17.9</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3325.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>43.1</td>\n",
       "      <td>19.2</td>\n",
       "      <td>197.0</td>\n",
       "      <td>3500.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0    Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1    Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2    Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3    Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4    Adelie  Torgersen            36.7           19.3              193.0   \n",
       "5    Adelie  Torgersen            39.3           20.6              190.0   \n",
       "6    Adelie  Torgersen            38.9           17.8              181.0   \n",
       "7    Adelie  Torgersen            39.2           19.6              195.0   \n",
       "8    Adelie  Torgersen            34.1           18.1              193.0   \n",
       "9    Adelie  Torgersen            42.0           20.2              190.0   \n",
       "10   Adelie  Torgersen            37.8           17.1              186.0   \n",
       "11   Adelie  Torgersen            37.8           17.3              180.0   \n",
       "12   Adelie  Torgersen            41.1           17.6              182.0   \n",
       "13   Adelie  Torgersen            38.6           21.2              191.0   \n",
       "14   Adelie  Torgersen            34.6           21.1              198.0   \n",
       "15   Adelie  Torgersen            36.6           17.8              185.0   \n",
       "16   Adelie  Torgersen            38.7           19.0              195.0   \n",
       "17   Adelie  Torgersen            42.5           20.7              197.0   \n",
       "18   Adelie  Torgersen            34.4           18.4              184.0   \n",
       "19   Adelie  Torgersen            46.0           21.5              194.0   \n",
       "68   Adelie  Torgersen            35.9           16.6              190.0   \n",
       "69   Adelie  Torgersen            41.8           19.4              198.0   \n",
       "70   Adelie  Torgersen            33.5           19.0              190.0   \n",
       "71   Adelie  Torgersen            39.7           18.4              190.0   \n",
       "72   Adelie  Torgersen            39.6           17.2              196.0   \n",
       "73   Adelie  Torgersen            45.8           18.9              197.0   \n",
       "74   Adelie  Torgersen            35.5           17.5              190.0   \n",
       "75   Adelie  Torgersen            42.8           18.5              195.0   \n",
       "76   Adelie  Torgersen            40.9           16.8              191.0   \n",
       "77   Adelie  Torgersen            37.2           19.4              184.0   \n",
       "78   Adelie  Torgersen            36.2           16.1              187.0   \n",
       "79   Adelie  Torgersen            42.1           19.1              195.0   \n",
       "80   Adelie  Torgersen            34.6           17.2              189.0   \n",
       "81   Adelie  Torgersen            42.9           17.6              196.0   \n",
       "82   Adelie  Torgersen            36.7           18.8              187.0   \n",
       "83   Adelie  Torgersen            35.1           19.4              193.0   \n",
       "116  Adelie  Torgersen            38.6           17.0              188.0   \n",
       "117  Adelie  Torgersen            37.3           20.5              199.0   \n",
       "118  Adelie  Torgersen            35.7           17.0              189.0   \n",
       "119  Adelie  Torgersen            41.1           18.6              189.0   \n",
       "120  Adelie  Torgersen            36.2           17.2              187.0   \n",
       "121  Adelie  Torgersen            37.7           19.8              198.0   \n",
       "122  Adelie  Torgersen            40.2           17.0              176.0   \n",
       "123  Adelie  Torgersen            41.4           18.5              202.0   \n",
       "124  Adelie  Torgersen            35.2           15.9              186.0   \n",
       "125  Adelie  Torgersen            40.6           19.0              199.0   \n",
       "126  Adelie  Torgersen            38.8           17.6              191.0   \n",
       "127  Adelie  Torgersen            41.5           18.3              195.0   \n",
       "128  Adelie  Torgersen            39.0           17.1              191.0   \n",
       "129  Adelie  Torgersen            44.1           18.0              210.0   \n",
       "130  Adelie  Torgersen            38.5           17.9              190.0   \n",
       "131  Adelie  Torgersen            43.1           19.2              197.0   \n",
       "\n",
       "     body_mass_g     sex  \n",
       "0         3750.0    Male  \n",
       "1         3800.0  Female  \n",
       "2         3250.0  Female  \n",
       "3            NaN     NaN  \n",
       "4         3450.0  Female  \n",
       "5         3650.0    Male  \n",
       "6         3625.0  Female  \n",
       "7         4675.0    Male  \n",
       "8         3475.0     NaN  \n",
       "9         4250.0     NaN  \n",
       "10        3300.0     NaN  \n",
       "11        3700.0     NaN  \n",
       "12        3200.0  Female  \n",
       "13        3800.0    Male  \n",
       "14        4400.0    Male  \n",
       "15        3700.0  Female  \n",
       "16        3450.0  Female  \n",
       "17        4500.0    Male  \n",
       "18        3325.0  Female  \n",
       "19        4200.0    Male  \n",
       "68        3050.0  Female  \n",
       "69        4450.0    Male  \n",
       "70        3600.0  Female  \n",
       "71        3900.0    Male  \n",
       "72        3550.0  Female  \n",
       "73        4150.0    Male  \n",
       "74        3700.0  Female  \n",
       "75        4250.0    Male  \n",
       "76        3700.0  Female  \n",
       "77        3900.0    Male  \n",
       "78        3550.0  Female  \n",
       "79        4000.0    Male  \n",
       "80        3200.0  Female  \n",
       "81        4700.0    Male  \n",
       "82        3800.0  Female  \n",
       "83        4200.0    Male  \n",
       "116       2900.0  Female  \n",
       "117       3775.0    Male  \n",
       "118       3350.0  Female  \n",
       "119       3325.0    Male  \n",
       "120       3150.0  Female  \n",
       "121       3500.0    Male  \n",
       "122       3450.0  Female  \n",
       "123       3875.0    Male  \n",
       "124       3050.0  Female  \n",
       "125       4000.0    Male  \n",
       "126       3275.0  Female  \n",
       "127       4300.0    Male  \n",
       "128       3050.0  Female  \n",
       "129       4000.0    Male  \n",
       "130       3325.0  Female  \n",
       "131       3500.0    Male  "
      ]
     },
     "execution_count": 146,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_penguin.loc[s_bool, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>60</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>30</td>\n",
       "      <td>23</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     60     60     70\n",
       "student3     30     23     64"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    False\n",
       "student1    False\n",
       "student2     True\n",
       "student3     True\n",
       "Name: quiz1, dtype: bool"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# what are all the students who get below a 70 on quiz1?\n",
    "s_bool = df_quiz.loc[:, 'quiz1'] < 70\n",
    "s_bool\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>60</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>30</td>\n",
       "      <td>23</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student2     60     60     70\n",
       "student3     30     23     64"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz.loc[s_bool, :]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    False\n",
       "student1     True\n",
       "student2    False\n",
       "student3    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we can build more complex conditions using \n",
    "# & (and operator)\n",
    "# | (or operator)\n",
    "\n",
    "# all students who got higher than 91 on quiz1 but didn't score higher than 90 on quiz2\n",
    "s_bool = (df_quiz.loc[:, 'quiz1'] > 91) & (df_quiz.loc[:, 'quiz2']  <= 90)\n",
    "s_bool\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz.loc[s_bool, :]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# One more thing, whats `pd.DataFrame.head()`?\n",
    "\n",
    "It grabs the \"head\" (the first few rows) of a dataframe.  DataFrames can be so big that its overwhelming to look at the whole thing, sometimes a few rows is all thats needed.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3  Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "\n",
       "   body_mass_g     sex  \n",
       "0       3750.0    Male  \n",
       "1       3800.0  Female  \n",
       "2       3250.0  Female  \n",
       "3          NaN     NaN  \n",
       "4       3450.0  Female  "
      ]
     },
     "execution_count": 149,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_penguin = sns.load_dataset('penguins')\n",
    "df_penguin.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.3</td>\n",
       "      <td>20.6</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3650.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.9</td>\n",
       "      <td>17.8</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3625.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.2</td>\n",
       "      <td>19.6</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4675.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>34.1</td>\n",
       "      <td>18.1</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3475.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.0</td>\n",
       "      <td>20.2</td>\n",
       "      <td>190.0</td>\n",
       "      <td>4250.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3  Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "5  Adelie  Torgersen            39.3           20.6              190.0   \n",
       "6  Adelie  Torgersen            38.9           17.8              181.0   \n",
       "7  Adelie  Torgersen            39.2           19.6              195.0   \n",
       "8  Adelie  Torgersen            34.1           18.1              193.0   \n",
       "9  Adelie  Torgersen            42.0           20.2              190.0   \n",
       "\n",
       "   body_mass_g     sex  \n",
       "0       3750.0    Male  \n",
       "1       3800.0  Female  \n",
       "2       3250.0  Female  \n",
       "3          NaN     NaN  \n",
       "4       3450.0  Female  \n",
       "5       3650.0    Male  \n",
       "6       3625.0  Female  \n",
       "7       4675.0    Male  \n",
       "8       3475.0     NaN  \n",
       "9       4250.0     NaN  "
      ]
     },
     "execution_count": 150,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DataFrame.head() takes an argument, the number of top rows to return\n",
    "df_penguin.head(10)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## In Class Activity C\n",
    "The `pclass` of a titanic ticket describes the passenger class.  Its unclear if larger or smaller `pclass` are the fancy tickets.  See if you can answer this question by:\n",
    "\n",
    "- `.describe()` the `fare` paid by passengers who bought `pclass=3` tickets\n",
    "- `.describe()` the `fare` paid by passengers who bought `pclass=2` tickets\n",
    "- `.describe()` the `fare` paid by passengers who bought `pclass=1` tickets\n",
    "\n",
    "(++) You can use this boolean indexing to compare groups to answer all sorts of interesting questions:\n",
    "- Survival Effectiveness: Were people who travelled alone more or less likely to survive the titanic?\n",
    "- Demographics of towns: Which town, among Cherbourg, Queenstown or Southampton, seems to have the most families?\n",
    "- Layout of the boat: Does having a higher or lower cabin number suggest one is more likely to have a higher or lower ticket class?\n",
    "    - e.g. when `pclass=1` maybe these cabin numbers are all very large or small ...\n",
    "\n",
    "Data dictionary ([not the primary source, but a source](https://jkarakas.github.io/Exploratory-Analysis-of-the-Titanic-Dataset/Titanic_Dataset_Exploratory_Analysis_No_Code.html))\n",
    "\n",
    "| Variable | Definition                                 | Key                                           |\n",
    "|----------|--------------------------------------------|-----------------------------------------------|\n",
    "| Survived | Survival                                   | 0 = No, 1 = Yes                               |\n",
    "| Pclass   | Ticket class                               | 1 = 1st, 2 = 2nd, 3 = 3rd                     |\n",
    "| Sex      | Sex                                        |                                               |\n",
    "| Age      | Age in years                               |                                               |\n",
    "| Sibsp    | # of siblings / spouses aboard the Titanic |                                               |\n",
    "| Parch    | # of parents / children aboard the Titanic |                                               |\n",
    "| Ticket   | Ticket number                              |                                               |\n",
    "| Fare     | Passenger fare                             |                                               |\n",
    "| Cabin    | Cabin number                               |                                               |\n",
    "| Embarked | Port of Embarkation                        | C = Cherbourg, Q = Queenstown,S = Southampton |\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \\\n",
       "0         0       3    male  22.0      1      0   7.2500        S  Third   \n",
       "1         1       1  female  38.0      1      0  71.2833        C  First   \n",
       "2         1       3  female  26.0      0      0   7.9250        S  Third   \n",
       "3         1       1  female  35.0      1      0  53.1000        S  First   \n",
       "4         0       3    male  35.0      0      0   8.0500        S  Third   \n",
       "\n",
       "     who  adult_male deck  embark_town alive  alone  \n",
       "0    man        True  NaN  Southampton    no  False  \n",
       "1  woman       False    C    Cherbourg   yes  False  \n",
       "2  woman       False  NaN  Southampton   yes   True  \n",
       "3  woman       False    C  Southampton   yes  False  \n",
       "4    man        True  NaN  Southampton    no   True  "
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_titanic = sns.load_dataset('titanic')\n",
    "df_titanic.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_bool = df_titanic.loc[:, 'pclass'] == 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(891, 15)"
      ]
     },
     "execution_count": 160,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_titanic.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>123.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.4583</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>21.0750</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>882</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>22.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10.5167</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>884</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>25.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>885</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>39.0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>29.1250</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>23.4500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>32.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.7500</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>491 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     survived  pclass     sex   age  sibsp  parch      fare embarked  class  \\\n",
       "0           0       3    male  22.0      1      0    7.2500        S  Third   \n",
       "2           1       3  female  26.0      0      0  123.0000        S  Third   \n",
       "4           0       3    male  35.0      0      0    8.0500        S  Third   \n",
       "5           0       3    male   NaN      0      0    8.4583        Q  Third   \n",
       "7           0       3    male   2.0      3      1   21.0750        S  Third   \n",
       "..        ...     ...     ...   ...    ...    ...       ...      ...    ...   \n",
       "882         0       3  female  22.0      0      0   10.5167        S  Third   \n",
       "884         0       3    male  25.0      0      0    7.0500        S  Third   \n",
       "885         0       3  female  39.0      0      5   29.1250        Q  Third   \n",
       "888         0       3  female   NaN      1      2   23.4500        S  Third   \n",
       "890         0       3    male  32.0      0      0    7.7500        Q  Third   \n",
       "\n",
       "       who  adult_male deck  embark_town alive  alone  \n",
       "0      man        True  NaN  Southampton    no  False  \n",
       "2    woman       False  NaN  Southampton   yes   True  \n",
       "4      man        True  NaN  Southampton    no   True  \n",
       "5      man        True  NaN   Queenstown    no   True  \n",
       "7    child       False  NaN  Southampton    no  False  \n",
       "..     ...         ...  ...          ...   ...    ...  \n",
       "882  woman       False  NaN  Southampton    no   True  \n",
       "884    man        True  NaN  Southampton    no   True  \n",
       "885  woman       False  NaN   Queenstown    no  False  \n",
       "888  woman       False  NaN  Southampton    no  False  \n",
       "890    man        True  NaN   Queenstown    no   True  \n",
       "\n",
       "[491 rows x 15 columns]"
      ]
     },
     "execution_count": 159,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# all rows with just pclass3\n",
    "df_titanic.loc[s_bool, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([3, 1, 2])"
      ]
     },
     "execution_count": 154,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_titanic['pclass'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pclass = 1\n",
      "         survived  pclass         age       sibsp       parch        fare\n",
      "count  216.000000   216.0  186.000000  216.000000  216.000000  216.000000\n",
      "mean     0.629630     1.0   38.233441    0.416667    0.356481   84.154687\n",
      "std      0.484026     0.0   14.802856    0.611898    0.693997   78.380373\n",
      "min      0.000000     1.0    0.920000    0.000000    0.000000    0.000000\n",
      "25%      0.000000     1.0   27.000000    0.000000    0.000000   30.923950\n",
      "50%      1.000000     1.0   37.000000    0.000000    0.000000   60.287500\n",
      "75%      1.000000     1.0   49.000000    1.000000    0.000000   93.500000\n",
      "max      1.000000     1.0   80.000000    3.000000    4.000000  512.329200\n",
      "pclass = 2\n",
      "         survived  pclass         age       sibsp       parch        fare\n",
      "count  184.000000   184.0  173.000000  184.000000  184.000000  184.000000\n",
      "mean     0.472826     2.0   29.877630    0.402174    0.380435   20.662183\n",
      "std      0.500623     0.0   14.001077    0.601633    0.690963   13.417399\n",
      "min      0.000000     2.0    0.670000    0.000000    0.000000    0.000000\n",
      "25%      0.000000     2.0   23.000000    0.000000    0.000000   13.000000\n",
      "50%      0.000000     2.0   29.000000    0.000000    0.000000   14.250000\n",
      "75%      1.000000     2.0   36.000000    1.000000    1.000000   26.000000\n",
      "max      1.000000     2.0   70.000000    3.000000    3.000000   73.500000\n",
      "pclass = 3\n",
      "         survived  pclass         age       sibsp       parch        fare\n",
      "count  491.000000   491.0  355.000000  491.000000  491.000000  491.000000\n",
      "mean     0.242363     3.0   25.140620    0.615071    0.393075   13.909919\n",
      "std      0.428949     0.0   12.495398    1.374883    0.888861   12.766894\n",
      "min      0.000000     3.0    0.420000    0.000000    0.000000    0.000000\n",
      "25%      0.000000     3.0   18.000000    0.000000    0.000000    7.750000\n",
      "50%      0.000000     3.0   24.000000    0.000000    0.000000    8.050000\n",
      "75%      0.000000     3.0   32.000000    1.000000    0.000000   15.500000\n",
      "max      1.000000     3.0   74.000000    8.000000    6.000000  123.000000\n"
     ]
    }
   ],
   "source": [
    "for pclass in sorted(df_titanic['pclass'].unique()):\n",
    "    print(f'pclass = {pclass}')\n",
    "    s_bool = df_titanic.loc[:, 'pclass'] == pclass\n",
    "    print(df_titanic.loc[s_bool, :].describe())"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}