{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# DS2500 Lesson6\n",
    "\n",
    "Jan 31, 2023\n",
    "\n",
    "### Content:\n",
    "- Pandas\n",
    "    - series\n",
    "    - dataframe\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# before you begin, make sure you can load data from seaborn\n",
    "import seaborn as sns\n",
    "df_penguin = sns.load_dataset('penguins')\n",
    "df_titanic = sns.load_dataset('titanic')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Having trouble?\n",
    "- see piazza for long-term solution\n",
    "    - [mac SSL error](https://piazza.com/class/lbxsbawi9yq2f9/post/55)\n",
    "- use code below for today:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# if these lines give you trouble, use the csvs available on the website\n",
    "# (be sure they're adjacent to this .ipynb file on your machine)\n",
    "import pandas as pd\n",
    "df_penguin = pd.read_csv('penguin.csv')\n",
    "df_titanic = pd.read_csv('titanic.csv');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Admin:\n",
    "- lab1\n",
    "    - talk to friends\n",
    "    - lab digest\n",
    "    - part b (part c)\n",
    "- hw0 due friday @ 11:59 PM\n",
    "    - .py and .ipynb\n",
    "    - see canvas announcement\n",
    "    - see piazza\n",
    "- look at schedule together\n",
    "- tutoring groups\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The value of talking-out-loud about programming\n",
    "\n",
    "... I learned 2 new ways to approach lab1's part B `get_win_set()` this morning!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if all items in an array are the same, then the std dev is 0\n",
    "import numpy as np\n",
    "\n",
    "# mysterious student from section 2\n",
    "np.array([1, 1, 1]).std() == 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.array([1, 2, 0])\n",
    "\n",
    "# mysterious student from section 2\n",
    "len(set(x)) == 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Pandas\n",
    "\n",
    "Pandas is a python module which stores data in `pd.DataFrame` and `pd.Series` objects.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3  Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "\n",
       "   body_mass_g     sex  \n",
       "0       3750.0    Male  \n",
       "1       3800.0  Female  \n",
       "2       3250.0  Female  \n",
       "3          NaN     NaN  \n",
       "4       3450.0  Female  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import seaborn as sns\n",
    "\n",
    "# Example DataFrame:\n",
    "# df stands for dataframe.  df_penguin is a dataframe of penguin data\n",
    "df_penguin = sns.load_dataset('penguins')\n",
    "df_penguin.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      39.1\n",
       "1      39.5\n",
       "2      40.3\n",
       "3       NaN\n",
       "4      36.7\n",
       "       ... \n",
       "339     NaN\n",
       "340    46.8\n",
       "341    50.4\n",
       "342    45.2\n",
       "343    49.9\n",
       "Name: bill_length_mm, Length: 344, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# example series: the \"s_\" is a (personal) convention for variables which are series\n",
    "s_bill_length_mm = df_penguin['bill_length_mm']\n",
    "\n",
    "s_bill_length_mm\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "\n",
    "### `pd.DataFrame` are two-dimensional, `pd.Series` are one-dimensional\n",
    "\n",
    "### If we already have `np.array()`, why do we need pandas?\n",
    "- pandas supports non numeric data (strings for categorical data, for example)\n",
    "- pandas supports reading / storing data from more formats\n",
    "    - csv (spreadsheets)\n",
    "- pandas more elegantly deals with missing data\n",
    "- pandas handles indexing woes\n",
    "\n",
    "You could do almost everything pandas does with numpy arrays ... but it'd be much more difficult to accomplish.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Pandas Series\n",
    "\n",
    "### building:\n",
    "- building: default index\n",
    "- building: custom index\n",
    "- building: from a dict\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "\n",
       "   body_mass_g     sex  \n",
       "0       3750.0    Male  \n",
       "1       3800.0  Female  \n",
       "2       3250.0  Female  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# look at first 3 rows of dataframe (for reference)\n",
    "df_penguin.head(3)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "species                 Adelie\n",
       "island               Torgersen\n",
       "bill_length_mm            39.1\n",
       "bill_depth_mm             18.7\n",
       "flipper_length_mm        181.0\n",
       "body_mass_g             3750.0\n",
       "sex                       Male\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# each row, or column of dataframe is a series object\n",
    "# below is first row of dataframe (more on iloc indexing later...)\n",
    "# (remember: each row is a sample -> this is 1 penguin's data)\n",
    "penguin0_series = df_penguin.iloc[0, :]\n",
    "penguin0_series\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas series contain a sequence of labelled data elements:\n",
    "- penguin0's `species` is `Adelie`\n",
    "- penguin0's `island` is `Torgersen`\n",
    "- penguin0's `bill_length_mm` is `39.1` ...\n",
    "- penguin0's `<index-name>` is `<corresponding-value>`\n",
    "\n",
    "A series is quite similar to a dictionary ...\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "penguin0_dict = {'species': 'Adelie',\n",
    "                 'sex': 'Male',\n",
    "                 'island': 'Torgersen',\n",
    "                 'bill_length_mm': 39.1,\n",
    "                 'bill_depth_mm': 18.7,\n",
    "                 'flipper_length_mm': 181.0,\n",
    "                 'body_mass_g': 3750.0}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "species                 Adelie\n",
       "sex                       Male\n",
       "island               Torgersen\n",
       "bill_length_mm            39.1\n",
       "bill_depth_mm             18.7\n",
       "flipper_length_mm        181.0\n",
       "body_mass_g             3750.0\n",
       "dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# build a series from dict\n",
    "penguin0_series = pd.Series(penguin0_dict)\n",
    "penguin0_series\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "species              Torgersen\n",
       "island                    39.1\n",
       "bill_length_mm            18.7\n",
       "bill_depth_mm            181.0\n",
       "flipper_length_mm       3750.0\n",
       "body_mass_g               Male\n",
       "sex                     Adelie\n",
       "dtype: object"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can also pass two corresponding lists / tuples\n",
    "index = ['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']\n",
    "values = ['Adelie', 'Torgersen', 39.1, 18.7, 181.0, 3750.0, 'Male']\n",
    "\n",
    "penguin0_series = pd.Series(values, index=index)\n",
    "penguin0_series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0          vanilla\n",
       "1        chocolate\n",
       "2    cherry garcia\n",
       "3          oatmeal\n",
       "dtype: object"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sometimes your data has no meaningful index\n",
    "# pandas will default to indexing things with integers\n",
    "ice_cream_flavors = 'vanilla', 'chocolate', 'cherry garcia', 'oatmeal'\n",
    "pd.Series(ice_cream_flavors)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Adelie', 'Male', 'Torgersen', 39.1, 18.7, 181.0, 3750.0],\n",
       "      dtype=object)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can access values as an array via .values\n",
    "penguin0_series.values\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['species', 'sex', 'island', 'bill_length_mm', 'bill_depth_mm',\n",
       "       'flipper_length_mm', 'body_mass_g'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can access index (as a special pandas \"index\" object) via .index\n",
    "penguin0_series.index\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### indexing into `pd.Series`: accessing / changing data\n",
    "- accessing / setting using index:\n",
    "    - by name: `series.loc[name]`\n",
    "    - by position: `series.iloc[idx]`\n",
    "- iterating: keys, items, iteritems (much like dict)\n",
    "- deleting an entry\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt      6\n",
       "riva      7\n",
       "eli      11\n",
       "zeke    101\n",
       "sal     101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dict_fav_num = {'matt': 6, 'riva': 7, 'eli': 11, 'zeke': 101, 'sal': 101}\n",
    "series_fav_num = pd.Series(dict_fav_num)\n",
    "series_fav_num\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "11"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# lookup by position: get value in position 2 (third)\n",
    "series_fav_num.iloc[2]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# lookup by index (name): get value associated with index='matt'\n",
    "series_fav_num.loc['matt']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can also address directly into the series object to lookup by index\n",
    "# (my mild preference nobody follows: avoid this ... a bit more ambiguous)\n",
    "series_fav_num['matt']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt       6\n",
       "riva       7\n",
       "eli     1000\n",
       "zeke     101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# each of these access methods can also set the value\n",
    "series_fav_num.iloc[2] = 1000\n",
    "series_fav_num"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check membership of item in index\n",
    "'matt' in series_fav_num.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'bob' in series_fav_num.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "1000 in series_fav_num.values\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Iterating through elements of a `pd.Series`\n",
    "\n",
    "... pretty much the same as a dictionary except pandas uses an \"index\" while a dictionary has \"keys\".\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "matt\n",
      "riva\n",
      "eli\n",
      "zeke\n"
     ]
    }
   ],
   "source": [
    "# iterating through index (note: no parenthases around .index below)\n",
    "for idx in series_fav_num.index:\n",
    "    print(idx)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6\n",
      "7\n",
      "1000\n",
      "101\n"
     ]
    }
   ],
   "source": [
    "# iterating through values (notice: no parenthases on .values belwo)\n",
    "for val in series_fav_num.values:\n",
    "    print(val)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "matt 6\n",
      "riva 7\n",
      "eli 1000\n",
      "zeke 101\n"
     ]
    }
   ],
   "source": [
    "# iterating through index, value pairs (just like dict!)\n",
    "for key, val in series_fav_num.items():\n",
    "    print(key, val)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Removing an element\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# removing a pair by its corresponding index (just like dict!)\n",
    "del series_fav_num['matt']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "riva       7\n",
       "eli     1000\n",
       "zeke     101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_fav_num\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Examining a `pd.Series`\n",
    "\n",
    "Just like numpy arrays:\n",
    "- `Series.argmin()`\n",
    "    - which index has smallest value\n",
    "    - pandas gives the row number, not the index\n",
    "- `Series.argmax()`\n",
    "    - which index has largest value\n",
    "    - pandas gives the row number, not the index\n",
    "- `Series.mean()`\n",
    "- `Series.min()`\n",
    "- `Series.max()`\n",
    "- `Series.std()`\n",
    "- `Series.var()`\n",
    "\n",
    "But wait, there's more!  These are in pandas objects but not numpy array\n",
    "- `Series.count()`\n",
    "    - number of item pairs in series\n",
    "- `Series.value_counts()`\n",
    "    - count of every unique value in series (like a histogram)\n",
    "    - (see example below please)\n",
    "- `Series.describe()`\n",
    "    - summary statistics\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt       6\n",
       "riva       7\n",
       "eli       11\n",
       "zeke     101\n",
       "sally    101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dict_fav_num = {'matt': 6, 'riva': 7, 'eli': 11, 'zeke': 101, 'sally': 101}\n",
    "series_fav_num = pd.Series(dict_fav_num)\n",
    "series_fav_num\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "### Our old friends from numpy\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt       6\n",
       "riva       7\n",
       "eli       11\n",
       "zeke     101\n",
       "sally    101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# for reference\n",
    "series_fav_num\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6, 101, 50.97254162782154, 2598.2)"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# our familiar friends ...\n",
    "series_fav_num.min(), series_fav_num.max(), series_fav_num.std(), series_fav_num.var()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# notice: pandas gives the position of the row with smallest value\n",
    "# (one might think they'd get index 'matt' here instead)\n",
    "series_fav_num.argmin()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'matt'"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# index 0 (first entry) has the lowest favorite number\n",
    "idx_min = series_fav_num.argmin()\n",
    "series_fav_num.index[idx_min]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# index 3 (last entry) has the highest favorite number\n",
    "series_fav_num.argmax()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### New functionality, only in pandas\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matt       6\n",
       "riva       7\n",
       "eli       11\n",
       "zeke     101\n",
       "sally    101\n",
       "dtype: int64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_fav_num"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(5,)"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series_fav_num.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# number of entries (rows)\n",
    "series_fav_num.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "101    2\n",
       "6      1\n",
       "7      1\n",
       "11     1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# how many times did each of the favorite numbers occur?\n",
    "# (101 occurs twice in series_fav_num, while all other values occur once)\n",
    "series_fav_num.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Adelie       152\n",
       "Gentoo       124\n",
       "Chinstrap     68\n",
       "Name: species, dtype: int64"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_penguin['species'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Biscoe       168\n",
       "Dream        124\n",
       "Torgersen     52\n",
       "Name: island, dtype: int64"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_penguin['island'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count      5.000000\n",
       "mean      45.200000\n",
       "std       50.972542\n",
       "min        6.000000\n",
       "25%        7.000000\n",
       "50%       11.000000\n",
       "75%      101.000000\n",
       "max      101.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# describe is useful to get a sense of how values are distributed\n",
    "# \"50%\" is equivilent to the median\n",
    "# \"25%\"\" indicates that 25% of data is less than this value (and 75% is greater)\n",
    "series_fav_num.describe()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Extracting a `pd.DataFrame` column as a series\n",
    "\n",
    "A dataframe is a two dimensional table of data.  Each row or column is a series object.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \\\n",
       "0         0       3    male  22.0      1      0   7.2500        S  Third   \n",
       "1         1       1  female  38.0      1      0  71.2833        C  First   \n",
       "2         1       3  female  26.0      0      0   7.9250        S  Third   \n",
       "3         1       1  female  35.0      1      0  53.1000        S  First   \n",
       "4         0       3    male  35.0      0      0   8.0500        S  Third   \n",
       "\n",
       "     who  adult_male deck  embark_town alive  alone  \n",
       "0    man        True  NaN  Southampton    no  False  \n",
       "1  woman       False    C    Cherbourg   yes  False  \n",
       "2  woman       False  NaN  Southampton   yes   True  \n",
       "3  woman       False    C  Southampton   yes  False  \n",
       "4    man        True  NaN  Southampton    no   True  "
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import seaborn as sns\n",
    "\n",
    "# may take a 15 sec on first run to download titanic data\n",
    "df_titanic = sns.load_dataset('titanic')\n",
    "df_titanic.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0\n",
       "1      1\n",
       "2      1\n",
       "3      1\n",
       "4      0\n",
       "      ..\n",
       "886    0\n",
       "887    1\n",
       "888    0\n",
       "889    1\n",
       "890    0\n",
       "Name: survived, Length: 891, dtype: int64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get the age column of dataframe as a series\n",
    "df_titanic['survived']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## In Class Activity A\n",
    "\n",
    "- `.describe()` how much people paid to get aboard the titanic.  \n",
    "- count how many passengers of each age were on board\n",
    "- each passenger corresponds to a row, what is the index of the passenger who paid the highest price?\n",
    "- change the price paid of the passenger in row index 2 (the 3rd row) to `123`\n",
    "    - notice: does anything funny happen here?  If so ... investigate\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \\\n",
       "0         0       3    male  22.0      1      0   7.2500        S  Third   \n",
       "1         1       1  female  38.0      1      0  71.2833        C  First   \n",
       "2         1       3  female  26.0      0      0   7.9250        S  Third   \n",
       "3         1       1  female  35.0      1      0  53.1000        S  First   \n",
       "4         0       3    male  35.0      0      0   8.0500        S  Third   \n",
       "\n",
       "     who  adult_male deck  embark_town alive  alone  \n",
       "0    man        True  NaN  Southampton    no  False  \n",
       "1  woman       False    C    Cherbourg   yes  False  \n",
       "2  woman       False  NaN  Southampton   yes   True  \n",
       "3  woman       False    C  Southampton   yes  False  \n",
       "4    man        True  NaN  Southampton    no   True  "
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_titanic.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    891.000000\n",
       "mean      32.204208\n",
       "std       49.693429\n",
       "min        0.000000\n",
       "25%        7.910400\n",
       "50%       14.454200\n",
       "75%       31.000000\n",
       "max      512.329200\n",
       "Name: fare, dtype: float64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# .describe() how much people paid to get aboard the titanic.\n",
    "df_titanic['fare'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "24.00    30\n",
       "22.00    27\n",
       "18.00    26\n",
       "19.00    25\n",
       "28.00    25\n",
       "         ..\n",
       "36.50     1\n",
       "55.50     1\n",
       "0.92      1\n",
       "23.50     1\n",
       "74.00     1\n",
       "Name: age, Length: 88, dtype: int64"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count how many passengers of each age were on board\n",
    "df_titanic['age'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "258"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# each passenger corresponds to a row, what is the index of the passenger who paid the highest price?\n",
    "df_titanic['fare'].argmax()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_5765/1820496597.py:4: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  s_fare.iloc[2] = 12345\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0          7.2500\n",
       "1         71.2833\n",
       "2      12345.0000\n",
       "3         53.1000\n",
       "4          8.0500\n",
       "          ...    \n",
       "886       13.0000\n",
       "887       30.0000\n",
       "888       23.4500\n",
       "889       30.0000\n",
       "890        7.7500\n",
       "Name: fare, Length: 891, dtype: float64"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# change the price paid of the passenger in row index 2 (the 3rd row) to 123\n",
    "# notice: does anything funny happen here? If so ... investigate\n",
    "s_fare = df_titanic['fare']\n",
    "s_fare.iloc[2] = 12345\n",
    "s_fare"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12345.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived  pclass     sex   age  sibsp  parch        fare embarked  class  \\\n",
       "0         0       3    male  22.0      1      0      7.2500        S  Third   \n",
       "1         1       1  female  38.0      1      0     71.2833        C  First   \n",
       "2         1       3  female  26.0      0      0  12345.0000        S  Third   \n",
       "3         1       1  female  35.0      1      0     53.1000        S  First   \n",
       "4         0       3    male  35.0      0      0      8.0500        S  Third   \n",
       "\n",
       "     who  adult_male deck  embark_town alive  alone  \n",
       "0    man        True  NaN  Southampton    no  False  \n",
       "1  woman       False    C    Cherbourg   yes  False  \n",
       "2  woman       False  NaN  Southampton   yes   True  \n",
       "3  woman       False    C  Southampton   yes  False  \n",
       "4    man        True  NaN  Southampton    no   True  "
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# notice anything different since we modified the series directly above?  (... and why?)\n",
    "df_titanic.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Pandas: DataFrame\n",
    "\n",
    "Remember:\n",
    "- `Series`:  1d data object\n",
    "- `DataFrame`: 2d data object\n",
    "\n",
    "`DataFrame`s represent two-dimensional data, like the quiz scores from last class:\n",
    "\n",
    "|           | Quiz 0 | Quiz 1 | Quiz 2 |\n",
    "|-----------|--------|--------|--------|\n",
    "| Student 0 | 80     | 90     | 50     |\n",
    "| Student 1 | 87     | 92     | 80     |\n",
    "\n",
    "Each column or row above could be considered a `Series` object\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "quiz_array = np.array([[80, 90, 50],\n",
    "                       [87, 92, 80]])\n",
    "\n",
    "df_quiz = pd.DataFrame(quiz_array, \n",
    "                       columns=('quiz0', 'quiz1', 'quiz2'), \n",
    "                       index=('student0', 'student1'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>asdpfiuhasdifuh</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 quiz0  quiz1  quiz2\n",
       "student0            80     90     50\n",
       "asdpfiuhasdifuh     87     92     80"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we construct a dataframe as a dictionary\n",
    "# keys of the dictionary are columns of dataframe\n",
    "# values are lists (or tuples) of the values in each column\n",
    "quiz_dict = {'quiz0': [80, 87],\n",
    "            'quiz1': [90, 92],\n",
    "            'quiz2': [50, 80]}\n",
    "pd.DataFrame(quiz_dict, index=('student0', 'asdpfiuhasdifuh'))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2\n",
       "0  80  90  50\n",
       "1  87  92  80"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# can we make dataframe without labelling rows / columns?\n",
    "df_quiz = pd.DataFrame(quiz_array)\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we can just add the names in afterwards if you'd like to\n",
    "df_quiz.columns = ['quiz0', 'quiz1', 'quiz2']\n",
    "df_quiz.index = ('student0', 'student1')\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Describing a `pd.DataFrame`\n",
    "\n",
    "Just like numpy arrays:\n",
    "- `DataFrame.argmin()`\n",
    "    - which index has smallest value\n",
    "    - pandas gives the row number, not the index\n",
    "- `DataFrame.argmax()`\n",
    "    - which index has largest value\n",
    "    - pandas gives the row number, not the index\n",
    "- `DataFrame.mean()`\n",
    "- `DataFrame.min()`\n",
    "- `DataFrame.max()`\n",
    "- `DataFrame.std()`\n",
    "- `DataFrame.var()`\n",
    "\n",
    "New to pandas:\n",
    "- `DataFrame.count()`\n",
    "    - number of item pairs in series\n",
    "- `DataFrame.describe()`\n",
    "    - summary statistics\n",
    "- `DataFrame.value_counts()`\n",
    "    - count how many unique rows there are\n",
    "    - see falcon / dog / cat example below please\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quiz0    83.5\n",
       "quiz1    91.0\n",
       "quiz2    65.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# by default, each method applies operation to entire column of data\n",
    "df_quiz.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quiz0    83.5\n",
       "quiz1    91.0\n",
       "quiz2    65.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we can also pass axis parameter to specify if operation should be applied to row or column\n",
    "# !remember!\n",
    "# axis=0 -> apply operation across all rows (returns operation per col)\n",
    "# axis=1 -> apply operation across all cols (returns operation per row)\n",
    "df_quiz.mean(axis=0)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    73.333333\n",
       "student1    86.333333\n",
       "dtype: float64"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# applies each operation to entire column of data (row)\n",
    "df_quiz.mean(axis=1)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Take a moment to appreciate a panda:\n",
    "Those labels on the pandas objects are super help in understanding the output immediately above, right?\n",
    "\n",
    "(The `axis=0` vs `axis=1` stuff was easy to get turned around with in numpy)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>83.500000</td>\n",
       "      <td>91.000000</td>\n",
       "      <td>65.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>4.949747</td>\n",
       "      <td>1.414214</td>\n",
       "      <td>21.213203</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>80.000000</td>\n",
       "      <td>90.000000</td>\n",
       "      <td>50.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>81.750000</td>\n",
       "      <td>90.500000</td>\n",
       "      <td>57.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>83.500000</td>\n",
       "      <td>91.000000</td>\n",
       "      <td>65.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>85.250000</td>\n",
       "      <td>91.500000</td>\n",
       "      <td>72.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>87.000000</td>\n",
       "      <td>92.000000</td>\n",
       "      <td>80.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           quiz0      quiz1      quiz2\n",
       "count   2.000000   2.000000   2.000000\n",
       "mean   83.500000  91.000000  65.000000\n",
       "std     4.949747   1.414214  21.213203\n",
       "min    80.000000  90.000000  50.000000\n",
       "25%    81.750000  90.500000  57.500000\n",
       "50%    83.500000  91.000000  65.000000\n",
       "75%    85.250000  91.500000  72.500000\n",
       "max    87.000000  92.000000  80.000000"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# describe only works on columns (no axis param given)\n",
    "df_quiz.describe()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>num_legs</th>\n",
       "      <th>num_wings</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>falcon</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dog</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cat</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ant</th>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        num_legs  num_wings\n",
       "falcon         2          2\n",
       "dog            4          0\n",
       "cat            4          0\n",
       "ant            6          0"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# borrowing from pandas documentation for new example\n",
    "df = pd.DataFrame({'num_legs': [2, 4, 4, 6],\n",
    "                   'num_wings': [2, 0, 0, 0]},\n",
    "                  index=['falcon', 'dog', 'cat', 'ant'])\n",
    "df\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "num_legs  num_wings\n",
       "4         0            2\n",
       "2         2            1\n",
       "6         0            1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# notice that value_counts() gives \n",
    "df.value_counts()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "species    island   \n",
       "Gentoo     Biscoe       124\n",
       "Chinstrap  Dream         68\n",
       "Adelie     Dream         56\n",
       "           Torgersen     52\n",
       "           Biscoe        44\n",
       "dtype: int64"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_penguin.loc[:, :'island'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`value_counts()` on a `pd.DataFrame` tells us how many times we observed each full row.  It tells us that `df` has:\n",
    "- 2 row(s) in `df` with `num_legs=4, num_wings=0`  \n",
    "- 1 row(s) in `df` with `num_legs=2, num_wings=2`\n",
    "- 1 row(s) in `df` with `num_legs=6, num_wings=0`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Indexing / Accessing a DataFrame\n",
    "- indexing: \n",
    "    - `.loc[]` indexing by name of row or column\n",
    "    - `.iloc[]` indexing by position integer (0, 1, 2, 3, 4 ...)\n",
    "    & slicing & subsets\n",
    "- using the slice operator `:` to get full rows or columns\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87, 50, 89],\n",
    "            'quiz1': [90, 92, 24, 85],\n",
    "            'quiz2': [50, 80, 21, 40]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "80"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# indexing data by \"name\"\n",
    "# remember: rows first, then columns ... \n",
    "# 1st entry describes which row ('student0')\n",
    "# 2nd entry describes which col ('quiz0')\n",
    "\n",
    "df_quiz.loc['student0', 'quiz0']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# index data by position\n",
    "# 1st entry describes which row.  0 -> the 1st (topmost) row\n",
    "# 2nd entry describes which col.  2 -> the 3rd (from the left) col\n",
    "df_quiz.iloc[0, 2]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### you can use same slicing syntaxes on both .loc and .iloc\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    90\n",
       "student1    92\n",
       "student2    24\n",
       "student3    85\n",
       "Name: quiz1, dtype: int64"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get the column with idx 1 (second col)\n",
    "df_quiz.iloc[:, 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1st row, last col\n",
    "df_quiz.iloc[0, -1]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    80\n",
       "student1    87\n",
       "student2    50\n",
       "student3    89\n",
       "Name: quiz0, dtype: int64"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# all rows, only quiz0\n",
    "df_quiz.loc[:, 'quiz0']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quiz0    80\n",
       "quiz1    90\n",
       "quiz2    50\n",
       "Name: student0, dtype: int64"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# slicing with named cols and rows\n",
    "# you can get a range, by name of row/col\n",
    "# note: this includes both start and stop columns (! unlike array / list)\n",
    "df_quiz.loc['student0', 'quiz0':'quiz2' ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quiz0    80\n",
       "quiz1    90\n",
       "Name: student0, dtype: int64"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# watch out:\n",
    "# when you get ranges indexed by position: include start idx, exclude stop idx)\n",
    "df_quiz.iloc[0, 0:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    80\n",
       "student1    87\n",
       "student2    50\n",
       "student3    89\n",
       "Name: quiz0, dtype: int64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if you access directly into dataframe, it will assume you're looking for a column\n",
    "# (below is equivilent to df_quiz.loc[:, 'quiz0'])\n",
    "# mild preference: avoid this\n",
    "df_quiz['quiz0']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### I've seen someone use `pd.DataFrame.ix` to index like above, what does that do?\n",
    "\n",
    "It was something of a hybrid between `.iloc` / `.loc` ... but it was weird to use.\n",
    "\n",
    "[Please don't use it.](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.ix.html)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Modifying a DataFrame\n",
    "- updating values: single cell\n",
    "- adding a new column or row\n",
    "    - good practice: use a `pd.Series` to add a new row / col\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87, 50, 89],\n",
    "            'quiz1': [90, 92, 24, 85],\n",
    "            'quiz2': [50, 80, 21, 40]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80    123     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# setting single entry in dataframe\n",
    "df_quiz.loc['student0', 'quiz1'] = 123\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80    123    456\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# setting multiple (contiguous) entries in dataframe\n",
    "df_quiz.loc['student0', 'quiz1': 'quiz2'] = 123, 456\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "      <th>overall grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2 overall grade\n",
       "student0     80    123    456             a\n",
       "student1     87     92     80             b\n",
       "student2     50     24     21             c\n",
       "student3     89     85     40             d"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# adding a new column (error prone handling of indexing ... which student got which grade?)\n",
    "df_quiz['overall grade'] = 'a', 'b' , 'c', 'd'\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "      <th>overall grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>b-</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2 overall grade\n",
       "student0     80    123    456           NaN\n",
       "student1     87     92     80            b-\n",
       "student2     50     24     21           NaN\n",
       "student3     89     85     40             c"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "123"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz.loc['student0', 'quiz1']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "123"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz.iloc[0, 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80    123    456\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# delete a column\n",
    "del df_quiz['overall grade']\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80    123    456\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student1                   b-\n",
       "student3                    c\n",
       "some student not in df    AAA\n",
       "dtype: object"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# adding a column (next 2 cells) robust way of handling indexing\n",
    "# by explicilty labelling the index we're sure to match more explicitly\n",
    "s_overgrade = pd.Series({'student1': 'b-',   \n",
    "                         'student3': 'c',\n",
    "                         'some student not in df': 'AAA'})\n",
    "s_overgrade\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "      <th>overall grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>b-</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2 overall grade\n",
       "student0     80    123    456           NaN\n",
       "student1     87     92     80            b-\n",
       "student2     50     24     21           NaN\n",
       "student3     89     85     40             c"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# notice how pandas helps us out in aligning our new column with proper row\n",
    "df_quiz.loc[: , 'overall grade'] = s_overgrade\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "      <th>overall grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>b-</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2 overall grade\n",
       "student1     87     92     80            b-\n",
       "student2     50     24     21           NaN\n",
       "student3     89     85     40             c"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# how to 'drop' a row (returns a dataframe with row removed)\n",
    "df_quiz_short = df_quiz.drop('student0')\n",
    "df_quiz_short\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "      <th>overall grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>123</td>\n",
       "      <td>456</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "      <td>b-</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz1  quiz2 overall grade\n",
       "student0    123    456           NaN\n",
       "student1     92     80            b-\n",
       "student2     24     21           NaN\n",
       "student3     85     40             c"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can drop a column too by specifying `axis=1`\n",
    "# (by default it uses axis=0 to drop rows)\n",
    "df_quiz.drop('quiz0', axis=1)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# If slicing fails ... just pass a list\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87, 50, 89],\n",
    "            'quiz1': [90, 92, 24, 85],\n",
    "            'quiz2': [50, 80, 21, 40]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>50</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>89</td>\n",
       "      <td>85</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student2     50     24     21\n",
       "student3     89     85     40"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# how to get an arbitrary set of rows\n",
    "df_quiz.loc[['student0', 'student2', 'student3'], :]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## In Class Activity B\n",
    "1. Build the following `df_grade`.  Be sure to include the row and column names:\n",
    "\n",
    "|        | StudentB | StudentA | StudentC |\n",
    "|-------:|----:|------:|------:|\n",
    "| Quiz 1 |  89 |   100 |    78 |\n",
    "| Quiz 2 |  75 |    90 |    90 |\n",
    "| Quiz 3 |  93 |    85 |    65 |\n",
    "| Quiz 4 |  92 |    92 |    76 |\n",
    "\n",
    "1. index into this dataframe to build a `df_grade_subset`:\n",
    "    - only includes rows studentB and studentC\n",
    "    - only includes Quiz 2, Quiz 3, Quiz 4\n",
    "1. Using the `df_grade_subset` from the step above:\n",
    "    * calculate mean scores of studentB and studentC from the selected quizes\n",
    "    * calculate mean score of each quiz \n",
    "        * (remember the `axis` parameter)\n",
    "        \n",
    "Operating on `df_grade`:\n",
    "1. Add a new column `'StudentD'` with grades `60, 70, 80, 90` for quizes 1, 2, 3, 4 respectively\n",
    "    * can you do this by adding a new `pd.Series` object (to be a bit more explicit)?\n",
    "1. Add a new row, `quiz5`, with any grades\n",
    "1. Delete StudentC's column\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "      <th>StudentC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89</td>\n",
       "      <td>100</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75</td>\n",
       "      <td>90</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93</td>\n",
       "      <td>85</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92</td>\n",
       "      <td>92</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA  StudentC\n",
       "Quiz1        89       100        78\n",
       "Quiz2        75        90        90\n",
       "Quiz3        93        85        65\n",
       "Quiz4        92        92        76"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "student_grade_dict = {'StudentB': [89, 75, 93, 92],\n",
    "                      'StudentA': [100, 90, 85, 92],\n",
    "                      'StudentC': [78, 90, 65, 76]}\n",
    "\n",
    "df_quiz = pd.DataFrame(student_grade_dict, index=('Quiz1', 'Quiz2', 'Quiz3', 'Quiz4'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentC\n",
       "Quiz2        75        90\n",
       "Quiz3        93        65\n",
       "Quiz4        92        76"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1. index into this dataframe to build a `df_grade_subset`:\n",
    "#     - only includes studentB and studentC\n",
    "#     - only includes Quiz 2, Quiz 3, Quiz 4\n",
    "df_quiz_subset = df_quiz.loc['Quiz2': , ('StudentB', 'StudentC')]\n",
    "df_quiz_subset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StudentB    86.666667\n",
       "StudentC    77.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 145,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1. Using the `df_grade_subset` from the step above:\n",
    "#     * calculate mean scores of studentB and studentC from the selected quizes\n",
    "#     * calculate mean score of each quiz \n",
    "#         * (remember the `axis` parameter)\n",
    "df_quiz_subset.mean(axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Quiz2    82.5\n",
       "Quiz3    79.0\n",
       "Quiz4    84.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 137,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz_subset.mean(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75.0</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93.0</td>\n",
       "      <td>85.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92.0</td>\n",
       "      <td>92.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz5</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA\n",
       "Quiz1      89.0     100.0\n",
       "Quiz2      75.0      90.0\n",
       "Quiz3      93.0      85.0\n",
       "Quiz4      92.0      92.0\n",
       "Quiz5       1.0       2.0"
      ]
     },
     "execution_count": 147,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "del df_quiz['StudentD']\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "      <th>StudentD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92.0</td>\n",
       "      <td>92.0</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz5</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA  StudentD\n",
       "Quiz1      89.0     100.0      60.0\n",
       "Quiz2      75.0      90.0      70.0\n",
       "Quiz3      93.0      85.0      80.0\n",
       "Quiz4      92.0      92.0      90.0\n",
       "Quiz5       1.0       2.0       NaN"
      ]
     },
     "execution_count": 148,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Operating on `df_grade`:\n",
    "# 1. Add a new column `'StudentD'` with grades `60, 70, 80, 90` for quizes 1, 2, 3, 4 respectively\n",
    "#     * can you do this by adding a new `pd.Series` object (to be a bit more explicit)?\n",
    "s_student_d = {'Quiz1': 60, 'Quiz2': 70, 'Quiz3': 80, 'Quiz4': 90}\n",
    "df_quiz.loc[:, 'StudentD'] = s_student_d\n",
    "\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "      <th>StudentC</th>\n",
       "      <th>StudentD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>78.0</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92.0</td>\n",
       "      <td>92.0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz5</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA  StudentC  StudentD\n",
       "Quiz1      89.0     100.0      78.0      60.0\n",
       "Quiz2      75.0      90.0      90.0      70.0\n",
       "Quiz3      93.0      85.0      65.0      80.0\n",
       "Quiz4      92.0      92.0      76.0      90.0\n",
       "Quiz5       1.0       2.0       3.0       4.0"
      ]
     },
     "execution_count": 141,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1. Add a new row, `quiz5`, with any grades (implicit ... not great but quick)\n",
    "df_quiz.loc['Quiz5', :] = 1, 2, 3, 4\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1. Delete StudentC's column\n",
    "del df_quiz['StudentC']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentB</th>\n",
       "      <th>StudentA</th>\n",
       "      <th>StudentD</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Quiz1</th>\n",
       "      <td>89.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz2</th>\n",
       "      <td>75.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz3</th>\n",
       "      <td>93.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz4</th>\n",
       "      <td>92.0</td>\n",
       "      <td>92.0</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Quiz5</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       StudentB  StudentA  StudentD\n",
       "Quiz1      89.0     100.0      60.0\n",
       "Quiz2      75.0      90.0      70.0\n",
       "Quiz3      93.0      85.0      80.0\n",
       "Quiz4      92.0      92.0      90.0\n",
       "Quiz5       1.0       2.0       4.0"
      ]
     },
     "execution_count": 144,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Operating on DataFrame & Series Objects\n",
    "\n",
    "Your operators do pretty much what you'd expect them to.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 149,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87],\n",
    "            'quiz1': [90, 92],\n",
    "            'quiz2': [50, 80]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1'))\n",
    "df_quiz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80000</td>\n",
       "      <td>90000</td>\n",
       "      <td>50000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87000</td>\n",
       "      <td>92000</td>\n",
       "      <td>80000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0  80000  90000  50000\n",
       "student1  87000  92000  80000"
      ]
     },
     "execution_count": 150,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz * 1000\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>1000000000079</td>\n",
       "      <td>1000000000089</td>\n",
       "      <td>1000000000049</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  quiz0          quiz1          quiz2\n",
       "student0  1000000000079  1000000000089  1000000000049\n",
       "student1             87             92             80"
      ]
     },
     "execution_count": 151,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# thats some extra credit ...\n",
    "df_quiz.loc['student0', :] += 999999999999\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>1000000000079</td>\n",
       "      <td>1000000000089</td>\n",
       "      <td>1000000000049</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  quiz0          quiz1          quiz2\n",
       "student0  1000000000079  1000000000089  1000000000049\n",
       "student1             87             92             80"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0   True   True   True\n",
       "student1  False  False  False"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we can also use comparison operators (super helpful, see boolean indexing next)\n",
    "df_quiz > 100\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Boolean Indexing into DataFrame\n",
    "\n",
    "Sometimes we want to grab only the rows or columns which meet a particular condition.\n",
    "\n",
    "\"Get all students whose grade was higher than 85 on quiz 1\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>60</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>30</td>\n",
       "      <td>23</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     60     60     70\n",
       "student3     30     23     64"
      ]
     },
     "execution_count": 152,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quiz_dict = {'quiz0': [80, 87, 60, 30],\n",
    "            'quiz1': [90, 92, 60, 23],\n",
    "            'quiz2': [50, 80, 70, 64]}\n",
    "df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 153,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    90\n",
       "student1    92\n",
       "student2    60\n",
       "student3    23\n",
       "Name: quiz1, dtype: int64"
      ]
     },
     "execution_count": 153,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# quiz 1 is a series object which contains every index's quiz 1 grade\n",
    "s_quiz1 = df_quiz.loc[:, 'quiz1']\n",
    "s_quiz1\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0     True\n",
       "student1     True\n",
       "student2    False\n",
       "student3    False\n",
       "Name: quiz1, dtype: bool"
      ]
     },
     "execution_count": 154,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we create a series of booleans which is True only in the positions we're interested in\n",
    "s_bool = s_quiz1 > 85\n",
    "s_bool\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>60</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>30</td>\n",
       "      <td>23</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     60     60     70\n",
       "student3     30     23     64"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# boolean indexing: using a boolean series as index returns only those entries which are True\n",
    "# notice that since student2 & student3's quiz1 grade wasn't > 80 they aren't included below\n",
    "df_quiz.loc[s_bool, :]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.3</td>\n",
       "      <td>20.6</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3650.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.2</td>\n",
       "      <td>19.6</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4675.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.6</td>\n",
       "      <td>21.2</td>\n",
       "      <td>191.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>34.6</td>\n",
       "      <td>21.1</td>\n",
       "      <td>198.0</td>\n",
       "      <td>4400.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.5</td>\n",
       "      <td>20.7</td>\n",
       "      <td>197.0</td>\n",
       "      <td>4500.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>46.0</td>\n",
       "      <td>21.5</td>\n",
       "      <td>194.0</td>\n",
       "      <td>4200.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.8</td>\n",
       "      <td>19.4</td>\n",
       "      <td>198.0</td>\n",
       "      <td>4450.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.7</td>\n",
       "      <td>18.4</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3900.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>45.8</td>\n",
       "      <td>18.9</td>\n",
       "      <td>197.0</td>\n",
       "      <td>4150.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.8</td>\n",
       "      <td>18.5</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4250.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>37.2</td>\n",
       "      <td>19.4</td>\n",
       "      <td>184.0</td>\n",
       "      <td>3900.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.1</td>\n",
       "      <td>19.1</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.9</td>\n",
       "      <td>17.6</td>\n",
       "      <td>196.0</td>\n",
       "      <td>4700.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>35.1</td>\n",
       "      <td>19.4</td>\n",
       "      <td>193.0</td>\n",
       "      <td>4200.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>37.3</td>\n",
       "      <td>20.5</td>\n",
       "      <td>199.0</td>\n",
       "      <td>3775.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>119</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.1</td>\n",
       "      <td>18.6</td>\n",
       "      <td>189.0</td>\n",
       "      <td>3325.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>37.7</td>\n",
       "      <td>19.8</td>\n",
       "      <td>198.0</td>\n",
       "      <td>3500.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.4</td>\n",
       "      <td>18.5</td>\n",
       "      <td>202.0</td>\n",
       "      <td>3875.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>125</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.6</td>\n",
       "      <td>19.0</td>\n",
       "      <td>199.0</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>41.5</td>\n",
       "      <td>18.3</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4300.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>44.1</td>\n",
       "      <td>18.0</td>\n",
       "      <td>210.0</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>43.1</td>\n",
       "      <td>19.2</td>\n",
       "      <td>197.0</td>\n",
       "      <td>3500.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0    Adelie  Torgersen            39.1           18.7              181.0   \n",
       "5    Adelie  Torgersen            39.3           20.6              190.0   \n",
       "7    Adelie  Torgersen            39.2           19.6              195.0   \n",
       "13   Adelie  Torgersen            38.6           21.2              191.0   \n",
       "14   Adelie  Torgersen            34.6           21.1              198.0   \n",
       "17   Adelie  Torgersen            42.5           20.7              197.0   \n",
       "19   Adelie  Torgersen            46.0           21.5              194.0   \n",
       "69   Adelie  Torgersen            41.8           19.4              198.0   \n",
       "71   Adelie  Torgersen            39.7           18.4              190.0   \n",
       "73   Adelie  Torgersen            45.8           18.9              197.0   \n",
       "75   Adelie  Torgersen            42.8           18.5              195.0   \n",
       "77   Adelie  Torgersen            37.2           19.4              184.0   \n",
       "79   Adelie  Torgersen            42.1           19.1              195.0   \n",
       "81   Adelie  Torgersen            42.9           17.6              196.0   \n",
       "83   Adelie  Torgersen            35.1           19.4              193.0   \n",
       "117  Adelie  Torgersen            37.3           20.5              199.0   \n",
       "119  Adelie  Torgersen            41.1           18.6              189.0   \n",
       "121  Adelie  Torgersen            37.7           19.8              198.0   \n",
       "123  Adelie  Torgersen            41.4           18.5              202.0   \n",
       "125  Adelie  Torgersen            40.6           19.0              199.0   \n",
       "127  Adelie  Torgersen            41.5           18.3              195.0   \n",
       "129  Adelie  Torgersen            44.1           18.0              210.0   \n",
       "131  Adelie  Torgersen            43.1           19.2              197.0   \n",
       "\n",
       "     body_mass_g   sex  \n",
       "0         3750.0  Male  \n",
       "5         3650.0  Male  \n",
       "7         4675.0  Male  \n",
       "13        3800.0  Male  \n",
       "14        4400.0  Male  \n",
       "17        4500.0  Male  \n",
       "19        4200.0  Male  \n",
       "69        4450.0  Male  \n",
       "71        3900.0  Male  \n",
       "73        4150.0  Male  \n",
       "75        4250.0  Male  \n",
       "77        3900.0  Male  \n",
       "79        4000.0  Male  \n",
       "81        4700.0  Male  \n",
       "83        4200.0  Male  \n",
       "117       3775.0  Male  \n",
       "119       3325.0  Male  \n",
       "121       3500.0  Male  \n",
       "123       3875.0  Male  \n",
       "125       4000.0  Male  \n",
       "127       4300.0  Male  \n",
       "129       4000.0  Male  \n",
       "131       3500.0  Male  "
      ]
     },
     "execution_count": 162,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_bool = (df_penguin['island'] == 'Torgersen') & (df_penguin['sex'] == 'Male')\n",
    "df_penguin.loc[s_bool, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student0</th>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>60</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>30</td>\n",
       "      <td>23</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student0     80     90     50\n",
       "student1     87     92     80\n",
       "student2     60     60     70\n",
       "student3     30     23     64"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    False\n",
       "student1    False\n",
       "student2     True\n",
       "student3     True\n",
       "Name: quiz1, dtype: bool"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# what are all the students who get below a 70 on quiz1?\n",
    "s_bool = df_quiz.loc[:, 'quiz1'] < 70\n",
    "s_bool\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student2</th>\n",
       "      <td>60</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student3</th>\n",
       "      <td>30</td>\n",
       "      <td>23</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student2     60     60     70\n",
       "student3     30     23     64"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz.loc[s_bool, :]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student0    False\n",
       "student1     True\n",
       "student2    False\n",
       "student3    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we can build more complex conditions using \n",
    "# & (and operator)\n",
    "# | (or operator)\n",
    "\n",
    "# all students who got higher than 91 on quiz1 but didn't score higher than 90 on quiz2\n",
    "s_bool = (df_quiz.loc[:, 'quiz1'] > 91) & (df_quiz.loc[:, 'quiz2']  <= 90)\n",
    "s_bool\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quiz0</th>\n",
       "      <th>quiz1</th>\n",
       "      <th>quiz2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student1</th>\n",
       "      <td>87</td>\n",
       "      <td>92</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          quiz0  quiz1  quiz2\n",
       "student1     87     92     80"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_quiz.loc[s_bool, :]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# One more thing, whats `pd.DataFrame.head()`?\n",
    "\n",
    "It grabs the \"head\" (the first few rows) of a dataframe.  DataFrames can be so big that its overwhelming to look at the whole thing, sometimes a few rows is all thats needed.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3  Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "\n",
       "   body_mass_g     sex  \n",
       "0       3750.0    Male  \n",
       "1       3800.0  Female  \n",
       "2       3250.0  Female  \n",
       "3          NaN     NaN  \n",
       "4       3450.0  Female  "
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_penguin = sns.load_dataset('penguins')\n",
    "df_penguin.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.3</td>\n",
       "      <td>20.6</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3650.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>38.9</td>\n",
       "      <td>17.8</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3625.0</td>\n",
       "      <td>Female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.2</td>\n",
       "      <td>19.6</td>\n",
       "      <td>195.0</td>\n",
       "      <td>4675.0</td>\n",
       "      <td>Male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>34.1</td>\n",
       "      <td>18.1</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3475.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>42.0</td>\n",
       "      <td>20.2</td>\n",
       "      <td>190.0</td>\n",
       "      <td>4250.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3  Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "5  Adelie  Torgersen            39.3           20.6              190.0   \n",
       "6  Adelie  Torgersen            38.9           17.8              181.0   \n",
       "7  Adelie  Torgersen            39.2           19.6              195.0   \n",
       "8  Adelie  Torgersen            34.1           18.1              193.0   \n",
       "9  Adelie  Torgersen            42.0           20.2              190.0   \n",
       "\n",
       "   body_mass_g     sex  \n",
       "0       3750.0    Male  \n",
       "1       3800.0  Female  \n",
       "2       3250.0  Female  \n",
       "3          NaN     NaN  \n",
       "4       3450.0  Female  \n",
       "5       3650.0    Male  \n",
       "6       3625.0  Female  \n",
       "7       4675.0    Male  \n",
       "8       3475.0     NaN  \n",
       "9       4250.0     NaN  "
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DataFrame.head() takes an argument, the number of top rows to return\n",
    "df_penguin.head(10)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## In Class Activity C\n",
    "The `pclass` of a titanic ticket describes the passenger class.  Its unclear if larger or smaller `pclass` are the fancy tickets.  See if you can answer this question by:\n",
    "\n",
    "- `.describe()` the `fare` paid by passengers who bought `pclass=3` tickets\n",
    "- `.describe()` the `fare` paid by passengers who bought `pclass=2` tickets\n",
    "- `.describe()` the `fare` paid by passengers who bought `pclass=1` tickets\n",
    "\n",
    "(++) You can use this boolean indexing to compare groups to answer all sorts of interesting questions:\n",
    "- Survival Effectiveness: Were people who travelled alone more or less likely to survive the titanic?\n",
    "- Demographics of towns: Which town, among Cherbourg, Queenstown or Southampton, seems to have the most families?\n",
    "- Layout of the boat: Does having a higher or lower cabin number suggest one is more likely to have a higher or lower ticket class?\n",
    "    - e.g. when `pclass=1` maybe these cabin numbers are all very large or small ...\n",
    "\n",
    "Data dictionary ([not the primary source, but a source](https://jkarakas.github.io/Exploratory-Analysis-of-the-Titanic-Dataset/Titanic_Dataset_Exploratory_Analysis_No_Code.html))\n",
    "\n",
    "| Variable | Definition                                 | Key                                           |\n",
    "|----------|--------------------------------------------|-----------------------------------------------|\n",
    "| Survived | Survival                                   | 0 = No, 1 = Yes                               |\n",
    "| Pclass   | Ticket class                               | 1 = 1st, 2 = 2nd, 3 = 3rd                     |\n",
    "| Sex      | Sex                                        |                                               |\n",
    "| Age      | Age in years                               |                                               |\n",
    "| Sibsp    | # of siblings / spouses aboard the Titanic |                                               |\n",
    "| Parch    | # of parents / children aboard the Titanic |                                               |\n",
    "| Ticket   | Ticket number                              |                                               |\n",
    "| Fare     | Passenger fare                             |                                               |\n",
    "| Cabin    | Cabin number                               |                                               |\n",
    "| Embarked | Port of Embarkation                        | C = Cherbourg, Q = Queenstown,S = Southampton |\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>First</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>C</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \\\n",
       "0         0       3    male  22.0      1      0   7.2500        S  Third   \n",
       "1         1       1  female  38.0      1      0  71.2833        C  First   \n",
       "2         1       3  female  26.0      0      0   7.9250        S  Third   \n",
       "3         1       1  female  35.0      1      0  53.1000        S  First   \n",
       "4         0       3    male  35.0      0      0   8.0500        S  Third   \n",
       "\n",
       "     who  adult_male deck  embark_town alive  alone  \n",
       "0    man        True  NaN  Southampton    no  False  \n",
       "1  woman       False    C    Cherbourg   yes  False  \n",
       "2  woman       False  NaN  Southampton   yes   True  \n",
       "3  woman       False    C  Southampton   yes  False  \n",
       "4    man        True  NaN  Southampton    no   True  "
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_titanic = sns.load_dataset('titanic')\n",
    "df_titanic.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>491.000000</td>\n",
       "      <td>491.0</td>\n",
       "      <td>355.000000</td>\n",
       "      <td>491.000000</td>\n",
       "      <td>491.000000</td>\n",
       "      <td>491.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>0.242363</td>\n",
       "      <td>3.0</td>\n",
       "      <td>25.140620</td>\n",
       "      <td>0.615071</td>\n",
       "      <td>0.393075</td>\n",
       "      <td>38.801976</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>0.428949</td>\n",
       "      <td>0.0</td>\n",
       "      <td>12.495398</td>\n",
       "      <td>1.374883</td>\n",
       "      <td>0.888861</td>\n",
       "      <td>556.628917</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0.420000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.0</td>\n",
       "      <td>18.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>7.750000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.0</td>\n",
       "      <td>24.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>8.050000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.0</td>\n",
       "      <td>32.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>15.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>3.0</td>\n",
       "      <td>74.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>12345.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         survived  pclass         age       sibsp       parch          fare\n",
       "count  491.000000   491.0  355.000000  491.000000  491.000000    491.000000\n",
       "mean     0.242363     3.0   25.140620    0.615071    0.393075     38.801976\n",
       "std      0.428949     0.0   12.495398    1.374883    0.888861    556.628917\n",
       "min      0.000000     3.0    0.420000    0.000000    0.000000      0.000000\n",
       "25%      0.000000     3.0   18.000000    0.000000    0.000000      7.750000\n",
       "50%      0.000000     3.0   24.000000    0.000000    0.000000      8.050000\n",
       "75%      0.000000     3.0   32.000000    1.000000    0.000000     15.500000\n",
       "max      1.000000     3.0   74.000000    8.000000    6.000000  12345.000000"
      ]
     },
     "execution_count": 180,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s_bool = df_titanic.loc[:, 'pclass'] == 3\n",
    "df_titanic.loc[s_bool, :].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>survived</th>\n",
       "      <th>pclass</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>fare</th>\n",
       "      <th>embarked</th>\n",
       "      <th>class</th>\n",
       "      <th>who</th>\n",
       "      <th>adult_male</th>\n",
       "      <th>deck</th>\n",
       "      <th>embark_town</th>\n",
       "      <th>alive</th>\n",
       "      <th>alone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12345.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>yes</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.4583</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>21.0750</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>child</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>882</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>22.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10.5167</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>884</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>25.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>885</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>39.0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>29.1250</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>23.4500</td>\n",
       "      <td>S</td>\n",
       "      <td>Third</td>\n",
       "      <td>woman</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>no</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>32.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.7500</td>\n",
       "      <td>Q</td>\n",
       "      <td>Third</td>\n",
       "      <td>man</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Queenstown</td>\n",
       "      <td>no</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>491 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     survived  pclass     sex   age  sibsp  parch        fare embarked  class  \\\n",
       "0           0       3    male  22.0      1      0      7.2500        S  Third   \n",
       "2           1       3  female  26.0      0      0  12345.0000        S  Third   \n",
       "4           0       3    male  35.0      0      0      8.0500        S  Third   \n",
       "5           0       3    male   NaN      0      0      8.4583        Q  Third   \n",
       "7           0       3    male   2.0      3      1     21.0750        S  Third   \n",
       "..        ...     ...     ...   ...    ...    ...         ...      ...    ...   \n",
       "882         0       3  female  22.0      0      0     10.5167        S  Third   \n",
       "884         0       3    male  25.0      0      0      7.0500        S  Third   \n",
       "885         0       3  female  39.0      0      5     29.1250        Q  Third   \n",
       "888         0       3  female   NaN      1      2     23.4500        S  Third   \n",
       "890         0       3    male  32.0      0      0      7.7500        Q  Third   \n",
       "\n",
       "       who  adult_male deck  embark_town alive  alone  \n",
       "0      man        True  NaN  Southampton    no  False  \n",
       "2    woman       False  NaN  Southampton   yes   True  \n",
       "4      man        True  NaN  Southampton    no   True  \n",
       "5      man        True  NaN   Queenstown    no   True  \n",
       "7    child       False  NaN  Southampton    no  False  \n",
       "..     ...         ...  ...          ...   ...    ...  \n",
       "882  woman       False  NaN  Southampton    no   True  \n",
       "884    man        True  NaN  Southampton    no   True  \n",
       "885  woman       False  NaN   Queenstown    no  False  \n",
       "888  woman       False  NaN  Southampton    no  False  \n",
       "890    man        True  NaN   Queenstown    no   True  \n",
       "\n",
       "[491 rows x 15 columns]"
      ]
     },
     "execution_count": 164,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# extract only rows corresponding to pclass = 3\n",
    "s_bool = df_titanic.loc[:, 'pclass'] == 3\n",
    "df_titanic.loc[s_bool, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([3, 1, 2])"
      ]
     },
     "execution_count": 165,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_titanic['pclass'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 181,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pclass = 1 has:\n",
      "         survived  pclass         age       sibsp       parch        fare\n",
      "count  216.000000   216.0  186.000000  216.000000  216.000000  216.000000\n",
      "mean     0.629630     1.0   38.233441    0.416667    0.356481   84.154687\n",
      "std      0.484026     0.0   14.802856    0.611898    0.693997   78.380373\n",
      "min      0.000000     1.0    0.920000    0.000000    0.000000    0.000000\n",
      "25%      0.000000     1.0   27.000000    0.000000    0.000000   30.923950\n",
      "50%      1.000000     1.0   37.000000    0.000000    0.000000   60.287500\n",
      "75%      1.000000     1.0   49.000000    1.000000    0.000000   93.500000\n",
      "max      1.000000     1.0   80.000000    3.000000    4.000000  512.329200\n",
      "pclass = 2 has:\n",
      "         survived  pclass         age       sibsp       parch        fare\n",
      "count  184.000000   184.0  173.000000  184.000000  184.000000  184.000000\n",
      "mean     0.472826     2.0   29.877630    0.402174    0.380435   20.662183\n",
      "std      0.500623     0.0   14.001077    0.601633    0.690963   13.417399\n",
      "min      0.000000     2.0    0.670000    0.000000    0.000000    0.000000\n",
      "25%      0.000000     2.0   23.000000    0.000000    0.000000   13.000000\n",
      "50%      0.000000     2.0   29.000000    0.000000    0.000000   14.250000\n",
      "75%      1.000000     2.0   36.000000    1.000000    1.000000   26.000000\n",
      "max      1.000000     2.0   70.000000    3.000000    3.000000   73.500000\n",
      "pclass = 3 has:\n",
      "         survived  pclass         age       sibsp       parch          fare\n",
      "count  491.000000   491.0  355.000000  491.000000  491.000000    491.000000\n",
      "mean     0.242363     3.0   25.140620    0.615071    0.393075     38.801976\n",
      "std      0.428949     0.0   12.495398    1.374883    0.888861    556.628917\n",
      "min      0.000000     3.0    0.420000    0.000000    0.000000      0.000000\n",
      "25%      0.000000     3.0   18.000000    0.000000    0.000000      7.750000\n",
      "50%      0.000000     3.0   24.000000    0.000000    0.000000      8.050000\n",
      "75%      0.000000     3.0   32.000000    1.000000    0.000000     15.500000\n",
      "max      1.000000     3.0   74.000000    8.000000    6.000000  12345.000000\n"
     ]
    }
   ],
   "source": [
    "for pclass in sorted(df_titanic['pclass'].unique()):\n",
    "    # extract only rows coresponding to a particular value of feature\n",
    "    s_bool = df_titanic['pclass'] == pclass\n",
    "    df_titanic_subset = df_titanic.loc[s_bool, :]\n",
    "    \n",
    "    print(f'pclass = {pclass} has:')\n",
    "    print(df_titanic_subset.describe())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 183,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "age = 0.83 has:\n",
      "       survived  pclass   age     sibsp     parch       fare\n",
      "count       2.0     2.0  2.00  2.000000  2.000000   2.000000\n",
      "mean        1.0     2.0  0.83  0.500000  1.500000  23.875000\n",
      "std         0.0     0.0  0.00  0.707107  0.707107   7.247845\n",
      "min         1.0     2.0  0.83  0.000000  1.000000  18.750000\n",
      "25%         1.0     2.0  0.83  0.250000  1.250000  21.312500\n",
      "50%         1.0     2.0  0.83  0.500000  1.500000  23.875000\n",
      "75%         1.0     2.0  0.83  0.750000  1.750000  26.437500\n",
      "max         1.0     2.0  0.83  1.000000  2.000000  29.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 2.0 has:\n",
      "        survived     pclass   age     sibsp      parch        fare\n",
      "count  10.000000  10.000000  10.0  10.00000  10.000000   10.000000\n",
      "mean    0.300000   2.600000   2.0   2.10000   1.300000   37.536250\n",
      "std     0.483046   0.699206   0.0   1.66333   0.483046   40.979945\n",
      "min     0.000000   1.000000   2.0   0.00000   1.000000   10.462500\n",
      "25%     0.000000   2.250000   2.0   1.00000   1.000000   22.306250\n",
      "50%     0.000000   3.000000   2.0   2.00000   1.000000   26.950000\n",
      "75%     0.750000   3.000000   2.0   3.75000   1.750000   30.737500\n",
      "max     1.000000   3.000000   2.0   4.00000   2.000000  151.550000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 3.0 has:\n",
      "       survived    pclass  age     sibsp     parch       fare\n",
      "count  6.000000  6.000000  6.0  6.000000  6.000000   6.000000\n",
      "mean   0.833333  2.500000  3.0  1.833333  1.333333  25.781950\n",
      "std    0.408248  0.547723  0.0  1.329160  0.516398   9.489778\n",
      "min    0.000000  2.000000  3.0  1.000000  1.000000  15.900000\n",
      "25%    1.000000  2.000000  3.0  1.000000  1.000000  19.331250\n",
      "50%    1.000000  2.500000  3.0  1.000000  1.000000  23.537500\n",
      "75%    1.000000  3.000000  3.0  2.500000  1.750000  30.040625\n",
      "max    1.000000  3.000000  3.0  4.000000  2.000000  41.579200\n",
      "--------------------------------------------------------------------------------\n",
      "age = 4.0 has:\n",
      "        survived     pclass   age      sibsp      parch       fare\n",
      "count  10.000000  10.000000  10.0  10.000000  10.000000  10.000000\n",
      "mean    0.700000   2.600000   4.0   1.600000   1.400000  29.543330\n",
      "std     0.483046   0.699206   0.0   1.577621   0.516398  20.263399\n",
      "min     0.000000   1.000000   4.0   0.000000   1.000000  11.133300\n",
      "25%     0.250000   2.250000   4.0   0.250000   1.000000  18.031250\n",
      "50%     1.000000   3.000000   4.0   1.000000   1.000000  25.450000\n",
      "75%     1.000000   3.000000   4.0   2.750000   2.000000  30.737500\n",
      "max     1.000000   3.000000   4.0   4.000000   2.000000  81.858300\n",
      "--------------------------------------------------------------------------------\n",
      "age = 5.0 has:\n",
      "       survived  pclass  age     sibsp     parch       fare\n",
      "count       4.0    4.00  4.0  4.000000  4.000000   4.000000\n",
      "mean        1.0    2.75  5.0  1.750000  1.250000  22.717700\n",
      "std         0.0    0.50  0.0  1.707825  0.957427   8.512145\n",
      "min         1.0    2.00  5.0  0.000000  0.000000  12.475000\n",
      "25%         1.0    2.75  5.0  0.750000  0.750000  17.562475\n",
      "50%         1.0    3.00  5.0  1.500000  1.500000  23.504150\n",
      "75%         1.0    3.00  5.0  2.500000  2.000000  28.659375\n",
      "max         1.0    3.00  5.0  4.000000  2.000000  31.387500\n",
      "--------------------------------------------------------------------------------\n",
      "age = 7.0 has:\n",
      "       survived    pclass  age     sibsp     parch       fare\n",
      "count  3.000000  3.000000  3.0  3.000000  3.000000   3.000000\n",
      "mean   0.333333  2.666667  7.0  2.666667  1.333333  31.687500\n",
      "std    0.577350  0.577350  0.0  2.309401  0.577350   7.075762\n",
      "min    0.000000  2.000000  7.0  0.000000  1.000000  26.250000\n",
      "25%    0.000000  2.500000  7.0  2.000000  1.000000  27.687500\n",
      "50%    0.000000  3.000000  7.0  4.000000  1.000000  29.125000\n",
      "75%    0.500000  3.000000  7.0  4.000000  1.500000  34.406250\n",
      "max    1.000000  3.000000  7.0  4.000000  2.000000  39.687500\n",
      "--------------------------------------------------------------------------------\n",
      "age = 8.0 has:\n",
      "       survived   pclass  age     sibsp  parch       fare\n",
      "count   4.00000  4.00000  4.0  4.000000   4.00   4.000000\n",
      "mean    0.50000  2.50000  8.0  2.000000   1.25  28.300000\n",
      "std     0.57735  0.57735  0.0  1.825742   0.50   6.544368\n",
      "min     0.00000  2.00000  8.0  0.000000   1.00  21.075000\n",
      "25%     0.00000  2.00000  8.0  0.750000   1.00  24.956250\n",
      "50%     0.50000  2.50000  8.0  2.000000   1.00  27.687500\n",
      "75%     1.00000  3.00000  8.0  3.250000   1.25  31.031250\n",
      "max     1.00000  3.00000  8.0  4.000000   2.00  36.750000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 11.0 has:\n",
      "       survived  pclass   age     sibsp  parch        fare\n",
      "count      4.00     4.0   4.0  4.000000    4.0    4.000000\n",
      "mean       0.25     2.5  11.0  2.500000    1.5   54.240625\n",
      "std        0.50     1.0   0.0  2.380476    1.0   45.323004\n",
      "min        0.00     1.0  11.0  0.000000    0.0   18.787500\n",
      "25%        0.00     2.5  11.0  0.750000    1.5   28.153125\n",
      "50%        0.00     3.0  11.0  2.500000    2.0   39.087500\n",
      "75%        0.25     3.0  11.0  4.250000    2.0   65.175000\n",
      "max        1.00     3.0  11.0  5.000000    2.0  120.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 14.0 has:\n",
      "       survived   pclass   age  sibsp     parch        fare\n",
      "count  6.000000  6.00000   6.0   6.00  6.000000    6.000000\n",
      "mean   0.500000  2.50000  14.0   2.00  0.833333   42.625700\n",
      "std    0.547723  0.83666   0.0   2.00  0.983192   40.903113\n",
      "min    0.000000  1.00000  14.0   0.00  0.000000    7.854200\n",
      "25%    0.000000  2.25000  14.0   1.00  0.000000   15.948975\n",
      "50%    0.500000  3.00000  14.0   1.00  0.500000   34.879150\n",
      "75%    1.000000  3.00000  14.0   3.25  1.750000   45.096875\n",
      "max    1.000000  3.00000  14.0   5.00  2.000000  120.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 15.0 has:\n",
      "       survived    pclass   age     sibsp     parch        fare\n",
      "count  5.000000  5.000000   5.0  5.000000  5.000000    5.000000\n",
      "mean   0.800000  2.600000  15.0  0.400000  0.400000   49.655020\n",
      "std    0.447214  0.894427   0.0  0.547723  0.547723   90.434075\n",
      "min    0.000000  1.000000  15.0  0.000000  0.000000    7.225000\n",
      "25%    1.000000  3.000000  15.0  0.000000  0.000000    7.229200\n",
      "50%    1.000000  3.000000  15.0  0.000000  0.000000    8.029200\n",
      "75%    1.000000  3.000000  15.0  1.000000  1.000000   14.454200\n",
      "max    1.000000  3.000000  15.0  1.000000  1.000000  211.337500\n",
      "--------------------------------------------------------------------------------\n",
      "age = 16.0 has:\n",
      "        survived     pclass   age      sibsp      parch       fare\n",
      "count  17.000000  17.000000  17.0  17.000000  17.000000  17.000000\n",
      "mean    0.352941   2.529412  16.0   0.764706   0.529412  25.745100\n",
      "std     0.492592   0.799816   0.0   1.521899   0.874475  22.486392\n",
      "min     0.000000   1.000000  16.0   0.000000   0.000000   7.733300\n",
      "25%     0.000000   2.000000  16.0   0.000000   0.000000   8.050000\n",
      "50%     0.000000   3.000000  16.0   0.000000   0.000000  18.000000\n",
      "75%     1.000000   3.000000  16.0   1.000000   1.000000  39.400000\n",
      "max     1.000000   3.000000  16.0   5.000000   3.000000  86.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 17.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  13.000000  13.000000  13.0  13.000000  13.000000   13.000000\n",
      "mean    0.461538   2.384615  17.0   0.615385   0.384615   28.389423\n",
      "std     0.518875   0.869718   0.0   1.120897   0.767948   38.546345\n",
      "min     0.000000   1.000000  17.0   0.000000   0.000000    7.054200\n",
      "25%     0.000000   2.000000  17.0   0.000000   0.000000    7.925000\n",
      "50%     0.000000   3.000000  17.0   0.000000   0.000000    8.662500\n",
      "75%     1.000000   3.000000  17.0   1.000000   0.000000   14.458300\n",
      "max     1.000000   3.000000  17.0   4.000000   2.000000  110.883300\n",
      "--------------------------------------------------------------------------------\n",
      "age = 18.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  26.000000  26.000000  26.0  26.000000  26.000000   26.000000\n",
      "mean    0.346154   2.461538  18.0   0.384615   0.423077   38.063462\n",
      "std     0.485165   0.760567   0.0   0.637302   0.702742   66.241829\n",
      "min     0.000000   1.000000  18.0   0.000000   0.000000    6.495800\n",
      "25%     0.000000   2.000000  18.0   0.000000   0.000000    7.810400\n",
      "50%     0.000000   3.000000  18.0   0.000000   0.000000   11.500000\n",
      "75%     1.000000   3.000000  18.0   1.000000   1.000000   19.659375\n",
      "max     1.000000   3.000000  18.0   2.000000   2.000000  262.375000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 19.0 has:\n",
      "        survived    pclass   age      sibsp     parch        fare\n",
      "count  25.000000  25.00000  25.0  25.000000  25.00000   25.000000\n",
      "mean    0.360000   2.36000  19.0   0.320000   0.20000   27.869496\n",
      "std     0.489898   0.81035   0.0   0.690411   0.57735   52.652311\n",
      "min     0.000000   1.00000  19.0   0.000000   0.00000    0.000000\n",
      "25%     0.000000   2.00000  19.0   0.000000   0.00000    7.895800\n",
      "50%     0.000000   3.00000  19.0   0.000000   0.00000   10.170800\n",
      "75%     1.000000   3.00000  19.0   0.000000   0.00000   26.000000\n",
      "max     1.000000   3.00000  19.0   3.000000   2.00000  263.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 20.0 has:\n",
      "        survived  pclass   age      sibsp      parch       fare\n",
      "count  15.000000    15.0  15.0  15.000000  15.000000  15.000000\n",
      "mean    0.200000     3.0  20.0   0.200000   0.066667   8.624173\n",
      "std     0.414039     0.0   0.0   0.414039   0.258199   2.433533\n",
      "min     0.000000     3.0  20.0   0.000000   0.000000   4.012500\n",
      "25%     0.000000     3.0  20.0   0.000000   0.000000   7.854200\n",
      "50%     0.000000     3.0  20.0   0.000000   0.000000   8.050000\n",
      "75%     0.000000     3.0  20.0   0.000000   0.000000   9.362500\n",
      "max     1.000000     3.0  20.0   1.000000   1.000000  15.741700\n",
      "--------------------------------------------------------------------------------\n",
      "age = 21.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  24.000000  24.000000  24.0  24.000000  24.000000   24.000000\n",
      "mean    0.208333   2.583333  21.0   0.333333   0.208333   31.565621\n",
      "std     0.414851   0.717282   0.0   0.701964   0.588230   55.340305\n",
      "min     0.000000   1.000000  21.0   0.000000   0.000000    7.250000\n",
      "25%     0.000000   2.000000  21.0   0.000000   0.000000    7.798950\n",
      "50%     0.000000   3.000000  21.0   0.000000   0.000000    8.241650\n",
      "75%     0.000000   3.000000  21.0   0.000000   0.000000   20.668750\n",
      "max     1.000000   3.000000  21.0   2.000000   2.000000  262.375000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 22.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  27.000000  27.000000  27.0  27.000000  27.000000   27.000000\n",
      "mean    0.407407   2.555556  22.0   0.148148   0.222222   25.504781\n",
      "std     0.500712   0.800641   0.0   0.362014   0.577350   38.015474\n",
      "min     0.000000   1.000000  22.0   0.000000   0.000000    7.125000\n",
      "25%     0.000000   2.500000  22.0   0.000000   0.000000    7.385400\n",
      "50%     0.000000   3.000000  22.0   0.000000   0.000000    7.895800\n",
      "75%     1.000000   3.000000  22.0   0.000000   0.000000   19.758350\n",
      "max     1.000000   3.000000  22.0   1.000000   2.000000  151.550000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 23.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  15.000000  15.000000  15.0  15.000000  15.000000   15.000000\n",
      "mean    0.333333   2.133333  23.0   0.400000   0.266667   37.994720\n",
      "std     0.487950   0.743223   0.0   0.910259   0.593617   68.585477\n",
      "min     0.000000   1.000000  23.0   0.000000   0.000000    7.550000\n",
      "25%     0.000000   2.000000  23.0   0.000000   0.000000    8.575000\n",
      "50%     0.000000   2.000000  23.0   0.000000   0.000000   13.000000\n",
      "75%     1.000000   3.000000  23.0   0.000000   0.000000   14.418750\n",
      "max     1.000000   3.000000  23.0   3.000000   2.000000  263.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 24.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  30.000000  30.000000  30.0  30.000000  30.000000   30.000000\n",
      "mean    0.500000   2.200000  24.0   0.500000   0.533333   43.035690\n",
      "std     0.508548   0.805156   0.0   0.861034   0.973204   62.858665\n",
      "min     0.000000   1.000000  24.0   0.000000   0.000000    7.050000\n",
      "25%     0.000000   2.000000  24.0   0.000000   0.000000    9.750000\n",
      "50%     0.500000   2.000000  24.0   0.000000   0.000000   16.400000\n",
      "75%     1.000000   3.000000  24.0   1.000000   0.750000   61.126050\n",
      "max     1.000000   3.000000  24.0   3.000000   3.000000  263.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 25.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  23.000000  23.000000  23.0  23.000000  23.000000   23.000000\n",
      "mean    0.260870   2.434783  25.0   0.434783   0.260870   24.415765\n",
      "std     0.448978   0.727767   0.0   0.506870   0.619192   34.416843\n",
      "min     0.000000   1.000000  25.0   0.000000   0.000000    0.000000\n",
      "25%     0.000000   2.000000  25.0   0.000000   0.000000    7.695850\n",
      "50%     0.000000   3.000000  25.0   0.000000   0.000000    7.925000\n",
      "75%     0.500000   3.000000  25.0   1.000000   0.000000   26.000000\n",
      "max     1.000000   3.000000  25.0   1.000000   2.000000  151.550000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 26.0 has:\n",
      "        survived     pclass   age      sibsp      parch          fare\n",
      "count  18.000000  18.000000  18.0  18.000000  18.000000     18.000000\n",
      "mean    0.333333   2.666667  26.0   0.388889   0.166667    704.479861\n",
      "std     0.485071   0.685994   0.0   0.607685   0.514496   2905.153854\n",
      "min     0.000000   1.000000  26.0   0.000000   0.000000      7.775000\n",
      "25%     0.000000   3.000000  26.0   0.000000   0.000000      7.895800\n",
      "50%     0.000000   3.000000  26.0   0.000000   0.000000     12.477100\n",
      "75%     1.000000   3.000000  26.0   1.000000   0.000000     24.643750\n",
      "max     1.000000   3.000000  26.0   2.000000   2.000000  12345.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 27.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  18.000000  18.000000  18.0  18.000000  18.000000   18.000000\n",
      "mean    0.611111   2.222222  27.0   0.222222   0.277778   30.361339\n",
      "std     0.501631   0.808452   0.0   0.427793   0.669113   48.708195\n",
      "min     0.000000   1.000000  27.0   0.000000   0.000000    6.975000\n",
      "25%     0.000000   2.000000  27.0   0.000000   0.000000    9.121875\n",
      "50%     1.000000   2.000000  27.0   0.000000   0.000000   13.000000\n",
      "75%     1.000000   3.000000  27.0   0.000000   0.000000   24.750000\n",
      "max     1.000000   3.000000  27.0   1.000000   2.000000  211.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 28.0 has:\n",
      "        survived     pclass   age      sibsp      parch       fare\n",
      "count  25.000000  25.000000  25.0  25.000000  25.000000  25.000000\n",
      "mean    0.280000   2.320000  28.0   0.280000   0.080000  21.020160\n",
      "std     0.458258   0.748331   0.0   0.541603   0.276887  18.143502\n",
      "min     0.000000   1.000000  28.0   0.000000   0.000000   7.795800\n",
      "25%     0.000000   2.000000  28.0   0.000000   0.000000   9.500000\n",
      "50%     0.000000   2.000000  28.0   0.000000   0.000000  13.000000\n",
      "75%     1.000000   3.000000  28.0   0.000000   0.000000  26.000000\n",
      "max     1.000000   3.000000  28.0   2.000000   1.000000  82.170800\n",
      "--------------------------------------------------------------------------------\n",
      "age = 28.5 has:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       survived  pclass   age  sibsp  parch       fare\n",
      "count       2.0     2.0   2.0    2.0    2.0   2.000000\n",
      "mean        0.0     3.0  28.5    0.0    0.0  11.664600\n",
      "std         0.0     0.0   0.0    0.0    0.0   6.272603\n",
      "min         0.0     3.0  28.5    0.0    0.0   7.229200\n",
      "25%         0.0     3.0  28.5    0.0    0.0   9.446900\n",
      "50%         0.0     3.0  28.5    0.0    0.0  11.664600\n",
      "75%         0.0     3.0  28.5    0.0    0.0  13.882300\n",
      "max         0.0     3.0  28.5    0.0    0.0  16.100000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 29.0 has:\n",
      "        survived     pclass   age     sibsp      parch        fare\n",
      "count  20.000000  20.000000  20.0  20.00000  20.000000   20.000000\n",
      "mean    0.400000   2.400000  29.0   0.35000   0.350000   27.090825\n",
      "std     0.502625   0.753937   0.0   0.48936   0.988087   45.554098\n",
      "min     0.000000   1.000000  29.0   0.00000   0.000000    7.045800\n",
      "25%     0.000000   2.000000  29.0   0.00000   0.000000    8.011450\n",
      "50%     0.000000   3.000000  29.0   0.00000   0.000000   10.500000\n",
      "75%     1.000000   3.000000  29.0   1.00000   0.000000   26.000000\n",
      "max     1.000000   3.000000  29.0   1.00000   4.000000  211.337500\n",
      "--------------------------------------------------------------------------------\n",
      "age = 30.0 has:\n",
      "       survived     pclass   age      sibsp  parch        fare\n",
      "count      25.0  25.000000  25.0  25.000000  25.00   25.000000\n",
      "mean        0.4   2.200000  30.0   0.240000   0.04   25.541668\n",
      "std         0.5   0.816497   0.0   0.663325   0.20   28.636697\n",
      "min         0.0   1.000000  30.0   0.000000   0.00    7.225000\n",
      "25%         0.0   2.000000  30.0   0.000000   0.00    8.662500\n",
      "50%         0.0   2.000000  30.0   0.000000   0.00   13.000000\n",
      "75%         1.0   3.000000  30.0   0.000000   0.00   24.150000\n",
      "max         1.0   3.000000  30.0   3.000000   1.00  106.425000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 31.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  17.000000  17.000000  17.0  17.000000  17.000000   17.000000\n",
      "mean    0.470588   2.117647  31.0   0.470588   0.352941   37.009071\n",
      "std     0.514496   0.857493   0.0   0.514496   0.606339   42.809926\n",
      "min     0.000000   1.000000  31.0   0.000000   0.000000    7.750000\n",
      "25%     0.000000   1.000000  31.0   0.000000   0.000000    8.683300\n",
      "50%     0.000000   2.000000  31.0   0.000000   0.000000   20.525000\n",
      "75%     1.000000   3.000000  31.0   1.000000   1.000000   50.495800\n",
      "max     1.000000   3.000000  31.0   1.000000   2.000000  164.866700\n",
      "--------------------------------------------------------------------------------\n",
      "age = 32.0 has:\n",
      "        survived     pclass   age      sibsp      parch       fare\n",
      "count  18.000000  18.000000  18.0  18.000000  18.000000  18.000000\n",
      "mean    0.500000   2.555556  32.0   0.277778   0.055556  24.323378\n",
      "std     0.514496   0.704792   0.0   0.574513   0.235702  24.060172\n",
      "min     0.000000   1.000000  32.0   0.000000   0.000000   7.750000\n",
      "25%     0.000000   2.000000  32.0   0.000000   0.000000   7.925000\n",
      "50%     0.500000   3.000000  32.0   0.000000   0.000000  11.750000\n",
      "75%     1.000000   3.000000  32.0   0.000000   0.000000  29.375000\n",
      "max     1.000000   3.000000  32.0   2.000000   1.000000  76.291700\n",
      "--------------------------------------------------------------------------------\n",
      "age = 33.0 has:\n",
      "        survived     pclass   age      sibsp      parch       fare\n",
      "count  15.000000  15.000000  15.0  15.000000  15.000000  15.000000\n",
      "mean    0.400000   2.266667  33.0   0.466667   0.333333  25.825553\n",
      "std     0.507093   0.883715   0.0   0.833809   0.723747  28.179311\n",
      "min     0.000000   1.000000  33.0   0.000000   0.000000   5.000000\n",
      "25%     0.000000   1.500000  33.0   0.000000   0.000000   8.275000\n",
      "50%     0.000000   3.000000  33.0   0.000000   0.000000  12.275000\n",
      "75%     1.000000   3.000000  33.0   1.000000   0.000000  26.875000\n",
      "max     1.000000   3.000000  33.0   3.000000   2.000000  90.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 34.0 has:\n",
      "        survived     pclass   age      sibsp      parch       fare\n",
      "count  15.000000  15.000000  15.0  15.000000  15.000000  15.000000\n",
      "mean    0.400000   2.200000  34.0   0.333333   0.200000  16.636387\n",
      "std     0.507093   0.560612   0.0   0.487950   0.414039   7.846849\n",
      "min     0.000000   1.000000  34.0   0.000000   0.000000   6.495800\n",
      "25%     0.000000   2.000000  34.0   0.000000   0.000000  11.750000\n",
      "50%     0.000000   2.000000  34.0   0.000000   0.000000  13.000000\n",
      "75%     1.000000   2.500000  34.0   1.000000   0.000000  22.000000\n",
      "max     1.000000   3.000000  34.0   1.000000   1.000000  32.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 35.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  18.000000  18.000000  18.0  18.000000  18.000000   18.000000\n",
      "mean    0.611111   1.833333  35.0   0.277778   0.055556   89.312500\n",
      "std     0.501631   0.923548   0.0   0.460889   0.235702  157.870974\n",
      "min     0.000000   1.000000  35.0   0.000000   0.000000    7.050000\n",
      "25%     0.000000   1.000000  35.0   0.000000   0.000000    8.662500\n",
      "50%     1.000000   1.500000  35.0   0.000000   0.000000   26.143750\n",
      "75%     1.000000   3.000000  35.0   0.750000   0.000000   75.881250\n",
      "max     1.000000   3.000000  35.0   1.000000   1.000000  512.329200\n",
      "--------------------------------------------------------------------------------\n",
      "age = 38.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  11.000000  11.000000  11.0  11.000000  11.000000   11.000000\n",
      "mean    0.454545   1.818182  38.0   0.272727   0.545455   62.751509\n",
      "std     0.522233   0.981650   0.0   0.467099   1.507557   72.750026\n",
      "min     0.000000   1.000000  38.0   0.000000   0.000000    0.000000\n",
      "25%     0.000000   1.000000  38.0   0.000000   0.000000    8.279150\n",
      "50%     0.000000   1.000000  38.0   0.000000   0.000000   31.387500\n",
      "75%     1.000000   3.000000  38.0   0.500000   0.000000   85.000000\n",
      "max     1.000000   3.000000  38.0   1.000000   5.000000  227.525000\n",
      "--------------------------------------------------------------------------------\n",
      "age = nan has:\n",
      "       survived  pclass  age  sibsp  parch  fare\n",
      "count       0.0     0.0  0.0    0.0    0.0   0.0\n",
      "mean        NaN     NaN  NaN    NaN    NaN   NaN\n",
      "std         NaN     NaN  NaN    NaN    NaN   NaN\n",
      "min         NaN     NaN  NaN    NaN    NaN   NaN\n",
      "25%         NaN     NaN  NaN    NaN    NaN   NaN\n",
      "50%         NaN     NaN  NaN    NaN    NaN   NaN\n",
      "75%         NaN     NaN  NaN    NaN    NaN   NaN\n",
      "max         NaN     NaN  NaN    NaN    NaN   NaN\n",
      "--------------------------------------------------------------------------------\n",
      "age = 0.42 has:\n",
      "       survived  pclass   age  sibsp  parch    fare\n",
      "count       1.0     1.0  1.00    1.0    1.0  1.0000\n",
      "mean        1.0     3.0  0.42    0.0    1.0  8.5167\n",
      "std         NaN     NaN   NaN    NaN    NaN     NaN\n",
      "min         1.0     3.0  0.42    0.0    1.0  8.5167\n",
      "25%         1.0     3.0  0.42    0.0    1.0  8.5167\n",
      "50%         1.0     3.0  0.42    0.0    1.0  8.5167\n",
      "75%         1.0     3.0  0.42    0.0    1.0  8.5167\n",
      "max         1.0     3.0  0.42    0.0    1.0  8.5167\n",
      "--------------------------------------------------------------------------------\n",
      "age = 0.67 has:\n",
      "       survived  pclass   age  sibsp  parch  fare\n",
      "count       1.0     1.0  1.00    1.0    1.0   1.0\n",
      "mean        1.0     2.0  0.67    1.0    1.0  14.5\n",
      "std         NaN     NaN   NaN    NaN    NaN   NaN\n",
      "min         1.0     2.0  0.67    1.0    1.0  14.5\n",
      "25%         1.0     2.0  0.67    1.0    1.0  14.5\n",
      "50%         1.0     2.0  0.67    1.0    1.0  14.5\n",
      "75%         1.0     2.0  0.67    1.0    1.0  14.5\n",
      "max         1.0     2.0  0.67    1.0    1.0  14.5\n",
      "--------------------------------------------------------------------------------\n",
      "age = 0.75 has:\n",
      "       survived  pclass   age  sibsp  parch     fare\n",
      "count       2.0     2.0  2.00    2.0    2.0   2.0000\n",
      "mean        1.0     3.0  0.75    2.0    1.0  19.2583\n",
      "std         0.0     0.0  0.00    0.0    0.0   0.0000\n",
      "min         1.0     3.0  0.75    2.0    1.0  19.2583\n",
      "25%         1.0     3.0  0.75    2.0    1.0  19.2583\n",
      "50%         1.0     3.0  0.75    2.0    1.0  19.2583\n",
      "75%         1.0     3.0  0.75    2.0    1.0  19.2583\n",
      "max         1.0     3.0  0.75    2.0    1.0  19.2583\n",
      "--------------------------------------------------------------------------------\n",
      "age = 0.92 has:\n",
      "       survived  pclass   age  sibsp  parch    fare\n",
      "count       1.0     1.0  1.00    1.0    1.0    1.00\n",
      "mean        1.0     1.0  0.92    1.0    2.0  151.55\n",
      "std         NaN     NaN   NaN    NaN    NaN     NaN\n",
      "min         1.0     1.0  0.92    1.0    2.0  151.55\n",
      "25%         1.0     1.0  0.92    1.0    2.0  151.55\n",
      "50%         1.0     1.0  0.92    1.0    2.0  151.55\n",
      "75%         1.0     1.0  0.92    1.0    2.0  151.55\n",
      "max         1.0     1.0  0.92    1.0    2.0  151.55\n",
      "--------------------------------------------------------------------------------\n",
      "age = 1.0 has:\n",
      "       survived    pclass  age     sibsp     parch       fare\n",
      "count  7.000000  7.000000  7.0  7.000000  7.000000   7.000000\n",
      "mean   0.714286  2.714286  1.0  1.857143  1.571429  30.005957\n",
      "std    0.487950  0.487950  0.0  1.951800  0.534522  13.890034\n",
      "min    0.000000  2.000000  1.0  0.000000  1.000000  11.133300\n",
      "25%    0.500000  2.500000  1.0  0.500000  1.000000  18.158350\n",
      "50%    1.000000  3.000000  1.0  1.000000  2.000000  37.004200\n",
      "75%    1.000000  3.000000  1.0  3.000000  2.000000  39.343750\n",
      "max    1.000000  3.000000  1.0  5.000000  2.000000  46.900000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 6.0 has:\n",
      "       survived    pclass  age     sibsp     parch       fare\n",
      "count  3.000000  3.000000  3.0  3.000000  3.000000   3.000000\n",
      "mean   0.666667  2.666667  6.0  1.333333  1.333333  25.583333\n",
      "std    0.577350  0.577350  0.0  2.309401  0.577350  11.384868\n",
      "min    0.000000  2.000000  6.0  0.000000  1.000000  12.475000\n",
      "25%    0.500000  2.500000  6.0  0.000000  1.000000  21.875000\n",
      "50%    1.000000  3.000000  6.0  0.000000  1.000000  31.275000\n",
      "75%    1.000000  3.000000  6.0  2.000000  1.500000  32.137500\n",
      "max    1.000000  3.000000  6.0  4.000000  2.000000  33.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 9.0 has:\n",
      "       survived  pclass  age     sibsp    parch       fare\n",
      "count   8.00000     8.0  8.0  8.000000  8.00000   8.000000\n",
      "mean    0.25000     3.0  9.0  2.500000  1.75000  27.938537\n",
      "std     0.46291     0.0  0.0  1.772811  0.46291  10.589661\n",
      "min     0.00000     3.0  9.0  0.000000  1.00000  15.245800\n",
      "25%     0.00000     3.0  9.0  1.000000  1.75000  19.368750\n",
      "50%     0.00000     3.0  9.0  2.500000  2.00000  29.587500\n",
      "75%     0.25000     3.0  9.0  4.000000  2.00000  32.134375\n",
      "max     1.00000     3.0  9.0  5.000000  2.00000  46.900000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 10.0 has:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       survived  pclass   age    sibsp  parch      fare\n",
      "count       2.0     2.0   2.0  2.00000    2.0   2.00000\n",
      "mean        0.0     3.0  10.0  1.50000    2.0  26.02500\n",
      "std         0.0     0.0   0.0  2.12132    0.0   2.65165\n",
      "min         0.0     3.0  10.0  0.00000    2.0  24.15000\n",
      "25%         0.0     3.0  10.0  0.75000    2.0  25.08750\n",
      "50%         0.0     3.0  10.0  1.50000    2.0  26.02500\n",
      "75%         0.0     3.0  10.0  2.25000    2.0  26.96250\n",
      "max         0.0     3.0  10.0  3.00000    2.0  27.90000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 12.0 has:\n",
      "       survived  pclass   age  sibsp  parch     fare\n",
      "count       1.0     1.0   1.0    1.0    1.0   1.0000\n",
      "mean        1.0     3.0  12.0    1.0    0.0  11.2417\n",
      "std         NaN     NaN   NaN    NaN    NaN      NaN\n",
      "min         1.0     3.0  12.0    1.0    0.0  11.2417\n",
      "25%         1.0     3.0  12.0    1.0    0.0  11.2417\n",
      "50%         1.0     3.0  12.0    1.0    0.0  11.2417\n",
      "75%         1.0     3.0  12.0    1.0    0.0  11.2417\n",
      "max         1.0     3.0  12.0    1.0    0.0  11.2417\n",
      "--------------------------------------------------------------------------------\n",
      "age = 13.0 has:\n",
      "       survived    pclass   age  sibsp     parch       fare\n",
      "count       2.0  2.000000   2.0    2.0  2.000000   2.000000\n",
      "mean        1.0  2.500000  13.0    0.0  0.500000  13.364600\n",
      "std         0.0  0.707107   0.0    0.0  0.707107   8.676766\n",
      "min         1.0  2.000000  13.0    0.0  0.000000   7.229200\n",
      "25%         1.0  2.250000  13.0    0.0  0.250000  10.296900\n",
      "50%         1.0  2.500000  13.0    0.0  0.500000  13.364600\n",
      "75%         1.0  2.750000  13.0    0.0  0.750000  16.432300\n",
      "max         1.0  3.000000  13.0    0.0  1.000000  19.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 14.5 has:\n",
      "       survived  pclass   age  sibsp  parch     fare\n",
      "count       1.0     1.0   1.0    1.0    1.0   1.0000\n",
      "mean        0.0     3.0  14.5    1.0    0.0  14.4542\n",
      "std         NaN     NaN   NaN    NaN    NaN      NaN\n",
      "min         0.0     3.0  14.5    1.0    0.0  14.4542\n",
      "25%         0.0     3.0  14.5    1.0    0.0  14.4542\n",
      "50%         0.0     3.0  14.5    1.0    0.0  14.4542\n",
      "75%         0.0     3.0  14.5    1.0    0.0  14.4542\n",
      "max         0.0     3.0  14.5    1.0    0.0  14.4542\n",
      "--------------------------------------------------------------------------------\n",
      "age = 20.5 has:\n",
      "       survived  pclass   age  sibsp  parch  fare\n",
      "count       1.0     1.0   1.0    1.0    1.0  1.00\n",
      "mean        0.0     3.0  20.5    0.0    0.0  7.25\n",
      "std         NaN     NaN   NaN    NaN    NaN   NaN\n",
      "min         0.0     3.0  20.5    0.0    0.0  7.25\n",
      "25%         0.0     3.0  20.5    0.0    0.0  7.25\n",
      "50%         0.0     3.0  20.5    0.0    0.0  7.25\n",
      "75%         0.0     3.0  20.5    0.0    0.0  7.25\n",
      "max         0.0     3.0  20.5    0.0    0.0  7.25\n",
      "--------------------------------------------------------------------------------\n",
      "age = 23.5 has:\n",
      "       survived  pclass   age  sibsp  parch    fare\n",
      "count       1.0     1.0   1.0    1.0    1.0  1.0000\n",
      "mean        0.0     3.0  23.5    0.0    0.0  7.2292\n",
      "std         NaN     NaN   NaN    NaN    NaN     NaN\n",
      "min         0.0     3.0  23.5    0.0    0.0  7.2292\n",
      "25%         0.0     3.0  23.5    0.0    0.0  7.2292\n",
      "50%         0.0     3.0  23.5    0.0    0.0  7.2292\n",
      "75%         0.0     3.0  23.5    0.0    0.0  7.2292\n",
      "max         0.0     3.0  23.5    0.0    0.0  7.2292\n",
      "--------------------------------------------------------------------------------\n",
      "age = 24.5 has:\n",
      "       survived  pclass   age  sibsp  parch  fare\n",
      "count       1.0     1.0   1.0    1.0    1.0  1.00\n",
      "mean        0.0     3.0  24.5    0.0    0.0  8.05\n",
      "std         NaN     NaN   NaN    NaN    NaN   NaN\n",
      "min         0.0     3.0  24.5    0.0    0.0  8.05\n",
      "25%         0.0     3.0  24.5    0.0    0.0  8.05\n",
      "50%         0.0     3.0  24.5    0.0    0.0  8.05\n",
      "75%         0.0     3.0  24.5    0.0    0.0  8.05\n",
      "max         0.0     3.0  24.5    0.0    0.0  8.05\n",
      "--------------------------------------------------------------------------------\n",
      "age = 30.5 has:\n",
      "       survived  pclass   age  sibsp  parch      fare\n",
      "count       2.0     2.0   2.0    2.0    2.0  2.000000\n",
      "mean        0.0     3.0  30.5    0.0    0.0  7.900000\n",
      "std         0.0     0.0   0.0    0.0    0.0  0.212132\n",
      "min         0.0     3.0  30.5    0.0    0.0  7.750000\n",
      "25%         0.0     3.0  30.5    0.0    0.0  7.825000\n",
      "50%         0.0     3.0  30.5    0.0    0.0  7.900000\n",
      "75%         0.0     3.0  30.5    0.0    0.0  7.975000\n",
      "max         0.0     3.0  30.5    0.0    0.0  8.050000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 32.5 has:\n",
      "       survived  pclass   age     sibsp  parch       fare\n",
      "count  2.000000     2.0   2.0  2.000000    2.0   2.000000\n",
      "mean   0.500000     2.0  32.5  0.500000    0.0  21.535400\n",
      "std    0.707107     0.0   0.0  0.707107    0.0  12.070878\n",
      "min    0.000000     2.0  32.5  0.000000    0.0  13.000000\n",
      "25%    0.250000     2.0  32.5  0.250000    0.0  17.267700\n",
      "50%    0.500000     2.0  32.5  0.500000    0.0  21.535400\n",
      "75%    0.750000     2.0  32.5  0.750000    0.0  25.803100\n",
      "max    1.000000     2.0  32.5  1.000000    0.0  30.070800\n",
      "--------------------------------------------------------------------------------\n",
      "age = 34.5 has:\n",
      "       survived  pclass   age  sibsp  parch    fare\n",
      "count       1.0     1.0   1.0    1.0    1.0  1.0000\n",
      "mean        0.0     3.0  34.5    0.0    0.0  6.4375\n",
      "std         NaN     NaN   NaN    NaN    NaN     NaN\n",
      "min         0.0     3.0  34.5    0.0    0.0  6.4375\n",
      "25%         0.0     3.0  34.5    0.0    0.0  6.4375\n",
      "50%         0.0     3.0  34.5    0.0    0.0  6.4375\n",
      "75%         0.0     3.0  34.5    0.0    0.0  6.4375\n",
      "max         0.0     3.0  34.5    0.0    0.0  6.4375\n",
      "--------------------------------------------------------------------------------\n",
      "age = 36.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  22.000000  22.000000  22.0  22.000000  22.000000   22.000000\n",
      "mean    0.500000   1.863636  36.0   0.363636   0.454545   59.964959\n",
      "std     0.511766   0.833550   0.0   0.492366   0.800433  108.737593\n",
      "min     0.000000   1.000000  36.0   0.000000   0.000000    0.000000\n",
      "25%     0.000000   1.000000  36.0   0.000000   0.000000   13.000000\n",
      "50%     0.500000   2.000000  36.0   0.000000   0.000000   25.075000\n",
      "75%     1.000000   2.750000  36.0   1.000000   0.750000   63.281250\n",
      "max     1.000000   3.000000  36.0   1.000000   2.000000  512.329200\n",
      "--------------------------------------------------------------------------------\n",
      "age = 36.5 has:\n",
      "       survived  pclass   age  sibsp  parch  fare\n",
      "count       1.0     1.0   1.0    1.0    1.0   1.0\n",
      "mean        0.0     2.0  36.5    0.0    2.0  26.0\n",
      "std         NaN     NaN   NaN    NaN    NaN   NaN\n",
      "min         0.0     2.0  36.5    0.0    2.0  26.0\n",
      "25%         0.0     2.0  36.5    0.0    2.0  26.0\n",
      "50%         0.0     2.0  36.5    0.0    2.0  26.0\n",
      "75%         0.0     2.0  36.5    0.0    2.0  26.0\n",
      "max         0.0     2.0  36.5    0.0    2.0  26.0\n",
      "--------------------------------------------------------------------------------\n",
      "age = 37.0 has:\n",
      "       survived    pclass   age     sibsp     parch       fare\n",
      "count  6.000000  6.000000   6.0  6.000000  6.000000   6.000000\n",
      "mean   0.166667  1.833333  37.0  0.833333  0.333333  29.811117\n",
      "std    0.408248  0.983192   0.0  0.752773  0.516398  19.809864\n",
      "min    0.000000  1.000000  37.0  0.000000  0.000000   7.925000\n",
      "25%    0.000000  1.000000  37.0  0.250000  0.000000  13.690625\n",
      "50%    0.000000  1.500000  37.0  1.000000  0.000000  27.850000\n",
      "75%    0.000000  2.750000  37.0  1.000000  0.750000  46.840650\n",
      "max    1.000000  3.000000  37.0  2.000000  1.000000  53.100000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 39.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  14.000000  14.000000  14.0  14.000000  14.000000   14.000000\n",
      "mean    0.357143   2.071429  39.0   0.428571   1.285714   36.661900\n",
      "std     0.497245   0.916875   0.0   0.513553   2.054210   33.269718\n",
      "min     0.000000   1.000000  39.0   0.000000   0.000000    0.000000\n",
      "25%     0.000000   1.000000  39.0   0.000000   0.000000   13.000000\n",
      "50%     0.000000   2.000000  39.0   0.000000   0.000000   27.562500\n",
      "75%     1.000000   3.000000  39.0   1.000000   1.000000   49.743750\n",
      "max     1.000000   3.000000  39.0   1.000000   5.000000  110.883300\n",
      "--------------------------------------------------------------------------------\n",
      "age = 40.0 has:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  13.000000  13.000000  13.0  13.000000  13.000000   13.000000\n",
      "mean    0.461538   2.000000  40.0   0.384615   0.538462   37.109931\n",
      "std     0.518875   0.912871   0.0   0.506370   1.126601   48.843768\n",
      "min     0.000000   1.000000  40.0   0.000000   0.000000    0.000000\n",
      "25%     0.000000   1.000000  40.0   0.000000   0.000000    9.475000\n",
      "50%     0.000000   2.000000  40.0   0.000000   0.000000   15.750000\n",
      "75%     1.000000   3.000000  40.0   1.000000   1.000000   31.000000\n",
      "max     1.000000   3.000000  40.0   1.000000   4.000000  153.462500\n",
      "--------------------------------------------------------------------------------\n",
      "age = 40.5 has:\n",
      "       survived  pclass   age  sibsp     parch       fare\n",
      "count       2.0     2.0   2.0    2.0  2.000000   2.000000\n",
      "mean        0.0     3.0  40.5    0.0  1.000000  11.125000\n",
      "std         0.0     0.0   0.0    0.0  1.414214   4.772971\n",
      "min         0.0     3.0  40.5    0.0  0.000000   7.750000\n",
      "25%         0.0     3.0  40.5    0.0  0.500000   9.437500\n",
      "50%         0.0     3.0  40.5    0.0  1.000000  11.125000\n",
      "75%         0.0     3.0  40.5    0.0  1.500000  12.812500\n",
      "max         0.0     3.0  40.5    0.0  2.000000  14.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 41.0 has:\n",
      "       survived   pclass   age     sibsp     parch        fare\n",
      "count  6.000000  6.00000   6.0  6.000000  6.000000    6.000000\n",
      "mean   0.333333  2.50000  41.0  0.333333  1.333333   39.188883\n",
      "std    0.516398  0.83666   0.0  0.816497  1.966384   47.936085\n",
      "min    0.000000  1.00000  41.0  0.000000  0.000000    7.125000\n",
      "25%    0.000000  2.25000  41.0  0.000000  0.000000   15.456225\n",
      "50%    0.000000  3.00000  41.0  0.000000  0.500000   19.856250\n",
      "75%    0.750000  3.00000  41.0  0.000000  1.750000   34.818750\n",
      "max    1.000000  3.00000  41.0  2.000000  5.000000  134.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 42.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  13.000000  13.000000  13.0  13.000000  13.000000   13.000000\n",
      "mean    0.461538   2.000000  42.0   0.307692   0.076923   37.125646\n",
      "std     0.518875   0.816497   0.0   0.480384   0.277350   59.287239\n",
      "min     0.000000   1.000000  42.0   0.000000   0.000000    7.550000\n",
      "25%     0.000000   1.000000  42.0   0.000000   0.000000    8.662500\n",
      "50%     0.000000   2.000000  42.0   0.000000   0.000000   13.000000\n",
      "75%     1.000000   3.000000  42.0   1.000000   0.000000   27.000000\n",
      "max     1.000000   3.000000  42.0   1.000000   1.000000  227.525000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 43.0 has:\n",
      "       survived    pclass   age     sibsp    parch        fare\n",
      "count  5.000000  5.000000   5.0  5.000000  5.00000    5.000000\n",
      "mean   0.200000  2.400000  43.0  0.400000  1.60000   59.797500\n",
      "std    0.447214  0.894427   0.0  0.547723  2.50998   86.284285\n",
      "min    0.000000  1.000000  43.0  0.000000  0.00000    6.450000\n",
      "25%    0.000000  2.000000  43.0  0.000000  0.00000    8.050000\n",
      "50%    0.000000  3.000000  43.0  0.000000  1.00000   26.250000\n",
      "75%    0.000000  3.000000  43.0  1.000000  1.00000   46.900000\n",
      "max    1.000000  3.000000  43.0  1.000000  6.00000  211.337500\n",
      "--------------------------------------------------------------------------------\n",
      "age = 44.0 has:\n",
      "       survived    pclass   age     sibsp     parch       fare\n",
      "count  9.000000  9.000000   9.0  9.000000  9.000000   9.000000\n",
      "mean   0.333333  2.111111  44.0  0.444444  0.222222  29.758333\n",
      "std    0.500000  0.927961   0.0  0.726483  0.440959  27.530949\n",
      "min    0.000000  1.000000  44.0  0.000000  0.000000   7.925000\n",
      "25%    0.000000  1.000000  44.0  0.000000  0.000000   8.050000\n",
      "50%    0.000000  2.000000  44.0  0.000000  0.000000  26.000000\n",
      "75%    1.000000  3.000000  44.0  1.000000  0.000000  27.720800\n",
      "max    1.000000  3.000000  44.0  2.000000  1.000000  90.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 45.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  12.000000  12.000000  12.0  12.000000  12.000000   12.000000\n",
      "mean    0.416667   2.000000  45.0   0.333333   0.583333   36.818408\n",
      "std     0.514929   0.953463   0.0   0.492366   1.164500   45.311226\n",
      "min     0.000000   1.000000  45.0   0.000000   0.000000    6.975000\n",
      "25%     0.000000   1.000000  45.0   0.000000   0.000000   12.137500\n",
      "50%     0.000000   2.000000  45.0   0.000000   0.000000   26.400000\n",
      "75%     1.000000   3.000000  45.0   1.000000   1.000000   29.800000\n",
      "max     1.000000   3.000000  45.0   1.000000   4.000000  164.866700\n",
      "--------------------------------------------------------------------------------\n",
      "age = 45.5 has:\n",
      "       survived    pclass   age  sibsp  parch       fare\n",
      "count       2.0  2.000000   2.0    2.0    2.0   2.000000\n",
      "mean        0.0  2.000000  45.5    0.0    0.0  17.862500\n",
      "std         0.0  1.414214   0.0    0.0    0.0  15.043697\n",
      "min         0.0  1.000000  45.5    0.0    0.0   7.225000\n",
      "25%         0.0  1.500000  45.5    0.0    0.0  12.543750\n",
      "50%         0.0  2.000000  45.5    0.0    0.0  17.862500\n",
      "75%         0.0  2.500000  45.5    0.0    0.0  23.181250\n",
      "max         0.0  3.000000  45.5    0.0    0.0  28.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 46.0 has:\n",
      "       survived    pclass   age     sibsp  parch       fare\n",
      "count       3.0  3.000000   3.0  3.000000    3.0   3.000000\n",
      "mean        0.0  1.333333  46.0  0.333333    0.0  55.458333\n",
      "std         0.0  0.577350   0.0  0.577350    0.0  27.056796\n",
      "min         0.0  1.000000  46.0  0.000000    0.0  26.000000\n",
      "25%         0.0  1.000000  46.0  0.000000    0.0  43.587500\n",
      "50%         0.0  1.000000  46.0  0.000000    0.0  61.175000\n",
      "75%         0.0  1.500000  46.0  0.500000    0.0  70.187500\n",
      "max         0.0  2.000000  46.0  1.000000    0.0  79.200000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 47.0 has:\n",
      "       survived    pclass   age     sibsp     parch       fare\n",
      "count  9.000000  9.000000   9.0  9.000000  9.000000   9.000000\n",
      "mean   0.111111  1.777778  47.0  0.222222  0.111111  27.601389\n",
      "std    0.333333  0.971825   0.0  0.440959  0.333333  17.580570\n",
      "min    0.000000  1.000000  47.0  0.000000  0.000000   7.250000\n",
      "25%    0.000000  1.000000  47.0  0.000000  0.000000  14.500000\n",
      "50%    0.000000  1.000000  47.0  0.000000  0.000000  25.587500\n",
      "75%    0.000000  3.000000  47.0  0.000000  0.000000  38.500000\n",
      "max    1.000000  3.000000  47.0  1.000000  1.000000  52.554200\n",
      "--------------------------------------------------------------------------------\n",
      "age = 48.0 has:\n",
      "       survived    pclass   age     sibsp     parch       fare\n",
      "count  9.000000  9.000000   9.0  9.000000  9.000000   9.000000\n",
      "mean   0.666667  1.666667  48.0  0.555556  0.555556  37.893067\n",
      "std    0.500000  0.866025   0.0  0.527046  1.130388  23.051910\n",
      "min    0.000000  1.000000  48.0  0.000000  0.000000   7.854200\n",
      "25%    0.000000  1.000000  48.0  0.000000  0.000000  25.929200\n",
      "50%    1.000000  1.000000  48.0  1.000000  0.000000  34.375000\n",
      "75%    1.000000  2.000000  48.0  1.000000  0.000000  52.000000\n",
      "max    1.000000  3.000000  48.0  1.000000  3.000000  76.729200\n",
      "--------------------------------------------------------------------------------\n",
      "age = 49.0 has:\n",
      "       survived    pclass   age     sibsp     parch        fare\n",
      "count  6.000000  6.000000   6.0  6.000000  6.000000    6.000000\n",
      "mean   0.666667  1.333333  49.0  0.666667  0.166667   59.929183\n",
      "std    0.516398  0.816497   0.0  0.516398  0.408248   41.197694\n",
      "min    0.000000  1.000000  49.0  0.000000  0.000000    0.000000\n",
      "25%    0.250000  1.000000  49.0  0.250000  0.000000   33.679200\n",
      "50%    1.000000  1.000000  49.0  1.000000  0.000000   66.829200\n",
      "75%    1.000000  1.000000  49.0  1.000000  0.000000   86.010450\n",
      "max    1.000000  3.000000  49.0  1.000000  1.000000  110.883300\n",
      "--------------------------------------------------------------------------------\n",
      "age = 50.0 has:\n",
      "        survived     pclass   age      sibsp      parch        fare\n",
      "count  10.000000  10.000000  10.0  10.000000  10.000000   10.000000\n",
      "mean    0.500000   1.600000  50.0   0.400000   0.200000   64.025830\n",
      "std     0.527046   0.699206   0.0   0.699206   0.421637   77.847144\n",
      "min     0.000000   1.000000  50.0   0.000000   0.000000    8.050000\n",
      "25%     0.000000   1.000000  50.0   0.000000   0.000000   11.125000\n",
      "50%     0.500000   1.500000  50.0   0.000000   0.000000   27.356250\n",
      "75%     1.000000   2.000000  50.0   0.750000   0.000000   93.793750\n",
      "max     1.000000   3.000000  50.0   2.000000   1.000000  247.520800\n",
      "--------------------------------------------------------------------------------\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "age = 51.0 has:\n",
      "       survived  pclass   age     sibsp     parch       fare\n",
      "count  7.000000     7.0   7.0  7.000000  7.000000   7.000000\n",
      "mean   0.285714     2.0  51.0  0.142857  0.142857  28.752386\n",
      "std    0.487950     1.0   0.0  0.377964  0.377964  29.138777\n",
      "min    0.000000     1.0  51.0  0.000000  0.000000   7.054200\n",
      "25%    0.000000     1.0  51.0  0.000000  0.000000   7.900000\n",
      "50%    0.000000     2.0  51.0  0.000000  0.000000  12.525000\n",
      "75%    0.500000     3.0  51.0  0.000000  0.000000  43.964600\n",
      "max    1.000000     3.0  51.0  1.000000  1.000000  77.958300\n",
      "--------------------------------------------------------------------------------\n",
      "age = 52.0 has:\n",
      "       survived    pclass   age     sibsp     parch       fare\n",
      "count  6.000000  6.000000   6.0  6.000000  6.000000   6.000000\n",
      "mean   0.500000  1.333333  52.0  0.500000  0.333333  51.402783\n",
      "std    0.547723  0.516398   0.0  0.547723  0.516398  36.441932\n",
      "min    0.000000  1.000000  52.0  0.000000  0.000000  13.000000\n",
      "25%    0.000000  1.000000  52.0  0.000000  0.000000  17.750000\n",
      "50%    0.500000  1.000000  52.0  0.500000  0.000000  54.383350\n",
      "75%    1.000000  1.750000  52.0  1.000000  0.750000  79.304175\n",
      "max    1.000000  2.000000  52.0  1.000000  1.000000  93.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 53.0 has:\n",
      "       survived  pclass   age  sibsp  parch     fare\n",
      "count       1.0     1.0   1.0    1.0    1.0   1.0000\n",
      "mean        1.0     1.0  53.0    2.0    0.0  51.4792\n",
      "std         NaN     NaN   NaN    NaN    NaN      NaN\n",
      "min         1.0     1.0  53.0    2.0    0.0  51.4792\n",
      "25%         1.0     1.0  53.0    2.0    0.0  51.4792\n",
      "50%         1.0     1.0  53.0    2.0    0.0  51.4792\n",
      "75%         1.0     1.0  53.0    2.0    0.0  51.4792\n",
      "max         1.0     1.0  53.0    2.0    0.0  51.4792\n",
      "--------------------------------------------------------------------------------\n",
      "age = 54.0 has:\n",
      "       survived    pclass   age     sibsp     parch       fare\n",
      "count  8.000000  8.000000   8.0  8.000000  8.000000   8.000000\n",
      "mean   0.375000  1.500000  54.0  0.500000  0.500000  44.477087\n",
      "std    0.517549  0.534522   0.0  0.534522  1.069045  25.546659\n",
      "min    0.000000  1.000000  54.0  0.000000  0.000000  14.000000\n",
      "25%    0.000000  1.000000  54.0  0.000000  0.000000  25.250000\n",
      "50%    0.000000  1.500000  54.0  0.500000  0.000000  38.931250\n",
      "75%    1.000000  2.000000  54.0  1.000000  0.250000  63.871875\n",
      "max    1.000000  2.000000  54.0  1.000000  3.000000  78.266700\n",
      "--------------------------------------------------------------------------------\n",
      "age = 55.0 has:\n",
      "       survived    pclass   age  sibsp  parch       fare\n",
      "count  2.000000  2.000000   2.0    2.0    2.0   2.000000\n",
      "mean   0.500000  1.500000  55.0    0.0    0.0  23.250000\n",
      "std    0.707107  0.707107   0.0    0.0    0.0  10.253048\n",
      "min    0.000000  1.000000  55.0    0.0    0.0  16.000000\n",
      "25%    0.250000  1.250000  55.0    0.0    0.0  19.625000\n",
      "50%    0.500000  1.500000  55.0    0.0    0.0  23.250000\n",
      "75%    0.750000  1.750000  55.0    0.0    0.0  26.875000\n",
      "max    1.000000  2.000000  55.0    0.0    0.0  30.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 55.5 has:\n",
      "       survived  pclass   age  sibsp  parch  fare\n",
      "count       1.0     1.0   1.0    1.0    1.0  1.00\n",
      "mean        0.0     3.0  55.5    0.0    0.0  8.05\n",
      "std         NaN     NaN   NaN    NaN    NaN   NaN\n",
      "min         0.0     3.0  55.5    0.0    0.0  8.05\n",
      "25%         0.0     3.0  55.5    0.0    0.0  8.05\n",
      "50%         0.0     3.0  55.5    0.0    0.0  8.05\n",
      "75%         0.0     3.0  55.5    0.0    0.0  8.05\n",
      "max         0.0     3.0  55.5    0.0    0.0  8.05\n",
      "--------------------------------------------------------------------------------\n",
      "age = 56.0 has:\n",
      "       survived  pclass   age  sibsp  parch       fare\n",
      "count   4.00000     4.0   4.0    4.0   4.00   4.000000\n",
      "mean    0.50000     1.0  56.0    0.0   0.25  43.976025\n",
      "std     0.57735     0.0   0.0    0.0   0.50  26.376280\n",
      "min     0.00000     1.0  56.0    0.0   0.00  26.550000\n",
      "25%     0.00000     1.0  56.0    0.0   0.00  29.659350\n",
      "50%     0.50000     1.0  56.0    0.0   0.00  33.097900\n",
      "75%     1.00000     1.0  56.0    0.0   0.25  47.414575\n",
      "max     1.00000     1.0  56.0    0.0   1.00  83.158300\n",
      "--------------------------------------------------------------------------------\n",
      "age = 57.0 has:\n",
      "       survived  pclass   age  sibsp  parch       fare\n",
      "count       2.0     2.0   2.0    2.0    2.0   2.000000\n",
      "mean        0.0     2.0  57.0    0.0    0.0  11.425000\n",
      "std         0.0     0.0   0.0    0.0    0.0   1.308148\n",
      "min         0.0     2.0  57.0    0.0    0.0  10.500000\n",
      "25%         0.0     2.0  57.0    0.0    0.0  10.962500\n",
      "50%         0.0     2.0  57.0    0.0    0.0  11.425000\n",
      "75%         0.0     2.0  57.0    0.0    0.0  11.887500\n",
      "max         0.0     2.0  57.0    0.0    0.0  12.350000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 58.0 has:\n",
      "       survived  pclass   age  sibsp     parch        fare\n",
      "count  5.000000     5.0   5.0    5.0  5.000000    5.000000\n",
      "mean   0.600000     1.0  58.0    0.0  0.600000   93.901660\n",
      "std    0.547723     0.0   0.0    0.0  0.894427   61.946939\n",
      "min    0.000000     1.0  58.0    0.0  0.000000   26.550000\n",
      "25%    0.000000     1.0  58.0    0.0  0.000000   29.700000\n",
      "50%    1.000000     1.0  58.0    0.0  0.000000  113.275000\n",
      "75%    1.000000     1.0  58.0    0.0  1.000000  146.520800\n",
      "max    1.000000     1.0  58.0    0.0  2.000000  153.462500\n",
      "--------------------------------------------------------------------------------\n",
      "age = 59.0 has:\n",
      "       survived    pclass   age  sibsp  parch       fare\n",
      "count       2.0  2.000000   2.0    2.0    2.0   2.000000\n",
      "mean        0.0  2.500000  59.0    0.0    0.0  10.375000\n",
      "std         0.0  0.707107   0.0    0.0    0.0   4.419417\n",
      "min         0.0  2.000000  59.0    0.0    0.0   7.250000\n",
      "25%         0.0  2.250000  59.0    0.0    0.0   8.812500\n",
      "50%         0.0  2.500000  59.0    0.0    0.0  10.375000\n",
      "75%         0.0  2.750000  59.0    0.0    0.0  11.937500\n",
      "max         0.0  3.000000  59.0    0.0    0.0  13.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 60.0 has:\n",
      "       survived  pclass   age  sibsp    parch       fare\n",
      "count   4.00000    4.00   4.0   4.00  4.00000   4.000000\n",
      "mean    0.50000    1.25  60.0   0.75  0.50000  55.000000\n",
      "std     0.57735    0.50   0.0   0.50  0.57735  26.211353\n",
      "min     0.00000    1.00  60.0   0.00  0.00000  26.550000\n",
      "25%     0.00000    1.00  60.0   0.75  0.00000  35.887500\n",
      "50%     0.50000    1.00  60.0   1.00  0.50000  57.125000\n",
      "75%     1.00000    1.25  60.0   1.00  1.00000  76.237500\n",
      "max     1.00000    2.00  60.0   1.00  1.00000  79.200000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 61.0 has:\n",
      "       survived    pclass   age  sibsp  parch       fare\n",
      "count       3.0  3.000000   3.0    3.0    3.0   3.000000\n",
      "mean        0.0  1.666667  61.0    0.0    0.0  24.019433\n",
      "std         0.0  1.154701   0.0    0.0    0.0  15.410889\n",
      "min         0.0  1.000000  61.0    0.0    0.0   6.237500\n",
      "25%         0.0  1.000000  61.0    0.0    0.0  19.279150\n",
      "50%         0.0  1.000000  61.0    0.0    0.0  32.320800\n",
      "75%         0.0  2.000000  61.0    0.0    0.0  32.910400\n",
      "max         0.0  3.000000  61.0    0.0    0.0  33.500000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 62.0 has:\n",
      "       survived  pclass   age  sibsp  parch       fare\n",
      "count   4.00000    4.00   4.0    4.0    4.0   4.000000\n",
      "mean    0.50000    1.25  62.0    0.0    0.0  35.900000\n",
      "std     0.57735    0.50   0.0    0.0    0.0  30.357948\n",
      "min     0.00000    1.00  62.0    0.0    0.0  10.500000\n",
      "25%     0.00000    1.00  62.0    0.0    0.0  22.537500\n",
      "50%     0.50000    1.00  62.0    0.0    0.0  26.550000\n",
      "75%     1.00000    1.25  62.0    0.0    0.0  39.912500\n",
      "max     1.00000    2.00  62.0    0.0    0.0  80.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 63.0 has:\n",
      "       survived    pclass   age     sibsp  parch       fare\n",
      "count       2.0  2.000000   2.0  2.000000    2.0   2.000000\n",
      "mean        1.0  2.000000  63.0  0.500000    0.0  43.772900\n",
      "std         0.0  1.414214   0.0  0.707107    0.0  48.345456\n",
      "min         1.0  1.000000  63.0  0.000000    0.0   9.587500\n",
      "25%         1.0  1.500000  63.0  0.250000    0.0  26.680200\n",
      "50%         1.0  2.000000  63.0  0.500000    0.0  43.772900\n",
      "75%         1.0  2.500000  63.0  0.750000    0.0  60.865600\n",
      "max         1.0  3.000000  63.0  1.000000    0.0  77.958300\n",
      "--------------------------------------------------------------------------------\n",
      "age = 64.0 has:\n",
      "       survived  pclass   age     sibsp     parch        fare\n",
      "count       2.0     2.0   2.0  2.000000  2.000000    2.000000\n",
      "mean        0.0     1.0  64.0  0.500000  2.000000  144.500000\n",
      "std         0.0     0.0   0.0  0.707107  2.828427  167.584307\n",
      "min         0.0     1.0  64.0  0.000000  0.000000   26.000000\n",
      "25%         0.0     1.0  64.0  0.250000  1.000000   85.250000\n",
      "50%         0.0     1.0  64.0  0.500000  2.000000  144.500000\n",
      "75%         0.0     1.0  64.0  0.750000  3.000000  203.750000\n",
      "max         0.0     1.0  64.0  1.000000  4.000000  263.000000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 65.0 has:\n",
      "       survived    pclass   age  sibsp     parch       fare\n",
      "count       3.0  3.000000   3.0    3.0  3.000000   3.000000\n",
      "mean        0.0  1.666667  65.0    0.0  0.333333  32.093067\n",
      "std         0.0  1.154701   0.0    0.0  0.577350  27.536262\n",
      "min         0.0  1.000000  65.0    0.0  0.000000   7.750000\n",
      "25%         0.0  1.000000  65.0    0.0  0.000000  17.150000\n",
      "50%         0.0  1.000000  65.0    0.0  0.000000  26.550000\n",
      "75%         0.0  2.000000  65.0    0.0  0.500000  44.264600\n",
      "max         0.0  3.000000  65.0    0.0  1.000000  61.979200\n",
      "--------------------------------------------------------------------------------\n",
      "age = 66.0 has:\n",
      "       survived  pclass   age  sibsp  parch  fare\n",
      "count       1.0     1.0   1.0    1.0    1.0   1.0\n",
      "mean        0.0     2.0  66.0    0.0    0.0  10.5\n",
      "std         NaN     NaN   NaN    NaN    NaN   NaN\n",
      "min         0.0     2.0  66.0    0.0    0.0  10.5\n",
      "25%         0.0     2.0  66.0    0.0    0.0  10.5\n",
      "50%         0.0     2.0  66.0    0.0    0.0  10.5\n",
      "75%         0.0     2.0  66.0    0.0    0.0  10.5\n",
      "max         0.0     2.0  66.0    0.0    0.0  10.5\n",
      "--------------------------------------------------------------------------------\n",
      "age = 70.0 has:\n",
      "       survived    pclass   age     sibsp     parch      fare\n",
      "count       2.0  2.000000   2.0  2.000000  2.000000   2.00000\n",
      "mean        0.0  1.500000  70.0  0.500000  0.500000  40.75000\n",
      "std         0.0  0.707107   0.0  0.707107  0.707107  42.77996\n",
      "min         0.0  1.000000  70.0  0.000000  0.000000  10.50000\n",
      "25%         0.0  1.250000  70.0  0.250000  0.250000  25.62500\n",
      "50%         0.0  1.500000  70.0  0.500000  0.500000  40.75000\n",
      "75%         0.0  1.750000  70.0  0.750000  0.750000  55.87500\n",
      "max         0.0  2.000000  70.0  1.000000  1.000000  71.00000\n",
      "--------------------------------------------------------------------------------\n",
      "age = 70.5 has:\n",
      "       survived  pclass   age  sibsp  parch  fare\n",
      "count       1.0     1.0   1.0    1.0    1.0  1.00\n",
      "mean        0.0     3.0  70.5    0.0    0.0  7.75\n",
      "std         NaN     NaN   NaN    NaN    NaN   NaN\n",
      "min         0.0     3.0  70.5    0.0    0.0  7.75\n",
      "25%         0.0     3.0  70.5    0.0    0.0  7.75\n",
      "50%         0.0     3.0  70.5    0.0    0.0  7.75\n",
      "75%         0.0     3.0  70.5    0.0    0.0  7.75\n",
      "max         0.0     3.0  70.5    0.0    0.0  7.75\n",
      "--------------------------------------------------------------------------------\n",
      "age = 71.0 has:\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       survived  pclass   age  sibsp  parch       fare\n",
      "count       2.0     2.0   2.0    2.0    2.0   2.000000\n",
      "mean        0.0     1.0  71.0    0.0    0.0  42.079200\n",
      "std         0.0     0.0   0.0    0.0    0.0  10.500536\n",
      "min         0.0     1.0  71.0    0.0    0.0  34.654200\n",
      "25%         0.0     1.0  71.0    0.0    0.0  38.366700\n",
      "50%         0.0     1.0  71.0    0.0    0.0  42.079200\n",
      "75%         0.0     1.0  71.0    0.0    0.0  45.791700\n",
      "max         0.0     1.0  71.0    0.0    0.0  49.504200\n",
      "--------------------------------------------------------------------------------\n",
      "age = 74.0 has:\n",
      "       survived  pclass   age  sibsp  parch   fare\n",
      "count       1.0     1.0   1.0    1.0    1.0  1.000\n",
      "mean        0.0     3.0  74.0    0.0    0.0  7.775\n",
      "std         NaN     NaN   NaN    NaN    NaN    NaN\n",
      "min         0.0     3.0  74.0    0.0    0.0  7.775\n",
      "25%         0.0     3.0  74.0    0.0    0.0  7.775\n",
      "50%         0.0     3.0  74.0    0.0    0.0  7.775\n",
      "75%         0.0     3.0  74.0    0.0    0.0  7.775\n",
      "max         0.0     3.0  74.0    0.0    0.0  7.775\n",
      "--------------------------------------------------------------------------------\n",
      "age = 80.0 has:\n",
      "       survived  pclass   age  sibsp  parch  fare\n",
      "count       1.0     1.0   1.0    1.0    1.0   1.0\n",
      "mean        1.0     1.0  80.0    0.0    0.0  30.0\n",
      "std         NaN     NaN   NaN    NaN    NaN   NaN\n",
      "min         1.0     1.0  80.0    0.0    0.0  30.0\n",
      "25%         1.0     1.0  80.0    0.0    0.0  30.0\n",
      "50%         1.0     1.0  80.0    0.0    0.0  30.0\n",
      "75%         1.0     1.0  80.0    0.0    0.0  30.0\n",
      "max         1.0     1.0  80.0    0.0    0.0  30.0\n",
      "--------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "feat = 'age'\n",
    "for val in sorted(df_titanic[feat].unique()):\n",
    "    # extract only rows coresponding to a particular value of feature\n",
    "    s_bool = df_titanic[feat] == val\n",
    "    df_titanic_subset = df_titanic.loc[s_bool, :]\n",
    "    \n",
    "    print(f'{feat} = {val} has:')\n",
    "    print(df_titanic_subset.describe(), end='\\n' + '-' * 80 + '\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}