{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DS2500 Lesson6\n",
"\n",
"Jan 31, 2023\n",
"\n",
"### Content:\n",
"- Pandas\n",
" - series\n",
" - dataframe\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# before you begin, make sure you can load data from seaborn\n",
"import seaborn as sns\n",
"df_penguin = sns.load_dataset('penguins')\n",
"df_titanic = sns.load_dataset('titanic')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Having trouble?\n",
"- see piazza for long-term solution\n",
" - [mac SSL error](https://piazza.com/class/lbxsbawi9yq2f9/post/55)\n",
"- use code below for today:\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# if these lines give you trouble, use the csvs available on the website\n",
"# (be sure they're adjacent to this .ipynb file on your machine)\n",
"import pandas as pd\n",
"df_penguin = pd.read_csv('penguin.csv')\n",
"df_titanic = pd.read_csv('titanic.csv');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Admin:\n",
"- lab1\n",
" - talk to friends\n",
" - lab digest\n",
" - part b (part c)\n",
"- hw0 due friday @ 11:59 PM\n",
" - .py and .ipynb\n",
" - see canvas announcement\n",
" - see piazza\n",
"- look at schedule together\n",
"- tutoring groups\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The value of talking-out-loud about programming\n",
"\n",
"... I learned 2 new ways to approach lab1's part B `get_win_set()` this morning!\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# if all items in an array are the same, then the std dev is 0\n",
"import numpy as np\n",
"\n",
"# mysterious student from section 2\n",
"np.array([1, 1, 1]).std() == 0"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.array([1, 2, 0])\n",
"\n",
"# mysterious student from section 2\n",
"len(set(x)) == 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Pandas\n",
"\n",
"Pandas is a python module which stores data in `pd.DataFrame` and `pd.Series` objects.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" species | \n",
" island | \n",
" bill_length_mm | \n",
" bill_depth_mm | \n",
" flipper_length_mm | \n",
" body_mass_g | \n",
" sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.1 | \n",
" 18.7 | \n",
" 181.0 | \n",
" 3750.0 | \n",
" Male | \n",
"
\n",
" \n",
" 1 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.5 | \n",
" 17.4 | \n",
" 186.0 | \n",
" 3800.0 | \n",
" Female | \n",
"
\n",
" \n",
" 2 | \n",
" Adelie | \n",
" Torgersen | \n",
" 40.3 | \n",
" 18.0 | \n",
" 195.0 | \n",
" 3250.0 | \n",
" Female | \n",
"
\n",
" \n",
" 3 | \n",
" Adelie | \n",
" Torgersen | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" Adelie | \n",
" Torgersen | \n",
" 36.7 | \n",
" 19.3 | \n",
" 193.0 | \n",
" 3450.0 | \n",
" Female | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" species island bill_length_mm bill_depth_mm flipper_length_mm \\\n",
"0 Adelie Torgersen 39.1 18.7 181.0 \n",
"1 Adelie Torgersen 39.5 17.4 186.0 \n",
"2 Adelie Torgersen 40.3 18.0 195.0 \n",
"3 Adelie Torgersen NaN NaN NaN \n",
"4 Adelie Torgersen 36.7 19.3 193.0 \n",
"\n",
" body_mass_g sex \n",
"0 3750.0 Male \n",
"1 3800.0 Female \n",
"2 3250.0 Female \n",
"3 NaN NaN \n",
"4 3450.0 Female "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import seaborn as sns\n",
"\n",
"# Example DataFrame:\n",
"# df stands for dataframe. df_penguin is a dataframe of penguin data\n",
"df_penguin = sns.load_dataset('penguins')\n",
"df_penguin.head()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 39.1\n",
"1 39.5\n",
"2 40.3\n",
"3 NaN\n",
"4 36.7\n",
" ... \n",
"339 NaN\n",
"340 46.8\n",
"341 50.4\n",
"342 45.2\n",
"343 49.9\n",
"Name: bill_length_mm, Length: 344, dtype: float64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# example series: the \"s_\" is a (personal) convention for variables which are series\n",
"s_bill_length_mm = df_penguin['bill_length_mm']\n",
"\n",
"s_bill_length_mm\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"\n",
"### `pd.DataFrame` are two-dimensional, `pd.Series` are one-dimensional\n",
"\n",
"### If we already have `np.array()`, why do we need pandas?\n",
"- pandas supports non numeric data (strings for categorical data, for example)\n",
"- pandas supports reading / storing data from more formats\n",
" - csv (spreadsheets)\n",
"- pandas more elegantly deals with missing data\n",
"- pandas handles indexing woes\n",
"\n",
"You could do almost everything pandas does with numpy arrays ... but it'd be much more difficult to accomplish.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Pandas Series\n",
"\n",
"### building:\n",
"- building: default index\n",
"- building: custom index\n",
"- building: from a dict\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" species | \n",
" island | \n",
" bill_length_mm | \n",
" bill_depth_mm | \n",
" flipper_length_mm | \n",
" body_mass_g | \n",
" sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.1 | \n",
" 18.7 | \n",
" 181.0 | \n",
" 3750.0 | \n",
" Male | \n",
"
\n",
" \n",
" 1 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.5 | \n",
" 17.4 | \n",
" 186.0 | \n",
" 3800.0 | \n",
" Female | \n",
"
\n",
" \n",
" 2 | \n",
" Adelie | \n",
" Torgersen | \n",
" 40.3 | \n",
" 18.0 | \n",
" 195.0 | \n",
" 3250.0 | \n",
" Female | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" species island bill_length_mm bill_depth_mm flipper_length_mm \\\n",
"0 Adelie Torgersen 39.1 18.7 181.0 \n",
"1 Adelie Torgersen 39.5 17.4 186.0 \n",
"2 Adelie Torgersen 40.3 18.0 195.0 \n",
"\n",
" body_mass_g sex \n",
"0 3750.0 Male \n",
"1 3800.0 Female \n",
"2 3250.0 Female "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# look at first 3 rows of dataframe (for reference)\n",
"df_penguin.head(3)\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"species Adelie\n",
"island Torgersen\n",
"bill_length_mm 39.1\n",
"bill_depth_mm 18.7\n",
"flipper_length_mm 181.0\n",
"body_mass_g 3750.0\n",
"sex Male\n",
"Name: 0, dtype: object"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# each row, or column of dataframe is a series object\n",
"# below is first row of dataframe (more on iloc indexing later...)\n",
"# (remember: each row is a sample -> this is 1 penguin's data)\n",
"penguin0_series = df_penguin.iloc[0, :]\n",
"penguin0_series\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas series contain a sequence of labelled data elements:\n",
"- penguin0's `species` is `Adelie`\n",
"- penguin0's `island` is `Torgersen`\n",
"- penguin0's `bill_length_mm` is `39.1` ...\n",
"- penguin0's `` is ``\n",
"\n",
"A series is quite similar to a dictionary ...\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"penguin0_dict = {'species': 'Adelie',\n",
" 'sex': 'Male',\n",
" 'island': 'Torgersen',\n",
" 'bill_length_mm': 39.1,\n",
" 'bill_depth_mm': 18.7,\n",
" 'flipper_length_mm': 181.0,\n",
" 'body_mass_g': 3750.0}\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"species Adelie\n",
"sex Male\n",
"island Torgersen\n",
"bill_length_mm 39.1\n",
"bill_depth_mm 18.7\n",
"flipper_length_mm 181.0\n",
"body_mass_g 3750.0\n",
"dtype: object"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"# build a series from dict\n",
"penguin0_series = pd.Series(penguin0_dict)\n",
"penguin0_series\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"species Torgersen\n",
"island 39.1\n",
"bill_length_mm 18.7\n",
"bill_depth_mm 181.0\n",
"flipper_length_mm 3750.0\n",
"body_mass_g Male\n",
"sex Adelie\n",
"dtype: object"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# you can also pass two corresponding lists / tuples\n",
"index = ['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']\n",
"values = ['Adelie', 'Torgersen', 39.1, 18.7, 181.0, 3750.0, 'Male']\n",
"\n",
"penguin0_series = pd.Series(values, index=index)\n",
"penguin0_series"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 vanilla\n",
"1 chocolate\n",
"2 cherry garcia\n",
"3 oatmeal\n",
"dtype: object"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# sometimes your data has no meaningful index\n",
"# pandas will default to indexing things with integers\n",
"ice_cream_flavors = 'vanilla', 'chocolate', 'cherry garcia', 'oatmeal'\n",
"pd.Series(ice_cream_flavors)\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Adelie', 'Male', 'Torgersen', 39.1, 18.7, 181.0, 3750.0],\n",
" dtype=object)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# you can access values as an array via .values\n",
"penguin0_series.values\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['species', 'sex', 'island', 'bill_length_mm', 'bill_depth_mm',\n",
" 'flipper_length_mm', 'body_mass_g'],\n",
" dtype='object')"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# you can access index (as a special pandas \"index\" object) via .index\n",
"penguin0_series.index\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### indexing into `pd.Series`: accessing / changing data\n",
"- accessing / setting using index:\n",
" - by name: `series.loc[name]`\n",
" - by position: `series.iloc[idx]`\n",
"- iterating: keys, items, iteritems (much like dict)\n",
"- deleting an entry\n"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matt 6\n",
"riva 7\n",
"eli 11\n",
"zeke 101\n",
"sal 101\n",
"dtype: int64"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict_fav_num = {'matt': 6, 'riva': 7, 'eli': 11, 'zeke': 101, 'sal': 101}\n",
"series_fav_num = pd.Series(dict_fav_num)\n",
"series_fav_num\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"11"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# lookup by position: get value in position 2 (third)\n",
"series_fav_num.iloc[2]\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# lookup by index (name): get value associated with index='matt'\n",
"series_fav_num.loc['matt']"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# you can also address directly into the series object to lookup by index\n",
"# (my mild preference nobody follows: avoid this ... a bit more ambiguous)\n",
"series_fav_num['matt']"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matt 6\n",
"riva 7\n",
"eli 1000\n",
"zeke 101\n",
"dtype: int64"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# each of these access methods can also set the value\n",
"series_fav_num.iloc[2] = 1000\n",
"series_fav_num"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# check membership of item in index\n",
"'matt' in series_fav_num.index"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'bob' in series_fav_num.index"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"1000 in series_fav_num.values\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterating through elements of a `pd.Series`\n",
"\n",
"... pretty much the same as a dictionary except pandas uses an \"index\" while a dictionary has \"keys\".\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"matt\n",
"riva\n",
"eli\n",
"zeke\n"
]
}
],
"source": [
"# iterating through index (note: no parenthases around .index below)\n",
"for idx in series_fav_num.index:\n",
" print(idx)\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"6\n",
"7\n",
"1000\n",
"101\n"
]
}
],
"source": [
"# iterating through values (notice: no parenthases on .values belwo)\n",
"for val in series_fav_num.values:\n",
" print(val)\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"matt 6\n",
"riva 7\n",
"eli 1000\n",
"zeke 101\n"
]
}
],
"source": [
"# iterating through index, value pairs (just like dict!)\n",
"for key, val in series_fav_num.items():\n",
" print(key, val)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Removing an element\n"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"# removing a pair by its corresponding index (just like dict!)\n",
"del series_fav_num['matt']\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"riva 7\n",
"eli 1000\n",
"zeke 101\n",
"dtype: int64"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"series_fav_num\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Examining a `pd.Series`\n",
"\n",
"Just like numpy arrays:\n",
"- `Series.argmin()`\n",
" - which index has smallest value\n",
" - pandas gives the row number, not the index\n",
"- `Series.argmax()`\n",
" - which index has largest value\n",
" - pandas gives the row number, not the index\n",
"- `Series.mean()`\n",
"- `Series.min()`\n",
"- `Series.max()`\n",
"- `Series.std()`\n",
"- `Series.var()`\n",
"\n",
"But wait, there's more! These are in pandas objects but not numpy array\n",
"- `Series.count()`\n",
" - number of item pairs in series\n",
"- `Series.value_counts()`\n",
" - count of every unique value in series (like a histogram)\n",
" - (see example below please)\n",
"- `Series.describe()`\n",
" - summary statistics\n"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matt 6\n",
"riva 7\n",
"eli 11\n",
"zeke 101\n",
"sally 101\n",
"dtype: int64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict_fav_num = {'matt': 6, 'riva': 7, 'eli': 11, 'zeke': 101, 'sally': 101}\n",
"series_fav_num = pd.Series(dict_fav_num)\n",
"series_fav_num\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"### Our old friends from numpy\n"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matt 6\n",
"riva 7\n",
"eli 11\n",
"zeke 101\n",
"sally 101\n",
"dtype: int64"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# for reference\n",
"series_fav_num\n"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(6, 101, 50.97254162782154, 2598.2)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# our familiar friends ...\n",
"series_fav_num.min(), series_fav_num.max(), series_fav_num.std(), series_fav_num.var()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# notice: pandas gives the position of the row with smallest value\n",
"# (one might think they'd get index 'matt' here instead)\n",
"series_fav_num.argmin()"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'matt'"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# index 0 (first entry) has the lowest favorite number\n",
"idx_min = series_fav_num.argmin()\n",
"series_fav_num.index[idx_min]\n"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# index 3 (last entry) has the highest favorite number\n",
"series_fav_num.argmax()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### New functionality, only in pandas\n"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matt 6\n",
"riva 7\n",
"eli 11\n",
"zeke 101\n",
"sally 101\n",
"dtype: int64"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"series_fav_num"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(5,)"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"series_fav_num.shape"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# number of entries (rows)\n",
"series_fav_num.count()"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"101 2\n",
"6 1\n",
"7 1\n",
"11 1\n",
"dtype: int64"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# how many times did each of the favorite numbers occur?\n",
"# (101 occurs twice in series_fav_num, while all other values occur once)\n",
"series_fav_num.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Adelie 152\n",
"Gentoo 124\n",
"Chinstrap 68\n",
"Name: species, dtype: int64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_penguin['species'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Biscoe 168\n",
"Dream 124\n",
"Torgersen 52\n",
"Name: island, dtype: int64"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_penguin['island'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"count 5.000000\n",
"mean 45.200000\n",
"std 50.972542\n",
"min 6.000000\n",
"25% 7.000000\n",
"50% 11.000000\n",
"75% 101.000000\n",
"max 101.000000\n",
"dtype: float64"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# describe is useful to get a sense of how values are distributed\n",
"# \"50%\" is equivilent to the median\n",
"# \"25%\"\" indicates that 25% of data is less than this value (and 75% is greater)\n",
"series_fav_num.describe()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Extracting a `pd.DataFrame` column as a series\n",
"\n",
"A dataframe is a two dimensional table of data. Each row or column is a series object.\n"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" survived | \n",
" pclass | \n",
" sex | \n",
" age | \n",
" sibsp | \n",
" parch | \n",
" fare | \n",
" embarked | \n",
" class | \n",
" who | \n",
" adult_male | \n",
" deck | \n",
" embark_town | \n",
" alive | \n",
" alone | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" 7.2500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" False | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" 71.2833 | \n",
" C | \n",
" First | \n",
" woman | \n",
" False | \n",
" C | \n",
" Cherbourg | \n",
" yes | \n",
" False | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" 7.9250 | \n",
" S | \n",
" Third | \n",
" woman | \n",
" False | \n",
" NaN | \n",
" Southampton | \n",
" yes | \n",
" True | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 53.1000 | \n",
" S | \n",
" First | \n",
" woman | \n",
" False | \n",
" C | \n",
" Southampton | \n",
" yes | \n",
" False | \n",
"
\n",
" \n",
" 4 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 8.0500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" True | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" survived pclass sex age sibsp parch fare embarked class \\\n",
"0 0 3 male 22.0 1 0 7.2500 S Third \n",
"1 1 1 female 38.0 1 0 71.2833 C First \n",
"2 1 3 female 26.0 0 0 7.9250 S Third \n",
"3 1 1 female 35.0 1 0 53.1000 S First \n",
"4 0 3 male 35.0 0 0 8.0500 S Third \n",
"\n",
" who adult_male deck embark_town alive alone \n",
"0 man True NaN Southampton no False \n",
"1 woman False C Cherbourg yes False \n",
"2 woman False NaN Southampton yes True \n",
"3 woman False C Southampton yes False \n",
"4 man True NaN Southampton no True "
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import seaborn as sns\n",
"\n",
"# may take a 15 sec on first run to download titanic data\n",
"df_titanic = sns.load_dataset('titanic')\n",
"df_titanic.head()"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 0\n",
"1 1\n",
"2 1\n",
"3 1\n",
"4 0\n",
" ..\n",
"886 0\n",
"887 1\n",
"888 0\n",
"889 1\n",
"890 0\n",
"Name: survived, Length: 891, dtype: int64"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get the age column of dataframe as a series\n",
"df_titanic['survived']"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## In Class Activity A\n",
"\n",
"- `.describe()` how much people paid to get aboard the titanic. \n",
"- count how many passengers of each age were on board\n",
"- each passenger corresponds to a row, what is the index of the passenger who paid the highest price?\n",
"- change the price paid of the passenger in row index 2 (the 3rd row) to `123`\n",
" - notice: does anything funny happen here? If so ... investigate\n"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" survived | \n",
" pclass | \n",
" sex | \n",
" age | \n",
" sibsp | \n",
" parch | \n",
" fare | \n",
" embarked | \n",
" class | \n",
" who | \n",
" adult_male | \n",
" deck | \n",
" embark_town | \n",
" alive | \n",
" alone | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" 7.2500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" False | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" 71.2833 | \n",
" C | \n",
" First | \n",
" woman | \n",
" False | \n",
" C | \n",
" Cherbourg | \n",
" yes | \n",
" False | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" 7.9250 | \n",
" S | \n",
" Third | \n",
" woman | \n",
" False | \n",
" NaN | \n",
" Southampton | \n",
" yes | \n",
" True | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 53.1000 | \n",
" S | \n",
" First | \n",
" woman | \n",
" False | \n",
" C | \n",
" Southampton | \n",
" yes | \n",
" False | \n",
"
\n",
" \n",
" 4 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 8.0500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" True | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" survived pclass sex age sibsp parch fare embarked class \\\n",
"0 0 3 male 22.0 1 0 7.2500 S Third \n",
"1 1 1 female 38.0 1 0 71.2833 C First \n",
"2 1 3 female 26.0 0 0 7.9250 S Third \n",
"3 1 1 female 35.0 1 0 53.1000 S First \n",
"4 0 3 male 35.0 0 0 8.0500 S Third \n",
"\n",
" who adult_male deck embark_town alive alone \n",
"0 man True NaN Southampton no False \n",
"1 woman False C Cherbourg yes False \n",
"2 woman False NaN Southampton yes True \n",
"3 woman False C Southampton yes False \n",
"4 man True NaN Southampton no True "
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_titanic.head()\n"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 891.000000\n",
"mean 32.204208\n",
"std 49.693429\n",
"min 0.000000\n",
"25% 7.910400\n",
"50% 14.454200\n",
"75% 31.000000\n",
"max 512.329200\n",
"Name: fare, dtype: float64"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# .describe() how much people paid to get aboard the titanic.\n",
"df_titanic['fare'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"24.00 30\n",
"22.00 27\n",
"18.00 26\n",
"19.00 25\n",
"28.00 25\n",
" ..\n",
"36.50 1\n",
"55.50 1\n",
"0.92 1\n",
"23.50 1\n",
"74.00 1\n",
"Name: age, Length: 88, dtype: int64"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# count how many passengers of each age were on board\n",
"df_titanic['age'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"258"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# each passenger corresponds to a row, what is the index of the passenger who paid the highest price?\n",
"df_titanic['fare'].argmax()"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_5765/1820496597.py:4: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" s_fare.iloc[2] = 12345\n"
]
},
{
"data": {
"text/plain": [
"0 7.2500\n",
"1 71.2833\n",
"2 12345.0000\n",
"3 53.1000\n",
"4 8.0500\n",
" ... \n",
"886 13.0000\n",
"887 30.0000\n",
"888 23.4500\n",
"889 30.0000\n",
"890 7.7500\n",
"Name: fare, Length: 891, dtype: float64"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# change the price paid of the passenger in row index 2 (the 3rd row) to 123\n",
"# notice: does anything funny happen here? If so ... investigate\n",
"s_fare = df_titanic['fare']\n",
"s_fare.iloc[2] = 12345\n",
"s_fare"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" survived | \n",
" pclass | \n",
" sex | \n",
" age | \n",
" sibsp | \n",
" parch | \n",
" fare | \n",
" embarked | \n",
" class | \n",
" who | \n",
" adult_male | \n",
" deck | \n",
" embark_town | \n",
" alive | \n",
" alone | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" 7.2500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" False | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" 71.2833 | \n",
" C | \n",
" First | \n",
" woman | \n",
" False | \n",
" C | \n",
" Cherbourg | \n",
" yes | \n",
" False | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" 12345.0000 | \n",
" S | \n",
" Third | \n",
" woman | \n",
" False | \n",
" NaN | \n",
" Southampton | \n",
" yes | \n",
" True | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 53.1000 | \n",
" S | \n",
" First | \n",
" woman | \n",
" False | \n",
" C | \n",
" Southampton | \n",
" yes | \n",
" False | \n",
"
\n",
" \n",
" 4 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 8.0500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" True | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" survived pclass sex age sibsp parch fare embarked class \\\n",
"0 0 3 male 22.0 1 0 7.2500 S Third \n",
"1 1 1 female 38.0 1 0 71.2833 C First \n",
"2 1 3 female 26.0 0 0 12345.0000 S Third \n",
"3 1 1 female 35.0 1 0 53.1000 S First \n",
"4 0 3 male 35.0 0 0 8.0500 S Third \n",
"\n",
" who adult_male deck embark_town alive alone \n",
"0 man True NaN Southampton no False \n",
"1 woman False C Cherbourg yes False \n",
"2 woman False NaN Southampton yes True \n",
"3 woman False C Southampton yes False \n",
"4 man True NaN Southampton no True "
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# notice anything different since we modified the series directly above? (... and why?)\n",
"df_titanic.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Pandas: DataFrame\n",
"\n",
"Remember:\n",
"- `Series`: 1d data object\n",
"- `DataFrame`: 2d data object\n",
"\n",
"`DataFrame`s represent two-dimensional data, like the quiz scores from last class:\n",
"\n",
"| | Quiz 0 | Quiz 1 | Quiz 2 |\n",
"|-----------|--------|--------|--------|\n",
"| Student 0 | 80 | 90 | 50 |\n",
"| Student 1 | 87 | 92 | 80 |\n",
"\n",
"Each column or row above could be considered a `Series` object\n"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"quiz_array = np.array([[80, 90, 50],\n",
" [87, 92, 80]])\n",
"\n",
"df_quiz = pd.DataFrame(quiz_array, \n",
" columns=('quiz0', 'quiz1', 'quiz2'), \n",
" index=('student0', 'student1'))\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" asdpfiuhasdifuh | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"asdpfiuhasdifuh 87 92 80"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we construct a dataframe as a dictionary\n",
"# keys of the dictionary are columns of dataframe\n",
"# values are lists (or tuples) of the values in each column\n",
"quiz_dict = {'quiz0': [80, 87],\n",
" 'quiz1': [90, 92],\n",
" 'quiz2': [50, 80]}\n",
"pd.DataFrame(quiz_dict, index=('student0', 'asdpfiuhasdifuh'))\n"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" 1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"0 80 90 50\n",
"1 87 92 80"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# can we make dataframe without labelling rows / columns?\n",
"df_quiz = pd.DataFrame(quiz_array)\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we can just add the names in afterwards if you'd like to\n",
"df_quiz.columns = ['quiz0', 'quiz1', 'quiz2']\n",
"df_quiz.index = ('student0', 'student1')\n",
"df_quiz\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Describing a `pd.DataFrame`\n",
"\n",
"Just like numpy arrays:\n",
"- `DataFrame.argmin()`\n",
" - which index has smallest value\n",
" - pandas gives the row number, not the index\n",
"- `DataFrame.argmax()`\n",
" - which index has largest value\n",
" - pandas gives the row number, not the index\n",
"- `DataFrame.mean()`\n",
"- `DataFrame.min()`\n",
"- `DataFrame.max()`\n",
"- `DataFrame.std()`\n",
"- `DataFrame.var()`\n",
"\n",
"New to pandas:\n",
"- `DataFrame.count()`\n",
" - number of item pairs in series\n",
"- `DataFrame.describe()`\n",
" - summary statistics\n",
"- `DataFrame.value_counts()`\n",
" - count how many unique rows there are\n",
" - see falcon / dog / cat example below please\n"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"quiz0 83.5\n",
"quiz1 91.0\n",
"quiz2 65.0\n",
"dtype: float64"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# by default, each method applies operation to entire column of data\n",
"df_quiz.mean()"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"quiz0 83.5\n",
"quiz1 91.0\n",
"quiz2 65.0\n",
"dtype: float64"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we can also pass axis parameter to specify if operation should be applied to row or column\n",
"# !remember!\n",
"# axis=0 -> apply operation across all rows (returns operation per col)\n",
"# axis=1 -> apply operation across all cols (returns operation per row)\n",
"df_quiz.mean(axis=0)\n"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"student0 73.333333\n",
"student1 86.333333\n",
"dtype: float64"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# applies each operation to entire column of data (row)\n",
"df_quiz.mean(axis=1)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Take a moment to appreciate a panda:\n",
"Those labels on the pandas objects are super help in understanding the output immediately above, right?\n",
"\n",
"(The `axis=0` vs `axis=1` stuff was easy to get turned around with in numpy)\n"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 2.000000 | \n",
" 2.000000 | \n",
" 2.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 83.500000 | \n",
" 91.000000 | \n",
" 65.000000 | \n",
"
\n",
" \n",
" std | \n",
" 4.949747 | \n",
" 1.414214 | \n",
" 21.213203 | \n",
"
\n",
" \n",
" min | \n",
" 80.000000 | \n",
" 90.000000 | \n",
" 50.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" 81.750000 | \n",
" 90.500000 | \n",
" 57.500000 | \n",
"
\n",
" \n",
" 50% | \n",
" 83.500000 | \n",
" 91.000000 | \n",
" 65.000000 | \n",
"
\n",
" \n",
" 75% | \n",
" 85.250000 | \n",
" 91.500000 | \n",
" 72.500000 | \n",
"
\n",
" \n",
" max | \n",
" 87.000000 | \n",
" 92.000000 | \n",
" 80.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"count 2.000000 2.000000 2.000000\n",
"mean 83.500000 91.000000 65.000000\n",
"std 4.949747 1.414214 21.213203\n",
"min 80.000000 90.000000 50.000000\n",
"25% 81.750000 90.500000 57.500000\n",
"50% 83.500000 91.000000 65.000000\n",
"75% 85.250000 91.500000 72.500000\n",
"max 87.000000 92.000000 80.000000"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# describe only works on columns (no axis param given)\n",
"df_quiz.describe()\n"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" num_legs | \n",
" num_wings | \n",
"
\n",
" \n",
" \n",
" \n",
" falcon | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" dog | \n",
" 4 | \n",
" 0 | \n",
"
\n",
" \n",
" cat | \n",
" 4 | \n",
" 0 | \n",
"
\n",
" \n",
" ant | \n",
" 6 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" num_legs num_wings\n",
"falcon 2 2\n",
"dog 4 0\n",
"cat 4 0\n",
"ant 6 0"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# borrowing from pandas documentation for new example\n",
"df = pd.DataFrame({'num_legs': [2, 4, 4, 6],\n",
" 'num_wings': [2, 0, 0, 0]},\n",
" index=['falcon', 'dog', 'cat', 'ant'])\n",
"df\n"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"num_legs num_wings\n",
"4 0 2\n",
"2 2 1\n",
"6 0 1\n",
"dtype: int64"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# notice that value_counts() gives \n",
"df.value_counts()\n"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"species island \n",
"Gentoo Biscoe 124\n",
"Chinstrap Dream 68\n",
"Adelie Dream 56\n",
" Torgersen 52\n",
" Biscoe 44\n",
"dtype: int64"
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_penguin.loc[:, :'island'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`value_counts()` on a `pd.DataFrame` tells us how many times we observed each full row. It tells us that `df` has:\n",
"- 2 row(s) in `df` with `num_legs=4, num_wings=0` \n",
"- 1 row(s) in `df` with `num_legs=2, num_wings=2`\n",
"- 1 row(s) in `df` with `num_legs=6, num_wings=0`\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Indexing / Accessing a DataFrame\n",
"- indexing: \n",
" - `.loc[]` indexing by name of row or column\n",
" - `.iloc[]` indexing by position integer (0, 1, 2, 3, 4 ...)\n",
" & slicing & subsets\n",
"- using the slice operator `:` to get full rows or columns\n"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80\n",
"student2 50 24 21\n",
"student3 89 85 40"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quiz_dict = {'quiz0': [80, 87, 50, 89],\n",
" 'quiz1': [90, 92, 24, 85],\n",
" 'quiz2': [50, 80, 21, 40]}\n",
"df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"80"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# indexing data by \"name\"\n",
"# remember: rows first, then columns ... \n",
"# 1st entry describes which row ('student0')\n",
"# 2nd entry describes which col ('quiz0')\n",
"\n",
"df_quiz.loc['student0', 'quiz0']\n"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"50"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# index data by position\n",
"# 1st entry describes which row. 0 -> the 1st (topmost) row\n",
"# 2nd entry describes which col. 2 -> the 3rd (from the left) col\n",
"df_quiz.iloc[0, 2]\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### you can use same slicing syntaxes on both .loc and .iloc\n"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"student0 90\n",
"student1 92\n",
"student2 24\n",
"student3 85\n",
"Name: quiz1, dtype: int64"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get the column with idx 1 (second col)\n",
"df_quiz.iloc[:, 1]"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"50"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 1st row, last col\n",
"df_quiz.iloc[0, -1]\n"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"student0 80\n",
"student1 87\n",
"student2 50\n",
"student3 89\n",
"Name: quiz0, dtype: int64"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# all rows, only quiz0\n",
"df_quiz.loc[:, 'quiz0']\n"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80\n",
"student2 50 24 21\n",
"student3 89 85 40"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"quiz0 80\n",
"quiz1 90\n",
"quiz2 50\n",
"Name: student0, dtype: int64"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# slicing with named cols and rows\n",
"# you can get a range, by name of row/col\n",
"# note: this includes both start and stop columns (! unlike array / list)\n",
"df_quiz.loc['student0', 'quiz0':'quiz2' ]"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"quiz0 80\n",
"quiz1 90\n",
"Name: student0, dtype: int64"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# watch out:\n",
"# when you get ranges indexed by position: include start idx, exclude stop idx)\n",
"df_quiz.iloc[0, 0:2]"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"student0 80\n",
"student1 87\n",
"student2 50\n",
"student3 89\n",
"Name: quiz0, dtype: int64"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# if you access directly into dataframe, it will assume you're looking for a column\n",
"# (below is equivilent to df_quiz.loc[:, 'quiz0'])\n",
"# mild preference: avoid this\n",
"df_quiz['quiz0']\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### I've seen someone use `pd.DataFrame.ix` to index like above, what does that do?\n",
"\n",
"It was something of a hybrid between `.iloc` / `.loc` ... but it was weird to use.\n",
"\n",
"[Please don't use it.](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.ix.html)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Modifying a DataFrame\n",
"- updating values: single cell\n",
"- adding a new column or row\n",
" - good practice: use a `pd.Series` to add a new row / col\n"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80\n",
"student2 50 24 21\n",
"student3 89 85 40"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quiz_dict = {'quiz0': [80, 87, 50, 89],\n",
" 'quiz1': [90, 92, 24, 85],\n",
" 'quiz2': [50, 80, 21, 40]}\n",
"df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 123 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 123 50\n",
"student1 87 92 80\n",
"student2 50 24 21\n",
"student3 89 85 40"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# setting single entry in dataframe\n",
"df_quiz.loc['student0', 'quiz1'] = 123\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 123 | \n",
" 456 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 123 456\n",
"student1 87 92 80\n",
"student2 50 24 21\n",
"student3 89 85 40"
]
},
"execution_count": 110,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# setting multiple (contiguous) entries in dataframe\n",
"df_quiz.loc['student0', 'quiz1': 'quiz2'] = 123, 456\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
" overall grade | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 123 | \n",
" 456 | \n",
" a | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
" b | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
" c | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
" d | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2 overall grade\n",
"student0 80 123 456 a\n",
"student1 87 92 80 b\n",
"student2 50 24 21 c\n",
"student3 89 85 40 d"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# adding a new column (error prone handling of indexing ... which student got which grade?)\n",
"df_quiz['overall grade'] = 'a', 'b' , 'c', 'd'\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
" overall grade | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 123 | \n",
" 456 | \n",
" NaN | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
" b- | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
" NaN | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
" c | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2 overall grade\n",
"student0 80 123 456 NaN\n",
"student1 87 92 80 b-\n",
"student2 50 24 21 NaN\n",
"student3 89 85 40 c"
]
},
"execution_count": 125,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"123"
]
},
"execution_count": 126,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz.loc['student0', 'quiz1']"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"123"
]
},
"execution_count": 129,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz.iloc[0, 1]"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 123 | \n",
" 456 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 123 456\n",
"student1 87 92 80\n",
"student2 50 24 21\n",
"student3 89 85 40"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# delete a column\n",
"del df_quiz['overall grade']\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 123 | \n",
" 456 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 123 456\n",
"student1 87 92 80\n",
"student2 50 24 21\n",
"student3 89 85 40"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"student1 b-\n",
"student3 c\n",
"some student not in df AAA\n",
"dtype: object"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# adding a column (next 2 cells) robust way of handling indexing\n",
"# by explicilty labelling the index we're sure to match more explicitly\n",
"s_overgrade = pd.Series({'student1': 'b-', \n",
" 'student3': 'c',\n",
" 'some student not in df': 'AAA'})\n",
"s_overgrade\n"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
" overall grade | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 123 | \n",
" 456 | \n",
" NaN | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
" b- | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
" NaN | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
" c | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2 overall grade\n",
"student0 80 123 456 NaN\n",
"student1 87 92 80 b-\n",
"student2 50 24 21 NaN\n",
"student3 89 85 40 c"
]
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# notice how pandas helps us out in aligning our new column with proper row\n",
"df_quiz.loc[: , 'overall grade'] = s_overgrade\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
" overall grade | \n",
"
\n",
" \n",
" \n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
" b- | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
" NaN | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
" c | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2 overall grade\n",
"student1 87 92 80 b-\n",
"student2 50 24 21 NaN\n",
"student3 89 85 40 c"
]
},
"execution_count": 123,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# how to 'drop' a row (returns a dataframe with row removed)\n",
"df_quiz_short = df_quiz.drop('student0')\n",
"df_quiz_short\n"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz1 | \n",
" quiz2 | \n",
" overall grade | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 123 | \n",
" 456 | \n",
" NaN | \n",
"
\n",
" \n",
" student1 | \n",
" 92 | \n",
" 80 | \n",
" b- | \n",
"
\n",
" \n",
" student2 | \n",
" 24 | \n",
" 21 | \n",
" NaN | \n",
"
\n",
" \n",
" student3 | \n",
" 85 | \n",
" 40 | \n",
" c | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz1 quiz2 overall grade\n",
"student0 123 456 NaN\n",
"student1 92 80 b-\n",
"student2 24 21 NaN\n",
"student3 85 40 c"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# you can drop a column too by specifying `axis=1`\n",
"# (by default it uses axis=0 to drop rows)\n",
"df_quiz.drop('quiz0', axis=1)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# If slicing fails ... just pass a list\n"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80\n",
"student2 50 24 21\n",
"student3 89 85 40"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quiz_dict = {'quiz0': [80, 87, 50, 89],\n",
" 'quiz1': [90, 92, 24, 85],\n",
" 'quiz2': [50, 80, 21, 40]}\n",
"df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student2 | \n",
" 50 | \n",
" 24 | \n",
" 21 | \n",
"
\n",
" \n",
" student3 | \n",
" 89 | \n",
" 85 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student2 50 24 21\n",
"student3 89 85 40"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# how to get an arbitrary set of rows\n",
"df_quiz.loc[['student0', 'student2', 'student3'], :]\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## In Class Activity B\n",
"1. Build the following `df_grade`. Be sure to include the row and column names:\n",
"\n",
"| | StudentB | StudentA | StudentC |\n",
"|-------:|----:|------:|------:|\n",
"| Quiz 1 | 89 | 100 | 78 |\n",
"| Quiz 2 | 75 | 90 | 90 |\n",
"| Quiz 3 | 93 | 85 | 65 |\n",
"| Quiz 4 | 92 | 92 | 76 |\n",
"\n",
"1. index into this dataframe to build a `df_grade_subset`:\n",
" - only includes rows studentB and studentC\n",
" - only includes Quiz 2, Quiz 3, Quiz 4\n",
"1. Using the `df_grade_subset` from the step above:\n",
" * calculate mean scores of studentB and studentC from the selected quizes\n",
" * calculate mean score of each quiz \n",
" * (remember the `axis` parameter)\n",
" \n",
"Operating on `df_grade`:\n",
"1. Add a new column `'StudentD'` with grades `60, 70, 80, 90` for quizes 1, 2, 3, 4 respectively\n",
" * can you do this by adding a new `pd.Series` object (to be a bit more explicit)?\n",
"1. Add a new row, `quiz5`, with any grades\n",
"1. Delete StudentC's column\n"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" StudentB | \n",
" StudentA | \n",
" StudentC | \n",
"
\n",
" \n",
" \n",
" \n",
" Quiz1 | \n",
" 89 | \n",
" 100 | \n",
" 78 | \n",
"
\n",
" \n",
" Quiz2 | \n",
" 75 | \n",
" 90 | \n",
" 90 | \n",
"
\n",
" \n",
" Quiz3 | \n",
" 93 | \n",
" 85 | \n",
" 65 | \n",
"
\n",
" \n",
" Quiz4 | \n",
" 92 | \n",
" 92 | \n",
" 76 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" StudentB StudentA StudentC\n",
"Quiz1 89 100 78\n",
"Quiz2 75 90 90\n",
"Quiz3 93 85 65\n",
"Quiz4 92 92 76"
]
},
"execution_count": 133,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"student_grade_dict = {'StudentB': [89, 75, 93, 92],\n",
" 'StudentA': [100, 90, 85, 92],\n",
" 'StudentC': [78, 90, 65, 76]}\n",
"\n",
"df_quiz = pd.DataFrame(student_grade_dict, index=('Quiz1', 'Quiz2', 'Quiz3', 'Quiz4'))\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" StudentB | \n",
" StudentC | \n",
"
\n",
" \n",
" \n",
" \n",
" Quiz2 | \n",
" 75 | \n",
" 90 | \n",
"
\n",
" \n",
" Quiz3 | \n",
" 93 | \n",
" 65 | \n",
"
\n",
" \n",
" Quiz4 | \n",
" 92 | \n",
" 76 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" StudentB StudentC\n",
"Quiz2 75 90\n",
"Quiz3 93 65\n",
"Quiz4 92 76"
]
},
"execution_count": 135,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 1. index into this dataframe to build a `df_grade_subset`:\n",
"# - only includes studentB and studentC\n",
"# - only includes Quiz 2, Quiz 3, Quiz 4\n",
"df_quiz_subset = df_quiz.loc['Quiz2': , ('StudentB', 'StudentC')]\n",
"df_quiz_subset"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"StudentB 86.666667\n",
"StudentC 77.000000\n",
"dtype: float64"
]
},
"execution_count": 145,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 1. Using the `df_grade_subset` from the step above:\n",
"# * calculate mean scores of studentB and studentC from the selected quizes\n",
"# * calculate mean score of each quiz \n",
"# * (remember the `axis` parameter)\n",
"df_quiz_subset.mean(axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Quiz2 82.5\n",
"Quiz3 79.0\n",
"Quiz4 84.0\n",
"dtype: float64"
]
},
"execution_count": 137,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz_subset.mean(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" StudentB | \n",
" StudentA | \n",
"
\n",
" \n",
" \n",
" \n",
" Quiz1 | \n",
" 89.0 | \n",
" 100.0 | \n",
"
\n",
" \n",
" Quiz2 | \n",
" 75.0 | \n",
" 90.0 | \n",
"
\n",
" \n",
" Quiz3 | \n",
" 93.0 | \n",
" 85.0 | \n",
"
\n",
" \n",
" Quiz4 | \n",
" 92.0 | \n",
" 92.0 | \n",
"
\n",
" \n",
" Quiz5 | \n",
" 1.0 | \n",
" 2.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" StudentB StudentA\n",
"Quiz1 89.0 100.0\n",
"Quiz2 75.0 90.0\n",
"Quiz3 93.0 85.0\n",
"Quiz4 92.0 92.0\n",
"Quiz5 1.0 2.0"
]
},
"execution_count": 147,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"del df_quiz['StudentD']\n",
"df_quiz"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" StudentB | \n",
" StudentA | \n",
" StudentD | \n",
"
\n",
" \n",
" \n",
" \n",
" Quiz1 | \n",
" 89.0 | \n",
" 100.0 | \n",
" 60.0 | \n",
"
\n",
" \n",
" Quiz2 | \n",
" 75.0 | \n",
" 90.0 | \n",
" 70.0 | \n",
"
\n",
" \n",
" Quiz3 | \n",
" 93.0 | \n",
" 85.0 | \n",
" 80.0 | \n",
"
\n",
" \n",
" Quiz4 | \n",
" 92.0 | \n",
" 92.0 | \n",
" 90.0 | \n",
"
\n",
" \n",
" Quiz5 | \n",
" 1.0 | \n",
" 2.0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" StudentB StudentA StudentD\n",
"Quiz1 89.0 100.0 60.0\n",
"Quiz2 75.0 90.0 70.0\n",
"Quiz3 93.0 85.0 80.0\n",
"Quiz4 92.0 92.0 90.0\n",
"Quiz5 1.0 2.0 NaN"
]
},
"execution_count": 148,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Operating on `df_grade`:\n",
"# 1. Add a new column `'StudentD'` with grades `60, 70, 80, 90` for quizes 1, 2, 3, 4 respectively\n",
"# * can you do this by adding a new `pd.Series` object (to be a bit more explicit)?\n",
"s_student_d = {'Quiz1': 60, 'Quiz2': 70, 'Quiz3': 80, 'Quiz4': 90}\n",
"df_quiz.loc[:, 'StudentD'] = s_student_d\n",
"\n",
"df_quiz"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" StudentB | \n",
" StudentA | \n",
" StudentC | \n",
" StudentD | \n",
"
\n",
" \n",
" \n",
" \n",
" Quiz1 | \n",
" 89.0 | \n",
" 100.0 | \n",
" 78.0 | \n",
" 60.0 | \n",
"
\n",
" \n",
" Quiz2 | \n",
" 75.0 | \n",
" 90.0 | \n",
" 90.0 | \n",
" 70.0 | \n",
"
\n",
" \n",
" Quiz3 | \n",
" 93.0 | \n",
" 85.0 | \n",
" 65.0 | \n",
" 80.0 | \n",
"
\n",
" \n",
" Quiz4 | \n",
" 92.0 | \n",
" 92.0 | \n",
" 76.0 | \n",
" 90.0 | \n",
"
\n",
" \n",
" Quiz5 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 3.0 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" StudentB StudentA StudentC StudentD\n",
"Quiz1 89.0 100.0 78.0 60.0\n",
"Quiz2 75.0 90.0 90.0 70.0\n",
"Quiz3 93.0 85.0 65.0 80.0\n",
"Quiz4 92.0 92.0 76.0 90.0\n",
"Quiz5 1.0 2.0 3.0 4.0"
]
},
"execution_count": 141,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 1. Add a new row, `quiz5`, with any grades (implicit ... not great but quick)\n",
"df_quiz.loc['Quiz5', :] = 1, 2, 3, 4\n",
"df_quiz"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {},
"outputs": [],
"source": [
"# 1. Delete StudentC's column\n",
"del df_quiz['StudentC']"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" StudentB | \n",
" StudentA | \n",
" StudentD | \n",
"
\n",
" \n",
" \n",
" \n",
" Quiz1 | \n",
" 89.0 | \n",
" 100.0 | \n",
" 60.0 | \n",
"
\n",
" \n",
" Quiz2 | \n",
" 75.0 | \n",
" 90.0 | \n",
" 70.0 | \n",
"
\n",
" \n",
" Quiz3 | \n",
" 93.0 | \n",
" 85.0 | \n",
" 80.0 | \n",
"
\n",
" \n",
" Quiz4 | \n",
" 92.0 | \n",
" 92.0 | \n",
" 90.0 | \n",
"
\n",
" \n",
" Quiz5 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" StudentB StudentA StudentD\n",
"Quiz1 89.0 100.0 60.0\n",
"Quiz2 75.0 90.0 70.0\n",
"Quiz3 93.0 85.0 80.0\n",
"Quiz4 92.0 92.0 90.0\n",
"Quiz5 1.0 2.0 4.0"
]
},
"execution_count": 144,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Operating on DataFrame & Series Objects\n",
"\n",
"Your operators do pretty much what you'd expect them to.\n"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80"
]
},
"execution_count": 149,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quiz_dict = {'quiz0': [80, 87],\n",
" 'quiz1': [90, 92],\n",
" 'quiz2': [50, 80]}\n",
"df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1'))\n",
"df_quiz"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80000 | \n",
" 90000 | \n",
" 50000 | \n",
"
\n",
" \n",
" student1 | \n",
" 87000 | \n",
" 92000 | \n",
" 80000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80000 90000 50000\n",
"student1 87000 92000 80000"
]
},
"execution_count": 150,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz * 1000\n"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 1000000000079 | \n",
" 1000000000089 | \n",
" 1000000000049 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 1000000000079 1000000000089 1000000000049\n",
"student1 87 92 80"
]
},
"execution_count": 151,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# thats some extra credit ...\n",
"df_quiz.loc['student0', :] += 999999999999\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 1000000000079 | \n",
" 1000000000089 | \n",
" 1000000000049 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 1000000000079 1000000000089 1000000000049\n",
"student1 87 92 80"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" True | \n",
" True | \n",
" True | \n",
"
\n",
" \n",
" student1 | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 True True True\n",
"student1 False False False"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we can also use comparison operators (super helpful, see boolean indexing next)\n",
"df_quiz > 100\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Boolean Indexing into DataFrame\n",
"\n",
"Sometimes we want to grab only the rows or columns which meet a particular condition.\n",
"\n",
"\"Get all students whose grade was higher than 85 on quiz 1\"\n"
]
},
{
"cell_type": "code",
"execution_count": 152,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 60 | \n",
" 60 | \n",
" 70 | \n",
"
\n",
" \n",
" student3 | \n",
" 30 | \n",
" 23 | \n",
" 64 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80\n",
"student2 60 60 70\n",
"student3 30 23 64"
]
},
"execution_count": 152,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quiz_dict = {'quiz0': [80, 87, 60, 30],\n",
" 'quiz1': [90, 92, 60, 23],\n",
" 'quiz2': [50, 80, 70, 64]}\n",
"df_quiz = pd.DataFrame(quiz_dict, index=('student0', 'student1', 'student2', 'student3'))\n",
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 153,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"student0 90\n",
"student1 92\n",
"student2 60\n",
"student3 23\n",
"Name: quiz1, dtype: int64"
]
},
"execution_count": 153,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# quiz 1 is a series object which contains every index's quiz 1 grade\n",
"s_quiz1 = df_quiz.loc[:, 'quiz1']\n",
"s_quiz1\n"
]
},
{
"cell_type": "code",
"execution_count": 154,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"student0 True\n",
"student1 True\n",
"student2 False\n",
"student3 False\n",
"Name: quiz1, dtype: bool"
]
},
"execution_count": 154,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we create a series of booleans which is True only in the positions we're interested in\n",
"s_bool = s_quiz1 > 85\n",
"s_bool\n"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 60 | \n",
" 60 | \n",
" 70 | \n",
"
\n",
" \n",
" student3 | \n",
" 30 | \n",
" 23 | \n",
" 64 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80\n",
"student2 60 60 70\n",
"student3 30 23 64"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# boolean indexing: using a boolean series as index returns only those entries which are True\n",
"# notice that since student2 & student3's quiz1 grade wasn't > 80 they aren't included below\n",
"df_quiz.loc[s_bool, :]\n"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" species | \n",
" island | \n",
" bill_length_mm | \n",
" bill_depth_mm | \n",
" flipper_length_mm | \n",
" body_mass_g | \n",
" sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.1 | \n",
" 18.7 | \n",
" 181.0 | \n",
" 3750.0 | \n",
" Male | \n",
"
\n",
" \n",
" 5 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.3 | \n",
" 20.6 | \n",
" 190.0 | \n",
" 3650.0 | \n",
" Male | \n",
"
\n",
" \n",
" 7 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.2 | \n",
" 19.6 | \n",
" 195.0 | \n",
" 4675.0 | \n",
" Male | \n",
"
\n",
" \n",
" 13 | \n",
" Adelie | \n",
" Torgersen | \n",
" 38.6 | \n",
" 21.2 | \n",
" 191.0 | \n",
" 3800.0 | \n",
" Male | \n",
"
\n",
" \n",
" 14 | \n",
" Adelie | \n",
" Torgersen | \n",
" 34.6 | \n",
" 21.1 | \n",
" 198.0 | \n",
" 4400.0 | \n",
" Male | \n",
"
\n",
" \n",
" 17 | \n",
" Adelie | \n",
" Torgersen | \n",
" 42.5 | \n",
" 20.7 | \n",
" 197.0 | \n",
" 4500.0 | \n",
" Male | \n",
"
\n",
" \n",
" 19 | \n",
" Adelie | \n",
" Torgersen | \n",
" 46.0 | \n",
" 21.5 | \n",
" 194.0 | \n",
" 4200.0 | \n",
" Male | \n",
"
\n",
" \n",
" 69 | \n",
" Adelie | \n",
" Torgersen | \n",
" 41.8 | \n",
" 19.4 | \n",
" 198.0 | \n",
" 4450.0 | \n",
" Male | \n",
"
\n",
" \n",
" 71 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.7 | \n",
" 18.4 | \n",
" 190.0 | \n",
" 3900.0 | \n",
" Male | \n",
"
\n",
" \n",
" 73 | \n",
" Adelie | \n",
" Torgersen | \n",
" 45.8 | \n",
" 18.9 | \n",
" 197.0 | \n",
" 4150.0 | \n",
" Male | \n",
"
\n",
" \n",
" 75 | \n",
" Adelie | \n",
" Torgersen | \n",
" 42.8 | \n",
" 18.5 | \n",
" 195.0 | \n",
" 4250.0 | \n",
" Male | \n",
"
\n",
" \n",
" 77 | \n",
" Adelie | \n",
" Torgersen | \n",
" 37.2 | \n",
" 19.4 | \n",
" 184.0 | \n",
" 3900.0 | \n",
" Male | \n",
"
\n",
" \n",
" 79 | \n",
" Adelie | \n",
" Torgersen | \n",
" 42.1 | \n",
" 19.1 | \n",
" 195.0 | \n",
" 4000.0 | \n",
" Male | \n",
"
\n",
" \n",
" 81 | \n",
" Adelie | \n",
" Torgersen | \n",
" 42.9 | \n",
" 17.6 | \n",
" 196.0 | \n",
" 4700.0 | \n",
" Male | \n",
"
\n",
" \n",
" 83 | \n",
" Adelie | \n",
" Torgersen | \n",
" 35.1 | \n",
" 19.4 | \n",
" 193.0 | \n",
" 4200.0 | \n",
" Male | \n",
"
\n",
" \n",
" 117 | \n",
" Adelie | \n",
" Torgersen | \n",
" 37.3 | \n",
" 20.5 | \n",
" 199.0 | \n",
" 3775.0 | \n",
" Male | \n",
"
\n",
" \n",
" 119 | \n",
" Adelie | \n",
" Torgersen | \n",
" 41.1 | \n",
" 18.6 | \n",
" 189.0 | \n",
" 3325.0 | \n",
" Male | \n",
"
\n",
" \n",
" 121 | \n",
" Adelie | \n",
" Torgersen | \n",
" 37.7 | \n",
" 19.8 | \n",
" 198.0 | \n",
" 3500.0 | \n",
" Male | \n",
"
\n",
" \n",
" 123 | \n",
" Adelie | \n",
" Torgersen | \n",
" 41.4 | \n",
" 18.5 | \n",
" 202.0 | \n",
" 3875.0 | \n",
" Male | \n",
"
\n",
" \n",
" 125 | \n",
" Adelie | \n",
" Torgersen | \n",
" 40.6 | \n",
" 19.0 | \n",
" 199.0 | \n",
" 4000.0 | \n",
" Male | \n",
"
\n",
" \n",
" 127 | \n",
" Adelie | \n",
" Torgersen | \n",
" 41.5 | \n",
" 18.3 | \n",
" 195.0 | \n",
" 4300.0 | \n",
" Male | \n",
"
\n",
" \n",
" 129 | \n",
" Adelie | \n",
" Torgersen | \n",
" 44.1 | \n",
" 18.0 | \n",
" 210.0 | \n",
" 4000.0 | \n",
" Male | \n",
"
\n",
" \n",
" 131 | \n",
" Adelie | \n",
" Torgersen | \n",
" 43.1 | \n",
" 19.2 | \n",
" 197.0 | \n",
" 3500.0 | \n",
" Male | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" species island bill_length_mm bill_depth_mm flipper_length_mm \\\n",
"0 Adelie Torgersen 39.1 18.7 181.0 \n",
"5 Adelie Torgersen 39.3 20.6 190.0 \n",
"7 Adelie Torgersen 39.2 19.6 195.0 \n",
"13 Adelie Torgersen 38.6 21.2 191.0 \n",
"14 Adelie Torgersen 34.6 21.1 198.0 \n",
"17 Adelie Torgersen 42.5 20.7 197.0 \n",
"19 Adelie Torgersen 46.0 21.5 194.0 \n",
"69 Adelie Torgersen 41.8 19.4 198.0 \n",
"71 Adelie Torgersen 39.7 18.4 190.0 \n",
"73 Adelie Torgersen 45.8 18.9 197.0 \n",
"75 Adelie Torgersen 42.8 18.5 195.0 \n",
"77 Adelie Torgersen 37.2 19.4 184.0 \n",
"79 Adelie Torgersen 42.1 19.1 195.0 \n",
"81 Adelie Torgersen 42.9 17.6 196.0 \n",
"83 Adelie Torgersen 35.1 19.4 193.0 \n",
"117 Adelie Torgersen 37.3 20.5 199.0 \n",
"119 Adelie Torgersen 41.1 18.6 189.0 \n",
"121 Adelie Torgersen 37.7 19.8 198.0 \n",
"123 Adelie Torgersen 41.4 18.5 202.0 \n",
"125 Adelie Torgersen 40.6 19.0 199.0 \n",
"127 Adelie Torgersen 41.5 18.3 195.0 \n",
"129 Adelie Torgersen 44.1 18.0 210.0 \n",
"131 Adelie Torgersen 43.1 19.2 197.0 \n",
"\n",
" body_mass_g sex \n",
"0 3750.0 Male \n",
"5 3650.0 Male \n",
"7 4675.0 Male \n",
"13 3800.0 Male \n",
"14 4400.0 Male \n",
"17 4500.0 Male \n",
"19 4200.0 Male \n",
"69 4450.0 Male \n",
"71 3900.0 Male \n",
"73 4150.0 Male \n",
"75 4250.0 Male \n",
"77 3900.0 Male \n",
"79 4000.0 Male \n",
"81 4700.0 Male \n",
"83 4200.0 Male \n",
"117 3775.0 Male \n",
"119 3325.0 Male \n",
"121 3500.0 Male \n",
"123 3875.0 Male \n",
"125 4000.0 Male \n",
"127 4300.0 Male \n",
"129 4000.0 Male \n",
"131 3500.0 Male "
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s_bool = (df_penguin['island'] == 'Torgersen') & (df_penguin['sex'] == 'Male')\n",
"df_penguin.loc[s_bool, :]"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student0 | \n",
" 80 | \n",
" 90 | \n",
" 50 | \n",
"
\n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
" student2 | \n",
" 60 | \n",
" 60 | \n",
" 70 | \n",
"
\n",
" \n",
" student3 | \n",
" 30 | \n",
" 23 | \n",
" 64 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student0 80 90 50\n",
"student1 87 92 80\n",
"student2 60 60 70\n",
"student3 30 23 64"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz\n"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"student0 False\n",
"student1 False\n",
"student2 True\n",
"student3 True\n",
"Name: quiz1, dtype: bool"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# what are all the students who get below a 70 on quiz1?\n",
"s_bool = df_quiz.loc[:, 'quiz1'] < 70\n",
"s_bool\n"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student2 | \n",
" 60 | \n",
" 60 | \n",
" 70 | \n",
"
\n",
" \n",
" student3 | \n",
" 30 | \n",
" 23 | \n",
" 64 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student2 60 60 70\n",
"student3 30 23 64"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz.loc[s_bool, :]\n"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"student0 False\n",
"student1 True\n",
"student2 False\n",
"student3 False\n",
"dtype: bool"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we can build more complex conditions using \n",
"# & (and operator)\n",
"# | (or operator)\n",
"\n",
"# all students who got higher than 91 on quiz1 but didn't score higher than 90 on quiz2\n",
"s_bool = (df_quiz.loc[:, 'quiz1'] > 91) & (df_quiz.loc[:, 'quiz2'] <= 90)\n",
"s_bool\n"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quiz0 | \n",
" quiz1 | \n",
" quiz2 | \n",
"
\n",
" \n",
" \n",
" \n",
" student1 | \n",
" 87 | \n",
" 92 | \n",
" 80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quiz0 quiz1 quiz2\n",
"student1 87 92 80"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_quiz.loc[s_bool, :]\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# One more thing, whats `pd.DataFrame.head()`?\n",
"\n",
"It grabs the \"head\" (the first few rows) of a dataframe. DataFrames can be so big that its overwhelming to look at the whole thing, sometimes a few rows is all thats needed.\n"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" species | \n",
" island | \n",
" bill_length_mm | \n",
" bill_depth_mm | \n",
" flipper_length_mm | \n",
" body_mass_g | \n",
" sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.1 | \n",
" 18.7 | \n",
" 181.0 | \n",
" 3750.0 | \n",
" Male | \n",
"
\n",
" \n",
" 1 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.5 | \n",
" 17.4 | \n",
" 186.0 | \n",
" 3800.0 | \n",
" Female | \n",
"
\n",
" \n",
" 2 | \n",
" Adelie | \n",
" Torgersen | \n",
" 40.3 | \n",
" 18.0 | \n",
" 195.0 | \n",
" 3250.0 | \n",
" Female | \n",
"
\n",
" \n",
" 3 | \n",
" Adelie | \n",
" Torgersen | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" Adelie | \n",
" Torgersen | \n",
" 36.7 | \n",
" 19.3 | \n",
" 193.0 | \n",
" 3450.0 | \n",
" Female | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" species island bill_length_mm bill_depth_mm flipper_length_mm \\\n",
"0 Adelie Torgersen 39.1 18.7 181.0 \n",
"1 Adelie Torgersen 39.5 17.4 186.0 \n",
"2 Adelie Torgersen 40.3 18.0 195.0 \n",
"3 Adelie Torgersen NaN NaN NaN \n",
"4 Adelie Torgersen 36.7 19.3 193.0 \n",
"\n",
" body_mass_g sex \n",
"0 3750.0 Male \n",
"1 3800.0 Female \n",
"2 3250.0 Female \n",
"3 NaN NaN \n",
"4 3450.0 Female "
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_penguin = sns.load_dataset('penguins')\n",
"df_penguin.head()\n"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" species | \n",
" island | \n",
" bill_length_mm | \n",
" bill_depth_mm | \n",
" flipper_length_mm | \n",
" body_mass_g | \n",
" sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.1 | \n",
" 18.7 | \n",
" 181.0 | \n",
" 3750.0 | \n",
" Male | \n",
"
\n",
" \n",
" 1 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.5 | \n",
" 17.4 | \n",
" 186.0 | \n",
" 3800.0 | \n",
" Female | \n",
"
\n",
" \n",
" 2 | \n",
" Adelie | \n",
" Torgersen | \n",
" 40.3 | \n",
" 18.0 | \n",
" 195.0 | \n",
" 3250.0 | \n",
" Female | \n",
"
\n",
" \n",
" 3 | \n",
" Adelie | \n",
" Torgersen | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" Adelie | \n",
" Torgersen | \n",
" 36.7 | \n",
" 19.3 | \n",
" 193.0 | \n",
" 3450.0 | \n",
" Female | \n",
"
\n",
" \n",
" 5 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.3 | \n",
" 20.6 | \n",
" 190.0 | \n",
" 3650.0 | \n",
" Male | \n",
"
\n",
" \n",
" 6 | \n",
" Adelie | \n",
" Torgersen | \n",
" 38.9 | \n",
" 17.8 | \n",
" 181.0 | \n",
" 3625.0 | \n",
" Female | \n",
"
\n",
" \n",
" 7 | \n",
" Adelie | \n",
" Torgersen | \n",
" 39.2 | \n",
" 19.6 | \n",
" 195.0 | \n",
" 4675.0 | \n",
" Male | \n",
"
\n",
" \n",
" 8 | \n",
" Adelie | \n",
" Torgersen | \n",
" 34.1 | \n",
" 18.1 | \n",
" 193.0 | \n",
" 3475.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 9 | \n",
" Adelie | \n",
" Torgersen | \n",
" 42.0 | \n",
" 20.2 | \n",
" 190.0 | \n",
" 4250.0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" species island bill_length_mm bill_depth_mm flipper_length_mm \\\n",
"0 Adelie Torgersen 39.1 18.7 181.0 \n",
"1 Adelie Torgersen 39.5 17.4 186.0 \n",
"2 Adelie Torgersen 40.3 18.0 195.0 \n",
"3 Adelie Torgersen NaN NaN NaN \n",
"4 Adelie Torgersen 36.7 19.3 193.0 \n",
"5 Adelie Torgersen 39.3 20.6 190.0 \n",
"6 Adelie Torgersen 38.9 17.8 181.0 \n",
"7 Adelie Torgersen 39.2 19.6 195.0 \n",
"8 Adelie Torgersen 34.1 18.1 193.0 \n",
"9 Adelie Torgersen 42.0 20.2 190.0 \n",
"\n",
" body_mass_g sex \n",
"0 3750.0 Male \n",
"1 3800.0 Female \n",
"2 3250.0 Female \n",
"3 NaN NaN \n",
"4 3450.0 Female \n",
"5 3650.0 Male \n",
"6 3625.0 Female \n",
"7 4675.0 Male \n",
"8 3475.0 NaN \n",
"9 4250.0 NaN "
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DataFrame.head() takes an argument, the number of top rows to return\n",
"df_penguin.head(10)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## In Class Activity C\n",
"The `pclass` of a titanic ticket describes the passenger class. Its unclear if larger or smaller `pclass` are the fancy tickets. See if you can answer this question by:\n",
"\n",
"- `.describe()` the `fare` paid by passengers who bought `pclass=3` tickets\n",
"- `.describe()` the `fare` paid by passengers who bought `pclass=2` tickets\n",
"- `.describe()` the `fare` paid by passengers who bought `pclass=1` tickets\n",
"\n",
"(++) You can use this boolean indexing to compare groups to answer all sorts of interesting questions:\n",
"- Survival Effectiveness: Were people who travelled alone more or less likely to survive the titanic?\n",
"- Demographics of towns: Which town, among Cherbourg, Queenstown or Southampton, seems to have the most families?\n",
"- Layout of the boat: Does having a higher or lower cabin number suggest one is more likely to have a higher or lower ticket class?\n",
" - e.g. when `pclass=1` maybe these cabin numbers are all very large or small ...\n",
"\n",
"Data dictionary ([not the primary source, but a source](https://jkarakas.github.io/Exploratory-Analysis-of-the-Titanic-Dataset/Titanic_Dataset_Exploratory_Analysis_No_Code.html))\n",
"\n",
"| Variable | Definition | Key |\n",
"|----------|--------------------------------------------|-----------------------------------------------|\n",
"| Survived | Survival | 0 = No, 1 = Yes |\n",
"| Pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd |\n",
"| Sex | Sex | |\n",
"| Age | Age in years | |\n",
"| Sibsp | # of siblings / spouses aboard the Titanic | |\n",
"| Parch | # of parents / children aboard the Titanic | |\n",
"| Ticket | Ticket number | |\n",
"| Fare | Passenger fare | |\n",
"| Cabin | Cabin number | |\n",
"| Embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown,S = Southampton |\n"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" survived | \n",
" pclass | \n",
" sex | \n",
" age | \n",
" sibsp | \n",
" parch | \n",
" fare | \n",
" embarked | \n",
" class | \n",
" who | \n",
" adult_male | \n",
" deck | \n",
" embark_town | \n",
" alive | \n",
" alone | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" 7.2500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" False | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" 71.2833 | \n",
" C | \n",
" First | \n",
" woman | \n",
" False | \n",
" C | \n",
" Cherbourg | \n",
" yes | \n",
" False | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" 7.9250 | \n",
" S | \n",
" Third | \n",
" woman | \n",
" False | \n",
" NaN | \n",
" Southampton | \n",
" yes | \n",
" True | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 53.1000 | \n",
" S | \n",
" First | \n",
" woman | \n",
" False | \n",
" C | \n",
" Southampton | \n",
" yes | \n",
" False | \n",
"
\n",
" \n",
" 4 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 8.0500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" True | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" survived pclass sex age sibsp parch fare embarked class \\\n",
"0 0 3 male 22.0 1 0 7.2500 S Third \n",
"1 1 1 female 38.0 1 0 71.2833 C First \n",
"2 1 3 female 26.0 0 0 7.9250 S Third \n",
"3 1 1 female 35.0 1 0 53.1000 S First \n",
"4 0 3 male 35.0 0 0 8.0500 S Third \n",
"\n",
" who adult_male deck embark_town alive alone \n",
"0 man True NaN Southampton no False \n",
"1 woman False C Cherbourg yes False \n",
"2 woman False NaN Southampton yes True \n",
"3 woman False C Southampton yes False \n",
"4 man True NaN Southampton no True "
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_titanic = sns.load_dataset('titanic')\n",
"df_titanic.head()\n"
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" survived | \n",
" pclass | \n",
" age | \n",
" sibsp | \n",
" parch | \n",
" fare | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 491.000000 | \n",
" 491.0 | \n",
" 355.000000 | \n",
" 491.000000 | \n",
" 491.000000 | \n",
" 491.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 0.242363 | \n",
" 3.0 | \n",
" 25.140620 | \n",
" 0.615071 | \n",
" 0.393075 | \n",
" 38.801976 | \n",
"
\n",
" \n",
" std | \n",
" 0.428949 | \n",
" 0.0 | \n",
" 12.495398 | \n",
" 1.374883 | \n",
" 0.888861 | \n",
" 556.628917 | \n",
"
\n",
" \n",
" min | \n",
" 0.000000 | \n",
" 3.0 | \n",
" 0.420000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" 0.000000 | \n",
" 3.0 | \n",
" 18.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 7.750000 | \n",
"
\n",
" \n",
" 50% | \n",
" 0.000000 | \n",
" 3.0 | \n",
" 24.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 8.050000 | \n",
"
\n",
" \n",
" 75% | \n",
" 0.000000 | \n",
" 3.0 | \n",
" 32.000000 | \n",
" 1.000000 | \n",
" 0.000000 | \n",
" 15.500000 | \n",
"
\n",
" \n",
" max | \n",
" 1.000000 | \n",
" 3.0 | \n",
" 74.000000 | \n",
" 8.000000 | \n",
" 6.000000 | \n",
" 12345.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" survived pclass age sibsp parch fare\n",
"count 491.000000 491.0 355.000000 491.000000 491.000000 491.000000\n",
"mean 0.242363 3.0 25.140620 0.615071 0.393075 38.801976\n",
"std 0.428949 0.0 12.495398 1.374883 0.888861 556.628917\n",
"min 0.000000 3.0 0.420000 0.000000 0.000000 0.000000\n",
"25% 0.000000 3.0 18.000000 0.000000 0.000000 7.750000\n",
"50% 0.000000 3.0 24.000000 0.000000 0.000000 8.050000\n",
"75% 0.000000 3.0 32.000000 1.000000 0.000000 15.500000\n",
"max 1.000000 3.0 74.000000 8.000000 6.000000 12345.000000"
]
},
"execution_count": 180,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s_bool = df_titanic.loc[:, 'pclass'] == 3\n",
"df_titanic.loc[s_bool, :].describe()"
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" survived | \n",
" pclass | \n",
" sex | \n",
" age | \n",
" sibsp | \n",
" parch | \n",
" fare | \n",
" embarked | \n",
" class | \n",
" who | \n",
" adult_male | \n",
" deck | \n",
" embark_town | \n",
" alive | \n",
" alone | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" 7.2500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" False | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" 12345.0000 | \n",
" S | \n",
" Third | \n",
" woman | \n",
" False | \n",
" NaN | \n",
" Southampton | \n",
" yes | \n",
" True | \n",
"
\n",
" \n",
" 4 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 8.0500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" True | \n",
"
\n",
" \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 8.4583 | \n",
" Q | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Queenstown | \n",
" no | \n",
" True | \n",
"
\n",
" \n",
" 7 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 2.0 | \n",
" 3 | \n",
" 1 | \n",
" 21.0750 | \n",
" S | \n",
" Third | \n",
" child | \n",
" False | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" False | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 882 | \n",
" 0 | \n",
" 3 | \n",
" female | \n",
" 22.0 | \n",
" 0 | \n",
" 0 | \n",
" 10.5167 | \n",
" S | \n",
" Third | \n",
" woman | \n",
" False | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" True | \n",
"
\n",
" \n",
" 884 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 25.0 | \n",
" 0 | \n",
" 0 | \n",
" 7.0500 | \n",
" S | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" True | \n",
"
\n",
" \n",
" 885 | \n",
" 0 | \n",
" 3 | \n",
" female | \n",
" 39.0 | \n",
" 0 | \n",
" 5 | \n",
" 29.1250 | \n",
" Q | \n",
" Third | \n",
" woman | \n",
" False | \n",
" NaN | \n",
" Queenstown | \n",
" no | \n",
" False | \n",
"
\n",
" \n",
" 888 | \n",
" 0 | \n",
" 3 | \n",
" female | \n",
" NaN | \n",
" 1 | \n",
" 2 | \n",
" 23.4500 | \n",
" S | \n",
" Third | \n",
" woman | \n",
" False | \n",
" NaN | \n",
" Southampton | \n",
" no | \n",
" False | \n",
"
\n",
" \n",
" 890 | \n",
" 0 | \n",
" 3 | \n",
" male | \n",
" 32.0 | \n",
" 0 | \n",
" 0 | \n",
" 7.7500 | \n",
" Q | \n",
" Third | \n",
" man | \n",
" True | \n",
" NaN | \n",
" Queenstown | \n",
" no | \n",
" True | \n",
"
\n",
" \n",
"
\n",
"
491 rows × 15 columns
\n",
"
"
],
"text/plain": [
" survived pclass sex age sibsp parch fare embarked class \\\n",
"0 0 3 male 22.0 1 0 7.2500 S Third \n",
"2 1 3 female 26.0 0 0 12345.0000 S Third \n",
"4 0 3 male 35.0 0 0 8.0500 S Third \n",
"5 0 3 male NaN 0 0 8.4583 Q Third \n",
"7 0 3 male 2.0 3 1 21.0750 S Third \n",
".. ... ... ... ... ... ... ... ... ... \n",
"882 0 3 female 22.0 0 0 10.5167 S Third \n",
"884 0 3 male 25.0 0 0 7.0500 S Third \n",
"885 0 3 female 39.0 0 5 29.1250 Q Third \n",
"888 0 3 female NaN 1 2 23.4500 S Third \n",
"890 0 3 male 32.0 0 0 7.7500 Q Third \n",
"\n",
" who adult_male deck embark_town alive alone \n",
"0 man True NaN Southampton no False \n",
"2 woman False NaN Southampton yes True \n",
"4 man True NaN Southampton no True \n",
"5 man True NaN Queenstown no True \n",
"7 child False NaN Southampton no False \n",
".. ... ... ... ... ... ... \n",
"882 woman False NaN Southampton no True \n",
"884 man True NaN Southampton no True \n",
"885 woman False NaN Queenstown no False \n",
"888 woman False NaN Southampton no False \n",
"890 man True NaN Queenstown no True \n",
"\n",
"[491 rows x 15 columns]"
]
},
"execution_count": 164,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# extract only rows corresponding to pclass = 3\n",
"s_bool = df_titanic.loc[:, 'pclass'] == 3\n",
"df_titanic.loc[s_bool, :]"
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 1, 2])"
]
},
"execution_count": 165,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_titanic['pclass'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pclass = 1 has:\n",
" survived pclass age sibsp parch fare\n",
"count 216.000000 216.0 186.000000 216.000000 216.000000 216.000000\n",
"mean 0.629630 1.0 38.233441 0.416667 0.356481 84.154687\n",
"std 0.484026 0.0 14.802856 0.611898 0.693997 78.380373\n",
"min 0.000000 1.0 0.920000 0.000000 0.000000 0.000000\n",
"25% 0.000000 1.0 27.000000 0.000000 0.000000 30.923950\n",
"50% 1.000000 1.0 37.000000 0.000000 0.000000 60.287500\n",
"75% 1.000000 1.0 49.000000 1.000000 0.000000 93.500000\n",
"max 1.000000 1.0 80.000000 3.000000 4.000000 512.329200\n",
"pclass = 2 has:\n",
" survived pclass age sibsp parch fare\n",
"count 184.000000 184.0 173.000000 184.000000 184.000000 184.000000\n",
"mean 0.472826 2.0 29.877630 0.402174 0.380435 20.662183\n",
"std 0.500623 0.0 14.001077 0.601633 0.690963 13.417399\n",
"min 0.000000 2.0 0.670000 0.000000 0.000000 0.000000\n",
"25% 0.000000 2.0 23.000000 0.000000 0.000000 13.000000\n",
"50% 0.000000 2.0 29.000000 0.000000 0.000000 14.250000\n",
"75% 1.000000 2.0 36.000000 1.000000 1.000000 26.000000\n",
"max 1.000000 2.0 70.000000 3.000000 3.000000 73.500000\n",
"pclass = 3 has:\n",
" survived pclass age sibsp parch fare\n",
"count 491.000000 491.0 355.000000 491.000000 491.000000 491.000000\n",
"mean 0.242363 3.0 25.140620 0.615071 0.393075 38.801976\n",
"std 0.428949 0.0 12.495398 1.374883 0.888861 556.628917\n",
"min 0.000000 3.0 0.420000 0.000000 0.000000 0.000000\n",
"25% 0.000000 3.0 18.000000 0.000000 0.000000 7.750000\n",
"50% 0.000000 3.0 24.000000 0.000000 0.000000 8.050000\n",
"75% 0.000000 3.0 32.000000 1.000000 0.000000 15.500000\n",
"max 1.000000 3.0 74.000000 8.000000 6.000000 12345.000000\n"
]
}
],
"source": [
"for pclass in sorted(df_titanic['pclass'].unique()):\n",
" # extract only rows coresponding to a particular value of feature\n",
" s_bool = df_titanic['pclass'] == pclass\n",
" df_titanic_subset = df_titanic.loc[s_bool, :]\n",
" \n",
" print(f'pclass = {pclass} has:')\n",
" print(df_titanic_subset.describe())"
]
},
{
"cell_type": "code",
"execution_count": 183,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"age = 0.83 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.0 2.00 2.000000 2.000000 2.000000\n",
"mean 1.0 2.0 0.83 0.500000 1.500000 23.875000\n",
"std 0.0 0.0 0.00 0.707107 0.707107 7.247845\n",
"min 1.0 2.0 0.83 0.000000 1.000000 18.750000\n",
"25% 1.0 2.0 0.83 0.250000 1.250000 21.312500\n",
"50% 1.0 2.0 0.83 0.500000 1.500000 23.875000\n",
"75% 1.0 2.0 0.83 0.750000 1.750000 26.437500\n",
"max 1.0 2.0 0.83 1.000000 2.000000 29.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 2.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 10.000000 10.000000 10.0 10.00000 10.000000 10.000000\n",
"mean 0.300000 2.600000 2.0 2.10000 1.300000 37.536250\n",
"std 0.483046 0.699206 0.0 1.66333 0.483046 40.979945\n",
"min 0.000000 1.000000 2.0 0.00000 1.000000 10.462500\n",
"25% 0.000000 2.250000 2.0 1.00000 1.000000 22.306250\n",
"50% 0.000000 3.000000 2.0 2.00000 1.000000 26.950000\n",
"75% 0.750000 3.000000 2.0 3.75000 1.750000 30.737500\n",
"max 1.000000 3.000000 2.0 4.00000 2.000000 151.550000\n",
"--------------------------------------------------------------------------------\n",
"age = 3.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 6.000000 6.000000 6.0 6.000000 6.000000 6.000000\n",
"mean 0.833333 2.500000 3.0 1.833333 1.333333 25.781950\n",
"std 0.408248 0.547723 0.0 1.329160 0.516398 9.489778\n",
"min 0.000000 2.000000 3.0 1.000000 1.000000 15.900000\n",
"25% 1.000000 2.000000 3.0 1.000000 1.000000 19.331250\n",
"50% 1.000000 2.500000 3.0 1.000000 1.000000 23.537500\n",
"75% 1.000000 3.000000 3.0 2.500000 1.750000 30.040625\n",
"max 1.000000 3.000000 3.0 4.000000 2.000000 41.579200\n",
"--------------------------------------------------------------------------------\n",
"age = 4.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 10.000000 10.000000 10.0 10.000000 10.000000 10.000000\n",
"mean 0.700000 2.600000 4.0 1.600000 1.400000 29.543330\n",
"std 0.483046 0.699206 0.0 1.577621 0.516398 20.263399\n",
"min 0.000000 1.000000 4.0 0.000000 1.000000 11.133300\n",
"25% 0.250000 2.250000 4.0 0.250000 1.000000 18.031250\n",
"50% 1.000000 3.000000 4.0 1.000000 1.000000 25.450000\n",
"75% 1.000000 3.000000 4.0 2.750000 2.000000 30.737500\n",
"max 1.000000 3.000000 4.0 4.000000 2.000000 81.858300\n",
"--------------------------------------------------------------------------------\n",
"age = 5.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 4.0 4.00 4.0 4.000000 4.000000 4.000000\n",
"mean 1.0 2.75 5.0 1.750000 1.250000 22.717700\n",
"std 0.0 0.50 0.0 1.707825 0.957427 8.512145\n",
"min 1.0 2.00 5.0 0.000000 0.000000 12.475000\n",
"25% 1.0 2.75 5.0 0.750000 0.750000 17.562475\n",
"50% 1.0 3.00 5.0 1.500000 1.500000 23.504150\n",
"75% 1.0 3.00 5.0 2.500000 2.000000 28.659375\n",
"max 1.0 3.00 5.0 4.000000 2.000000 31.387500\n",
"--------------------------------------------------------------------------------\n",
"age = 7.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 3.000000 3.000000 3.0 3.000000 3.000000 3.000000\n",
"mean 0.333333 2.666667 7.0 2.666667 1.333333 31.687500\n",
"std 0.577350 0.577350 0.0 2.309401 0.577350 7.075762\n",
"min 0.000000 2.000000 7.0 0.000000 1.000000 26.250000\n",
"25% 0.000000 2.500000 7.0 2.000000 1.000000 27.687500\n",
"50% 0.000000 3.000000 7.0 4.000000 1.000000 29.125000\n",
"75% 0.500000 3.000000 7.0 4.000000 1.500000 34.406250\n",
"max 1.000000 3.000000 7.0 4.000000 2.000000 39.687500\n",
"--------------------------------------------------------------------------------\n",
"age = 8.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 4.00000 4.00000 4.0 4.000000 4.00 4.000000\n",
"mean 0.50000 2.50000 8.0 2.000000 1.25 28.300000\n",
"std 0.57735 0.57735 0.0 1.825742 0.50 6.544368\n",
"min 0.00000 2.00000 8.0 0.000000 1.00 21.075000\n",
"25% 0.00000 2.00000 8.0 0.750000 1.00 24.956250\n",
"50% 0.50000 2.50000 8.0 2.000000 1.00 27.687500\n",
"75% 1.00000 3.00000 8.0 3.250000 1.25 31.031250\n",
"max 1.00000 3.00000 8.0 4.000000 2.00 36.750000\n",
"--------------------------------------------------------------------------------\n",
"age = 11.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 4.00 4.0 4.0 4.000000 4.0 4.000000\n",
"mean 0.25 2.5 11.0 2.500000 1.5 54.240625\n",
"std 0.50 1.0 0.0 2.380476 1.0 45.323004\n",
"min 0.00 1.0 11.0 0.000000 0.0 18.787500\n",
"25% 0.00 2.5 11.0 0.750000 1.5 28.153125\n",
"50% 0.00 3.0 11.0 2.500000 2.0 39.087500\n",
"75% 0.25 3.0 11.0 4.250000 2.0 65.175000\n",
"max 1.00 3.0 11.0 5.000000 2.0 120.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 14.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 6.000000 6.00000 6.0 6.00 6.000000 6.000000\n",
"mean 0.500000 2.50000 14.0 2.00 0.833333 42.625700\n",
"std 0.547723 0.83666 0.0 2.00 0.983192 40.903113\n",
"min 0.000000 1.00000 14.0 0.00 0.000000 7.854200\n",
"25% 0.000000 2.25000 14.0 1.00 0.000000 15.948975\n",
"50% 0.500000 3.00000 14.0 1.00 0.500000 34.879150\n",
"75% 1.000000 3.00000 14.0 3.25 1.750000 45.096875\n",
"max 1.000000 3.00000 14.0 5.00 2.000000 120.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 15.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 5.000000 5.000000 5.0 5.000000 5.000000 5.000000\n",
"mean 0.800000 2.600000 15.0 0.400000 0.400000 49.655020\n",
"std 0.447214 0.894427 0.0 0.547723 0.547723 90.434075\n",
"min 0.000000 1.000000 15.0 0.000000 0.000000 7.225000\n",
"25% 1.000000 3.000000 15.0 0.000000 0.000000 7.229200\n",
"50% 1.000000 3.000000 15.0 0.000000 0.000000 8.029200\n",
"75% 1.000000 3.000000 15.0 1.000000 1.000000 14.454200\n",
"max 1.000000 3.000000 15.0 1.000000 1.000000 211.337500\n",
"--------------------------------------------------------------------------------\n",
"age = 16.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 17.000000 17.000000 17.0 17.000000 17.000000 17.000000\n",
"mean 0.352941 2.529412 16.0 0.764706 0.529412 25.745100\n",
"std 0.492592 0.799816 0.0 1.521899 0.874475 22.486392\n",
"min 0.000000 1.000000 16.0 0.000000 0.000000 7.733300\n",
"25% 0.000000 2.000000 16.0 0.000000 0.000000 8.050000\n",
"50% 0.000000 3.000000 16.0 0.000000 0.000000 18.000000\n",
"75% 1.000000 3.000000 16.0 1.000000 1.000000 39.400000\n",
"max 1.000000 3.000000 16.0 5.000000 3.000000 86.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 17.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 13.000000 13.000000 13.0 13.000000 13.000000 13.000000\n",
"mean 0.461538 2.384615 17.0 0.615385 0.384615 28.389423\n",
"std 0.518875 0.869718 0.0 1.120897 0.767948 38.546345\n",
"min 0.000000 1.000000 17.0 0.000000 0.000000 7.054200\n",
"25% 0.000000 2.000000 17.0 0.000000 0.000000 7.925000\n",
"50% 0.000000 3.000000 17.0 0.000000 0.000000 8.662500\n",
"75% 1.000000 3.000000 17.0 1.000000 0.000000 14.458300\n",
"max 1.000000 3.000000 17.0 4.000000 2.000000 110.883300\n",
"--------------------------------------------------------------------------------\n",
"age = 18.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 26.000000 26.000000 26.0 26.000000 26.000000 26.000000\n",
"mean 0.346154 2.461538 18.0 0.384615 0.423077 38.063462\n",
"std 0.485165 0.760567 0.0 0.637302 0.702742 66.241829\n",
"min 0.000000 1.000000 18.0 0.000000 0.000000 6.495800\n",
"25% 0.000000 2.000000 18.0 0.000000 0.000000 7.810400\n",
"50% 0.000000 3.000000 18.0 0.000000 0.000000 11.500000\n",
"75% 1.000000 3.000000 18.0 1.000000 1.000000 19.659375\n",
"max 1.000000 3.000000 18.0 2.000000 2.000000 262.375000\n",
"--------------------------------------------------------------------------------\n",
"age = 19.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 25.000000 25.00000 25.0 25.000000 25.00000 25.000000\n",
"mean 0.360000 2.36000 19.0 0.320000 0.20000 27.869496\n",
"std 0.489898 0.81035 0.0 0.690411 0.57735 52.652311\n",
"min 0.000000 1.00000 19.0 0.000000 0.00000 0.000000\n",
"25% 0.000000 2.00000 19.0 0.000000 0.00000 7.895800\n",
"50% 0.000000 3.00000 19.0 0.000000 0.00000 10.170800\n",
"75% 1.000000 3.00000 19.0 0.000000 0.00000 26.000000\n",
"max 1.000000 3.00000 19.0 3.000000 2.00000 263.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 20.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 15.000000 15.0 15.0 15.000000 15.000000 15.000000\n",
"mean 0.200000 3.0 20.0 0.200000 0.066667 8.624173\n",
"std 0.414039 0.0 0.0 0.414039 0.258199 2.433533\n",
"min 0.000000 3.0 20.0 0.000000 0.000000 4.012500\n",
"25% 0.000000 3.0 20.0 0.000000 0.000000 7.854200\n",
"50% 0.000000 3.0 20.0 0.000000 0.000000 8.050000\n",
"75% 0.000000 3.0 20.0 0.000000 0.000000 9.362500\n",
"max 1.000000 3.0 20.0 1.000000 1.000000 15.741700\n",
"--------------------------------------------------------------------------------\n",
"age = 21.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 24.000000 24.000000 24.0 24.000000 24.000000 24.000000\n",
"mean 0.208333 2.583333 21.0 0.333333 0.208333 31.565621\n",
"std 0.414851 0.717282 0.0 0.701964 0.588230 55.340305\n",
"min 0.000000 1.000000 21.0 0.000000 0.000000 7.250000\n",
"25% 0.000000 2.000000 21.0 0.000000 0.000000 7.798950\n",
"50% 0.000000 3.000000 21.0 0.000000 0.000000 8.241650\n",
"75% 0.000000 3.000000 21.0 0.000000 0.000000 20.668750\n",
"max 1.000000 3.000000 21.0 2.000000 2.000000 262.375000\n",
"--------------------------------------------------------------------------------\n",
"age = 22.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 27.000000 27.000000 27.0 27.000000 27.000000 27.000000\n",
"mean 0.407407 2.555556 22.0 0.148148 0.222222 25.504781\n",
"std 0.500712 0.800641 0.0 0.362014 0.577350 38.015474\n",
"min 0.000000 1.000000 22.0 0.000000 0.000000 7.125000\n",
"25% 0.000000 2.500000 22.0 0.000000 0.000000 7.385400\n",
"50% 0.000000 3.000000 22.0 0.000000 0.000000 7.895800\n",
"75% 1.000000 3.000000 22.0 0.000000 0.000000 19.758350\n",
"max 1.000000 3.000000 22.0 1.000000 2.000000 151.550000\n",
"--------------------------------------------------------------------------------\n",
"age = 23.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 15.000000 15.000000 15.0 15.000000 15.000000 15.000000\n",
"mean 0.333333 2.133333 23.0 0.400000 0.266667 37.994720\n",
"std 0.487950 0.743223 0.0 0.910259 0.593617 68.585477\n",
"min 0.000000 1.000000 23.0 0.000000 0.000000 7.550000\n",
"25% 0.000000 2.000000 23.0 0.000000 0.000000 8.575000\n",
"50% 0.000000 2.000000 23.0 0.000000 0.000000 13.000000\n",
"75% 1.000000 3.000000 23.0 0.000000 0.000000 14.418750\n",
"max 1.000000 3.000000 23.0 3.000000 2.000000 263.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 24.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 30.000000 30.000000 30.0 30.000000 30.000000 30.000000\n",
"mean 0.500000 2.200000 24.0 0.500000 0.533333 43.035690\n",
"std 0.508548 0.805156 0.0 0.861034 0.973204 62.858665\n",
"min 0.000000 1.000000 24.0 0.000000 0.000000 7.050000\n",
"25% 0.000000 2.000000 24.0 0.000000 0.000000 9.750000\n",
"50% 0.500000 2.000000 24.0 0.000000 0.000000 16.400000\n",
"75% 1.000000 3.000000 24.0 1.000000 0.750000 61.126050\n",
"max 1.000000 3.000000 24.0 3.000000 3.000000 263.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 25.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 23.000000 23.000000 23.0 23.000000 23.000000 23.000000\n",
"mean 0.260870 2.434783 25.0 0.434783 0.260870 24.415765\n",
"std 0.448978 0.727767 0.0 0.506870 0.619192 34.416843\n",
"min 0.000000 1.000000 25.0 0.000000 0.000000 0.000000\n",
"25% 0.000000 2.000000 25.0 0.000000 0.000000 7.695850\n",
"50% 0.000000 3.000000 25.0 0.000000 0.000000 7.925000\n",
"75% 0.500000 3.000000 25.0 1.000000 0.000000 26.000000\n",
"max 1.000000 3.000000 25.0 1.000000 2.000000 151.550000\n",
"--------------------------------------------------------------------------------\n",
"age = 26.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 18.000000 18.000000 18.0 18.000000 18.000000 18.000000\n",
"mean 0.333333 2.666667 26.0 0.388889 0.166667 704.479861\n",
"std 0.485071 0.685994 0.0 0.607685 0.514496 2905.153854\n",
"min 0.000000 1.000000 26.0 0.000000 0.000000 7.775000\n",
"25% 0.000000 3.000000 26.0 0.000000 0.000000 7.895800\n",
"50% 0.000000 3.000000 26.0 0.000000 0.000000 12.477100\n",
"75% 1.000000 3.000000 26.0 1.000000 0.000000 24.643750\n",
"max 1.000000 3.000000 26.0 2.000000 2.000000 12345.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 27.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 18.000000 18.000000 18.0 18.000000 18.000000 18.000000\n",
"mean 0.611111 2.222222 27.0 0.222222 0.277778 30.361339\n",
"std 0.501631 0.808452 0.0 0.427793 0.669113 48.708195\n",
"min 0.000000 1.000000 27.0 0.000000 0.000000 6.975000\n",
"25% 0.000000 2.000000 27.0 0.000000 0.000000 9.121875\n",
"50% 1.000000 2.000000 27.0 0.000000 0.000000 13.000000\n",
"75% 1.000000 3.000000 27.0 0.000000 0.000000 24.750000\n",
"max 1.000000 3.000000 27.0 1.000000 2.000000 211.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 28.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 25.000000 25.000000 25.0 25.000000 25.000000 25.000000\n",
"mean 0.280000 2.320000 28.0 0.280000 0.080000 21.020160\n",
"std 0.458258 0.748331 0.0 0.541603 0.276887 18.143502\n",
"min 0.000000 1.000000 28.0 0.000000 0.000000 7.795800\n",
"25% 0.000000 2.000000 28.0 0.000000 0.000000 9.500000\n",
"50% 0.000000 2.000000 28.0 0.000000 0.000000 13.000000\n",
"75% 1.000000 3.000000 28.0 0.000000 0.000000 26.000000\n",
"max 1.000000 3.000000 28.0 2.000000 1.000000 82.170800\n",
"--------------------------------------------------------------------------------\n",
"age = 28.5 has:\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" survived pclass age sibsp parch fare\n",
"count 2.0 2.0 2.0 2.0 2.0 2.000000\n",
"mean 0.0 3.0 28.5 0.0 0.0 11.664600\n",
"std 0.0 0.0 0.0 0.0 0.0 6.272603\n",
"min 0.0 3.0 28.5 0.0 0.0 7.229200\n",
"25% 0.0 3.0 28.5 0.0 0.0 9.446900\n",
"50% 0.0 3.0 28.5 0.0 0.0 11.664600\n",
"75% 0.0 3.0 28.5 0.0 0.0 13.882300\n",
"max 0.0 3.0 28.5 0.0 0.0 16.100000\n",
"--------------------------------------------------------------------------------\n",
"age = 29.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 20.000000 20.000000 20.0 20.00000 20.000000 20.000000\n",
"mean 0.400000 2.400000 29.0 0.35000 0.350000 27.090825\n",
"std 0.502625 0.753937 0.0 0.48936 0.988087 45.554098\n",
"min 0.000000 1.000000 29.0 0.00000 0.000000 7.045800\n",
"25% 0.000000 2.000000 29.0 0.00000 0.000000 8.011450\n",
"50% 0.000000 3.000000 29.0 0.00000 0.000000 10.500000\n",
"75% 1.000000 3.000000 29.0 1.00000 0.000000 26.000000\n",
"max 1.000000 3.000000 29.0 1.00000 4.000000 211.337500\n",
"--------------------------------------------------------------------------------\n",
"age = 30.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 25.0 25.000000 25.0 25.000000 25.00 25.000000\n",
"mean 0.4 2.200000 30.0 0.240000 0.04 25.541668\n",
"std 0.5 0.816497 0.0 0.663325 0.20 28.636697\n",
"min 0.0 1.000000 30.0 0.000000 0.00 7.225000\n",
"25% 0.0 2.000000 30.0 0.000000 0.00 8.662500\n",
"50% 0.0 2.000000 30.0 0.000000 0.00 13.000000\n",
"75% 1.0 3.000000 30.0 0.000000 0.00 24.150000\n",
"max 1.0 3.000000 30.0 3.000000 1.00 106.425000\n",
"--------------------------------------------------------------------------------\n",
"age = 31.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 17.000000 17.000000 17.0 17.000000 17.000000 17.000000\n",
"mean 0.470588 2.117647 31.0 0.470588 0.352941 37.009071\n",
"std 0.514496 0.857493 0.0 0.514496 0.606339 42.809926\n",
"min 0.000000 1.000000 31.0 0.000000 0.000000 7.750000\n",
"25% 0.000000 1.000000 31.0 0.000000 0.000000 8.683300\n",
"50% 0.000000 2.000000 31.0 0.000000 0.000000 20.525000\n",
"75% 1.000000 3.000000 31.0 1.000000 1.000000 50.495800\n",
"max 1.000000 3.000000 31.0 1.000000 2.000000 164.866700\n",
"--------------------------------------------------------------------------------\n",
"age = 32.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 18.000000 18.000000 18.0 18.000000 18.000000 18.000000\n",
"mean 0.500000 2.555556 32.0 0.277778 0.055556 24.323378\n",
"std 0.514496 0.704792 0.0 0.574513 0.235702 24.060172\n",
"min 0.000000 1.000000 32.0 0.000000 0.000000 7.750000\n",
"25% 0.000000 2.000000 32.0 0.000000 0.000000 7.925000\n",
"50% 0.500000 3.000000 32.0 0.000000 0.000000 11.750000\n",
"75% 1.000000 3.000000 32.0 0.000000 0.000000 29.375000\n",
"max 1.000000 3.000000 32.0 2.000000 1.000000 76.291700\n",
"--------------------------------------------------------------------------------\n",
"age = 33.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 15.000000 15.000000 15.0 15.000000 15.000000 15.000000\n",
"mean 0.400000 2.266667 33.0 0.466667 0.333333 25.825553\n",
"std 0.507093 0.883715 0.0 0.833809 0.723747 28.179311\n",
"min 0.000000 1.000000 33.0 0.000000 0.000000 5.000000\n",
"25% 0.000000 1.500000 33.0 0.000000 0.000000 8.275000\n",
"50% 0.000000 3.000000 33.0 0.000000 0.000000 12.275000\n",
"75% 1.000000 3.000000 33.0 1.000000 0.000000 26.875000\n",
"max 1.000000 3.000000 33.0 3.000000 2.000000 90.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 34.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 15.000000 15.000000 15.0 15.000000 15.000000 15.000000\n",
"mean 0.400000 2.200000 34.0 0.333333 0.200000 16.636387\n",
"std 0.507093 0.560612 0.0 0.487950 0.414039 7.846849\n",
"min 0.000000 1.000000 34.0 0.000000 0.000000 6.495800\n",
"25% 0.000000 2.000000 34.0 0.000000 0.000000 11.750000\n",
"50% 0.000000 2.000000 34.0 0.000000 0.000000 13.000000\n",
"75% 1.000000 2.500000 34.0 1.000000 0.000000 22.000000\n",
"max 1.000000 3.000000 34.0 1.000000 1.000000 32.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 35.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 18.000000 18.000000 18.0 18.000000 18.000000 18.000000\n",
"mean 0.611111 1.833333 35.0 0.277778 0.055556 89.312500\n",
"std 0.501631 0.923548 0.0 0.460889 0.235702 157.870974\n",
"min 0.000000 1.000000 35.0 0.000000 0.000000 7.050000\n",
"25% 0.000000 1.000000 35.0 0.000000 0.000000 8.662500\n",
"50% 1.000000 1.500000 35.0 0.000000 0.000000 26.143750\n",
"75% 1.000000 3.000000 35.0 0.750000 0.000000 75.881250\n",
"max 1.000000 3.000000 35.0 1.000000 1.000000 512.329200\n",
"--------------------------------------------------------------------------------\n",
"age = 38.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 11.000000 11.000000 11.0 11.000000 11.000000 11.000000\n",
"mean 0.454545 1.818182 38.0 0.272727 0.545455 62.751509\n",
"std 0.522233 0.981650 0.0 0.467099 1.507557 72.750026\n",
"min 0.000000 1.000000 38.0 0.000000 0.000000 0.000000\n",
"25% 0.000000 1.000000 38.0 0.000000 0.000000 8.279150\n",
"50% 0.000000 1.000000 38.0 0.000000 0.000000 31.387500\n",
"75% 1.000000 3.000000 38.0 0.500000 0.000000 85.000000\n",
"max 1.000000 3.000000 38.0 1.000000 5.000000 227.525000\n",
"--------------------------------------------------------------------------------\n",
"age = nan has:\n",
" survived pclass age sibsp parch fare\n",
"count 0.0 0.0 0.0 0.0 0.0 0.0\n",
"mean NaN NaN NaN NaN NaN NaN\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min NaN NaN NaN NaN NaN NaN\n",
"25% NaN NaN NaN NaN NaN NaN\n",
"50% NaN NaN NaN NaN NaN NaN\n",
"75% NaN NaN NaN NaN NaN NaN\n",
"max NaN NaN NaN NaN NaN NaN\n",
"--------------------------------------------------------------------------------\n",
"age = 0.42 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.00 1.0 1.0 1.0000\n",
"mean 1.0 3.0 0.42 0.0 1.0 8.5167\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 1.0 3.0 0.42 0.0 1.0 8.5167\n",
"25% 1.0 3.0 0.42 0.0 1.0 8.5167\n",
"50% 1.0 3.0 0.42 0.0 1.0 8.5167\n",
"75% 1.0 3.0 0.42 0.0 1.0 8.5167\n",
"max 1.0 3.0 0.42 0.0 1.0 8.5167\n",
"--------------------------------------------------------------------------------\n",
"age = 0.67 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.00 1.0 1.0 1.0\n",
"mean 1.0 2.0 0.67 1.0 1.0 14.5\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 1.0 2.0 0.67 1.0 1.0 14.5\n",
"25% 1.0 2.0 0.67 1.0 1.0 14.5\n",
"50% 1.0 2.0 0.67 1.0 1.0 14.5\n",
"75% 1.0 2.0 0.67 1.0 1.0 14.5\n",
"max 1.0 2.0 0.67 1.0 1.0 14.5\n",
"--------------------------------------------------------------------------------\n",
"age = 0.75 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.0 2.00 2.0 2.0 2.0000\n",
"mean 1.0 3.0 0.75 2.0 1.0 19.2583\n",
"std 0.0 0.0 0.00 0.0 0.0 0.0000\n",
"min 1.0 3.0 0.75 2.0 1.0 19.2583\n",
"25% 1.0 3.0 0.75 2.0 1.0 19.2583\n",
"50% 1.0 3.0 0.75 2.0 1.0 19.2583\n",
"75% 1.0 3.0 0.75 2.0 1.0 19.2583\n",
"max 1.0 3.0 0.75 2.0 1.0 19.2583\n",
"--------------------------------------------------------------------------------\n",
"age = 0.92 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.00 1.0 1.0 1.00\n",
"mean 1.0 1.0 0.92 1.0 2.0 151.55\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 1.0 1.0 0.92 1.0 2.0 151.55\n",
"25% 1.0 1.0 0.92 1.0 2.0 151.55\n",
"50% 1.0 1.0 0.92 1.0 2.0 151.55\n",
"75% 1.0 1.0 0.92 1.0 2.0 151.55\n",
"max 1.0 1.0 0.92 1.0 2.0 151.55\n",
"--------------------------------------------------------------------------------\n",
"age = 1.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 7.000000 7.000000 7.0 7.000000 7.000000 7.000000\n",
"mean 0.714286 2.714286 1.0 1.857143 1.571429 30.005957\n",
"std 0.487950 0.487950 0.0 1.951800 0.534522 13.890034\n",
"min 0.000000 2.000000 1.0 0.000000 1.000000 11.133300\n",
"25% 0.500000 2.500000 1.0 0.500000 1.000000 18.158350\n",
"50% 1.000000 3.000000 1.0 1.000000 2.000000 37.004200\n",
"75% 1.000000 3.000000 1.0 3.000000 2.000000 39.343750\n",
"max 1.000000 3.000000 1.0 5.000000 2.000000 46.900000\n",
"--------------------------------------------------------------------------------\n",
"age = 6.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 3.000000 3.000000 3.0 3.000000 3.000000 3.000000\n",
"mean 0.666667 2.666667 6.0 1.333333 1.333333 25.583333\n",
"std 0.577350 0.577350 0.0 2.309401 0.577350 11.384868\n",
"min 0.000000 2.000000 6.0 0.000000 1.000000 12.475000\n",
"25% 0.500000 2.500000 6.0 0.000000 1.000000 21.875000\n",
"50% 1.000000 3.000000 6.0 0.000000 1.000000 31.275000\n",
"75% 1.000000 3.000000 6.0 2.000000 1.500000 32.137500\n",
"max 1.000000 3.000000 6.0 4.000000 2.000000 33.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 9.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 8.00000 8.0 8.0 8.000000 8.00000 8.000000\n",
"mean 0.25000 3.0 9.0 2.500000 1.75000 27.938537\n",
"std 0.46291 0.0 0.0 1.772811 0.46291 10.589661\n",
"min 0.00000 3.0 9.0 0.000000 1.00000 15.245800\n",
"25% 0.00000 3.0 9.0 1.000000 1.75000 19.368750\n",
"50% 0.00000 3.0 9.0 2.500000 2.00000 29.587500\n",
"75% 0.25000 3.0 9.0 4.000000 2.00000 32.134375\n",
"max 1.00000 3.0 9.0 5.000000 2.00000 46.900000\n",
"--------------------------------------------------------------------------------\n",
"age = 10.0 has:\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" survived pclass age sibsp parch fare\n",
"count 2.0 2.0 2.0 2.00000 2.0 2.00000\n",
"mean 0.0 3.0 10.0 1.50000 2.0 26.02500\n",
"std 0.0 0.0 0.0 2.12132 0.0 2.65165\n",
"min 0.0 3.0 10.0 0.00000 2.0 24.15000\n",
"25% 0.0 3.0 10.0 0.75000 2.0 25.08750\n",
"50% 0.0 3.0 10.0 1.50000 2.0 26.02500\n",
"75% 0.0 3.0 10.0 2.25000 2.0 26.96250\n",
"max 0.0 3.0 10.0 3.00000 2.0 27.90000\n",
"--------------------------------------------------------------------------------\n",
"age = 12.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.0000\n",
"mean 1.0 3.0 12.0 1.0 0.0 11.2417\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 1.0 3.0 12.0 1.0 0.0 11.2417\n",
"25% 1.0 3.0 12.0 1.0 0.0 11.2417\n",
"50% 1.0 3.0 12.0 1.0 0.0 11.2417\n",
"75% 1.0 3.0 12.0 1.0 0.0 11.2417\n",
"max 1.0 3.0 12.0 1.0 0.0 11.2417\n",
"--------------------------------------------------------------------------------\n",
"age = 13.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.000000 2.0 2.0 2.000000 2.000000\n",
"mean 1.0 2.500000 13.0 0.0 0.500000 13.364600\n",
"std 0.0 0.707107 0.0 0.0 0.707107 8.676766\n",
"min 1.0 2.000000 13.0 0.0 0.000000 7.229200\n",
"25% 1.0 2.250000 13.0 0.0 0.250000 10.296900\n",
"50% 1.0 2.500000 13.0 0.0 0.500000 13.364600\n",
"75% 1.0 2.750000 13.0 0.0 0.750000 16.432300\n",
"max 1.0 3.000000 13.0 0.0 1.000000 19.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 14.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.0000\n",
"mean 0.0 3.0 14.5 1.0 0.0 14.4542\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 3.0 14.5 1.0 0.0 14.4542\n",
"25% 0.0 3.0 14.5 1.0 0.0 14.4542\n",
"50% 0.0 3.0 14.5 1.0 0.0 14.4542\n",
"75% 0.0 3.0 14.5 1.0 0.0 14.4542\n",
"max 0.0 3.0 14.5 1.0 0.0 14.4542\n",
"--------------------------------------------------------------------------------\n",
"age = 20.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.00\n",
"mean 0.0 3.0 20.5 0.0 0.0 7.25\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 3.0 20.5 0.0 0.0 7.25\n",
"25% 0.0 3.0 20.5 0.0 0.0 7.25\n",
"50% 0.0 3.0 20.5 0.0 0.0 7.25\n",
"75% 0.0 3.0 20.5 0.0 0.0 7.25\n",
"max 0.0 3.0 20.5 0.0 0.0 7.25\n",
"--------------------------------------------------------------------------------\n",
"age = 23.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.0000\n",
"mean 0.0 3.0 23.5 0.0 0.0 7.2292\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 3.0 23.5 0.0 0.0 7.2292\n",
"25% 0.0 3.0 23.5 0.0 0.0 7.2292\n",
"50% 0.0 3.0 23.5 0.0 0.0 7.2292\n",
"75% 0.0 3.0 23.5 0.0 0.0 7.2292\n",
"max 0.0 3.0 23.5 0.0 0.0 7.2292\n",
"--------------------------------------------------------------------------------\n",
"age = 24.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.00\n",
"mean 0.0 3.0 24.5 0.0 0.0 8.05\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 3.0 24.5 0.0 0.0 8.05\n",
"25% 0.0 3.0 24.5 0.0 0.0 8.05\n",
"50% 0.0 3.0 24.5 0.0 0.0 8.05\n",
"75% 0.0 3.0 24.5 0.0 0.0 8.05\n",
"max 0.0 3.0 24.5 0.0 0.0 8.05\n",
"--------------------------------------------------------------------------------\n",
"age = 30.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.0 2.0 2.0 2.0 2.000000\n",
"mean 0.0 3.0 30.5 0.0 0.0 7.900000\n",
"std 0.0 0.0 0.0 0.0 0.0 0.212132\n",
"min 0.0 3.0 30.5 0.0 0.0 7.750000\n",
"25% 0.0 3.0 30.5 0.0 0.0 7.825000\n",
"50% 0.0 3.0 30.5 0.0 0.0 7.900000\n",
"75% 0.0 3.0 30.5 0.0 0.0 7.975000\n",
"max 0.0 3.0 30.5 0.0 0.0 8.050000\n",
"--------------------------------------------------------------------------------\n",
"age = 32.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.000000 2.0 2.0 2.000000 2.0 2.000000\n",
"mean 0.500000 2.0 32.5 0.500000 0.0 21.535400\n",
"std 0.707107 0.0 0.0 0.707107 0.0 12.070878\n",
"min 0.000000 2.0 32.5 0.000000 0.0 13.000000\n",
"25% 0.250000 2.0 32.5 0.250000 0.0 17.267700\n",
"50% 0.500000 2.0 32.5 0.500000 0.0 21.535400\n",
"75% 0.750000 2.0 32.5 0.750000 0.0 25.803100\n",
"max 1.000000 2.0 32.5 1.000000 0.0 30.070800\n",
"--------------------------------------------------------------------------------\n",
"age = 34.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.0000\n",
"mean 0.0 3.0 34.5 0.0 0.0 6.4375\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 3.0 34.5 0.0 0.0 6.4375\n",
"25% 0.0 3.0 34.5 0.0 0.0 6.4375\n",
"50% 0.0 3.0 34.5 0.0 0.0 6.4375\n",
"75% 0.0 3.0 34.5 0.0 0.0 6.4375\n",
"max 0.0 3.0 34.5 0.0 0.0 6.4375\n",
"--------------------------------------------------------------------------------\n",
"age = 36.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 22.000000 22.000000 22.0 22.000000 22.000000 22.000000\n",
"mean 0.500000 1.863636 36.0 0.363636 0.454545 59.964959\n",
"std 0.511766 0.833550 0.0 0.492366 0.800433 108.737593\n",
"min 0.000000 1.000000 36.0 0.000000 0.000000 0.000000\n",
"25% 0.000000 1.000000 36.0 0.000000 0.000000 13.000000\n",
"50% 0.500000 2.000000 36.0 0.000000 0.000000 25.075000\n",
"75% 1.000000 2.750000 36.0 1.000000 0.750000 63.281250\n",
"max 1.000000 3.000000 36.0 1.000000 2.000000 512.329200\n",
"--------------------------------------------------------------------------------\n",
"age = 36.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.0\n",
"mean 0.0 2.0 36.5 0.0 2.0 26.0\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 2.0 36.5 0.0 2.0 26.0\n",
"25% 0.0 2.0 36.5 0.0 2.0 26.0\n",
"50% 0.0 2.0 36.5 0.0 2.0 26.0\n",
"75% 0.0 2.0 36.5 0.0 2.0 26.0\n",
"max 0.0 2.0 36.5 0.0 2.0 26.0\n",
"--------------------------------------------------------------------------------\n",
"age = 37.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 6.000000 6.000000 6.0 6.000000 6.000000 6.000000\n",
"mean 0.166667 1.833333 37.0 0.833333 0.333333 29.811117\n",
"std 0.408248 0.983192 0.0 0.752773 0.516398 19.809864\n",
"min 0.000000 1.000000 37.0 0.000000 0.000000 7.925000\n",
"25% 0.000000 1.000000 37.0 0.250000 0.000000 13.690625\n",
"50% 0.000000 1.500000 37.0 1.000000 0.000000 27.850000\n",
"75% 0.000000 2.750000 37.0 1.000000 0.750000 46.840650\n",
"max 1.000000 3.000000 37.0 2.000000 1.000000 53.100000\n",
"--------------------------------------------------------------------------------\n",
"age = 39.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 14.000000 14.000000 14.0 14.000000 14.000000 14.000000\n",
"mean 0.357143 2.071429 39.0 0.428571 1.285714 36.661900\n",
"std 0.497245 0.916875 0.0 0.513553 2.054210 33.269718\n",
"min 0.000000 1.000000 39.0 0.000000 0.000000 0.000000\n",
"25% 0.000000 1.000000 39.0 0.000000 0.000000 13.000000\n",
"50% 0.000000 2.000000 39.0 0.000000 0.000000 27.562500\n",
"75% 1.000000 3.000000 39.0 1.000000 1.000000 49.743750\n",
"max 1.000000 3.000000 39.0 1.000000 5.000000 110.883300\n",
"--------------------------------------------------------------------------------\n",
"age = 40.0 has:\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" survived pclass age sibsp parch fare\n",
"count 13.000000 13.000000 13.0 13.000000 13.000000 13.000000\n",
"mean 0.461538 2.000000 40.0 0.384615 0.538462 37.109931\n",
"std 0.518875 0.912871 0.0 0.506370 1.126601 48.843768\n",
"min 0.000000 1.000000 40.0 0.000000 0.000000 0.000000\n",
"25% 0.000000 1.000000 40.0 0.000000 0.000000 9.475000\n",
"50% 0.000000 2.000000 40.0 0.000000 0.000000 15.750000\n",
"75% 1.000000 3.000000 40.0 1.000000 1.000000 31.000000\n",
"max 1.000000 3.000000 40.0 1.000000 4.000000 153.462500\n",
"--------------------------------------------------------------------------------\n",
"age = 40.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.0 2.0 2.0 2.000000 2.000000\n",
"mean 0.0 3.0 40.5 0.0 1.000000 11.125000\n",
"std 0.0 0.0 0.0 0.0 1.414214 4.772971\n",
"min 0.0 3.0 40.5 0.0 0.000000 7.750000\n",
"25% 0.0 3.0 40.5 0.0 0.500000 9.437500\n",
"50% 0.0 3.0 40.5 0.0 1.000000 11.125000\n",
"75% 0.0 3.0 40.5 0.0 1.500000 12.812500\n",
"max 0.0 3.0 40.5 0.0 2.000000 14.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 41.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 6.000000 6.00000 6.0 6.000000 6.000000 6.000000\n",
"mean 0.333333 2.50000 41.0 0.333333 1.333333 39.188883\n",
"std 0.516398 0.83666 0.0 0.816497 1.966384 47.936085\n",
"min 0.000000 1.00000 41.0 0.000000 0.000000 7.125000\n",
"25% 0.000000 2.25000 41.0 0.000000 0.000000 15.456225\n",
"50% 0.000000 3.00000 41.0 0.000000 0.500000 19.856250\n",
"75% 0.750000 3.00000 41.0 0.000000 1.750000 34.818750\n",
"max 1.000000 3.00000 41.0 2.000000 5.000000 134.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 42.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 13.000000 13.000000 13.0 13.000000 13.000000 13.000000\n",
"mean 0.461538 2.000000 42.0 0.307692 0.076923 37.125646\n",
"std 0.518875 0.816497 0.0 0.480384 0.277350 59.287239\n",
"min 0.000000 1.000000 42.0 0.000000 0.000000 7.550000\n",
"25% 0.000000 1.000000 42.0 0.000000 0.000000 8.662500\n",
"50% 0.000000 2.000000 42.0 0.000000 0.000000 13.000000\n",
"75% 1.000000 3.000000 42.0 1.000000 0.000000 27.000000\n",
"max 1.000000 3.000000 42.0 1.000000 1.000000 227.525000\n",
"--------------------------------------------------------------------------------\n",
"age = 43.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 5.000000 5.000000 5.0 5.000000 5.00000 5.000000\n",
"mean 0.200000 2.400000 43.0 0.400000 1.60000 59.797500\n",
"std 0.447214 0.894427 0.0 0.547723 2.50998 86.284285\n",
"min 0.000000 1.000000 43.0 0.000000 0.00000 6.450000\n",
"25% 0.000000 2.000000 43.0 0.000000 0.00000 8.050000\n",
"50% 0.000000 3.000000 43.0 0.000000 1.00000 26.250000\n",
"75% 0.000000 3.000000 43.0 1.000000 1.00000 46.900000\n",
"max 1.000000 3.000000 43.0 1.000000 6.00000 211.337500\n",
"--------------------------------------------------------------------------------\n",
"age = 44.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 9.000000 9.000000 9.0 9.000000 9.000000 9.000000\n",
"mean 0.333333 2.111111 44.0 0.444444 0.222222 29.758333\n",
"std 0.500000 0.927961 0.0 0.726483 0.440959 27.530949\n",
"min 0.000000 1.000000 44.0 0.000000 0.000000 7.925000\n",
"25% 0.000000 1.000000 44.0 0.000000 0.000000 8.050000\n",
"50% 0.000000 2.000000 44.0 0.000000 0.000000 26.000000\n",
"75% 1.000000 3.000000 44.0 1.000000 0.000000 27.720800\n",
"max 1.000000 3.000000 44.0 2.000000 1.000000 90.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 45.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 12.000000 12.000000 12.0 12.000000 12.000000 12.000000\n",
"mean 0.416667 2.000000 45.0 0.333333 0.583333 36.818408\n",
"std 0.514929 0.953463 0.0 0.492366 1.164500 45.311226\n",
"min 0.000000 1.000000 45.0 0.000000 0.000000 6.975000\n",
"25% 0.000000 1.000000 45.0 0.000000 0.000000 12.137500\n",
"50% 0.000000 2.000000 45.0 0.000000 0.000000 26.400000\n",
"75% 1.000000 3.000000 45.0 1.000000 1.000000 29.800000\n",
"max 1.000000 3.000000 45.0 1.000000 4.000000 164.866700\n",
"--------------------------------------------------------------------------------\n",
"age = 45.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.000000 2.0 2.0 2.0 2.000000\n",
"mean 0.0 2.000000 45.5 0.0 0.0 17.862500\n",
"std 0.0 1.414214 0.0 0.0 0.0 15.043697\n",
"min 0.0 1.000000 45.5 0.0 0.0 7.225000\n",
"25% 0.0 1.500000 45.5 0.0 0.0 12.543750\n",
"50% 0.0 2.000000 45.5 0.0 0.0 17.862500\n",
"75% 0.0 2.500000 45.5 0.0 0.0 23.181250\n",
"max 0.0 3.000000 45.5 0.0 0.0 28.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 46.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 3.0 3.000000 3.0 3.000000 3.0 3.000000\n",
"mean 0.0 1.333333 46.0 0.333333 0.0 55.458333\n",
"std 0.0 0.577350 0.0 0.577350 0.0 27.056796\n",
"min 0.0 1.000000 46.0 0.000000 0.0 26.000000\n",
"25% 0.0 1.000000 46.0 0.000000 0.0 43.587500\n",
"50% 0.0 1.000000 46.0 0.000000 0.0 61.175000\n",
"75% 0.0 1.500000 46.0 0.500000 0.0 70.187500\n",
"max 0.0 2.000000 46.0 1.000000 0.0 79.200000\n",
"--------------------------------------------------------------------------------\n",
"age = 47.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 9.000000 9.000000 9.0 9.000000 9.000000 9.000000\n",
"mean 0.111111 1.777778 47.0 0.222222 0.111111 27.601389\n",
"std 0.333333 0.971825 0.0 0.440959 0.333333 17.580570\n",
"min 0.000000 1.000000 47.0 0.000000 0.000000 7.250000\n",
"25% 0.000000 1.000000 47.0 0.000000 0.000000 14.500000\n",
"50% 0.000000 1.000000 47.0 0.000000 0.000000 25.587500\n",
"75% 0.000000 3.000000 47.0 0.000000 0.000000 38.500000\n",
"max 1.000000 3.000000 47.0 1.000000 1.000000 52.554200\n",
"--------------------------------------------------------------------------------\n",
"age = 48.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 9.000000 9.000000 9.0 9.000000 9.000000 9.000000\n",
"mean 0.666667 1.666667 48.0 0.555556 0.555556 37.893067\n",
"std 0.500000 0.866025 0.0 0.527046 1.130388 23.051910\n",
"min 0.000000 1.000000 48.0 0.000000 0.000000 7.854200\n",
"25% 0.000000 1.000000 48.0 0.000000 0.000000 25.929200\n",
"50% 1.000000 1.000000 48.0 1.000000 0.000000 34.375000\n",
"75% 1.000000 2.000000 48.0 1.000000 0.000000 52.000000\n",
"max 1.000000 3.000000 48.0 1.000000 3.000000 76.729200\n",
"--------------------------------------------------------------------------------\n",
"age = 49.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 6.000000 6.000000 6.0 6.000000 6.000000 6.000000\n",
"mean 0.666667 1.333333 49.0 0.666667 0.166667 59.929183\n",
"std 0.516398 0.816497 0.0 0.516398 0.408248 41.197694\n",
"min 0.000000 1.000000 49.0 0.000000 0.000000 0.000000\n",
"25% 0.250000 1.000000 49.0 0.250000 0.000000 33.679200\n",
"50% 1.000000 1.000000 49.0 1.000000 0.000000 66.829200\n",
"75% 1.000000 1.000000 49.0 1.000000 0.000000 86.010450\n",
"max 1.000000 3.000000 49.0 1.000000 1.000000 110.883300\n",
"--------------------------------------------------------------------------------\n",
"age = 50.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 10.000000 10.000000 10.0 10.000000 10.000000 10.000000\n",
"mean 0.500000 1.600000 50.0 0.400000 0.200000 64.025830\n",
"std 0.527046 0.699206 0.0 0.699206 0.421637 77.847144\n",
"min 0.000000 1.000000 50.0 0.000000 0.000000 8.050000\n",
"25% 0.000000 1.000000 50.0 0.000000 0.000000 11.125000\n",
"50% 0.500000 1.500000 50.0 0.000000 0.000000 27.356250\n",
"75% 1.000000 2.000000 50.0 0.750000 0.000000 93.793750\n",
"max 1.000000 3.000000 50.0 2.000000 1.000000 247.520800\n",
"--------------------------------------------------------------------------------\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"age = 51.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 7.000000 7.0 7.0 7.000000 7.000000 7.000000\n",
"mean 0.285714 2.0 51.0 0.142857 0.142857 28.752386\n",
"std 0.487950 1.0 0.0 0.377964 0.377964 29.138777\n",
"min 0.000000 1.0 51.0 0.000000 0.000000 7.054200\n",
"25% 0.000000 1.0 51.0 0.000000 0.000000 7.900000\n",
"50% 0.000000 2.0 51.0 0.000000 0.000000 12.525000\n",
"75% 0.500000 3.0 51.0 0.000000 0.000000 43.964600\n",
"max 1.000000 3.0 51.0 1.000000 1.000000 77.958300\n",
"--------------------------------------------------------------------------------\n",
"age = 52.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 6.000000 6.000000 6.0 6.000000 6.000000 6.000000\n",
"mean 0.500000 1.333333 52.0 0.500000 0.333333 51.402783\n",
"std 0.547723 0.516398 0.0 0.547723 0.516398 36.441932\n",
"min 0.000000 1.000000 52.0 0.000000 0.000000 13.000000\n",
"25% 0.000000 1.000000 52.0 0.000000 0.000000 17.750000\n",
"50% 0.500000 1.000000 52.0 0.500000 0.000000 54.383350\n",
"75% 1.000000 1.750000 52.0 1.000000 0.750000 79.304175\n",
"max 1.000000 2.000000 52.0 1.000000 1.000000 93.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 53.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.0000\n",
"mean 1.0 1.0 53.0 2.0 0.0 51.4792\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 1.0 1.0 53.0 2.0 0.0 51.4792\n",
"25% 1.0 1.0 53.0 2.0 0.0 51.4792\n",
"50% 1.0 1.0 53.0 2.0 0.0 51.4792\n",
"75% 1.0 1.0 53.0 2.0 0.0 51.4792\n",
"max 1.0 1.0 53.0 2.0 0.0 51.4792\n",
"--------------------------------------------------------------------------------\n",
"age = 54.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 8.000000 8.000000 8.0 8.000000 8.000000 8.000000\n",
"mean 0.375000 1.500000 54.0 0.500000 0.500000 44.477087\n",
"std 0.517549 0.534522 0.0 0.534522 1.069045 25.546659\n",
"min 0.000000 1.000000 54.0 0.000000 0.000000 14.000000\n",
"25% 0.000000 1.000000 54.0 0.000000 0.000000 25.250000\n",
"50% 0.000000 1.500000 54.0 0.500000 0.000000 38.931250\n",
"75% 1.000000 2.000000 54.0 1.000000 0.250000 63.871875\n",
"max 1.000000 2.000000 54.0 1.000000 3.000000 78.266700\n",
"--------------------------------------------------------------------------------\n",
"age = 55.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.000000 2.000000 2.0 2.0 2.0 2.000000\n",
"mean 0.500000 1.500000 55.0 0.0 0.0 23.250000\n",
"std 0.707107 0.707107 0.0 0.0 0.0 10.253048\n",
"min 0.000000 1.000000 55.0 0.0 0.0 16.000000\n",
"25% 0.250000 1.250000 55.0 0.0 0.0 19.625000\n",
"50% 0.500000 1.500000 55.0 0.0 0.0 23.250000\n",
"75% 0.750000 1.750000 55.0 0.0 0.0 26.875000\n",
"max 1.000000 2.000000 55.0 0.0 0.0 30.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 55.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.00\n",
"mean 0.0 3.0 55.5 0.0 0.0 8.05\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 3.0 55.5 0.0 0.0 8.05\n",
"25% 0.0 3.0 55.5 0.0 0.0 8.05\n",
"50% 0.0 3.0 55.5 0.0 0.0 8.05\n",
"75% 0.0 3.0 55.5 0.0 0.0 8.05\n",
"max 0.0 3.0 55.5 0.0 0.0 8.05\n",
"--------------------------------------------------------------------------------\n",
"age = 56.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 4.00000 4.0 4.0 4.0 4.00 4.000000\n",
"mean 0.50000 1.0 56.0 0.0 0.25 43.976025\n",
"std 0.57735 0.0 0.0 0.0 0.50 26.376280\n",
"min 0.00000 1.0 56.0 0.0 0.00 26.550000\n",
"25% 0.00000 1.0 56.0 0.0 0.00 29.659350\n",
"50% 0.50000 1.0 56.0 0.0 0.00 33.097900\n",
"75% 1.00000 1.0 56.0 0.0 0.25 47.414575\n",
"max 1.00000 1.0 56.0 0.0 1.00 83.158300\n",
"--------------------------------------------------------------------------------\n",
"age = 57.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.0 2.0 2.0 2.0 2.000000\n",
"mean 0.0 2.0 57.0 0.0 0.0 11.425000\n",
"std 0.0 0.0 0.0 0.0 0.0 1.308148\n",
"min 0.0 2.0 57.0 0.0 0.0 10.500000\n",
"25% 0.0 2.0 57.0 0.0 0.0 10.962500\n",
"50% 0.0 2.0 57.0 0.0 0.0 11.425000\n",
"75% 0.0 2.0 57.0 0.0 0.0 11.887500\n",
"max 0.0 2.0 57.0 0.0 0.0 12.350000\n",
"--------------------------------------------------------------------------------\n",
"age = 58.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 5.000000 5.0 5.0 5.0 5.000000 5.000000\n",
"mean 0.600000 1.0 58.0 0.0 0.600000 93.901660\n",
"std 0.547723 0.0 0.0 0.0 0.894427 61.946939\n",
"min 0.000000 1.0 58.0 0.0 0.000000 26.550000\n",
"25% 0.000000 1.0 58.0 0.0 0.000000 29.700000\n",
"50% 1.000000 1.0 58.0 0.0 0.000000 113.275000\n",
"75% 1.000000 1.0 58.0 0.0 1.000000 146.520800\n",
"max 1.000000 1.0 58.0 0.0 2.000000 153.462500\n",
"--------------------------------------------------------------------------------\n",
"age = 59.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.000000 2.0 2.0 2.0 2.000000\n",
"mean 0.0 2.500000 59.0 0.0 0.0 10.375000\n",
"std 0.0 0.707107 0.0 0.0 0.0 4.419417\n",
"min 0.0 2.000000 59.0 0.0 0.0 7.250000\n",
"25% 0.0 2.250000 59.0 0.0 0.0 8.812500\n",
"50% 0.0 2.500000 59.0 0.0 0.0 10.375000\n",
"75% 0.0 2.750000 59.0 0.0 0.0 11.937500\n",
"max 0.0 3.000000 59.0 0.0 0.0 13.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 60.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 4.00000 4.00 4.0 4.00 4.00000 4.000000\n",
"mean 0.50000 1.25 60.0 0.75 0.50000 55.000000\n",
"std 0.57735 0.50 0.0 0.50 0.57735 26.211353\n",
"min 0.00000 1.00 60.0 0.00 0.00000 26.550000\n",
"25% 0.00000 1.00 60.0 0.75 0.00000 35.887500\n",
"50% 0.50000 1.00 60.0 1.00 0.50000 57.125000\n",
"75% 1.00000 1.25 60.0 1.00 1.00000 76.237500\n",
"max 1.00000 2.00 60.0 1.00 1.00000 79.200000\n",
"--------------------------------------------------------------------------------\n",
"age = 61.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 3.0 3.000000 3.0 3.0 3.0 3.000000\n",
"mean 0.0 1.666667 61.0 0.0 0.0 24.019433\n",
"std 0.0 1.154701 0.0 0.0 0.0 15.410889\n",
"min 0.0 1.000000 61.0 0.0 0.0 6.237500\n",
"25% 0.0 1.000000 61.0 0.0 0.0 19.279150\n",
"50% 0.0 1.000000 61.0 0.0 0.0 32.320800\n",
"75% 0.0 2.000000 61.0 0.0 0.0 32.910400\n",
"max 0.0 3.000000 61.0 0.0 0.0 33.500000\n",
"--------------------------------------------------------------------------------\n",
"age = 62.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 4.00000 4.00 4.0 4.0 4.0 4.000000\n",
"mean 0.50000 1.25 62.0 0.0 0.0 35.900000\n",
"std 0.57735 0.50 0.0 0.0 0.0 30.357948\n",
"min 0.00000 1.00 62.0 0.0 0.0 10.500000\n",
"25% 0.00000 1.00 62.0 0.0 0.0 22.537500\n",
"50% 0.50000 1.00 62.0 0.0 0.0 26.550000\n",
"75% 1.00000 1.25 62.0 0.0 0.0 39.912500\n",
"max 1.00000 2.00 62.0 0.0 0.0 80.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 63.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.000000 2.0 2.000000 2.0 2.000000\n",
"mean 1.0 2.000000 63.0 0.500000 0.0 43.772900\n",
"std 0.0 1.414214 0.0 0.707107 0.0 48.345456\n",
"min 1.0 1.000000 63.0 0.000000 0.0 9.587500\n",
"25% 1.0 1.500000 63.0 0.250000 0.0 26.680200\n",
"50% 1.0 2.000000 63.0 0.500000 0.0 43.772900\n",
"75% 1.0 2.500000 63.0 0.750000 0.0 60.865600\n",
"max 1.0 3.000000 63.0 1.000000 0.0 77.958300\n",
"--------------------------------------------------------------------------------\n",
"age = 64.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.0 2.0 2.000000 2.000000 2.000000\n",
"mean 0.0 1.0 64.0 0.500000 2.000000 144.500000\n",
"std 0.0 0.0 0.0 0.707107 2.828427 167.584307\n",
"min 0.0 1.0 64.0 0.000000 0.000000 26.000000\n",
"25% 0.0 1.0 64.0 0.250000 1.000000 85.250000\n",
"50% 0.0 1.0 64.0 0.500000 2.000000 144.500000\n",
"75% 0.0 1.0 64.0 0.750000 3.000000 203.750000\n",
"max 0.0 1.0 64.0 1.000000 4.000000 263.000000\n",
"--------------------------------------------------------------------------------\n",
"age = 65.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 3.0 3.000000 3.0 3.0 3.000000 3.000000\n",
"mean 0.0 1.666667 65.0 0.0 0.333333 32.093067\n",
"std 0.0 1.154701 0.0 0.0 0.577350 27.536262\n",
"min 0.0 1.000000 65.0 0.0 0.000000 7.750000\n",
"25% 0.0 1.000000 65.0 0.0 0.000000 17.150000\n",
"50% 0.0 1.000000 65.0 0.0 0.000000 26.550000\n",
"75% 0.0 2.000000 65.0 0.0 0.500000 44.264600\n",
"max 0.0 3.000000 65.0 0.0 1.000000 61.979200\n",
"--------------------------------------------------------------------------------\n",
"age = 66.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.0\n",
"mean 0.0 2.0 66.0 0.0 0.0 10.5\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 2.0 66.0 0.0 0.0 10.5\n",
"25% 0.0 2.0 66.0 0.0 0.0 10.5\n",
"50% 0.0 2.0 66.0 0.0 0.0 10.5\n",
"75% 0.0 2.0 66.0 0.0 0.0 10.5\n",
"max 0.0 2.0 66.0 0.0 0.0 10.5\n",
"--------------------------------------------------------------------------------\n",
"age = 70.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 2.0 2.000000 2.0 2.000000 2.000000 2.00000\n",
"mean 0.0 1.500000 70.0 0.500000 0.500000 40.75000\n",
"std 0.0 0.707107 0.0 0.707107 0.707107 42.77996\n",
"min 0.0 1.000000 70.0 0.000000 0.000000 10.50000\n",
"25% 0.0 1.250000 70.0 0.250000 0.250000 25.62500\n",
"50% 0.0 1.500000 70.0 0.500000 0.500000 40.75000\n",
"75% 0.0 1.750000 70.0 0.750000 0.750000 55.87500\n",
"max 0.0 2.000000 70.0 1.000000 1.000000 71.00000\n",
"--------------------------------------------------------------------------------\n",
"age = 70.5 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.00\n",
"mean 0.0 3.0 70.5 0.0 0.0 7.75\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 3.0 70.5 0.0 0.0 7.75\n",
"25% 0.0 3.0 70.5 0.0 0.0 7.75\n",
"50% 0.0 3.0 70.5 0.0 0.0 7.75\n",
"75% 0.0 3.0 70.5 0.0 0.0 7.75\n",
"max 0.0 3.0 70.5 0.0 0.0 7.75\n",
"--------------------------------------------------------------------------------\n",
"age = 71.0 has:\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" survived pclass age sibsp parch fare\n",
"count 2.0 2.0 2.0 2.0 2.0 2.000000\n",
"mean 0.0 1.0 71.0 0.0 0.0 42.079200\n",
"std 0.0 0.0 0.0 0.0 0.0 10.500536\n",
"min 0.0 1.0 71.0 0.0 0.0 34.654200\n",
"25% 0.0 1.0 71.0 0.0 0.0 38.366700\n",
"50% 0.0 1.0 71.0 0.0 0.0 42.079200\n",
"75% 0.0 1.0 71.0 0.0 0.0 45.791700\n",
"max 0.0 1.0 71.0 0.0 0.0 49.504200\n",
"--------------------------------------------------------------------------------\n",
"age = 74.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.000\n",
"mean 0.0 3.0 74.0 0.0 0.0 7.775\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 0.0 3.0 74.0 0.0 0.0 7.775\n",
"25% 0.0 3.0 74.0 0.0 0.0 7.775\n",
"50% 0.0 3.0 74.0 0.0 0.0 7.775\n",
"75% 0.0 3.0 74.0 0.0 0.0 7.775\n",
"max 0.0 3.0 74.0 0.0 0.0 7.775\n",
"--------------------------------------------------------------------------------\n",
"age = 80.0 has:\n",
" survived pclass age sibsp parch fare\n",
"count 1.0 1.0 1.0 1.0 1.0 1.0\n",
"mean 1.0 1.0 80.0 0.0 0.0 30.0\n",
"std NaN NaN NaN NaN NaN NaN\n",
"min 1.0 1.0 80.0 0.0 0.0 30.0\n",
"25% 1.0 1.0 80.0 0.0 0.0 30.0\n",
"50% 1.0 1.0 80.0 0.0 0.0 30.0\n",
"75% 1.0 1.0 80.0 0.0 0.0 30.0\n",
"max 1.0 1.0 80.0 0.0 0.0 30.0\n",
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"feat = 'age'\n",
"for val in sorted(df_titanic[feat].unique()):\n",
" # extract only rows coresponding to a particular value of feature\n",
" s_bool = df_titanic[feat] == val\n",
" df_titanic_subset = df_titanic.loc[s_bool, :]\n",
" \n",
" print(f'{feat} = {val} has:')\n",
" print(df_titanic_subset.describe(), end='\\n' + '-' * 80 + '\\n')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}