{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# DS2500 Lesson4\n", "\n", "Jan 24, 2023\n", "\n", "Content:\n", "- defaultdict\n", "- imports\n", " - random.choices\n", "- numpy & arrays\n", "\n", "Admin:\n", "- lab\n", " - due: tomorrow weds @ 11:59 PM\n", " - still stuck? see lab digest session this weds\n", "- hw resource: office hours & piazza\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `defaultdict`\n", "\n", "Allows you to add a default value for any key which is not in the dictionary\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# ordinarily, if you try to lookup a key not in the dictionary -> KeyError\n", "normal_dict = {'a': 3, 'b': 65}\n", "# normal_dict['c']\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a defaultdict allows you to specify a default value \n", "# this default is used if one attempts to access a key not in the dictionary\n", "from collections import defaultdict\n", "\n", "def_dict = defaultdict(lambda: 10)\n", "\n", "# look, there isn't an error ... even though 'a' isnt a key\n", "def_dict['a']\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def_dict['aisduhfaidsuhf']" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def_dict['a'] = 300" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "defaultdict(()>, {'a': 300, 'aisduhfaidsuhf': 10})" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# notice that after we access key 'a', its stored\n", "def_dict\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defaultdict(()>,\n", " {'a': 4, 'o': 1, 's': 4, 'u': 4, 'i': 2, 'f': 3, 'd': 3})" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# default dictionaries are useful for counting (hw0 hint ...)\n", "\n", "char_count = defaultdict(lambda: 0)\n", "for char in 'aosuifasduiasdfuasduf':\n", " # add 1 to total number of times character is seen\n", " char_count[char] = char_count[char] + 1\n", " \n", "char_count\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Imports\n", "\n", "`import` statements allow us to use code stored in another file (our own local file, or maybe some module we installed).\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## `import`ing a local file\n", "\n", "Lets first make a `.py` file, adjacent to this `.ipynb`, which has some code in it:\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "s_file_contents = '''\n", "some_secret_api_key = 18451982\n", "\n", "def print_greeting(name, language='english'):\n", " \"\"\" prints a greeting in english or spanish\n", " \n", " Args:\n", " name (str): name to greet\n", " language (str): 'english' or 'spanish'\n", " \"\"\"\n", " \n", " str_greet_dict = {'english': 'hello {name}!',\n", " 'spanish': 'hola {name}!'}\n", " \n", " # print message\n", " str_greet = str_greet_dict[language]\n", " print(str_greet.format(name=name))\n", "'''\n", "\n", "# this will print the string above into the file \"some_file.py\" (overwrite if file exists)\n", "with open('greet.py', 'w') as f:\n", " print(s_file_contents, file=f)\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "18451982" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# run the local file greet.py, put its contents in the variable \"greet\"\n", "import greet\n", "\n", "# you can access the contents of greet with a period character\n", "greet.some_secret_api_key\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello sal!\n" ] } ], "source": [ "greet.print_greeting('sal')\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hola sal!\n" ] } ], "source": [ "greet.print_greeting('sal', language='spanish')\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `import`ing is \"lazy\"\n", "\n", "In a particular python session, the import will only be run the first time it occurs. \n", "\n", "For example, the `import greet` below does not run `greet.py`, but just re-uses the same `greet` variable created by the first import. \n", "\n", "(note to self: modify greet.py to demonstrate)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import greet\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'this is a new api key'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "greet.some_secret_api_key\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `import`ing a module somebody else wrote\n", "\n", "We use `import` to access code from some python module.\n", "\n", "For example, we can import the random module and its function [random.choices](https://docs.python.org/3/library/random.html#random.choices) (useful for HW0!)\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['red pill']" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import random\n", "\n", "population = 'red pill', 'blue pill'\n", "random.choices(population)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### tip: skim the documentation for other keyword arguments ... they're often helpful\n", "\n", "- `k`: how many random samples we draw\n", "- `weights`: how likely each item in population is to be drawn\n", " - `weights[0]` is the weight of choosing `population[0]`\n", " - you can pass a probability distribution here and it'll do what you expect\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['red pill', 'red pill', 'blue pill', 'blue pill']" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random.choices(population, k=4)\n", "\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('red pill', 'blue pill')" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "population" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['red pill',\n", " 'red pill',\n", " 'red pill',\n", " 'red pill',\n", " 'red pill',\n", " 'red pill',\n", " 'red pill',\n", " 'red pill',\n", " 'red pill',\n", " 'red pill']" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# red pill occurs 90% of the time, blue pill 10% of the time\n", "random.choices(population, k=10, weights=[.99, .01])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `import`\n", "\n", "Convenience syntaxes for shortening code\n", "\n", "- `from random import choices`\n", "- `import numpy as np`\n" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['a', 'c', 'c', 'c', 'b', 'a', 'c', 'b', 'c', 'b']" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# by importing with we can call item directly\n", "from random import choices\n", "\n", "choices(['a', 'b', 'c'], k=10)\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# imports numpy, but stores it as a local module variable \"np\"\n", "import numpy as np\n", "\n", "# make a numpy array (we'll see this again shortly)\n", "np.array([[1, 2],\n", " [3, 4]])\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# In Class Activity A\n", "\n", "In a particular game a die is rolled such that:\n", "- a player earns 10 points 1/2 of the time\n", "- a player earns 20 points 1/3 of the time\n", "- a player earns 30 points 1/6 of the time\n", "\n", "Create a single call to `random.choices` which simulates 4 players being assigned points as above. (Your output should be a list of 4 items)\n" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[30, 10, 10, 30]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import random\n", "\n", "population = [10, 20, 30]\n", "weights = [1/2, 1/3, 1/6]\n", "k = 4\n", "\n", "random.choices([10, 20, 30], weights=weights, k=k)\n" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "\n", "pop_weight_dict = {10: 1/2,\n", " 20: 1/3,\n", " 30: 1/6}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Rows vs Columns\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Why do we make such a fuss to represent data as arrays?\n", "\n", "Its often a convenient to consider a dataset as a two dimensional array (see below). Where:\n", "- every row corresponds to a particular **sample**\n", " - e.g. a penguin\n", "- every column corresponds to a particular **feature**\n", " - e.g. how heavy all penguins are\n", "- the intersection of a row and column contains the feature corresponding to a particular sample:\n", " - e.g. how heavy a particular penguin is\n", "\n", " \n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
0AdelieTorgersen39.118.7181.03750.0Male
1AdelieTorgersen39.517.4186.03800.0Female
2AdelieTorgersen40.318.0195.03250.0Female
3AdelieTorgersenNaNNaNNaNNaNNaN
4AdelieTorgersen36.719.3193.03450.0Female
\n", "
" ], "text/plain": [ " species island bill_length_mm bill_depth_mm flipper_length_mm \\\n", "0 Adelie Torgersen 39.1 18.7 181.0 \n", "1 Adelie Torgersen 39.5 17.4 186.0 \n", "2 Adelie Torgersen 40.3 18.0 195.0 \n", "3 Adelie Torgersen NaN NaN NaN \n", "4 Adelie Torgersen 36.7 19.3 193.0 \n", "\n", " body_mass_g sex \n", "0 3750.0 Male \n", "1 3800.0 Female \n", "2 3250.0 Female \n", "3 NaN NaN \n", "4 3450.0 Female " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# (we'll cover this code next lesson, for today I just want us all to\n", "# look at a dataset together)\n", "\n", "import seaborn as sns\n", "\n", "# data source: https://github.com/mwaskom/seaborn-data/blob/master/penguins.csv\n", "df_penguin = sns.load_dataset('penguins')\n", "df_penguin.head()\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## **NumPy** (**Numerical Python**) Library\n", "* First appeared in 2006 and is the **preferred Python array implementation**.\n", "* High-performance, richly functional **_n_-dimensional array** type called **`ndarray`**. \n", "* **Written in C** and **up to 100 times faster than lists**.\n", "* Critical in big-data processing, AI applications and much more. \n", "* According to `libraries.io`, **over 450 Python libraries depend on NumPy**. \n", "* Many popular data science libraries such as Pandas, SciPy (Scientific Python) and Keras (for deep learning) are built on or depend on NumPy. \n", "\n", "Big Question:\n", "```\n", "What is an array? (and how is different than a list or list of lists?)\n", "```\n", "\n", "| Array | List (Python: Dynamic Array) |\n", "|---------------------------------------|------------------------------------------------------|\n", "| Size is static (contiguous memory) | Size can be modified quickly (non-contiguous memory) |\n", "| Quick to compute (esp Linear Algebra) | Slower to compute (and clumsy looking code) |\n", "| contains 1 datatype | may contain different data types |\n", "\n", "In summary, Arrays are faster, but more restrictive than lists.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Initializing arrays:\n", "- 1d from list / tuple\n", "- 2d from list / tuple\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "# x is a 1d array (3)\n", "x = np.array((1, 2, 3))\n", "x\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# y is a 2d array (2, 3)\n", "y = np.array([[1, 2, 3],\n", " [4, 5, 6]])\n", "y\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Building some special matrics\n", "- zeros\n", " - dtype\n", " - shape\n", "- ones\n", " - dtype\n", " - shape\n", "- full \n", " - dtype\n", " - shape\n", " - fill_value\n", "- eye\n", " - dtype\n", " - N\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "#### Convention: Rows First!\n", "- we describe array shape as `(n_rows, n_cols)`\n", "- we index into an array as `x[row_idx, col_idx]`\n" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 0., 0., 0., 0.],\n", " [0., 0., 0., 0., 0.]])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# shape = (n_rows, n_cols)\n", "# shape = (height, width)\n", "z = np.zeros((2, 5)) # tall array\n", "z\n" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1.]])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "one_array = np.ones((6, 5))\n", "one_array\n" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2., 2., 2., 2., 2.],\n", " [2., 2., 2., 2., 2.]])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# np.full(shape=(2,5), fill_value=2)\n", "two_array = np.full(shape=(2, 5), fill_value=2.0)\n", "two_array\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 0., 0.],\n", " [0., 1., 0.],\n", " [0., 0., 1.]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# identity matrix\n", "# square matrix with 1's on the diagonal, 0s elsewhere\n", "np.eye(3)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Building arrays which change: \n", "- `arange()`\n", "- `linspace()`\n", "- `geomspace()`\n", "- `logspace()`\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 2, 4, 6, 8])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# np.arange(start (inclusive), stop (exclusive), step)\n", "np.arange(0, 10, 2)\n" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0.25, 0.5 , 0.75, 1. ])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# linearly spaced values np.linspace(start (inclusive), stop (inclusive), size)\n", "np.linspace(0, 1, 5)\n" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1., 3., 9., 27.])" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# geom spaced values np.geomspace(start (inclusive), stop (inclusive), size)\n", "np.geomspace(1, 27, 4)\n" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# log spaced value np.logspace(start_exp, stop_exp, size)\n", "# start = 10^start_exp, stop = 10^stop_exp\n", "np.logspace(0, 6, 7)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Array Attributes\n", "- shape\n", "- size\n", "- ndim\n", "\n", "Numpy can build arrays out of many different number types (bool, int, float). ([see also](https://numpy.org/doc/stable/user/basics.types.html#:~:text=There%20are%205%20basic%20numerical,point%20(float)%20and%20complex.&text=NumPy%20knows%20that%20int%20refers,int_%20%2C%20bool%20means%20np.))\n", "- dtype\n", " - astype\n", "- nbytes\n" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "x = np.array([[1, 2, 3],\n", " [4, 5, 6]]) \n" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.dtype\n" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.ndim\n" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 3)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.shape\n" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# size is total number of elements\n", "x.size\n" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "48" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.nbytes\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Manipulating array shape\n", "\n", "### Diagonal\n", "\n", "The diagonal of each array is shaded below, the unshaded elements are not on the diagonal of the matrix:\n", "\n", "$$ \\begin{bmatrix}\n", "\\blacksquare & \\square & \\square\\\\\n", "\\square & \\blacksquare & \\square\\\\\n", "\\square & \\square & \\blacksquare\\\\\n", "\\square & \\square & \\square\\\\\n", "\\end{bmatrix} \n", "\\hspace{2cm}\n", "\\begin{bmatrix}\n", "\\blacksquare & \\square & \\square & \\square & \\square\\\\\n", "\\square & \\blacksquare & \\square& \\square & \\square\\\\\n", "\\square & \\square & \\blacksquare& \\square & \\square\\end{bmatrix}\n", "\\hspace{2cm}\n", "\\begin{bmatrix}\n", "\\blacksquare & \\square & \\square\\\\\n", "\\square & \\blacksquare & \\square\\\\\n", "\\square & \\square & \\blacksquare\n", "\\end{bmatrix} \n", "$$\n", "\n", "### Numpy methods\n", "- transpose\n", "- .reshape()\n", " - order of reshape (row or column first?)\n" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([[1, 2, 3],\n", " [4, 5, 6]]) \n", "x\n" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 4],\n", " [2, 5],\n", " [3, 6]])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# transpose: flip across the diagonal\n", "y = x.T\n", "y\n" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x\n" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3, 4, 5, 6]])" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reshape allows us to change shape of matrix\n", "# (new matrix must have same total number of elements)\n", "x.reshape((1, 6))\n" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "# x.reshape((1, 8))\n" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z = np.arange(0, 12)\n", "z\n" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11]])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z.reshape((3, 4))\n" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11]])" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# -1 may be used at most in the shape argument\n", "# its value will be chosen to ensure output array has same number of elements\n", "z.reshape((3, -1))\n" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "# be mindful that -1 can be replaced by some integer to keep same number of elements in array\n", "# z.reshape((5, -1))" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11]])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can fill the array across the rows first (order='C') ...\n", "z.reshape((3, 4), order='C')\n" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 3, 6, 9],\n", " [ 1, 4, 7, 10],\n", " [ 2, 5, 8, 11]])" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# or down columns first (order='F') ...\n", "z.reshape((3, 4), order='F')\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## In Class Activity B\n", "1. Build an array by:\n", "- getting 100 equally spaced values from 11 to 42\n", "- reshaping it into an array with 5 columns\n", "2. How much memory does the computer use to store the array above if ...\n", " - ... each item in array is a `float`\n", " - ... each item in array is an 8 bit unsigned integer `np.uint8`\n", " - is anything lost in this representation? (explain in comment please)\n", "3. (++) Build an `11x11` checkerboard matrix. A `3x3` checkerboard is shown below for reference:\n", "$$ \\begin{bmatrix} 0 & 1 & 0 \\\\ 1 & 0 & 1 \\\\ 0 & 1 & 0 \\end{bmatrix} $$\n", "- hint: try `[0, 1] * 3`, how could you use this?\n", "- hint: you can slice matrices just like tuples / lists\n", " - `x[1:3]` gets the second and third items in an array\n" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "# part 1\n", "x = np.linspace(11, 42, 100).reshape((-1, 5))" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "800" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# part 2 (float)\n", "x = np.linspace(11, 42, num=100, dtype=float)\n", "x.nbytes\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "100" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# part 2 (uint8)\n", "x = np.linspace(11, 42, num=100, dtype=np.uint8)\n", "x.nbytes\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Array Indexing (slicing)\n", "\n", "You can index arrays, everything we've previously shown about `start:stop:step` indexing works for arrays too!\n" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(11)\n", "x\n" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[5]\n" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 3, 4, 5])" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[2:6]\n" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 8, 9, 10])" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[-3:]\n" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4])" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[:5]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A two dimensional array requires two indices to get a value: `x[row_idx, col_idx]`\n", "\n", "(Just like our convention for rows first in shape, the row index comes first as we index into the array)\n" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19]])" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(20).reshape((4, 5))\n", "x\n" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# row_idx=1 (second row since python starts counting at 0)\n", "# col_idx=2 (third row since python starts counting at 0)\n", "x[1, 2]\n" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 7])" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can start:stop:step slice either index\n", "\n", "# get a slice of rows and a constant column\n", "x[0:2, 2]\n" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10, 11])" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get a slice of columns and a constant row\n", "x[2, 0:2]\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Super useful slice syntax on arrays:\n", "(so useful it deserves its own title)\n" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# by default, the slice indexing chooses start:stop to give the entire object\n", "x = np.array([1, 2, 3])\n", "x[:]\n" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19]])" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can use this to get an entire rows or columns as needed\n", "x = np.arange(20).reshape((4, 5))\n", "x\n" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 5, 10, 15])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get the first column\n", "x[:, 0]\n" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5, 6, 7, 8, 9])" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get the second row\n", "x[1, :]\n" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 3, 4],\n", " [ 8, 9],\n", " [13, 14],\n", " [18, 19]])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get the last two columns\n", "x[:, -2:]\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## In Class Activity C\n", "\n", "Using a single array slice, extract all values which match each value below in the matrix `x`\n", "- 1\n", "- 2\n", "- 3\n", "- 4\n", "- 5\n", "\n", "- extract the last column of x\n", "- extract the last row of x\n", "- extract the first three elements of the last column of x\n" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [], "source": [ "x = np.array([[0., 1., 3., 0., 5., 5.],\n", " [0., 0., 3., 0., 0., 0.],\n", " [0., 2., 3., 0., 0., 0.],\n", " [0., 0., 3., 4., 4., 4.],\n", " [0., 0., 3., 4., 4., 4.],\n", " [0., 0., 3., 4., 4., 4.]])\n" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0, 1]" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[2, 1]" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3., 3., 3., 3., 3., 3.])" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[:, 2]" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[4., 4., 4.],\n", " [4., 4., 4.],\n", " [4., 4., 4.]])" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[3:, 3:]" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5., 5.])" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0, -2:]" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5., 0., 0., 4., 4., 4.])" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[:, -1]" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0., 0., 3., 4., 4., 4.])" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[-1, :]" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5., 0., 0.])" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[:3, -1]" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }