#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Felix Muzny 11/29/2022 DS 2000 Lecture 22 - pandas and DataFrames Logistics: - Take the final quiz - OH for the rest of the semester - 4 - 8pm - we're happy to help you with DS 2001 projects AND expect to explain your project/goal to the TA a bit to get help to start with :) - remote attendance (https://bit.ly/remote-ds2000-muzny) Three ways to participate in multiple choice questions 1) via the PollEverywhere website: https://pollev.com/muzny 2) via text: text "muzny" to the number 22333 to join the session 3) via Poll Everywhere app (available for iOS or Android) """ """ Warm-up 0 ---- What is an API? A. Apple Pie Incident B. Android Program Intervention C. Application Programming Interface D. Abstract Python Information E. Automated Port Issuer """ """ Warm-up 1 ---- What is an API? A. A way to programmatically get information from servers on the internet B. A way to find the location of files on a server C. A way to remotely boot your software so that it can be accessed online D. A way to write programs that have no functions E. A way to clean data files so that all the data is nice to work with """ # What did we do on Tuesday (11/22/22)? """ Warm-up 2 --- Please spend 10 minutes filling out your trace evaluations. -> you have *separate* evaluations for DS 2000 & DS 2001 -> I do read these at the end of the semester! -> tell me specifically: -> what did you like? -> what did you wish had been different? If you have already done your trace evaluations, consider the below problem: The time module allows you to ask your computer what time it currently is. How much time does it take to sort a list? Does it matter if the list is already sorted or not? """ import time import random # ls = list(range(100)) # # to shuffle a list into a random order # # with a mutator funtion # # random.shuffle(ls) # # print(ls) # start = time.time() # # your code here # end = time.time() # duration = end - start # print("That took:", duration) # We'll work from here on Friday! # we'll take a look at different data structures # and functions and test out how fast/efficient they are/are not """ Data Science in the real world: a case study --- All data science projects are motivated by a question. We'll take a look at the question "is this map unconsistitutionally gerrymandered?" """ """ pandas & DataFrames --- what is it most useful for? -> if you're going on to DS 2500 -> you'll likely be working with dataframes for ~80% of the semester -> https://course.ccs.neu.edu/ds2500/ -> if you're doing other data science projects in the future and want a bit more analytical power -> if you're curious about working with big data sets efficiently """ """ pandas & DataFrames ---- What is pandas? What is a DataFrame? """ import pandas as pd # read in a file # movies.csv, boston_earnings.csv, trips.csv df = pd.read_csv() # ask the data frame what it's shape is # what columns does it have # get a summary of the data frame as a whole # access a single column # get the max, min, average of a certain column # base stats for every column in the dataframe # find rows that meet a specific condition """ DataFrames + Jupyter notebooks --- Jupyter Notebooks will automatically display DataFrames nicely for us. This makes Jupyter Notebooks a natural choice to use when doing data science investigations that use DataFrames. """ # Next Time # --- # - Timing Experiments # - More Data Science Applications