#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Felix Muzny 10/25/2022 DS 2000 Lecture 14 - dictionaries, part 1 Logistics: - Homework 6 is due Friday @ 9pm - Dictionaries are not required. We don't think that using them will make your lives easier. - NO Quiz this week - Friday lecture materials will be released for asynchronous viewing: - weds or thurs - *IMPORTANT* On Friday, I will be holding office hours during from 9:50 - 10:55am and 1:35 - 2:40pm in RI 236. No class or office hours during Section 4's regular meeting time. (I will be on a train.) - remote attendance (https://bit.ly/remote-ds2000-muzny) Three ways to participate (please do one of these!) 1) via the PollEverywhere website: https://pollev.com/muzny 2) via text: text "muzny" to the number 22333 to join the session 3) via Poll Everywhere app (available for iOS or Android) """ """ Warm-up 1: If I have the following code, what will the output be? """ # we do have a nested loop structure — typically corresponds # to a list of lists ls = [] for i in range(3): # ls = [] # oops! is this an error? for j in range(5): ls.append(j) print(ls) """ A. [0, 1, 2, 3, 4] <--- we reset ls each time the outer loop iterates B. [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]] would have to: ls = [] for i in range(3): ls_inner = [] for j in range(5): ls_inner.append(j) ls.append(ls_inner) print(ls) C. [[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]] D. [0, 1, 2] E. Error """ """ Warm-up 2: If I have the following code, what will it's effect be? line 0 % line 1 line 2 % line 3 line 4 line 5 """ file = open("assumethisfileexists.txt", "r") for line in file: # line = file.readline() # tests to see if the line doesn't start with % if line[0] != "%": print("actual print", line) else: print(line) line = file.readline() print("at the end of the loop:", line) file.close() # line 0 # line 5 # Moral of the story: # you probably (almost always) don't want # to have file.readline() inside of a for loop # that is reading a file # (this will skip lines) """ A. This will print all lines in the file B. This will print all lines in the file that don't start with % (not correct) C. This will print no lines in the file D. This will do something else E. Error """ """ Dictionaries (from lecture 12) --- A dictionary is a *data structure*, similarly to lists. A dictionary links a *key* to a *value*. Examples: student emails -> graduation year words -> definitions words -> counts of how often they occur names -> ages Keys: any immutable data type (ints, floats, strings, booleans) keys are *unique* (no key can occur more than once) Values: any data type values are not unique """ # Dictionary examples # Creating a dictionary # an empty dictionary print("dict examples") ages = {} # a dictionary w/ some starting values ages = {"Felix": 31, "Dylan": 33} print(ages) # number of key-value pairs print(len(ages)) # adding a key/value pair to a dictionary ages["Donald Duck"] = 5 print(ages) # number of key-value pairs print(len(ages)) # updating a key/value pair in a dictionary ages["Felix"] = 32 ages["Felix"] = ages["Felix"] + 1 ages["Felix"] += 1 # equivalent to the line above print(ages) # number of key-value pairs print(len(ages)) print() # what if I look up a key/value that doesn't exist? # KeyError # print(ages["Lizzo"]) # test for a key (not a value!) in a dict if "Lizzo" in ages: print(ages["Lizzo"]) else: print("oh no!") print() # iterating through all values in a dictionary for key in ages: print(key) # key print(ages[key]) # value it is linked to print() for key, value in ages.items(): # key = next key in the dict # value = ages[key] print(key) # key print(value) # value it is linked to print() """ How is a list similar to a dictionary? --- - we can lookup values in them with [] notation - accessing items that exist - and store values in them - for dicts: [] - for lists: append or [] if resetting - dicts have distinct keys and lists have distinct indexes 0 1 [31, 33] Felix Dylan [31, 33] - error if we look for a key or index that doesn't exist - apply len() function to both of them """ """ Writing a function ---- Write a function, count_words, that takes in a string text as input and returns a dictionary of word counts. """ def count_words(text): """ Count each word in a given string. Parameters ---------- text : str words separated by whitespace. Returns ------- dict of word counts. """ words = text.split() counts = {} # to update or insert a key-value pair in a dictionary # dict[key] = value for word in words: # I have never seen this word if word not in counts: counts[word] = 1 # this word is in the dictionary else: counts[word] = counts[word] + 1 return counts test1 = "This is a sentence." counts = count_words(test1) # {"This": 1, "is": 1, "a": 1, "sentence.": 1} print(counts) test2 = "hat hat hat bat cat hat" counts = count_words(test2) # {"hat": 4, "bat": 1, "cat": 1} print(counts) """ Writing a larger program --- Using the movie review data (blackadam.txt and tickettoparadise.txt), write a larger program that will read in each review for the given movie, line-by-line and does two things: 1) Lets the user ask how many times a word exists in the reviews 2) Reports the combined word counts for all reviews for that movie https://www.rottentomatoes.com/m/black_adam https://www.rottentomatoes.com/m/ticket_to_paradise_2022 """ """ Cool-down 1 --- Go pull up the HW 6 write-up. What cleaning must be done to the data that you are working with? - only need to convert ints to strings where appropriate - you probably want to skip the header - you don't need to replace any data with 0s """ """ Next time (what to expect for the asynchronous installment) - combining dictionaries with lists - writing another, larger program! """