DS2000 (Spring 2019, NCH) :: Lecture 8b¶

0. Administrivia¶

Due today @ 9pm: HW7 (submit via Blackboard)
Due before Monday's lecture: pre-class quiz (via Blackboard; feel free to use book/notes/Python)
Wednesday (in practicum): in-class quiz 5 (Dictionaries, OOP; last ICQ!)
Due next Friday @ 9pm: HW8 (submit via Blackboard)
No Derbinsky office hour next Friday :(

OOP Example: Sentiment Analysis¶

Source: http://ai.stanford.edu/~amaas/data/sentiment/

Recall our goal: sentiment analysis (categorize movie reviews as positive/negative based upon positivity/negativity of the words used in the review)

Recall also that in order to score a review we had to perform several (complex) steps...

Read in the vocabulary from a file (vocab = read_vocab())
Read in the weights from a file (weights = read_weights())
Create a dictionary from these two (vocab_dict = make_word_weight_dict(vocab, weights))
Use the dictionary to score a word/collection of words

So let's think about doing this in an OOP fashion (SentimentAnalyzer)...

On construction (__init__), does steps 1-3 above (dictionary is stored as object state)
Provide a method to score a word

class SentimentAnalyzer:
    
    def __init__(self, vocab_fpath, weights_fpath):
        with open(vocab_fpath, 'r', encoding='utf8') as vocab, open(weights_fpath, 'r', encoding='utf8') as weights:
            self.vocab_dict = {word:float(weight) for word, weight in zip(vocab.read().split(), weights.read().split())}

    def score_word(self, word):
        return self.vocab_dict.get(word, 0)
    
    def score_collection(self, phrase):
        if not phrase:
            return 0
        else:
            return sum([self.score_word(word) for word in phrase]) / len(phrase)

import os

imdb_sentiment = SentimentAnalyzer(os.path.join('aclImdb', 'imdb.vocab'), os.path.join('aclImdb', 'imdbEr.txt'))

print(imdb_sentiment.score_word('terrible'))

-2.18077869986

print(imdb_sentiment.score_word('hilarious'))

0.993189582589

review = '''
A surprisingly beautiful movie. 
Beautifully conceived, beautifully directed, beautifully acted, beautifully acted and most beautifully photographed.....the cinematography is nothing short of splendid. 
It is a war movie but is epic in it's scope and blends romance, tragedy and comedy into a story that is as harrowing as it is provoking.
'''

print(imdb_sentiment.score_collection(review.split()))

0.36281647436158926

Isn't this nice and convenient? Benefits:

It can be used with other vocab/weight files
You don't have to keep track of lists/dictionaries to use it

OOP Example: Writing Tests!¶

Let's build up to you writing your own tests using Python's unittest module. If you look, every test file you've used has had the following class TestAssignment(unittest.TestCase):. Let's take a detour to explain this...

Detour: Inheritance¶

The basic idea is to support creating hierarchies of related classes, whereby more specialized classes retain functions from more general classes. For example...

class Rectangle:
    
    def __init__(self, length, width):
        self.length = length
        self.width = width
        
    def __str__(self):
        return "{} x {} (area={}, perimeter={})".format(self.length, self.width, self.area(), self.perimeter())
        
    def perimeter(self):
        return 2 * (self.length + self.width)
    
    def area(self):
        return self.length * self.width

##

print(Rectangle(3, 5))
print(Rectangle(4, 4))

3 x 5 (area=15, perimeter=16)
4 x 4 (area=16, perimeter=16)

What actually is that second example? It's a square! Which is a special kind of rectangle...

# Fancy way of saying that every instance of
# a square is an instance of a rectangle
# Technically: the square class *extends* the rectangle class
class Square(Rectangle):
    
    def __init__(self, side):
        Rectangle.__init__(self, side, side)

##

print(Square(4))

4 x 4 (area=16, perimeter=16)

Back to Tests!¶

Import unittest
Create a class that extends unittest.TestCase
Any method(s) you write whose name begins with the word test are considered tests to be run
Call unittest.main() to run the tests and provide feedback

A test is any code you want, but the most important lines create assertions, which claim something should be true and will cause a test failure if that's not the case.

import unittest

class TestShapes(unittest.TestCase):
    
    def test_rectangle(self):
        r = Rectangle(3, 5)
        
        self.assertEqual(r.perimeter(), 16)
        self.assertEqual(r.area(), 15)
        self.assertEqual(str(r), "3 x 5 (area=15, perimeter=16)")
    
    def test_square(self):
        r = Square(4)
        
        self.assertEqual(r.perimeter(), 16)
        self.assertEqual(r.area(), 16)
        self.assertEqual(str(r), "4 x 4 (area=16, perimeter=16)")
        
    def test_made_to_fail(self):
        self.assertFalse(True)

# There's some extra stuff here specific to notebooks
unittest.main(argv=['first-arg-is-ignored'], exit=False, verbosity=2)

# In general...
# if __name__ == '__main__':
#     unittest.main() # verbosity only if you want extra detail

test_made_to_fail (__main__.TestShapes) ... FAIL
test_rectangle (__main__.TestShapes) ... ok
test_square (__main__.TestShapes) ... ok

======================================================================
FAIL: test_made_to_fail (__main__.TestShapes)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-32-ad0a93504f8b>", line 20, in test_made_to_fail
    self.assertFalse(True)
AssertionError: True is not false

----------------------------------------------------------------------
Ran 3 tests in 0.003s

FAILED (failures=1)

<unittest.main.TestProgram at 0x113dd1860>