DS2000 (Spring 2019, NCH) :: Lecture 8b

0. Administrivia

  1. Due today @ 9pm: HW7 (submit via Blackboard)
  2. Due before Monday's lecture: pre-class quiz (via Blackboard; feel free to use book/notes/Python)
  3. Wednesday (in practicum): in-class quiz 5 (Dictionaries, OOP; last ICQ!)
  4. Due next Friday @ 9pm: HW8 (submit via Blackboard)
  5. No Derbinsky office hour next Friday :(

OOP Example: Sentiment Analysis

Recall our goal: sentiment analysis (categorize movie reviews as positive/negative based upon positivity/negativity of the words used in the review)

Recall also that in order to score a review we had to perform several (complex) steps...

  1. Read in the vocabulary from a file (vocab = read_vocab())
  2. Read in the weights from a file (weights = read_weights())
  3. Create a dictionary from these two (vocab_dict = make_word_weight_dict(vocab, weights))
  4. Use the dictionary to score a word/collection of words

So let's think about doing this in an OOP fashion (SentimentAnalyzer)...

  • On construction (__init__), does steps 1-3 above (dictionary is stored as object state)
  • Provide a method to score a word
In [11]:
class SentimentAnalyzer:
    
    def __init__(self, vocab_fpath, weights_fpath):
        with open(vocab_fpath, 'r', encoding='utf8') as vocab, open(weights_fpath, 'r', encoding='utf8') as weights:
            self.vocab_dict = {word:float(weight) for word, weight in zip(vocab.read().split(), weights.read().split())}

    def score_word(self, word):
        return self.vocab_dict.get(word, 0)
    
    def score_collection(self, phrase):
        if not phrase:
            return 0
        else:
            return sum([self.score_word(word) for word in phrase]) / len(phrase)
In [12]:
import os

imdb_sentiment = SentimentAnalyzer(os.path.join('aclImdb', 'imdb.vocab'), os.path.join('aclImdb', 'imdbEr.txt'))
In [13]:
print(imdb_sentiment.score_word('terrible'))
-2.18077869986
In [14]:
print(imdb_sentiment.score_word('hilarious'))
0.993189582589
In [15]:
review = '''
A surprisingly beautiful movie. 
Beautifully conceived, beautifully directed, beautifully acted, beautifully acted and most beautifully photographed.....the cinematography is nothing short of splendid. 
It is a war movie but is epic in it's scope and blends romance, tragedy and comedy into a story that is as harrowing as it is provoking.
'''

print(imdb_sentiment.score_collection(review.split()))
0.36281647436158926

Isn't this nice and convenient? Benefits:

  • It can be used with other vocab/weight files
  • You don't have to keep track of lists/dictionaries to use it

OOP Example: Writing Tests!

Let's build up to you writing your own tests using Python's unittest module. If you look, every test file you've used has had the following class TestAssignment(unittest.TestCase):. Let's take a detour to explain this...

Detour: Inheritance

The basic idea is to support creating hierarchies of related classes, whereby more specialized classes retain functions from more general classes. For example...

In [22]:
class Rectangle:
    
    def __init__(self, length, width):
        self.length = length
        self.width = width
        
    def __str__(self):
        return "{} x {} (area={}, perimeter={})".format(self.length, self.width, self.area(), self.perimeter())
        
    def perimeter(self):
        return 2 * (self.length + self.width)
    
    def area(self):
        return self.length * self.width

##

print(Rectangle(3, 5))
print(Rectangle(4, 4))
3 x 5 (area=15, perimeter=16)
4 x 4 (area=16, perimeter=16)

What actually is that second example? It's a square! Which is a special kind of rectangle...

In [23]:
# Fancy way of saying that every instance of
# a square is an instance of a rectangle
# Technically: the square class *extends* the rectangle class
class Square(Rectangle):
    
    def __init__(self, side):
        Rectangle.__init__(self, side, side)

##

print(Square(4))
4 x 4 (area=16, perimeter=16)

Back to Tests!

  1. Import unittest
  2. Create a class that extends unittest.TestCase
  3. Any method(s) you write whose name begins with the word test are considered tests to be run
  4. Call unittest.main() to run the tests and provide feedback

A test is any code you want, but the most important lines create assertions, which claim something should be true and will cause a test failure if that's not the case.

In [32]:
import unittest

class TestShapes(unittest.TestCase):
    
    def test_rectangle(self):
        r = Rectangle(3, 5)
        
        self.assertEqual(r.perimeter(), 16)
        self.assertEqual(r.area(), 15)
        self.assertEqual(str(r), "3 x 5 (area=15, perimeter=16)")
    
    def test_square(self):
        r = Square(4)
        
        self.assertEqual(r.perimeter(), 16)
        self.assertEqual(r.area(), 16)
        self.assertEqual(str(r), "4 x 4 (area=16, perimeter=16)")
        
    def test_made_to_fail(self):
        self.assertFalse(True)

# There's some extra stuff here specific to notebooks
unittest.main(argv=['first-arg-is-ignored'], exit=False, verbosity=2)

# In general...
# if __name__ == '__main__':
#     unittest.main() # verbosity only if you want extra detail
test_made_to_fail (__main__.TestShapes) ... FAIL
test_rectangle (__main__.TestShapes) ... ok
test_square (__main__.TestShapes) ... ok

======================================================================
FAIL: test_made_to_fail (__main__.TestShapes)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-32-ad0a93504f8b>", line 20, in test_made_to_fail
    self.assertFalse(True)
AssertionError: True is not false

----------------------------------------------------------------------
Ran 3 tests in 0.003s

FAILED (failures=1)
Out[32]:
<unittest.main.TestProgram at 0x113dd1860>