Recall our goal: sentiment analysis (categorize movie reviews as positive/negative based upon positivity/negativity of the words used in the review)
Recall also that in order to score a review we had to perform several (complex) steps...
vocab = read_vocab()
)weights = read_weights()
)vocab_dict = make_word_weight_dict(vocab, weights)
)So let's think about doing this in an OOP fashion (SentimentAnalyzer
)...
__init__
), does steps 1-3 above (dictionary is stored as object state)class SentimentAnalyzer:
def __init__(self, vocab_fpath, weights_fpath):
with open(vocab_fpath, 'r', encoding='utf8') as vocab, open(weights_fpath, 'r', encoding='utf8') as weights:
self.vocab_dict = {word:float(weight) for word, weight in zip(vocab.read().split(), weights.read().split())}
def score_word(self, word):
return self.vocab_dict.get(word, 0)
def score_collection(self, phrase):
if not phrase:
return 0
else:
return sum([self.score_word(word) for word in phrase]) / len(phrase)
import os
imdb_sentiment = SentimentAnalyzer(os.path.join('aclImdb', 'imdb.vocab'), os.path.join('aclImdb', 'imdbEr.txt'))
print(imdb_sentiment.score_word('terrible'))
print(imdb_sentiment.score_word('hilarious'))
review = '''
A surprisingly beautiful movie.
Beautifully conceived, beautifully directed, beautifully acted, beautifully acted and most beautifully photographed.....the cinematography is nothing short of splendid.
It is a war movie but is epic in it's scope and blends romance, tragedy and comedy into a story that is as harrowing as it is provoking.
'''
print(imdb_sentiment.score_collection(review.split()))
Isn't this nice and convenient? Benefits:
Let's build up to you writing your own tests using Python's unittest
module. If you look, every test file you've used has had the following class TestAssignment(unittest.TestCase):
. Let's take a detour to explain this...
The basic idea is to support creating hierarchies of related classes, whereby more specialized classes retain functions from more general classes. For example...
class Rectangle:
def __init__(self, length, width):
self.length = length
self.width = width
def __str__(self):
return "{} x {} (area={}, perimeter={})".format(self.length, self.width, self.area(), self.perimeter())
def perimeter(self):
return 2 * (self.length + self.width)
def area(self):
return self.length * self.width
##
print(Rectangle(3, 5))
print(Rectangle(4, 4))
What actually is that second example? It's a square! Which is a special kind of rectangle...
# Fancy way of saying that every instance of
# a square is an instance of a rectangle
# Technically: the square class *extends* the rectangle class
class Square(Rectangle):
def __init__(self, side):
Rectangle.__init__(self, side, side)
##
print(Square(4))
unittest
unittest.TestCase
test
are considered tests to be rununittest.main()
to run the tests and provide feedbackA test is any code you want, but the most important lines create assertions, which claim something should be true and will cause a test failure if that's not the case.
import unittest
class TestShapes(unittest.TestCase):
def test_rectangle(self):
r = Rectangle(3, 5)
self.assertEqual(r.perimeter(), 16)
self.assertEqual(r.area(), 15)
self.assertEqual(str(r), "3 x 5 (area=15, perimeter=16)")
def test_square(self):
r = Square(4)
self.assertEqual(r.perimeter(), 16)
self.assertEqual(r.area(), 16)
self.assertEqual(str(r), "4 x 4 (area=16, perimeter=16)")
def test_made_to_fail(self):
self.assertFalse(True)
# There's some extra stuff here specific to notebooks
unittest.main(argv=['first-arg-is-ignored'], exit=False, verbosity=2)
# In general...
# if __name__ == '__main__':
# unittest.main() # verbosity only if you want extra detail