#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Felix Muzny
11/8/2022
DS 2000
Lecture 18 - sentiment analysis,
            sets,
            moving averages

Logistics:
    - Homework 8 is out
        - this is your last HW for DS 2000
        - there is a significant amount of extra credit available
        (so start early if this is something that you need)
        - due 11/18
        - yes, you are doing sentiment analysis for this homework
    
    - checking your grades:
        - most accurate is calculate based on Gradescope grades
            - HW is 90% (total HW points you got/ total HW points)
            - Quizzes are 10% (drop your lowest quiz)
        - you can look in Canvas—know that this lags behind Gradescope

    - No quiz this week
    
    - No lecture on Friday (Veteran's day!)
        - I'll see you all next week!
        
    - remote attendance (https://bit.ly/remote-ds2000-muzny)
    
    
 Three ways to participate (please do one of these!)
 1) via the PollEverywhere website: https://pollev.com/muzny
 2) via text: text "muzny" to the number 22333 to join the session
 3) via Poll Everywhere app (available for iOS or Android)

"""

"""
HW 7 reflection
---

Look at the sample speeds.pdf figure. 
(Or the one that you produced)

How do we feel about it?

A. looks good
B. I have questions
C. looks bad
D. That line at 0 makes me queasy
E. This graph is wrong?

Why does this look bad?
    - tons of values at zero
    - this doesn't make sense
    - means that speed was 0 for a ton of trips, we
    don't think that this many trips were just people
    getting a bike and not moving

How could we answer "what's going on with 0"?
    - investigate if this is the trips that we
    don't have station data for
    - speed is distance/duration
    -> look at both of these variables to find "where" this 
    result comes from
    - is this correct? Go investigate our calculation for bugs
    - write a little code to display trips with 0 mph speed
    -> look at these with your eyes and see if you can find patterns
"""


"""
Sentiment
----

Is anti-hero (Taylor Swift) positive, negative, neutral?
A. positive
B. negative    
C. neutral
Hypothesis: (15 pos, 80 neg, 5 neutral)

Are the following reviews:
A. positive
B. negative    
C. neutral

The movie ____________ is-forgive the critical jargon-pretty good 
(72 pos)
- most significant words are "pretty good" which is positive

Clearly, _________'s film, while riddled with glaringly 
awful mistakes, is not bad at all. 
(30 pos, 30 neg, 30 neutral)
- "glaringly awful" + "not bad" = neutral ?
- this seems hard because maybe sarcasm/something else?

You can't believe what you're looking at because it's 
so hideous to behold. The best thing here is that it's 
at least under two hours
(75 neg)
- hideous
- the best thing is actually "a bad thing"

The film is a triple-decker weirdburger from the 
twitching ears to the too- long tails that make the 
ensemble look like lemurs.

-> Cats (2017?)


To measure sentiment:
    start with the easiest thing first:
        -> count how many positive and negative words we have
        -> download word lists from the internet/past research
        to load in those words
"""


"""
Sets
---

- a data structure
    -> like: lists, dictionaries
    -> these are for storing a collection of values
    
- a set is a unique collection of values
- it has no order (no indices, no keys)
- it is super fast to look things up in

# not so fast
if value in list:

# super fast
if value in set:
"""
# create a new set
s1 = set()

# sets don't have indices
# TypeError: 'set' object does not support item assignment
# s1[0] = 0

# can't append to a set
# AttributeError: 'set' object has no attribute 'append'
# s1.append(0)

# Add to a set!
s1.add(0)
print(s1)
s1.add(31)
print(s1)

# if I try to add a value again, no error, but doesn't re-add
s1.add(31)
print(s1)
s1.add(31)
s1.add(31)
print(s1) # still {0, 31}

# make a set from a list
ls = [1, 1, 1, 2, 3]
s2 = set(ls)
print(s2)

# test to see if a value is in a set
print(1 in s2)
print(9 in s2)

"""
Moving Averages
---
"""

# from dataproc.py (Prof. Rachlin's version of
# data_utils.py)

def avg(L):
    """ Compute the numerical average of a list of numbers.
    If list is empty, return 0.0 """
    
    if len(L) > 0:
        return sum(L) / len(L)
    else:
        return 0.0

def get_window(L, idx, window_size=1):
    """ Extract a window of values of specified size
    centered on the specified index
    L: List of values
    idx: Center index
    window_size: window size
    """
    minrange = max(idx - window_size // 2, 0)
    maxrange = idx + window_size // 2 + (window_size % 2)
    return L[minrange:maxrange]
    
def moving_average(L, window_size=1):
    """ Compute a moving average over the list L
    using the specified window size
    L: List of values
    window_size - The window size (default=1)
    return - A new list with smoothed values
    """
    mavg = []
    for i in range(len(L)):
        window = get_window(L, i, window_size)
        mavg.append(avg(window))

    return mavg

ls = [1, 2, 3, 7, 10, 12]
print(ls)
print(avg(ls)) # 5.833333333333333

# a moving average smoothes out variations due to
# daily fluctuation (or specific lines that are very pos/neg)
print(get_window(ls, 2, window_size = 3))

# [1.0, 2.0, 3.0, 7.0, 10.0, 12.0]
print(moving_average(ls, window_size = 1))

# index: description
# 0: average of just 1
# 1: average of 1 and 2
# 2: average of 2 and 3
# 3: average of 3 and 7
# [1.0, 1.5, 2.5, 5.0, 8.5, 11.0]
print(moving_average(ls, window_size = 2))

# what happens w/ big window sizes?
# if window_size was 10
# first 9 calculations will have 1, then 2, then 3, etc numbers averaged


"""
Next time:
    - Jupyter Notebooks
    - Classes and objects
"""