# DS2500 Day 21

Mar 31, 2023

### Content
- Weather API
    - timestamps
    - representing a tree via nested dictionaries (json format)
    - making API calls
- üêÇüí© visualizations
    - the colorful language, [it isn't mine](https://www.callingbullshit.org/), though it does add a certain excitement to our ordinarily dry technical lingo

### Admin
- focus on the project!
    - presentatoin preferences due next monday @ 9AM
        - https://piazza.com/class/lbxsbawi9yq2f9/post/403
    - lab sessions next week will support hw7
        - its optional, no need to go if you don't feel the need
        - hw7 is due next monday (April 3) to allow you to ask questions in lab

    - no classes next week, sign up for a project team meeting with Prof Higger
        - link on course website

# Timestamps
## Unix Time

- [UTC 00:00](https://en.wikipedia.org/wiki/UTC%2B00:00) Coordinated Universal Time's "zero" timezome
    - time zone at 0 deg longitude
        - how is 0 deg longitude defined?  
            - A succesfully warring empire (United Kingdom) chose it 
                - (personally, I'd find it convenient if a metric system loving empire had been more successful at war ...)
- [Unix Time](https://en.wikipedia.org/wiki/Unix_time) is The number of seconds which have passed since 00:00:00 UTC on 1 Jan 1970 (ignoring leap seconds)
- UTC is time zone agnostic 
    - (more on this next lesson...)

## Python's `datetime` & `timedelta`
- helpful for all those pesky unit conversions

In [2]:
from datetime import datetime, timedelta

utc_example = 1613286000

# WARNING! assumes the time zone of the machine its running on!
# (we'll see this issue again later ...)
dt0 = datetime.fromtimestamp(utc_example)
dt0

datetime.datetime(2021, 2, 14, 2, 0)

[further reading](https://docs.python.org/3/library/datetime.html#aware-and-naive-objects) on the datetime above being timezone agnostic.

In [3]:
# we can access meaningful date attributes of a datetime object
# year, month, day, hour, minute, second
dt0.month, dt0.day

(2, 14)

In [4]:
dt0.second

0

In [5]:
dt0.year

2021

In [26]:
dt1 = datetime.now()
dt1

datetime.datetime(2023, 3, 31, 10, 8, 31, 404973)

In [7]:
dt2 = datetime(year=2222, month=2, day=2, hour=2)
dt2

datetime.datetime(2222, 2, 2, 2, 0)

In [8]:
# time delta measure differences between two datetimes:
dt2 - dt1

datetime.timedelta(days=72625, seconds=57279, microseconds=96793)

In [16]:
# you can build them explicitly:
offset = timedelta(days=2, seconds=123456)
offset

datetime.timedelta(days=3, seconds=37056)

In [17]:
# and operate with them (datetime + timedelta = datetime)
dt2 + offset

datetime.datetime(2222, 2, 5, 12, 17, 36)

In [27]:
# how many seconds old are you?
(datetime.now() - datetime(year=2000, month=1, day=1)).total_seconds()

733572534.414235

In [29]:
# you've got a billionth second birthday coming up around age 31 or so:
billion_sec = timedelta(seconds=1e9)
billion_sec.days/365

31.70958904109589

In [30]:
# be sure to compute below so you can plan that party 10 years from now
datetime(year=2000, month=1, day=1) + billion_sec

datetime.datetime(2031, 9, 9, 1, 46, 40)

# `pd.to_datetime()`

Use pandas's [pd.to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) function to convert a column of your dataframe to datetime objects.  

`to_datetime()` does a pretty good job guessing your format, but if it runs into trouble you've always got [strftime & strptime](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [33]:
import pandas as pd

s = pd.Series(['dec 23, 2000', 'jan 1, 2039 3AM', 'jan 14, 2039 14:00', 'not really a toHFDSOAIUSHFIAUime'])

s

0                        dec 23, 2000
1                     jan 1, 2039 3AM
2                  jan 14, 2039 14:00
3    not really a toHFDSOAIUSHFIAUime
dtype: object

In [35]:
# the errors = 'coerce' argument yield "NaT", not-a-time, objects for inputs which
# can't be converted.  without this generates an error
pd.to_datetime(s, errors='coerce')

0   2000-12-23 00:00:00
1   2039-01-01 03:00:00
2   2039-01-14 14:00:00
3                   NaT
dtype: datetime64[ns]

# Do you know what time it is?
[after reading this, I'm not sure that I do ...](https://www.creativedeletion.com/2015/01/28/falsehoods-programmers-date-time-zones.html)

takeaways:
- don't underestimate the difficulty of describing time unabmiguously, its hard!
- use a library where you might run into time issues:
    - timezones
    - leap year / second
    - varying days of month / year
    - time formatting 'Feb' vs 'February' etc
- Unix Time isn't human readable ... but it is unambiguous.  

### Punchline: measuring time is hard, don't underestimate it (I certainly have!)

## Representing Trees as Lists & Dictionaries
- useful for representing a tree of data
- (our API calls will return nested dictionaries)

<img src="https://i.ibb.co/Pmxqpb3/tree-ex.png" alt="Drawing" style="width: 400px;"/>

In [39]:
red_branch_dict = {'a': 0, 'b': 1, 'c': 2}
blu_branch_dict = {'x': 24, 'y': 25, 'z': 26}
tree_dict = {'f': red_branch_dict,
             'g': blu_branch_dict}
tree_dict

{'f': {'a': 0, 'b': 1, 'c': 2}, 'g': {'x': 24, 'y': 25, 'z': 26}}

In [40]:
tree_dict['f']['b']

1

<img src="https://i.ibb.co/4SSH4mm/tree-ex2.png" alt="Drawing" style="width: 600px;"/>

In [44]:
dict0 = {'num': [3, 1, 4, 1, 5, 9, 2, 6],
        'letter': 'C'}
dict1 = {'num': 17,
        'letter': 'R'}
dict2 = {'num': 21,
        'letter': 'S'}
dict_of_dict = {0: dict0,
                1: dict1,
                2: dict2}
list_of_dict = [dict0, dict1, dict2]

In [46]:
dict_of_dict[1]['letter']

'R'

In [45]:
list_of_dict[1]['letter']

'R'

## In class activity 1:
1. Express all of the following penguin group's height and weight as a list of dictionaries:
<img src="https://i.ibb.co/XXzX4Wk/penguin-tree.png" alt="Drawing" style="width: 700px;"/>

In [51]:
grn_dict = {'height': [2, 3, 2],
            'weight': [4, 6, 5]}
blu_dict = {'height': [4, 6, 5],
            'weight': [2, 3, 2]}
red_dict = {'height': [10, 7, 8],
            'weight': [11, 6, 5]}
peng_data = [grn_dict,
             blu_dict,
             red_dict]
peng_data2 = {'penguin0': grn_dict,
              'penguin1': blu_dict,
              'penguin2': red_dict}

In [49]:
peng_data[0]['height']

[2, 3, 2]

# API
###  Definitions
**API** Application Program Interface
 - within DS: a server which gives out data (often over the internet)
 - note: 'API', in general, refers to the barrier between two pieces of software, has a specific meaning in DS
 
 
 **JSON** JavaScript Object Notation
  - a method of storing objects as text
  - much like the nested dictionaries ... JSON and similar formats are often trees

## OpenWeather API
What information does this offer?

- [https://openweathermap.org/api](https://openweathermap.org/api)
- (note, you won't have access to all that, see the "free" column [here](https://openweathermap.org/price))

How do I get ready to use it?
- sign up for an account
    - [https://home.openweathermap.org/users/sign_up](https://home.openweathermap.org/users/sign_up)
- get an api key
    - [https://home.openweathermap.org/api_keys](https://home.openweathermap.org/api_keys)
        
Think of APIs as a hybrid of a website and a function.  Its a website where your query is stored in the address:

In [53]:
api_key = 'eea5fcef9a7ea19505dc1c165bacac4a'

# north = positive, south = negative
lat = 42.3601
# west = positive, east = negative
lon = -71.0589

units = 'imperial'

url = f'https://api.openweathermap.org/data/2.5/forecast?lat={lat}&lon={lon}&units={units}&appid={api_key}'
print(url)

https://api.openweathermap.org/data/2.5/forecast?lat=42.3601&lon=-71.0589&units=imperial&appid=eea5fcef9a7ea19505dc1c165bacac4a


In [55]:
import requests

# get url as a string
url_text = requests.get(url).text    
url_text

'{"cod":"200","message":0,"cnt":40,"list":[{"dt":1680274800,"main":{"temp":40.91,"feels_like":37.31,"temp_min":40.91,"temp_max":43.12,"pressure":1027,"sea_level":1027,"grnd_level":1026,"humidity":36,"temp_kf":-1.23},"weather":[{"id":803,"main":"Clouds","description":"broken clouds","icon":"04d"}],"clouds":{"all":75},"wind":{"speed":5.32,"deg":252,"gust":10.31},"visibility":10000,"pop":0,"sys":{"pod":"d"},"dt_txt":"2023-03-31 15:00:00"},{"dt":1680285600,"main":{"temp":44.42,"feels_like":41.05,"temp_min":44.42,"temp_max":51.44,"pressure":1026,"sea_level":1026,"grnd_level":1023,"humidity":32,"temp_kf":-3.9},"weather":[{"id":803,"main":"Clouds","description":"broken clouds","icon":"04d"}],"clouds":{"all":81},"wind":{"speed":5.95,"deg":213,"gust":12.77},"visibility":10000,"pop":0,"sys":{"pod":"d"},"dt_txt":"2023-03-31 18:00:00"},{"dt":1680296400,"main":{"temp":43.93,"feels_like":37.71,"temp_min":43.93,"temp_max":45.45,"pressure":1023,"sea_level":1023,"grnd_level":1020,"humidity":48,"temp_kf

The resulting JSON object is a dictionary of dictionaries (or a list of dictionaries) tree as seen in the previous section.

You can convert it from string to dicts and lists via:

In [67]:
import json

# convert json to a nested dict
weather_dict = json.loads(url_text)

weather_dict;

In [70]:
weather_dict['list'][2]

{'dt': 1680296400,
 'main': {'temp': 43.93,
  'feels_like': 37.71,
  'temp_min': 43.93,
  'temp_max': 45.45,
  'pressure': 1023,
  'sea_level': 1023,
  'grnd_level': 1020,
  'humidity': 48,
  'temp_kf': -0.84},
 'weather': [{'id': 500,
   'main': 'Rain',
   'description': 'light rain',
   'icon': '10d'}],
 'clouds': {'all': 92},
 'wind': {'speed': 12.28, 'deg': 194, 'gust': 18.52},
 'visibility': 10000,
 'pop': 0.4,
 'rain': {'3h': 0.11},
 'sys': {'pod': 'd'},
 'dt_txt': '2023-03-31 21:00:00'}

## Cleaning up data from one instant

In [71]:
d = weather_dict['list'][0]
d

{'dt': 1680274800,
 'main': {'temp': 40.91,
  'feels_like': 37.31,
  'temp_min': 40.91,
  'temp_max': 43.12,
  'pressure': 1027,
  'sea_level': 1027,
  'grnd_level': 1026,
  'humidity': 36,
  'temp_kf': -1.23},
 'weather': [{'id': 803,
   'main': 'Clouds',
   'description': 'broken clouds',
   'icon': '04d'}],
 'clouds': {'all': 75},
 'wind': {'speed': 5.32, 'deg': 252, 'gust': 10.31},
 'visibility': 10000,
 'pop': 0,
 'sys': {'pod': 'd'},
 'dt_txt': '2023-03-31 15:00:00'}

In [72]:
# lets convert from unix time to a datetime (easier to use)
d['datetime'] = datetime.fromtimestamp(d['dt'])

d

{'dt': 1680274800,
 'main': {'temp': 40.91,
  'feels_like': 37.31,
  'temp_min': 40.91,
  'temp_max': 43.12,
  'pressure': 1027,
  'sea_level': 1027,
  'grnd_level': 1026,
  'humidity': 36,
  'temp_kf': -1.23},
 'weather': [{'id': 803,
   'main': 'Clouds',
   'description': 'broken clouds',
   'icon': '04d'}],
 'clouds': {'all': 75},
 'wind': {'speed': 5.32, 'deg': 252, 'gust': 10.31},
 'visibility': 10000,
 'pop': 0,
 'sys': {'pod': 'd'},
 'dt_txt': '2023-03-31 15:00:00',
 'datetime': datetime.datetime(2023, 3, 31, 11, 0)}

In [82]:
some_dict = {'a': 1, 'b': 2}
some_dict

{'a': 1, 'b': 2}

In [84]:
some_other_dict = {'c': 3, 'd': 1000}

# some_dict.update(some_other_dict)

for key, val in some_other_dict:
    some_dict[key] = val

some_dict

{'a': 1, 'b': 2, 'c': 3, 'd': 1000}

In [75]:
# lets "flatten" the main dictionary
d.update(d['main'])
del d['main']

d

{'dt': 1680274800,
 'weather': [{'id': 803,
   'main': 'Clouds',
   'description': 'broken clouds',
   'icon': '04d'}],
 'clouds': {'all': 75},
 'wind': {'speed': 5.32, 'deg': 252, 'gust': 10.31},
 'visibility': 10000,
 'pop': 0,
 'sys': {'pod': 'd'},
 'dt_txt': '2023-03-31 15:00:00',
 'datetime': datetime.datetime(2023, 3, 31, 11, 0),
 'temp': 40.91,
 'feels_like': 37.31,
 'temp_min': 40.91,
 'temp_max': 43.12,
 'pressure': 1027,
 'sea_level': 1027,
 'grnd_level': 1026,
 'humidity': 36,
 'temp_kf': -1.23}

# Why doesn't our datetime match the "dt_txt"?

[lets examine the API's documentation](https://openweathermap.org/forecast5)

(spoiler: the forecasted day is in UTC 00:00, our function converts it to our computer's timezone ... see how sneaky measuring time can be?)

# Storing an API key in another file

- security of your account
- easily swappable with another user

In [77]:
from weather_api import api_key

# nice to keep it a bit more hidden in your code
# (though you'd do better not to parrot it on a jupyter output cell,
# I just want to show you all how this works)
api_key

'eea5fcef9a7ea19505dc1c165bacac4a'

## In Class Activity 2
    
1. Make a function `get_forecast` which accepts:
    - `lat`
    - `lon`
    - `api_key`
    - `units` (default = 'imperial')
    
    and returns a dataframe of the forecasted weather
    
```python
# clean it up
row_list = list()
for d in weather_dict['list']:
    # process dictionary into a row

    # store
    row_list.append(d)

# convert list of dictionaries to dataframe
df = pd.DataFrame(row_list)
```

2. "flatten" the parts of the dictionary which make sense (see "main" example a few cells above).


## Test Case

Osaka, Japan is located at:

    34.6937¬∞ N, 135.5023¬∞ E
    
The first few rows of my call 

```python
df_osaka = get_forecast(lat=34.6937, lon=135.5023, api_key=api_key)
```

yielded the following dataframe:

|   | dt         | visibility | pop | dt_txt              | all | speed | deg | gust | pod | temp  | feels_like | temp_min | temp_max | pressure | sea_level | grnd_level | humidity | temp_kf | id  | main   | description | icon |
|---|------------|------------|-----|---------------------|-----|-------|-----|------|-----|-------|------------|----------|----------|----------|-----------|------------|----------|---------|-----|--------|-------------|------|
| 0 | 1680264000 | 10000      | 0   | 2023-03-31 12:00:00 | 0   | 4.79  | 222 | 7.87 | n   | 60.35 | 58.24      | 60.04    | 60.35    | 1016     | 1016      | 1016       | 46       | 0.17    | 800 | Clear  | clear sky   | 01n  |
| 1 | 1680274800 | 10000      | 0   | 2023-03-31 15:00:00 | 11  | 3.18  | 211 | 6.51 | n   | 59.38 | 57.56      | 57.43    | 59.38    | 1017     | 1017      | 1016       | 54       | 1.08    | 801 | Clouds | few clouds  | 02n  |
| 2 | 1680285600 | 10000      | 0   | 2023-03-31 18:00:00 | 21  | 1.72  | 78  | 2.37 | n   | 57.18 | 55.42      | 55.6     | 57.18    | 1017     | 1017      | 1015       | 60       | 0.88    | 801 | Clouds | few clouds  | 02n  |
| 3 | 1680296400 | 10000      | 0   | 2023-03-31 21:00:00 | 4   | 3     | 58  | 4.52 | d   | 53.69 | 51.82      | 53.69    | 53.69    | 1017     | 1017      | 1015       | 65       | 0       | 800 | Clear  | clear sky   | 01d  |
| 4 | 1680307200 | 10000      | 0   | 2023-04-01 00:00:00 | 2   | 3.69  | 50  | 4.34 | d   | 61.18 | 59.25      | 61.18    | 61.18    | 1018     | 1018      | 1016       | 48       | 0       | 800 | Clear  | clear sky   | 01d  |

In [79]:
from weather_api import api_key


def get_forecast(lat, lon, api_key, units='imperial'):
    """ returns forecast
    
    https://openweathermap.org/forecast5
    
    Args:
        lat (float): lattitude (positive is north)
        lon (float): longitude (positive is east)
        api_key (str): api key for openweather
        units (str): standard, metric or imperial.  see link for
            details
            
    Returns:
        df (pd.DataFrame): forecasted weather, one row per 
            instant and one column per feature
    """
    
    # get data from api
    url = f'https://api.openweathermap.org/data/2.5/forecast?lat={lat}&lon={lon}&units={units}&appid={api_key}'
    url_text = requests.get(url).text 
    weather_dict = json.loads(url_text)
    
    # clean it up
    row_list = list()
    for d in weather_dict['list']:
        # process dictionary into a row
        
        # lets "flatten" the main dictionary
        for feat in ['clouds', 'wind', 'sys', 'main']:
            d.update(d[feat])
            del d[feat]
            
        # flattening "weather" is funny, its a list (of length 1) of dicts
        d.update(d['weather'][0])
        del d['weather']

        row_list.append(d)
        
    return pd.DataFrame(row_list)

In [80]:
get_forecast(lat=42.21, lon=-71, api_key=api_key)

Unnamed: 0,dt,visibility,pop,dt_txt,all,speed,deg,gust,pod,temp,...,pressure,sea_level,grnd_level,humidity,temp_kf,id,main,description,icon,rain
0,1680274800,10000,0.0,2023-03-31 15:00:00,0,5.64,255,10.49,d,42.44,...,1028,1028,1024,36,-0.39,800,Clear,clear sky,01d,
1,1680285600,10000,0.0,2023-03-31 18:00:00,31,6.42,222,12.33,d,45.3,...,1026,1026,1021,33,-3.19,802,Clouds,scattered clouds,03d,
2,1680296400,10000,0.35,2023-03-31 21:00:00,67,13.44,196,21.12,d,43.3,...,1023,1023,1019,51,-0.24,500,Rain,light rain,10d,{'3h': 0.1}
3,1680307200,10000,0.91,2023-04-01 00:00:00,100,10.33,168,31.03,n,40.3,...,1019,1019,1017,81,0.0,500,Rain,light rain,10n,{'3h': 2.83}
4,1680318000,10000,1.0,2023-04-01 03:00:00,100,11.41,182,34.87,n,44.56,...,1014,1014,1012,93,0.0,501,Rain,moderate rain,10n,{'3h': 4.21}
5,1680328800,10000,1.0,2023-04-01 06:00:00,100,4.94,235,16.51,n,46.24,...,1011,1011,1009,97,0.0,500,Rain,light rain,10n,{'3h': 0.41}
6,1680339600,10000,0.0,2023-04-01 09:00:00,100,2.57,234,8.03,n,45.07,...,1008,1008,1006,98,0.0,804,Clouds,overcast clouds,04n,
7,1680350400,787,0.16,2023-04-01 12:00:00,100,4.27,171,17.85,d,46.47,...,1005,1005,1003,99,0.0,804,Clouds,overcast clouds,04d,
8,1680361200,10000,1.0,2023-04-01 15:00:00,100,13.89,208,40.06,d,51.55,...,1000,1000,998,96,0.0,501,Rain,moderate rain,10d,{'3h': 5.22}
9,1680372000,8290,1.0,2023-04-01 18:00:00,100,16.98,215,42.14,d,55.0,...,996,996,994,93,0.0,500,Rain,light rain,10d,{'3h': 1.98}


In [27]:
df_osaka = get_forecast(lat=34.6937, lon=135.5023, api_key=api_key)
df_osaka

Unnamed: 0,dt,visibility,pop,dt_txt,all,speed,deg,gust,pod,temp,...,temp_max,pressure,sea_level,grnd_level,humidity,temp_kf,id,main,description,icon
0,1680264000,10000,0.0,2023-03-31 12:00:00,0,4.79,222,7.87,n,58.62,...,60.04,1017,1017,1016,44,-0.79,800,Clear,clear sky,01n
1,1680274800,10000,0.0,2023-03-31 15:00:00,11,3.18,211,6.51,n,58.23,...,58.23,1017,1017,1016,52,0.44,801,Clouds,few clouds,02n
2,1680285600,10000,0.0,2023-03-31 18:00:00,21,1.72,78,2.37,n,56.61,...,56.61,1017,1017,1015,60,0.56,801,Clouds,few clouds,02n
3,1680296400,10000,0.0,2023-03-31 21:00:00,4,3.0,58,4.52,d,53.69,...,53.69,1017,1017,1015,65,0.0,800,Clear,clear sky,01d
4,1680307200,10000,0.0,2023-04-01 00:00:00,2,3.69,50,4.34,d,61.18,...,61.18,1018,1018,1016,48,0.0,800,Clear,clear sky,01d
5,1680318000,10000,0.0,2023-04-01 03:00:00,4,4.27,21,6.55,d,70.41,...,70.41,1016,1016,1014,24,0.0,800,Clear,clear sky,01d
6,1680328800,10000,0.0,2023-04-01 06:00:00,15,2.8,346,5.66,d,73.67,...,73.67,1013,1013,1011,18,0.0,801,Clouds,few clouds,02d
7,1680339600,10000,0.0,2023-04-01 09:00:00,66,10.29,356,13.22,d,67.93,...,67.93,1014,1014,1012,28,0.0,803,Clouds,broken clouds,04d
8,1680350400,10000,0.0,2023-04-01 12:00:00,81,8.9,14,13.04,n,61.41,...,61.41,1016,1016,1014,41,0.0,803,Clouds,broken clouds,04n
9,1680361200,10000,0.0,2023-04-01 15:00:00,45,7.72,23,12.24,n,57.07,...,57.07,1016,1016,1014,50,0.0,802,Clouds,scattered clouds,03n
