# DS2500 Day 21

Mar 31, 2023

### Content
- Weather API
    - timestamps
    - representing a tree via nested dictionaries (json format)
    - making API calls
- üêÇüí© visualizations
    - the colorful language, [it isn't mine](https://www.callingbullshit.org/), though it does add a certain excitement to our ordinarily dry technical lingo

### Admin
- focus on the project!
    - presentatoin preferences due next monday @ 9AM
        - https://piazza.com/class/lbxsbawi9yq2f9/post/403
    - lab sessions next week will support hw7
        - its optional, no need to go if you don't feel the need
        - hw7 is due next monday (April 3) to allow you to ask questions in lab

    - no classes next week, sign up for a project team meeting with Prof Higger
        - link on course website

# Timestamps
## Unix Time

- [UTC 00:00](https://en.wikipedia.org/wiki/UTC%2B00:00) Coordinated Universal Time's "zero" timezome
    - time zone at 0 deg longitude
        - how is 0 deg longitude defined?  
            - A succesfully warring empire (United Kingdom) chose it 
                - (personally, I'd find it convenient if a metric system loving empire had been more successful at war ...)
- [Unix Time](https://en.wikipedia.org/wiki/Unix_time) is The number of seconds which have passed since 00:00:00 UTC on 1 Jan 1970 (ignoring leap seconds)
- UTC is time zone agnostic 
    - (more on this next lesson...)

## Python's `datetime` & `timedelta`
- helpful for all those pesky unit conversions

In [1]:
from datetime import datetime, timedelta

utc_example = 1613286000

# WARNING! assumes the time zone of the machine its running on!
# (we'll see this issue again later ...)
dt0 = datetime.fromtimestamp(utc_example)
dt0

datetime.datetime(2021, 2, 14, 2, 0)

[further reading](https://docs.python.org/3/library/datetime.html#aware-and-naive-objects) on the datetime above being timezone agnostic.

In [3]:
# we can access meaningful date attributes of a datetime object
# year, month, day, hour, minute, second
dt0.month, dt0.day

(2, 14)

In [10]:
dt1 = datetime.now()
dt1

datetime.datetime(2023, 3, 31, 15, 52, 33, 44509)

In [11]:
dt2 = datetime(year=2222, month=2, day=2, hour=2, minute=22, second=22)
dt2

datetime.datetime(2222, 2, 2, 2, 22, 22)

In [12]:
# time delta measure differences between two datetimes:
dt2 - dt1

datetime.timedelta(days=72625, seconds=37788, microseconds=955491)

In [13]:
# you can build them explicitly:
offset = timedelta(days=2, seconds=123456)
offset

datetime.timedelta(days=3, seconds=37056)

In [15]:
# and operate with them (datetime + timedelta = datetime)
dt2 + offset

datetime.datetime(2222, 2, 5, 12, 39, 58)

In [17]:
# how many seconds old are you?
(datetime.now() - datetime(year=2000, month=1, day=1)).total_seconds()

733593284.490704

In [18]:
# you've got a billionth second birthday coming up around age 31 or so:
billion_sec = timedelta(seconds=1e9)
billion_sec.days / 365

31.70958904109589

In [11]:
# be sure to compute below so you can plan that party 10 years from now
datetime(year=2000, month=1, day=1) + billion_sec

datetime.datetime(2031, 9, 9, 1, 46, 40)

# `pd.to_datetime()`

Use pandas's [pd.to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) function to convert a column of your dataframe to datetime objects.  

`to_datetime()` does a pretty good job guessing your format, but if it runs into trouble you've always got [strftime & strptime](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [19]:
import pandas as pd

s = pd.Series(['dec 23, 2000', 'jan 1, 2039 3AM', 'jan 14, 2039 14:00', 'not reallysakdjfaposdiufpsaid8yfa time'])

s

0                              dec 23, 2000
1                           jan 1, 2039 3AM
2                        jan 14, 2039 14:00
3    not reallysakdjfaposdiufpsaid8yfa time
dtype: object

In [22]:
# the errors = 'coerce' argument yield "NaT", not-a-time, objects for inputs which
# can't be converted.  without this generates an error
pd.to_datetime(s, errors='coerce')

0   2000-12-23 00:00:00
1   2039-01-01 03:00:00
2   2039-01-14 14:00:00
3                   NaT
dtype: datetime64[ns]

# Do you know what time it is?
[after reading this, I'm not sure that I do ...](https://www.creativedeletion.com/2015/01/28/falsehoods-programmers-date-time-zones.html)

takeaways:
- don't underestimate the difficulty of describing time unabmiguously, its hard!
- use a library where you might run into time issues:
    - timezones
    - leap year / second
    - varying days of month / year
    - time formatting 'Feb' vs 'February' etc
- Unix Time isn't human readable ... but it is unambiguous.  

### Punchline: measuring time is hard, don't underestimate it (I certainly have!)

## Representing Trees as Lists & Dictionaries
- useful for representing a tree of data
- (our API calls will return nested dictionaries)

<img src="https://i.ibb.co/Pmxqpb3/tree-ex.png" alt="Drawing" style="width: 400px;"/>

In [27]:
red_branch_dict = {'a': 0, 'b': 1, 'c': 2}

In [28]:
blu_branch_dict = {'x': 24, 'y': 25, 'z': 26}

In [29]:
tree_dict = {'f': red_branch_dict,
             'g': blu_branch_dict}
tree_dict

{'f': {'a': 0, 'b': 1, 'c': 2}, 'g': {'x': 24, 'y': 25, 'z': 26}}

In [31]:
tree_dict['f']['b']

1

<img src="https://i.ibb.co/4SSH4mm/tree-ex2.png" alt="Drawing" style="width: 600px;"/>

In [33]:
dict0 = {'num': 14,
        'letter': 'C'}
dict1 = {'num': 17,
        'letter': 'R'}
dict2 = {'num': 21,
        'letter': 'S'}

dict_of_dict = {0: dict0,
                1: dict1,
                2: dict2}

In [34]:
dict_of_dict[0]['letter']

'C'

In [37]:
list_of_dict[0]['num']

14

## In class activity 1:
1. Express all of the following penguin group's height and weight as a list of dictionaries:
<img src="https://i.ibb.co/XXzX4Wk/penguin-tree.png" alt="Drawing" style="width: 700px;"/>

In [39]:
grn_dict = {'height': [2, 3, 2],
            'weight': [4, 6, 5]}
blu_dict = {'height': [4, 6, 5],
            'weight': [2, 3, 2]}
red_dict = {'height': [10, 7, 8],
            'weight': [11, 6, 5]}
peng_data = [grn_dict,
             blu_dict,
             red_dict]

In [42]:
# what is the height of the last penguin0?
peng_data[0]['height'][-1]

2

# API
###  Definitions
**API** Application Program Interface
 - within DS: a server which gives out data (often over the internet)
 - note: 'API', in general, refers to the barrier between two pieces of software, has a specific meaning in DS
 
 
 **JSON** JavaScript Object Notation
  - a method of storing objects as text
  - much like the nested dictionaries ... JSON and similar formats are often trees

## OpenWeather API
What information does this offer?

- [https://openweathermap.org/api](https://openweathermap.org/api)
- (note, you won't have access to all that, see the "free" column [here](https://openweathermap.org/price))

How do I get ready to use it?
- sign up for an account
    - [https://home.openweathermap.org/users/sign_up](https://home.openweathermap.org/users/sign_up)
- get an api key
    - [https://home.openweathermap.org/api_keys](https://home.openweathermap.org/api_keys)
        
Think of APIs as a hybrid of a website and a function.  Its a website where your query is stored in the address:

In [23]:
api_key = 'eea5fcef9a7ea19505dc1c165bacac4a'

# north = positive, south = negative
lat = 42.3601
# west = positive, east = negative
lon = -71.0589

units = 'imperial'

url = f'https://api.openweathermap.org/data/2.5/forecast?lat={lat}&lon={lon}&units={units}&appid={api_key}'
print(url)

https://api.openweathermap.org/data/2.5/forecast?lat=42.3601&lon=-71.0589&units=imperial&appid=eea5fcef9a7ea19505dc1c165bacac4a


In [43]:
import requests

# get url as a string
url_text = requests.get(url).text    
url_text

'{"cod":"200","message":0,"cnt":40,"list":[{"dt":1680296400,"main":{"temp":47.37,"feels_like":42.04,"temp_min":46.42,"temp_max":47.37,"pressure":1020,"sea_level":1020,"grnd_level":1020,"humidity":45,"temp_kf":0.53},"weather":[{"id":804,"main":"Clouds","description":"overcast clouds","icon":"04d"}],"clouds":{"all":100},"wind":{"speed":12.24,"deg":190,"gust":19.26},"visibility":10000,"pop":0.02,"sys":{"pod":"d"},"dt_txt":"2023-03-31 21:00:00"},{"dt":1680307200,"main":{"temp":44.46,"feels_like":39.24,"temp_min":38.62,"temp_max":44.46,"pressure":1020,"sea_level":1020,"grnd_level":1019,"humidity":59,"temp_kf":3.24},"weather":[{"id":500,"main":"Rain","description":"light rain","icon":"10n"}],"clouds":{"all":100},"wind":{"speed":9.82,"deg":160,"gust":26.66},"visibility":6644,"pop":0.92,"rain":{"3h":2.47},"sys":{"pod":"n"},"dt_txt":"2023-04-01 00:00:00"},{"dt":1680318000,"main":{"temp":43.47,"feels_like":38.66,"temp_min":41.52,"temp_max":43.47,"pressure":1017,"sea_level":1017,"grnd_level":1015

The resulting JSON object is a dictionary of dictionaries (or a list of dictionaries) tree as seen in the previous section.

You can convert it from string to dicts and lists via:

In [48]:
import json

# convert json to a nested dict
weather_dict = json.loads(url_text)

weather_dict['list'][0]

{'dt': 1680296400,
 'main': {'temp': 47.37,
  'feels_like': 42.04,
  'temp_min': 46.42,
  'temp_max': 47.37,
  'pressure': 1020,
  'sea_level': 1020,
  'grnd_level': 1020,
  'humidity': 45,
  'temp_kf': 0.53},
 'weather': [{'id': 804,
   'main': 'Clouds',
   'description': 'overcast clouds',
   'icon': '04d'}],
 'clouds': {'all': 100},
 'wind': {'speed': 12.24, 'deg': 190, 'gust': 19.26},
 'visibility': 10000,
 'pop': 0.02,
 'sys': {'pod': 'd'},
 'dt_txt': '2023-03-31 21:00:00'}

In [22]:
weather_dict['list']

[{'dt': 1680264000,
  'main': {'temp': 28.09,
   'feels_like': 21.51,
   'temp_min': 28.09,
   'temp_max': 33.62,
   'pressure': 1027,
   'sea_level': 1027,
   'grnd_level': 1027,
   'humidity': 61,
   'temp_kf': -3.07},
  'weather': [{'id': 801,
    'main': 'Clouds',
    'description': 'few clouds',
    'icon': '02d'}],
  'clouds': {'all': 17},
  'wind': {'speed': 6.08, 'deg': 258, 'gust': 10.69},
  'visibility': 10000,
  'pop': 0,
  'sys': {'pod': 'd'},
  'dt_txt': '2023-03-31 12:00:00'},
 {'dt': 1680274800,
  'main': {'temp': 33.33,
   'feels_like': 27.72,
   'temp_min': 33.33,
   'temp_max': 43.79,
   'pressure': 1027,
   'sea_level': 1027,
   'grnd_level': 1026,
   'humidity': 49,
   'temp_kf': -5.81},
  'weather': [{'id': 802,
    'main': 'Clouds',
    'description': 'scattered clouds',
    'icon': '03d'}],
  'clouds': {'all': 41},
  'wind': {'speed': 6.13, 'deg': 242, 'gust': 11.68},
  'visibility': 10000,
  'pop': 0,
  'sys': {'pod': 'd'},
  'dt_txt': '2023-03-31 15:00:00'},
 {

## Cleaning up data from one instant

In [50]:
d = weather_dict['list'][0]
d

{'dt': 1680296400,
 'main': {'temp': 47.37,
  'feels_like': 42.04,
  'temp_min': 46.42,
  'temp_max': 47.37,
  'pressure': 1020,
  'sea_level': 1020,
  'grnd_level': 1020,
  'humidity': 45,
  'temp_kf': 0.53},
 'weather': [{'id': 804,
   'main': 'Clouds',
   'description': 'overcast clouds',
   'icon': '04d'}],
 'clouds': {'all': 100},
 'wind': {'speed': 12.24, 'deg': 190, 'gust': 19.26},
 'visibility': 10000,
 'pop': 0.02,
 'sys': {'pod': 'd'},
 'dt_txt': '2023-03-31 21:00:00',
 'datetime': datetime.datetime(2023, 3, 31, 17, 0)}

In [51]:
# lets convert from unix time to a datetime (easier to use)
d['datetime'] = datetime.fromtimestamp(d['dt'])

d

{'dt': 1680296400,
 'main': {'temp': 47.37,
  'feels_like': 42.04,
  'temp_min': 46.42,
  'temp_max': 47.37,
  'pressure': 1020,
  'sea_level': 1020,
  'grnd_level': 1020,
  'humidity': 45,
  'temp_kf': 0.53},
 'weather': [{'id': 804,
   'main': 'Clouds',
   'description': 'overcast clouds',
   'icon': '04d'}],
 'clouds': {'all': 100},
 'wind': {'speed': 12.24, 'deg': 190, 'gust': 19.26},
 'visibility': 10000,
 'pop': 0.02,
 'sys': {'pod': 'd'},
 'dt_txt': '2023-03-31 21:00:00',
 'datetime': datetime.datetime(2023, 3, 31, 17, 0)}

In [None]:
# lets "flatten" the main dictionary
d.update(d['main'])

# for key, val in d['main'].items():
#     d[key] = val

In [52]:
# removing key "main" and its associated value from d
del d['main']

d

{'dt': 1680296400,
 'weather': [{'id': 804,
   'main': 'Clouds',
   'description': 'overcast clouds',
   'icon': '04d'}],
 'clouds': {'all': 100},
 'wind': {'speed': 12.24, 'deg': 190, 'gust': 19.26},
 'visibility': 10000,
 'pop': 0.02,
 'sys': {'pod': 'd'},
 'dt_txt': '2023-03-31 21:00:00',
 'datetime': datetime.datetime(2023, 3, 31, 17, 0),
 'temp': 47.37,
 'feels_like': 42.04,
 'temp_min': 46.42,
 'temp_max': 47.37,
 'pressure': 1020,
 'sea_level': 1020,
 'grnd_level': 1020,
 'humidity': 45,
 'temp_kf': 0.53}

# Why doesn't our datetime match the "dt_txt"?

[lets examine the API's documentation](https://openweathermap.org/forecast5)

(spoiler: the forecasted day is in UTC 00:00, our function converts it to our computer's timezone ... see how sneaky measuring time can be?)

# Storing an API key in another file

- security of your account
- easily swappable with another user

In [25]:
from weather_api import api_key

# nice to keep it a bit more hidden in your code
# (though you'd do better not to parrot it on a jupyter output cell,
# I just want to show you all how this works)
api_key

'eea5fcef9a7ea19505dc1c165bacac4a'

## In Class Activity 2
    
1. Make a function `get_forecast` which accepts:
    - `lat`
    - `lon`
    - `api_key`
    - `units` (default = 'imperial')
    
    and returns a dataframe of the forecasted weather
    
```python
# clean it up
row_list = list()
for d in weather_dict['list']:
    # process dictionary into a row

    # store
    row_list.append(d)

# convert list of dictionaries to dataframe
df = pd.DataFrame(row_list)
```

2. "flatten" the parts of the dictionary which make sense (see "main" example a few cells above).


## Test Case

Osaka, Japan is located at:

    34.6937¬∞ N, 135.5023¬∞ E
    
The first few rows of my call 

```python
df_osaka = get_forecast(lat=34.6937, lon=135.5023, api_key=api_key)
```

yielded the following dataframe:

|   | dt         | visibility | pop | dt_txt              | all | speed | deg | gust | pod | temp  | feels_like | temp_min | temp_max | pressure | sea_level | grnd_level | humidity | temp_kf | id  | main   | description | icon |
|---|------------|------------|-----|---------------------|-----|-------|-----|------|-----|-------|------------|----------|----------|----------|-----------|------------|----------|---------|-----|--------|-------------|------|
| 0 | 1680264000 | 10000      | 0   | 2023-03-31 12:00:00 | 0   | 4.79  | 222 | 7.87 | n   | 60.35 | 58.24      | 60.04    | 60.35    | 1016     | 1016      | 1016       | 46       | 0.17    | 800 | Clear  | clear sky   | 01n  |
| 1 | 1680274800 | 10000      | 0   | 2023-03-31 15:00:00 | 11  | 3.18  | 211 | 6.51 | n   | 59.38 | 57.56      | 57.43    | 59.38    | 1017     | 1017      | 1016       | 54       | 1.08    | 801 | Clouds | few clouds  | 02n  |
| 2 | 1680285600 | 10000      | 0   | 2023-03-31 18:00:00 | 21  | 1.72  | 78  | 2.37 | n   | 57.18 | 55.42      | 55.6     | 57.18    | 1017     | 1017      | 1015       | 60       | 0.88    | 801 | Clouds | few clouds  | 02n  |
| 3 | 1680296400 | 10000      | 0   | 2023-03-31 21:00:00 | 4   | 3     | 58  | 4.52 | d   | 53.69 | 51.82      | 53.69    | 53.69    | 1017     | 1017      | 1015       | 65       | 0       | 800 | Clear  | clear sky   | 01d  |
| 4 | 1680307200 | 10000      | 0   | 2023-04-01 00:00:00 | 2   | 3.69  | 50  | 4.34 | d   | 61.18 | 59.25      | 61.18    | 61.18    | 1018     | 1018      | 1016       | 48       | 0       | 800 | Clear  | clear sky   | 01d  |

In [53]:

from weather_api import api_key


def get_forecast(lat, lon, api_key, units='imperial'):
    """ returns forecast
    
    https://openweathermap.org/forecast5
    
    Args:
        lat (float): lattitude (positive is north)
        lon (float): longitude (positive is east)
        api_key (str): api key for openweather
        units (str): standard, metric or imperial.  see link for
            details
            
    Returns:
        df (pd.DataFrame): forecasted weather, one row per 
            instant and one column per feature
    """
    
    # get data from api
    url = f'https://api.openweathermap.org/data/2.5/forecast?lat={lat}&lon={lon}&units={units}&appid={api_key}'
    url_text = requests.get(url).text 
    weather_dict = json.loads(url_text)
    
    # clean it up
    row_list = list()
    for d in weather_dict['list']:
        # process dictionary into a row
        
        # lets "flatten" the main dictionary
        for feat in ['clouds', 'wind', 'sys', 'main']:
            d.update(d[feat])
            del d[feat]
            
        # flattening "weather" is funny, its a list (of length 1) of dicts
        d.update(d['weather'][0])
        del d['weather']

        row_list.append(d)
        
    return pd.DataFrame(row_list)

In [55]:
get_forecast(lat=-34.6937, lon=135.5023, api_key=api_key)

Unnamed: 0,dt,visibility,pop,dt_txt,all,speed,deg,gust,pod,temp,...,pressure,sea_level,grnd_level,humidity,temp_kf,id,main,description,icon,rain
0,1680296400,10000,0.0,2023-03-31 21:00:00,93,11.32,135,15.82,n,57.56,...,1022,1022,1014,68,0.0,804,Clouds,overcast clouds,04n,
1,1680307200,10000,0.0,2023-04-01 00:00:00,88,13.04,131,13.94,d,58.66,...,1022,1022,1015,63,-1.23,804,Clouds,overcast clouds,04d,
2,1680318000,10000,0.0,2023-04-01 03:00:00,70,11.88,138,11.92,d,61.86,...,1022,1022,1014,54,-1.2,803,Clouds,broken clouds,04d,
3,1680328800,10000,0.0,2023-04-01 06:00:00,43,14.99,138,14.32,d,63.3,...,1021,1021,1013,48,0.0,802,Clouds,scattered clouds,03d,
4,1680339600,10000,0.0,2023-04-01 09:00:00,10,10.11,128,14.56,n,58.93,...,1022,1022,1014,54,0.0,800,Clear,clear sky,01n,
5,1680350400,10000,0.0,2023-04-01 12:00:00,6,7.49,128,13.24,n,56.77,...,1023,1023,1014,62,0.0,800,Clear,clear sky,01n,
6,1680361200,10000,0.0,2023-04-01 15:00:00,8,9.73,117,17.74,n,55.72,...,1021,1021,1013,68,0.0,800,Clear,clear sky,01n,
7,1680372000,10000,0.0,2023-04-01 18:00:00,6,6.96,112,11.97,n,54.39,...,1020,1020,1012,72,0.0,800,Clear,clear sky,01n,
8,1680382800,10000,0.0,2023-04-01 21:00:00,51,9.42,97,13.47,n,58.78,...,1020,1020,1012,61,0.0,803,Clouds,broken clouds,04n,
9,1680393600,10000,0.0,2023-04-02 00:00:00,76,9.1,92,11.9,d,61.36,...,1020,1020,1012,59,0.0,803,Clouds,broken clouds,04d,


In [54]:
df_osaka = get_forecast(lat=34.6937, lon=135.5023, api_key=api_key)
df_osaka

Unnamed: 0,dt,visibility,pop,dt_txt,all,speed,deg,gust,pod,temp,...,temp_max,pressure,sea_level,grnd_level,humidity,temp_kf,id,main,description,icon
0,1680296400,10000,0.0,2023-03-31 21:00:00,4,3.22,56,4.0,d,51.21,...,53.69,1017,1017,1015,53,-1.38,800,Clear,clear sky,01d
1,1680307200,10000,0.0,2023-04-01 00:00:00,3,4.36,37,5.21,d,54.7,...,61.7,1017,1017,1015,49,-3.89,800,Clear,clear sky,01d
2,1680318000,10000,0.0,2023-04-01 03:00:00,2,4.18,359,6.67,d,64.56,...,71.24,1016,1016,1014,32,-3.71,800,Clear,clear sky,01d
3,1680328800,10000,0.0,2023-04-01 06:00:00,6,6.42,325,6.67,d,73.65,...,73.65,1013,1013,1011,19,0.0,800,Clear,clear sky,01d
4,1680339600,10000,0.0,2023-04-01 09:00:00,98,11.34,352,14.52,d,67.26,...,67.26,1014,1014,1012,27,0.0,804,Clouds,overcast clouds,04d
5,1680350400,10000,0.0,2023-04-01 12:00:00,87,8.79,12,13.44,n,60.53,...,60.53,1016,1016,1014,41,0.0,804,Clouds,overcast clouds,04n
6,1680361200,10000,0.0,2023-04-01 15:00:00,100,8.03,18,12.17,n,56.41,...,56.41,1016,1016,1014,50,0.0,804,Clouds,overcast clouds,04n
7,1680372000,10000,0.0,2023-04-01 18:00:00,89,7.65,24,12.62,n,53.76,...,53.76,1015,1015,1014,54,0.0,804,Clouds,overcast clouds,04n
8,1680382800,10000,0.0,2023-04-01 21:00:00,97,7.47,23,13.58,d,52.2,...,52.2,1016,1016,1014,56,0.0,804,Clouds,overcast clouds,04d
9,1680393600,10000,0.0,2023-04-02 00:00:00,98,9.91,27,13.76,d,59.56,...,59.56,1016,1016,1015,48,0.0,804,Clouds,overcast clouds,04d
