We want to see how temperature affects the amount of Blue Bike usage, under each factor. We hypothesize that in warmer months, there will be larger and longer bike usage, and as the temperature decreases, we expect the bike usage to follow suit. We also want to compare the usage with the temperature specifically to add context to the number and length of bike rides; for example, on a warmer day, maybe some people will prefer to walk, so there’s less usage, or they think they’ll get too warm and sweaty by walking, so there will be more usage. However, if one thinks more broadly in terms of hot vs. cold, on a colder winter day, there will be little Blue Bike usage, because no one is really out and about; on the other hand, one could argue that there would be more bike usage because people want to get to their destination (indoors) as fast as possible. We want to see how the weather plays out with the bike rides monthly. There is so much context that can be provided for each day and each ride, so it will be interesting to observe.
How does temperature affect BlueBike usage?
we will take each large monthly Blue Bike dataset and aggregate the following values: number of trips (so, the length of each dataset), and then the total minutes of all trips and total distances for all trips, along with averages of the latter two values. The averages won’t necessarily be manipulated, but they will be useful numbers to have. All of these values will be put into a new spreadsheet, as mentioned above, that will only have the numbers that we need. For our temperature data, we will use our pre-made file from above listing minimum, maximum, and averages for each day of the year, and clean it up so we have our daily temperature data. From there, after extracting our respective values, we will plot them in a line graph, and observe how the temperature changes along with Blue Bike usage.
There are two main datasets that we will be using for this project. The first is Boston’s daily temperature data, taken from thisa weather website It does not export full datasets, so we copied and pasted the columns that we wanted, and created our own simple spreadsheet. Above is a screenshot of some of the rows. We specifically chose these four columns because they are the only ones we find necessary for our project; while the weather website provided other data, we only want to look at daily temperature. The second set of data that we will be using was taken from Blue Bike's website; we downloaded all trips for the year of 2021, which was organized monthly. With each file, we will extract the three values that we’re looking for: the number of trips each month (the length of the file, excluding the header), the total time of trips each month (the sum of the duration of each trip), and the total length of trips each month (the sum of the distance traveled of each trip). So from those 12 files, we will have another dataset that will again be smaller and simpler.
Months | maximum | minimum | average |
---|---|---|---|
2021-01 | 36 | 29 | 32.5 |
2021-02 | 42 | 32 | 37 |
2021-03 | 36 | 29 | 32.5 |
2021-04 | 39 | 31 | 35 |
2021-05 | 33 | 30 | 31.5 |
2021-06 | 40 | 30 | 35 |
2021-07 | 42 | 29 | 35.5 |
# one of the dataset for bluebikes we need to use
import pandas as pd
df_bikes = pd.read_csv('202101.csv')
df_bikes
tripduration | start station name | start station latitude | start station longitude | end station name | end station latitude | end station longitude | |
---|---|---|---|---|---|---|---|
0 | 914 | One Kendall Square at Hampshire St / Portland St | 42.366277 | -71.091690 | Dartmouth St at Newbury St | 42.350961 | -71.077828 |
1 | 1085 | Dartmouth St at Newbury St | 42.350961 | -71.077828 | Edwards Playground - Main St at Eden St | 42.378965 | -71.068607 |
2 | 946 | Christian Science Plaza - Massachusetts Ave at... | 42.343666 | -71.085824 | Prudential Center - 101 Huntington Ave | 42.346520 | -71.080658 |
3 | 355 | MIT Pacific St at Purrington St | 42.359573 | -71.101295 | Ames St at Main St | 42.362500 | -71.088220 |
4 | 511 | Sennott Park Broadway at Norfolk Street | 42.368605 | -71.099302 | Kennedy-Longfellow School 158 Spring St | 42.369553 | -71.085790 |
... | ... | ... | ... | ... | ... | ... | ... |
71800 | 181 | Ames St at Main St | 42.362500 | -71.088220 | Kennedy-Longfellow School 158 Spring St | 42.369553 | -71.085790 |
71801 | 408 | MIT Stata Center at Vassar St / Main St | 42.362131 | -71.091156 | MIT Stata Center at Vassar St / Main St | 42.362131 | -71.091156 |
71802 | 535 | Harvard Stadium: N. Harvard St at Soldiers Fie... | 42.368019 | -71.124200 | Innovation Lab - 125 Western Ave at Batten Way | 42.363145 | -71.122986 |
71803 | 2552 | Sidney Research Campus/Erie Street at Waverly | 42.357753 | -71.103934 | Watertown Sq | 42.365260 | -71.185733 |
71804 | 525 | Harvard University Gund Hall at Quincy St / Ki... | 42.376369 | -71.114025 | Harvard University Radcliffe Quadrangle at She... | 42.380287 | -71.125107 |
71805 rows × 7 columns
bike01 = {'trip_duration': '914, 1085, 946, 355, 511...',
'start station latitude': '42.366277, 42.350961,42.343666, 42.359573... ',
'start station longitude': '-71.091690,-71.077828,-71.085824,-71.101295... ',
'end station latitude': '42.350961,42.378965,42.346520,42.362500... ',
'end station longitude': '-71.077828,-71.068607, -71.080658,-71.088220... '}
bike
{'trip_duration': '914, 1085, 946, 355, 511...', 'start station latitude': '42.366277, 42.350961,42.343666, 42.359573... ', 'start station longitude': '-71.091690,-71.077828,-71.085824,-71.101295... ', 'end station latitude': '42.350961,42.378965,42.346520,42.362500... ', 'end station longitude': '-71.077828,-71.068607, -71.080658,-71.088220... '}
# one of the dataset for weather we need to use
df_weather = pd.read_csv('weather01.csv')
df_weather
date | average | |
---|---|---|
0 | 2021-01-01 | 32.5 |
1 | 2021-01-02 | 37.0 |
2 | 2021-01-03 | 32.5 |
3 | 2021-01-04 | 35.0 |
4 | 2021-01-05 | 31.5 |
5 | 2021-01-06 | 35.0 |
6 | 2021-01-07 | 35.5 |
7 | 2021-01-08 | 33.5 |
8 | 2021-01-09 | 30.0 |
9 | 2021-01-10 | 34.0 |
10 | 2021-01-11 | 30.0 |
11 | 2021-01-12 | 34.5 |
12 | 2021-01-13 | 35.5 |
13 | 2021-01-14 | 35.0 |
14 | 2021-01-15 | 37.0 |
15 | 2021-01-16 | 44.5 |
16 | 2021-01-17 | 38.5 |
17 | 2021-01-18 | 38.0 |
18 | 2021-01-19 | 34.0 |
19 | 2021-01-20 | 31.5 |
20 | 2021-01-21 | 26.5 |
21 | 2021-01-22 | 36.5 |
22 | 2021-01-23 | 25.5 |
23 | 2021-01-24 | 23.5 |
24 | 2021-01-25 | 27.5 |
25 | 2021-01-26 | 28.5 |
26 | 2021-01-27 | 32.0 |
27 | 2021-01-28 | 24.5 |
28 | 2021-01-29 | 12.5 |
29 | 2021-01-30 | 14.0 |
30 | 2021-01-31 | 15.0 |
weather01 = {'2021-01-01': '32.75', '2021-01-02':'37.0', '2021-01-03': '32.5', '2021-01-04':'35.0...'}
weather01
{'2021-01-01': '32.75', '2021-01-02': '37.0', '2021-01-03': '32.5', '2021-01-04': '35.0...'}
In terms of our methodology, we will take each large monthly Blue Bike dataset and aggregate the following values: number of trips (so, the length of each dataset), and then the total minutes of all trips and total distances for all trips, along with averages of the latter two values. The averages won’t necessarily be manipulated, but they will be useful numbers to have. All of these values will be put into a new spreadsheet, as mentioned above, that will only have the numbers that we need.
ML: We will aggregate the average temperature into a set of relatively suitable temperatures. Doing this allows us to discover what conditions people prefer to ride remotely