This project will look at the potential viability of the hypothetical cross-country high speed rail system. \
The proposed rail would take one from Los Angeles to New York in under 10 hours and could connect most major US cities to form a sustainable, fast method of transportation for a wide variety of individuals as well as ease the strain on air travel that pollutes the air and often has major delays. \
\
We will use the following map of the hypothetical rail, this one was made by First Cultural at UC Berkeley but since the plans aren't official there isn't an official one to draw from - this is just an idea that connects a lot of the major cities.
\
\
\
To investigate this topic more, check out Vox's article about it and it's new popularity or just this paper from the Congressional Research Service about it's Issue and Recent Events.
First we look at US air travel to identify how many trips can be substituded by hypothetical trains. We will look at flights that run between cities on the hypothetical rail and their passenger counts and use analysis of the rail system in Europe to predict how many of those plane passengers could be rail passengers. \ \ You can find the following data set on Kaggle, we will be using Destination_city, Origin_city, and Passengers
import pandas as pd
df_usa = pd.read_csv("Airports2.csv")
df_usa.head(6)
Origin_airport | Destination_airport | Origin_city | Destination_city | Passengers | Seats | Flights | Distance | Fly_date | Origin_population | Destination_population | Org_airport_lat | Org_airport_long | Dest_airport_lat | Dest_airport_long | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | MHK | AMW | Manhattan, KS | Ames, IA | 21 | 30 | 1 | 254 | 2008-10-01 | 122049 | 86219 | 39.140999 | -96.670799 | NaN | NaN |
1 | EUG | RDM | Eugene, OR | Bend, OR | 41 | 396 | 22 | 103 | 1990-11-01 | 284093 | 76034 | 44.124599 | -123.211998 | 44.254101 | -121.150002 |
2 | EUG | RDM | Eugene, OR | Bend, OR | 88 | 342 | 19 | 103 | 1990-12-01 | 284093 | 76034 | 44.124599 | -123.211998 | 44.254101 | -121.150002 |
3 | EUG | RDM | Eugene, OR | Bend, OR | 11 | 72 | 4 | 103 | 1990-10-01 | 284093 | 76034 | 44.124599 | -123.211998 | 44.254101 | -121.150002 |
4 | MFR | RDM | Medford, OR | Bend, OR | 0 | 18 | 1 | 156 | 1990-02-01 | 147300 | 76034 | 42.374199 | -122.873001 | 44.254101 | -121.150002 |
5 | MFR | RDM | Medford, OR | Bend, OR | 11 | 18 | 1 | 156 | 1990-03-01 | 147300 | 76034 | 42.374199 | -122.873001 | 44.254101 | -121.150002 |
The machine learning aspect I propose is looking at the transporation data in Europe to predict how it could translate to the US; Europe has a great rail system and while China's is bigger, I believe Europe to be a better model geographically and culturally to the US. While I don't have a super solid understanding of ML yet, I think we can take the following data on train and plane use as well as the corresponding populations for each country to get a picture of how Europeans travel, then develop a model of what such travel in the US would look like if we had a high speed rail. \ \ The following dataset is European plane travel, using country and passengers, measure specifies what type of travel it is. In our case we will only be using PAS_BRD which is commerical passengers. You can find it also at Kaggle
df_eur_planes = pd.read_csv("Passengers_Year_Transit.csv")
df_eur_planes.head(6)
country | measure | year | passengers | |
---|---|---|---|---|
0 | AUT | PAS_BRD | 2021 | 11187400.0 |
1 | BEL | PAS_BRD | 2021 | 13516263.0 |
2 | BGR | PAS_BRD | 2021 | 5146280.0 |
3 | CHE | PAS_BRD | 2021 | 19293409.0 |
4 | CYP | PAS_BRD | 2021 | 4993689.0 |
5 | CZE | PAS_BRD | 2021 | 4796559.0 |
There is data about train passengers for all the countries in the world (will be cleaned to just Europe) but it is too big for read_csv. But the relevent columns are Country Name and Passengers served each year since 1961. You can find it at The World Bank's website. \ \ This is population data for countries in Europe, we will be suing country_name and population. Can also be found on Kaggle.
df_pops = pd.read_csv("europe populations.csv")
df_pops.head(6)
# European population data, will be using the country_name and population columns
Unnamed: 0 | country_name | Continent | region | local_name | capital | area | population | population_per_sq_km | male_life_expectancy | female_life_expectancy | birth_rate | death_rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Austria | Europe | Western Europe | Österreich | Vienna | 83,879 km² | 8,917,000 | 106.3 | 78.9 | 83.6 | 9.4 | 10.3 |
1 | 1 | Belgium | Europe | Western Europe | België / Belgique | Brussels | 30,530 km² | 11,544,000 | 378.1 | 78.6 | 83.1 | 9.9 | 11.0 |
2 | 2 | France | Europe | Western Europe | France | Paris | 549,087 km² | 67,380,000 | 122.7 | 79.2 | 85.3 | 10.9 | 9.9 |
3 | 3 | Germany | Europe | Western Europe | Deutschland | Berlin | 357,580 km² | 83,161,000 | 232.6 | 78.6 | 83.4 | 9.3 | 11.9 |
4 | 4 | Liechtenstein | Europe | Western Europe | Liechtenstein | Vaduz | 161 km² | 38,137 | 237.6 | 80.1 | 83.6 | 9.1 | 8.2 |
5 | 5 | Luxembourg | Europe | Western Europe | Luxembourg/Lëtzebuerg | Luxembourg | 2,590 km² | 630,419 | 243.4 | 79.4 | 84.2 | 10.2 | 7.3 |