(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).
The national park service recommends hiking/outdoor activities for the improvement/maintaining of mental health, physical health, and relational health (https://www.nps.gov/subjects/trails/benefits-of-hiking.htm). I'm interested in giving the world more accesibilty to trails and outdoor activites in order to encourage being outside. 'Kid-friendly' or 'wheelchair accesible' are important features to be advertised for accesibility! There are a multitude of reasons of why trail research matters. It is not only helfpul for the user but also for the trails themselves (funding, etc.) (https://www.americantrails.org/resources/five-reasons-trail-research-matters)
The data set I have chosen is extracted from All Trails, a mobile app that provides information on hiking and outdoor recreational activities. It includes all of the trails in the National Park Service. The data set has columns that describe the name of the trail, geographic location (park, city, state, country, coordinates), popularity, rating, features, and activities.
I want to investigate a way to determine the best national park, city, or region to visit according to specific interests (birdwatching, backpacking, hiking, kid-friendly, difficulty level, etc.). The overarching problem I'm investigating is the general inaccessibility of trails & lack of information surrounding trail accessibility. This branches out to the desire to know different activities a trail provides – a birdwatcher would want to hike a trail that is known for good birdwatching, for example. The dataset is quite long and overwhelming and could be sorted in a way to help users make travel decisions that fit their interests & abilities.
(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.
trail_id (int): trail identification number
name (str): name of trail
area_name (str): name of national park that the trail is in
city_name (str): name of city the trail is located in
state_name (str): name of state the trail is located in
country_name (str): name of country the trail is located in
_geoloc (dict): lattitude and longitude coordinates of trail location
popularity (float): not sure of unit, measures how popular the trail is by visits
length (float): length of trail in meters
elevation_gain (float): elevation in meteres
difficulty_rating (int): ranges from 1-7, measures how difficult the trail is
route_type (str): out-and-back, point-to-point or loop – the format of the trail
visitor_usage(float): ranges from 1-4, assume it measures something similar to popularity
avg_rating (float): ranges from 0-5, describes average of ratings made on All Trails for that specific trail
num_reviews (int): number of reviews for trail
features (list of str): different elements of the trail – mixture of suitability and attractions of the trail
activities (list of str): list of activities that can be done on the trail
units (str): not sure what this means
import pandas as pd
# note: file must be next to jupyter notebook in same folder
pd.read_csv('trails.csv')
trail_id | name | area_name | city_name | state_name | country_name | _geoloc | popularity | length | elevation_gain | difficulty_rating | route_type | visitor_usage | avg_rating | num_reviews | features | activities | units | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10020048 | Harding Ice Field Trail | Kenai Fjords National Park | Seward | Alaska | United States | {'lat': 60.18852, 'lng': -149.63156} | 24.8931 | 15610.598 | 1161.8976 | 5 | out and back | 3.0 | 5.0 | 423 | ['dogs-no', 'forest', 'river', 'views', 'water... | ['birding', 'camping', 'hiking', 'nature-trips... | i |
1 | 10236086 | Mount Healy Overlook Trail | Denali National Park | Denali National Park | Alaska | United States | {'lat': 63.73049, 'lng': -148.91968} | 18.0311 | 6920.162 | 507.7968 | 3 | out and back | 1.0 | 4.5 | 260 | ['dogs-no', 'forest', 'views', 'wild-flowers',... | ['birding', 'camping', 'hiking', 'nature-trips... | i |
2 | 10267857 | Exit Glacier Trail | Kenai Fjords National Park | Seward | Alaska | United States | {'lat': 60.18879, 'lng': -149.631} | 17.7821 | 2896.812 | 81.9912 | 1 | out and back | 3.0 | 4.5 | 224 | ['dogs-no', 'partially-paved', 'views', 'wildl... | ['hiking', 'walking'] | i |
3 | 10236076 | Horseshoe Lake Trail | Denali National Park | Denali National Park | Alaska | United States | {'lat': 63.73661, 'lng': -148.915} | 16.2674 | 3379.614 | 119.7864 | 1 | loop | 2.0 | 4.5 | 237 | ['dogs-no', 'forest', 'lake', 'kids', 'views',... | ['birding', 'hiking', 'nature-trips', 'trail-r... | i |
4 | 10236082 | Triple Lakes Trail | Denali National Park | Denali National Park | Alaska | United States | {'lat': 63.73319, 'lng': -148.89682} | 12.5935 | 29772.790 | 1124.7120 | 5 | out and back | 1.0 | 4.5 | 110 | ['dogs-no', 'lake', 'views', 'wild-flowers', '... | ['birding', 'fishing', 'hiking', 'nature-trips... | i |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3308 | 10008302 | Silversword Loop Via Halemau'u Trail | Haleakala National Park | Kula | Maui | Hawaii | {'lat': 20.75275, 'lng': -156.22884} | 9.3861 | 20116.750 | 1105.8144 | 5 | loop | 2.0 | 4.5 | 43 | ['dogs-no', 'views', 'wild-flowers'] | ['birding', 'hiking', 'nature-trips'] | m |
3309 | 10236001 | Keonehe'ehe'e Trail | Haleakala National Park | Kula | Maui | Hawaii | {'lat': 20.714480000000002, 'lng': -156.25072} | 9.1555 | 28324.384 | 1171.9560 | 5 | out and back | 2.0 | 5.0 | 22 | ['dogs-no', 'views', 'wildlife'] | ['backpacking', 'camping', 'hiking'] | m |
3310 | 10258707 | Red Hill Overlook Summit Trail | Haleakala National Park | Kula | Maui | Hawaii | {'lat': 20.71007, 'lng': -156.25357} | 8.5066 | 321.868 | 3.9624 | 1 | out and back | NaN | 4.5 | 31 | ['dogs-no', 'kids', 'views'] | ['hiking', 'walking'] | m |
3311 | 10014989 | Kaupo Trail | Haleakala National Park | Kula | Maui | Hawaii | {'lat': 20.64981, 'lng': -156.137} | 8.3240 | 19312.080 | 1670.9136 | 5 | out and back | 1.0 | 4.0 | 8 | ['dogs-no', 'views', 'wildlife'] | ['hiking'] | m |
3312 | 10259465 | Ka Lu'u o ka O'o Cinder Cone via Crater and Sl... | Haleakala National Park | Kula | Maui | Hawaii | {'lat': 20.71449, 'lng': -156.25085} | 2.4176 | 8368.568 | 510.8448 | 3 | loop | 2.0 | 4.5 | 45 | ['views'] | ['hiking'] | m |
3313 rows × 18 columns
This data set has a multitude of variables to choose from. I'm interested in focusing on the activities and features columns to help determine areas of interest (geographic locations) for specific users (birders, backpackers, families, people with dogs, etc.). I think difficulty level, eleveation gain, length, and avg_rating are all good variables to use to cross reference this data.
(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do.
I want to cluster national parks according to areas of interest (features or activities, or both). Then, I want to take those findings and sort them according to region, state or national park. This will serve as a sort of geographic, visual recommendation for users according to their desired trail features, activities, or accessibility requirements.