Making Trails More Accessible & Travel Decisions Easier¶

Part 1¶

(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

The national park service recommends hiking/outdoor activities for the improvement/maintaining of mental health, physical health, and relational health (https://www.nps.gov/subjects/trails/benefits-of-hiking.htm). I'm interested in giving the world more accesibilty to trails and outdoor activites in order to encourage being outside. 'Kid-friendly' or 'wheelchair accesible' are important features to be advertised for accesibility! There are a multitude of reasons of why trail research matters. It is not only helfpul for the user but also for the trails themselves (funding, etc.) (https://www.americantrails.org/resources/five-reasons-trail-research-matters)

The data set I have chosen is extracted from All Trails, a mobile app that provides information on hiking and outdoor recreational activities. It includes all of the trails in the National Park Service. The data set has columns that describe the name of the trail, geographic location (park, city, state, country, coordinates), popularity, rating, features, and activities.

I want to investigate a way to determine the best national park, city, or region to visit according to specific interests (birdwatching, backpacking, hiking, kid-friendly, difficulty level, etc.). The overarching problem I'm investigating is the general inaccessibility of trails & lack of information surrounding trail accessibility. This branches out to the desire to know different activities a trail provides – a birdwatcher would want to hike a trail that is known for good birdwatching, for example. The dataset is quite long and overwhelming and could be sorted in a way to help users make travel decisions that fit their interests & abilities.

Part 2¶

(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

trail_id (int): trail identification number

name (str): name of trail

area_name (str): name of national park that the trail is in

city_name (str): name of city the trail is located in

state_name (str): name of state the trail is located in

country_name (str): name of country the trail is located in

_geoloc (dict): lattitude and longitude coordinates of trail location

popularity (float): not sure of unit, measures how popular the trail is by visits

length (float): length of trail in meters

elevation_gain (float): elevation in meteres

difficulty_rating (int): ranges from 1-7, measures how difficult the trail is

route_type (str): out-and-back, point-to-point or loop – the format of the trail

visitor_usage(float): ranges from 1-4, assume it measures something similar to popularity

avg_rating (float): ranges from 0-5, describes average of ratings made on All Trails for that specific trail

num_reviews (int): number of reviews for trail

features (list of str): different elements of the trail – mixture of suitability and attractions of the trail

activities (list of str): list of activities that can be done on the trail

units (str): not sure what this means

In [2]:
import pandas as pd

# note: file must be next to jupyter notebook in same folder
pd.read_csv('trails.csv')
Out[2]:
trail_id name area_name city_name state_name country_name _geoloc popularity length elevation_gain difficulty_rating route_type visitor_usage avg_rating num_reviews features activities units
0 10020048 Harding Ice Field Trail Kenai Fjords National Park Seward Alaska United States {'lat': 60.18852, 'lng': -149.63156} 24.8931 15610.598 1161.8976 5 out and back 3.0 5.0 423 ['dogs-no', 'forest', 'river', 'views', 'water... ['birding', 'camping', 'hiking', 'nature-trips... i
1 10236086 Mount Healy Overlook Trail Denali National Park Denali National Park Alaska United States {'lat': 63.73049, 'lng': -148.91968} 18.0311 6920.162 507.7968 3 out and back 1.0 4.5 260 ['dogs-no', 'forest', 'views', 'wild-flowers',... ['birding', 'camping', 'hiking', 'nature-trips... i
2 10267857 Exit Glacier Trail Kenai Fjords National Park Seward Alaska United States {'lat': 60.18879, 'lng': -149.631} 17.7821 2896.812 81.9912 1 out and back 3.0 4.5 224 ['dogs-no', 'partially-paved', 'views', 'wildl... ['hiking', 'walking'] i
3 10236076 Horseshoe Lake Trail Denali National Park Denali National Park Alaska United States {'lat': 63.73661, 'lng': -148.915} 16.2674 3379.614 119.7864 1 loop 2.0 4.5 237 ['dogs-no', 'forest', 'lake', 'kids', 'views',... ['birding', 'hiking', 'nature-trips', 'trail-r... i
4 10236082 Triple Lakes Trail Denali National Park Denali National Park Alaska United States {'lat': 63.73319, 'lng': -148.89682} 12.5935 29772.790 1124.7120 5 out and back 1.0 4.5 110 ['dogs-no', 'lake', 'views', 'wild-flowers', '... ['birding', 'fishing', 'hiking', 'nature-trips... i
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3308 10008302 Silversword Loop Via Halemau'u Trail Haleakala National Park Kula Maui Hawaii {'lat': 20.75275, 'lng': -156.22884} 9.3861 20116.750 1105.8144 5 loop 2.0 4.5 43 ['dogs-no', 'views', 'wild-flowers'] ['birding', 'hiking', 'nature-trips'] m
3309 10236001 Keonehe'ehe'e Trail Haleakala National Park Kula Maui Hawaii {'lat': 20.714480000000002, 'lng': -156.25072} 9.1555 28324.384 1171.9560 5 out and back 2.0 5.0 22 ['dogs-no', 'views', 'wildlife'] ['backpacking', 'camping', 'hiking'] m
3310 10258707 Red Hill Overlook Summit Trail Haleakala National Park Kula Maui Hawaii {'lat': 20.71007, 'lng': -156.25357} 8.5066 321.868 3.9624 1 out and back NaN 4.5 31 ['dogs-no', 'kids', 'views'] ['hiking', 'walking'] m
3311 10014989 Kaupo Trail Haleakala National Park Kula Maui Hawaii {'lat': 20.64981, 'lng': -156.137} 8.3240 19312.080 1670.9136 5 out and back 1.0 4.0 8 ['dogs-no', 'views', 'wildlife'] ['hiking'] m
3312 10259465 Ka Lu'u o ka O'o Cinder Cone via Crater and Sl... Haleakala National Park Kula Maui Hawaii {'lat': 20.71449, 'lng': -156.25085} 2.4176 8368.568 510.8448 3 loop 2.0 4.5 45 ['views'] ['hiking'] m

3313 rows × 18 columns

This data set has a multitude of variables to choose from. I'm interested in focusing on the activities and features columns to help determine areas of interest (geographic locations) for specific users (birders, backpackers, families, people with dogs, etc.). I think difficulty level, eleveation gain, length, and avg_rating are all good variables to use to cross reference this data.

Part 3¶

(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do.

I want to cluster national parks according to areas of interest (features or activities, or both). Then, I want to take those findings and sort them according to region, state or national park. This will serve as a sort of geographic, visual recommendation for users according to their desired trail features, activities, or accessibility requirements.