College Tuition Cost and Salary Potential¶

Problem¶

College tuition cost is constantly rising in the United States, at almost 8 times faster than wages. With this rising tuition, it is very important that students go to the college that both fits them most and sets them up for the best possible future. There are many factors to consider with this problem, including average salary after graduating a college, and the comparison of the cost of college against the salary earned.
Price of College Increasing Almost 8 Times Faster Than Wages: https://www.forbes.com/sites/camilomaldonado/2018/07/24/price-of-college-increasing-almost-8-times-faster-than-wages/?sh=372ef3b866c1
Does College Pay Off? Tuition Costs vs. Earning Power: http://www.educationplanner.org/students/career-planning/explore-salary-pay/does-college-pay-off.shtml

Solution¶

There are datasets that contain the average room and board, in-state tuition, and out of state tuition for a huge number of colleges across the US. In addition to this, there are datasets that have the early career pay and mid career pay of a wide variety of colleges across the US. Using these datasets, we could potentially identify and use a relationship between a colleges features (state, type of college, tuition, stem percent) to estimate the salary of the students after they graduate.

Impact¶

This could produce a classifier that predicts a students salary based on the features of a college. This could help provide students with insight on different factors that affect their long-term career and financial stability, and perhaps help them narrow down the types of colleges they want to apply to.

Potential Problems¶

The one downside of the classifier is that this is an average of all students. This doesn't take into account different majors (some of which inherently pay less money than others). There is a lot of data on this already on the internet, and this looks at it in some way by testing whether the percent of STEM students in the college affects the future salary of the students.

Additionally, the data doesn't take into account the size of the colleges, or admission rates. The more prestigious a college, generally the higher salary a student earns after leaving college. We chose to focus more on features outside of this.

Loaded Dataset¶

In [16]:
import pandas as pd
tuition = pd.read_csv('tuition_cost.csv')
tuition.head()
Out[16]:
name state state_code type degree_length room_and_board in_state_tuition in_state_total out_of_state_tuition out_of_state_total
0 Aaniiih Nakoda College Montana MT Public 2 Year NaN 2380 2380 2380 2380
1 Abilene Christian University Texas TX Private 4 Year 10350.0 34850 45200 34850 45200
2 Abraham Baldwin Agricultural College Georgia GA Public 2 Year 8474.0 4128 12602 12550 21024
3 Academy College Minnesota MN For Profit 2 Year NaN 17661 17661 17661 17661
4 Academy of Art University California CA For Profit 4 Year 16648.0 27810 44458 27810 44458

Data Dictionary:¶

name: Name of the college
state: Name of the state the college is based in
state_code: The abbreviation of the state name
type: The type of college it is (for profit, public, or private)
degree_length: How long it takes to get a degree at the college
room_and_board: Average room and board costs combined at the college
in_state_tuition: Average tuition for students who live in the same state as the college
in_state_total: Average total costs for students who live in the same state as the college (room, board, tuition)
in_state_tuition: Average tuition for students who don't live in the same state as the college
in_state_total: Average total costs for students who don't live in the same state as the college (room, board, tuition)

  • All of the data involving pay is measured in US dollars.
In [17]:
salary = pd.read_csv('salary_potential.csv')
salary.head()
Out[17]:
rank name state_name early_career_pay mid_career_pay make_world_better_percent stem_percent
0 1 Auburn University Alabama 54400 104500 51.0 31
1 2 University of Alabama in Huntsville Alabama 57500 103900 59.0 45
2 3 The University of Alabama Alabama 52300 97400 50.0 15
3 4 Tuskegee University Alabama 54500 93500 61.0 30
4 5 Samford University Alabama 48400 90500 52.0 3

Data Dictionary:¶

rank: The number of the row the college is in
name: Name of the college
state_name: Name of the state the college is based in
early_career_pay: The average pay one receives in the years immediately following college
mid_career_pay: The average pay one receives in the middle of their career (when they are generally most stable)
make_world_better_percent: The percent of students who think they're helping make the world a better place
stem_percent: The percent of STEM students at the college

How will data be used to solve the problem?¶

Method¶

This problem will be a regression problem, where we use the features of each college to estimate the salary of college students after they graduate (using all of the coninuous data, so excluding the college names, states, and types). If we wish to also find the relationship between the states and types of college each are located in with post-graduation salary, we could use classification. This will help us discover if there are certain traits that affect post-graduation salary more than others.