College tuition cost is constantly rising in the United States, at almost 8 times faster than wages. With this rising tuition, it is very important that students go to the college that both fits them most and sets them up for the best possible future. There are many factors to consider with this problem, including average salary after graduating a college, and the comparison of the cost of college against the salary earned.
Price of College Increasing Almost 8 Times Faster Than Wages: https://www.forbes.com/sites/camilomaldonado/2018/07/24/price-of-college-increasing-almost-8-times-faster-than-wages/?sh=372ef3b866c1
Does College Pay Off? Tuition Costs vs. Earning Power: http://www.educationplanner.org/students/career-planning/explore-salary-pay/does-college-pay-off.shtml
There are datasets that contain the average room and board, in-state tuition, and out of state tuition for a huge number of colleges across the US. In addition to this, there are datasets that have the early career pay and mid career pay of a wide variety of colleges across the US. Using these datasets, we could potentially identify and use a relationship between a colleges features (state, type of college, tuition, stem percent) to estimate the salary of the students after they graduate.
This could produce a classifier that predicts a students salary based on the features of a college. This could help provide students with insight on different factors that affect their long-term career and financial stability, and perhaps help them narrow down the types of colleges they want to apply to.
The one downside of the classifier is that this is an average of all students. This doesn't take into account different majors (some of which inherently pay less money than others). There is a lot of data on this already on the internet, and this looks at it in some way by testing whether the percent of STEM students in the college affects the future salary of the students.
Additionally, the data doesn't take into account the size of the colleges, or admission rates. The more prestigious a college, generally the higher salary a student earns after leaving college. We chose to focus more on features outside of this.
import pandas as pd
tuition = pd.read_csv('tuition_cost.csv')
tuition.head()
name | state | state_code | type | degree_length | room_and_board | in_state_tuition | in_state_total | out_of_state_tuition | out_of_state_total | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Aaniiih Nakoda College | Montana | MT | Public | 2 Year | NaN | 2380 | 2380 | 2380 | 2380 |
1 | Abilene Christian University | Texas | TX | Private | 4 Year | 10350.0 | 34850 | 45200 | 34850 | 45200 |
2 | Abraham Baldwin Agricultural College | Georgia | GA | Public | 2 Year | 8474.0 | 4128 | 12602 | 12550 | 21024 |
3 | Academy College | Minnesota | MN | For Profit | 2 Year | NaN | 17661 | 17661 | 17661 | 17661 |
4 | Academy of Art University | California | CA | For Profit | 4 Year | 16648.0 | 27810 | 44458 | 27810 | 44458 |
name: Name of the college
state: Name of the state the college is based in
state_code: The abbreviation of the state name
type: The type of college it is (for profit, public, or private)
degree_length: How long it takes to get a degree at the college
room_and_board: Average room and board costs combined at the college
in_state_tuition: Average tuition for students who live in the same state as the college
in_state_total: Average total costs for students who live in the same state as the college (room, board, tuition)
in_state_tuition: Average tuition for students who don't live in the same state as the college
in_state_total: Average total costs for students who don't live in the same state as the college (room, board, tuition)
salary = pd.read_csv('salary_potential.csv')
salary.head()
rank | name | state_name | early_career_pay | mid_career_pay | make_world_better_percent | stem_percent | |
---|---|---|---|---|---|---|---|
0 | 1 | Auburn University | Alabama | 54400 | 104500 | 51.0 | 31 |
1 | 2 | University of Alabama in Huntsville | Alabama | 57500 | 103900 | 59.0 | 45 |
2 | 3 | The University of Alabama | Alabama | 52300 | 97400 | 50.0 | 15 |
3 | 4 | Tuskegee University | Alabama | 54500 | 93500 | 61.0 | 30 |
4 | 5 | Samford University | Alabama | 48400 | 90500 | 52.0 | 3 |
rank: The number of the row the college is in
name: Name of the college
state_name: Name of the state the college is based in
early_career_pay: The average pay one receives in the years immediately following college
mid_career_pay: The average pay one receives in the middle of their career (when they are generally most stable)
make_world_better_percent: The percent of students who think they're helping make the world a better place
stem_percent: The percent of STEM students at the college
This problem will be a regression problem, where we use the features of each college to estimate the salary of college students after they graduate (using all of the coninuous data, so excluding the college names, states, and types). If we wish to also find the relationship between the states and types of college each are located in with post-graduation salary, we could use classification. This will help us discover if there are certain traits that affect post-graduation salary more than others.