According to research.com.), US college students experience a 40% dropout rate per year with only 41% of college students graduating after 4 years without delay. This has led to it rating 19/28 in terms of graduation rates according to the Organization for Economic Co-operation and Development. While a college education is not necessary to succeed in modern American life, there are direct links between a bachelors degree and an increase in average salary, job level earned, and financial success.
Find a way to predict possible college dropouts based on socioeconomic factors as well as other influences.
We will use this dataset to achieve our goal of identifying possible relationships between socioeconomic and other outside factors in the dropout rate of US college students using factors including but not limited to marital status of the students or their parents, whether the student is a scholarship holder, age of the student at enrollment,and other factors.
If successful, we may be able to use these indicators to potentialy help target and better prepare struggling college students for the change they are about to experience before they enter the higher education system.
import pandas as pd
df_dropout_pred = pd.read_csv('dataset.csv')
df_dropout_pred
Marital status | Application mode | Application order | Course | Daytime/evening attendance | Previous qualification | Nacionality | Mother's qualification | Father's qualification | Mother's occupation | ... | Curricular units 2nd sem (credited) | Curricular units 2nd sem (enrolled) | Curricular units 2nd sem (evaluations) | Curricular units 2nd sem (approved) | Curricular units 2nd sem (grade) | Curricular units 2nd sem (without evaluations) | Unemployment rate | Inflation rate | GDP | Target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 8 | 5 | 2 | 1 | 1 | 1 | 13 | 10 | 6 | ... | 0 | 0 | 0 | 0 | 0.000000 | 0 | 10.8 | 1.4 | 1.74 | Dropout |
1 | 1 | 6 | 1 | 11 | 1 | 1 | 1 | 1 | 3 | 4 | ... | 0 | 6 | 6 | 6 | 13.666667 | 0 | 13.9 | -0.3 | 0.79 | Graduate |
2 | 1 | 1 | 5 | 5 | 1 | 1 | 1 | 22 | 27 | 10 | ... | 0 | 6 | 0 | 0 | 0.000000 | 0 | 10.8 | 1.4 | 1.74 | Dropout |
3 | 1 | 8 | 2 | 15 | 1 | 1 | 1 | 23 | 27 | 6 | ... | 0 | 6 | 10 | 5 | 12.400000 | 0 | 9.4 | -0.8 | -3.12 | Graduate |
4 | 2 | 12 | 1 | 3 | 0 | 1 | 1 | 22 | 28 | 10 | ... | 0 | 6 | 6 | 6 | 13.000000 | 0 | 13.9 | -0.3 | 0.79 | Graduate |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4419 | 1 | 1 | 6 | 15 | 1 | 1 | 1 | 1 | 1 | 6 | ... | 0 | 6 | 8 | 5 | 12.666667 | 0 | 15.5 | 2.8 | -4.06 | Graduate |
4420 | 1 | 1 | 2 | 15 | 1 | 1 | 19 | 1 | 1 | 10 | ... | 0 | 6 | 6 | 2 | 11.000000 | 0 | 11.1 | 0.6 | 2.02 | Dropout |
4421 | 1 | 1 | 1 | 12 | 1 | 1 | 1 | 22 | 27 | 10 | ... | 0 | 8 | 9 | 1 | 13.500000 | 0 | 13.9 | -0.3 | 0.79 | Dropout |
4422 | 1 | 1 | 1 | 9 | 1 | 1 | 1 | 22 | 27 | 8 | ... | 0 | 5 | 6 | 5 | 12.000000 | 0 | 9.4 | -0.8 | -3.12 | Graduate |
4423 | 1 | 5 | 1 | 15 | 1 | 1 | 9 | 23 | 27 | 6 | ... | 0 | 6 | 6 | 6 | 13.000000 | 0 | 12.7 | 3.7 | -1.70 | Graduate |
4424 rows × 35 columns
This dataset relies heavily on categorical data and as such, may need to be paired with another dataset or combed through to see if there is more viability in the categorical data provided. It has also already come to its own conclusion on whether the students will dropout or not based off the data it has gathered. However, the concept and data provided by the given dataset still prove to be intriguing and worth exploring.
If the categorical data is chosen as the preferred method of analyzing and predicting possible US college dropouts, we propose using a k-means classifier as well as a multiple regression analysis to observe the associated correlation between the given variables and final result of whether the student is predicted to be a dropout or not.