Factors on College Interest Among High Schoolers¶

Dataset: https://www.kaggle.com/datasets/saddamazyazy/go-to-college-dataset

1¶

Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

Problem Definition:

High school education, family status and living area could all be considered factors as to why a student may have the access, resources, or desire to get a college education. I have chosen this dataset to analyze becuase I believe it provides a lot of different potential analyzers. The level of education from the quality of a school a student goes to could impact their grades and as a result if they are able to get into a college. School accredidation is based on multiple factors however generally the higher level schools have better resources for students to learn [1]. Although college is not necessary to become sucessful, it should still be an option to anyone who wants to go. Being prepared with the correct education can greatly drive this desire. College prices have become increasingly expensive and unaffordable for many families [2]. I would like to know if a students level of interest to go to college is influenced by their parents salary and if their parents have ever gone to college or if their desire is linked to their grades. If this is the case, the quality of school and the area could also potentially impact a students grades due to access to better education and facilities. Overall, college can provide a larger range of career opprotunities, networking, and earning potential and should be an option for students [3]. This dataset will help inform of the factors that may lead to a student's desire to attend college.

[1] https://www.accreditedschoolsonline.org/resources/how-college-accreditation-works/#:~:text=Why%20Does%20Accreditation%20Matter%20So%20Much%3F%20Accreditation%20is,students%20for%20jobs%2C%20faculty%20quality%2C%20and%20curriculum%20strength.

[2] https://www.cnbc.com/2021/03/14/fewer-kids-going-to-college-because-of-cost.html

[3] https://cew.georgetown.edu/cew-reports/valueofcollegemajors/

2¶

Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

In [5]:
import pandas as pd
data = pd.read_csv('data.csv')
data.head()
Out[5]:
type_school school_accreditation gender interest residence parent_age parent_salary house_area average_grades parent_was_in_college will_go_to_college
0 Academic A Male Less Interested Urban 56 6950000 83.0 84.09 False True
1 Academic A Male Less Interested Urban 57 4410000 76.8 86.91 False True
2 Academic B Female Very Interested Urban 50 6500000 80.6 87.43 False True
3 Vocational B Male Very Interested Rural 49 6600000 78.2 82.12 True True
4 Academic A Female Very Interested Urban 57 5250000 75.1 86.79 False False
Data Measure Meaning
type_school The type of school student attends (academic or vocational)
school_accredidation Quality of school. A is better than B
gender Gender of student
interest Interest level of student attending college
residence Type of residence student lives in (urban or rural)
parent_age Age of parent
parent_salary Parent salary per month
house_area Parent house area in meter square
average_grades Average grade of student on a scale 0 - 100
parent_was_in_college If the parent ever attended college (True or False)

This is enough data to provide progress to the problem question since it contains the level of interest in college for each student as well as data about their family, residence, and school information. By comparing these measures you can see if there is a correlation between them.

3¶

Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do. For example:

We'll cluster the students into sets of students that are at the same level of interest in going to college. Allowing so will allow us to discover if there is a grouping of certain grades, residence area, school type, or parent salary based on the interest.