Underlying Cause and Effect of Alcohol use in High School¶

Project Overview¶

I will be looking at secondary school students (High School), specifically regarding alcohol consumption and it's underlying causes and effects such as on school performance. The data set titled "Student Alcohol Consumption" I sourced from kaggle and contains survey data from a portugese secondary school which contains a lot of interesting social, gender, and study information about students.

I chose to look into alcohol consumption in secondary school because it is not only prevelant, but has long lasting personal, and societal impacts. There has been many studies into alcohol's affect on adolescents, specifically brain development. Today I'm going to be referencing research done by the McLean Hospital. Firstly, they show that as the brain is still developing, it is significantly more vunerable to alcohol than the adult brain. It also indicates that the earlier a person starts drinking, the more likely that person will develop serious problems with alcohol or drug addition later in life. It also has significant financial implications.

According to the CDC's report on "Current and Binge Drinking Among High School Students — United States, 1991–2015," excessive alcohol consumption was responsible for approximately 4,300 deaths among people aged < 21, and cost the US over 24.3 Billion dollars in 2010, which is 33 Billion in today's value. Not only is alcohol stuting the brain development of adolescents, but has significant financial implications.

With this project I want to be able to see what sort of effect alcohol consumption for a student, or even with their family, has on their overall school performance. In addition, I want to look into possible underlying causes which may predict secondary school alcohol consumption. Some initial ideas for what these may be include but not limited to, parental status (living together vs apart), mother and father's education levels, extracurricular activites, free time after school, and quality of family relationships.

Data Set Overview¶

In [11]:
import pandas as pd
pd.read_csv('student-mat.csv')
Out[11]:
school sex age address famsize Pstatus Medu Fedu Mjob Fjob ... famrel freetime goout Dalc Walc health absences G1 G2 G3
0 GP F 18 U GT3 A 4 4 at_home teacher ... 4 3 4 1 1 3 6 5 6 6
1 GP F 17 U GT3 T 1 1 at_home other ... 5 3 3 1 1 3 4 5 5 6
2 GP F 15 U LE3 T 1 1 at_home other ... 4 3 2 2 3 3 10 7 8 10
3 GP F 15 U GT3 T 4 2 health services ... 3 2 2 1 1 5 2 15 14 15
4 GP F 16 U GT3 T 3 3 other other ... 4 3 2 1 2 5 4 6 10 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
390 MS M 20 U LE3 A 2 2 services services ... 5 5 4 4 5 4 11 9 9 9
391 MS M 17 U LE3 T 3 1 services services ... 2 4 5 3 4 2 3 14 16 16
392 MS M 21 R GT3 T 1 1 other other ... 5 5 3 3 3 3 3 10 8 7
393 MS M 18 R LE3 T 3 2 services other ... 4 4 1 3 4 5 0 11 12 10
394 MS M 19 U LE3 T 1 1 other at_home ... 3 2 3 3 3 5 5 8 9 9

395 rows × 33 columns

Data Set Columns and Format¶

  1. sex - student's sex (binary: 'F' - female or 'M' - male)
  2. age - student's age (numeric: from 15 to 22)
  3. address - student's home address type (binary: 'U' - urban or 'R' - rural)
  4. famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
  5. Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
  6. Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
  7. Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
  8. Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
  9. Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
  10. guardian - student's guardian (nominal: 'mother', 'father' or 'other')
  11. studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
  12. failures - number of past class failures (numeric: n if 1<=n<3, else 4)
  13. schoolsup - extra educational support (binary: yes or no)
  14. famsup - family educational support (binary: yes or no)
  15. activities - extra-curricular activities (binary: yes or no)
  16. nursery - attended nursery school (binary: yes or no)
  17. higher - wants to take higher education (binary: yes or no)
  18. internet - Internet access at home (binary: yes or no)
  19. romantic - with a romantic relationship (binary: yes or no)
  20. famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
  21. freetime - free time after school (numeric: from 1 - very low to 5 - very high)
  22. goout - going out with friends (numeric: from 1 - very low to 5 - very high)
  23. Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
  24. Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
  25. health - current health status (numeric: from 1 - very bad to 5 - very good)
  26. absences - number of school absences (numeric: from 0 to 93)

Math grades for each given student

  • G1 - first period grade (numeric: from 0 to 20)
  • G2 - second period grade (numeric: from 0 to 20)
  • G3 - final grade (numeric: from 0 to 20, output target)

How will we use this data¶

Firstly, to see the after effect of alcohol consumption we can run linear regression tests betwteen students' total alcohol consumption with factors such as a student's parental cohabilitation status, quality of family relationships, as well as family education level. Heavy drinking was most common among students when a family member drinks too much, Tests like these will allow us to see if there is any meaningful relationship between alcohol consumption and a student's family circumstances. Using machine learning, we can also try and predict a student's alcohol consumption by the strength of their familial relationships.

Finally, we can try and guage the success of a student using machine learning to predict their final grade based on their alcohol consumption patterns or any other factor which me may find to be correlated with greater drinking.

In [ ]: