Predicting mental health of medical students based on demographic factors¶

Motivation:¶

Problem¶

A large percentage of medical students undergo heavy amounts of pressure and stress casued mt a multitude of factors such as difficult coursework, working long hours, and exposure to patients with high risk health conditions. As heavily researched by the American Health Organization, many medical students are at risk for symptoms of depression. For instance, the odds for a first year medical student developing depression by their fourth year increased by 50%.

Solution¶

By analyzing the how being a medical student causes experiences of high levels of depression or burnout, we can gain a better understanding of how different demographic groups are being affected by the field. By isloating a few variables ( for example - age, sex, or health status), we can potentially analyze which groups are experiencing the most amount of burnout or depression.

Impact¶

If successful, this work may shed insight about different types of psychological profiles of medical students. It can also help medical programs create more tailored mental health support resources that are aimed at targeted groups. Additionally, understanding the unique challenges faced by different demographic groups can improve communication between medical students, faculty, and administrators.

Dataset¶

Detail¶

We will use a Kaggle Dataset of Medical Students Mental Health to observe the following features for each student:

In [10]:
import pandas as pd
data = pd.read_csv("medical.csv")
data.head(10)
Out[10]:
id age year sex glang part job stud_h health psyt jspe qcae_cog qcae_aff amsp erec_mean cesd stai_t mbi_ex mbi_cy mbi_ea
0 2 18 1 1 120 1 0 56 3 0 88 62 27 17 0.738095 34 61 17 13 20
1 4 26 4 1 1 1 0 20 4 0 109 55 37 22 0.690476 7 33 14 11 26
2 9 21 3 2 1 0 0 36 3 0 106 64 39 17 0.690476 25 73 24 7 23
3 10 21 2 2 1 0 1 51 5 0 101 52 33 18 0.833333 17 48 16 10 21
4 13 21 3 1 1 1 0 22 4 0 102 58 28 21 0.690476 14 46 22 14 23
5 14 26 5 2 1 1 1 10 2 0 102 48 37 17 0.690476 14 56 18 15 18
6 17 23 5 2 1 1 0 15 3 0 117 58 38 23 0.714286 45 56 28 17 16
7 21 23 4 1 1 1 1 8 4 0 118 65 40 32 0.880952 6 36 11 10 27
8 23 23 4 2 1 1 1 20 2 0 118 69 46 23 0.666667 43 43 26 21 22
9 24 22 2 2 1 1 0 20 5 0 108 56 36 22 0.690476 11 43 18 6 23
  • age: Age of the participant. (Integer)
  • year: Year of study of the participant. (Integer)
  • sex:Gender of the participant. (String)
  • glang: Language spoken by the participant. (String)
  • job: Job of the participant. (String)
  • stud_h: Hours of study per week of the participant. (Integer)
  • health: Self-reported health status of the participant. (String)
  • psyt: Psychological distress score of the participant. (Integer)
  • jspe: Job satisfaction score of the participant. (Integer)
  • qcae_cog: Cognitive empathy score of the participant. (Integer)
  • qcae_aff: Affective empathy score of the participant. (Integer)
  • amsp: Academic motivation score of the participant. (Integer)
  • erec_mean: Empathy rating score mean of the participant. (Integer)
  • cesd: Center for Epidemiologic Studies Depression scale of the participant. (Integer)
  • stai_t: State-Trait Anxiety Inventory scale of the participant. (Integer)
  • mbi_ex: Maslach Burnout Inventory-Exhaustion scale of the participant. (Integer)
  • mbi_cy: Maslach Burnout Inventory - Cynicism Scale of the participant. (Integer)
  • mbi_ea: Maslach Burnout Inventory - Professional Efficacy Scale of the participant. (Integer)

Potential Problems¶

The scope of this analysis is limited as the data gathered took place in Switzerland. The results that are the conducted from this analysis may not apply to medical students everywhere as the participants in the study are only from one geographic region.

Method:¶

By isloating a few variables, we can study the relationship between the demographic variable and the level of depression a student typically experiences. Variables to measure likiliness of depression can be psyt, mbi_ex, and mbi_cy. A regression can be conducted to analyze the relationship of two or more variables through a line graph. This visualization will depict how each variable leads to an increase or decrease in likeliness of depression.