A real world problem I would like to focus on for my final project for DS2500 is the issue of sleep. Although many people may think that insufficient sleep is not a major issue, consistent lack of sleep can lead to a variety of health issues such as heart disease, kidney disease, high blood pressure, diabetes, stroke, obesity, and depression (NIH). Many people also consider the time from when they get in their bed to the time they wake up to be considered the number of hours of sleep they are getting, when in reality there is a major difference in the quality of sleep between a night of light sleep with frequent awakenings and a night with a proper balance of light, REM, and deep sleep. Each stage of sleep has its respective benefits and it is especially important that people consistently get enough hours of REM and deep sleep (Pacheco). Factors such as the time a person's bedtime, gender, age, caffeine consumption, alcohol consumption, smoking status, and exercise can all play a role in a person's ability to get both quantity and quality hours of sleep. For my DS2500 project, using a Kaggle data set, I hope to identify which factors play a role in either helping or hindering sleep quality so that people can help improve their sleep based on adjustments to their current lifestyle habbits.
Works Cited
NIH. “Brain Basics: Understanding Sleep.” National Institute of Neurological Disorders and Stroke, U.S. Department of Health and Human Services, 10 Feb. 2023, https://www.ninds.nih.gov/health-information/public-education/brain-basics/brain-basics-understanding-sleep#:~:text=There%20are%20two%20basic%20types,brain%20waves%20and%20neuronal%20activity.
Pacheco, Danielle. “Deep Sleep: What It Is and How Much You Need.” Sleep Foundation, 13 Feb. 2023, https://www.sleepfoundation.org/stages-of-sleep/deep-sleep#:~:text=What%20Is%20Deep%20Sleep%3F,View%20Source%20.
import pandas as pd
df_sleep = pd.read_csv('sleep_efficiency.csv')
df_sleep
ID | Age | Gender | Bedtime | Wakeup time | Sleep duration | Sleep efficiency | REM sleep percentage | Deep sleep percentage | Light sleep percentage | Awakenings | Caffeine consumption | Alcohol consumption | Smoking status | Exercise frequency | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 65 | Female | 2021-03-06 01:00:00 | 2021-03-06 07:00:00 | 6.0 | 0.88 | 18 | 70 | 12 | 0.0 | 0.0 | 0.0 | Yes | 3.0 |
1 | 2 | 69 | Male | 2021-12-05 02:00:00 | 2021-12-05 09:00:00 | 7.0 | 0.66 | 19 | 28 | 53 | 3.0 | 0.0 | 3.0 | Yes | 3.0 |
2 | 3 | 40 | Female | 2021-05-25 21:30:00 | 2021-05-25 05:30:00 | 8.0 | 0.89 | 20 | 70 | 10 | 1.0 | 0.0 | 0.0 | No | 3.0 |
3 | 4 | 40 | Female | 2021-11-03 02:30:00 | 2021-11-03 08:30:00 | 6.0 | 0.51 | 23 | 25 | 52 | 3.0 | 50.0 | 5.0 | Yes | 1.0 |
4 | 5 | 57 | Male | 2021-03-13 01:00:00 | 2021-03-13 09:00:00 | 8.0 | 0.76 | 27 | 55 | 18 | 3.0 | 0.0 | 3.0 | No | 3.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
447 | 448 | 27 | Female | 2021-11-13 22:00:00 | 2021-11-13 05:30:00 | 7.5 | 0.91 | 22 | 57 | 21 | 0.0 | 0.0 | 0.0 | No | 5.0 |
448 | 449 | 52 | Male | 2021-03-31 21:00:00 | 2021-03-31 03:00:00 | 6.0 | 0.74 | 28 | 57 | 15 | 4.0 | 25.0 | 0.0 | No | 3.0 |
449 | 450 | 40 | Female | 2021-09-07 23:00:00 | 2021-09-07 07:30:00 | 8.5 | 0.55 | 20 | 32 | 48 | 1.0 | NaN | 3.0 | Yes | 0.0 |
450 | 451 | 45 | Male | 2021-07-29 21:00:00 | 2021-07-29 04:00:00 | 7.0 | 0.76 | 18 | 72 | 10 | 3.0 | 0.0 | 0.0 | No | 3.0 |
451 | 452 | 18 | Male | 2021-03-17 02:30:00 | 2021-03-17 10:00:00 | 7.5 | 0.63 | 22 | 23 | 55 | 1.0 | 50.0 | 0.0 | No | 1.0 |
452 rows × 15 columns
Field Name | Data Type | Description |
---|---|---|
ID | int |
a unique identifier for each test subject |
Age | int |
age of the test subject |
Gender | str |
male or female |
Bedtime | str |
the time the test subject goes to bed each night |
Wakeup Time | str |
the time the test subject wakes up each morning |
Sleep Duration | float |
the total amount of time the test subject slept (in hours) |
Sleep Efficiency | float |
a measure of the proportion of time in bed spent asleep |
REM Sleep Percentage | int |
the percentage of total sleep time spent in REM sleep |
Deep Sleep Percentage | int |
the percentage of total sleep time spent in deep sleep |
Light Sleep Percentage | int |
the percentage of total sleep time spent in light sleep |
Awakenings | float |
the number of times the test subject wakes up during the night |
Caffeine Consumption | float |
the amount of caffeine consumed in the 24 hours prior to bedtime (in mg) |
Alcohol Consumption | float |
the amount of alcohol consumed in the 24 hours prior to bedtime (in oz) |
Smoking Status | str |
whether or not the test subject smokes |
Exercise Frequency | float |
the number of times the test subject exercises each week |
The data above is sufficient to make progress on my real-world problem described above. Basic, relevant information about each subject such as their age, gender, and sleep schedule is listed. Potential factors for impacting sleep quality such as caffine consumption, alcohol consumption, smoking status, and exercise frequency are all measured for each of the subjects. Awakenings, REM Sleep, Deep Sleep, and Light Sleep percentages for each of the subjects are also given so both the quantity and quality of their sleep can be tracked.
For the project, I plan on clustering together subjects based on their age and lifestyle habbits (caffine consumption, alcohol consumption, smoking status, and exercise frequency). Clustering by both lifestyle habbits and age will allow us to see if there is a correlation between sleep quality and all of these lifestyle factors while eliminating the possibilitity that the differences in sleep quality may be due to difference in age.