sleep¶

1.¶

A real world problem I would like to focus on for my final project for DS2500 is the issue of sleep. Although many people may think that insufficient sleep is not a major issue, consistent lack of sleep can lead to a variety of health issues such as heart disease, kidney disease, high blood pressure, diabetes, stroke, obesity, and depression (NIH). Many people also consider the time from when they get in their bed to the time they wake up to be considered the number of hours of sleep they are getting, when in reality there is a major difference in the quality of sleep between a night of light sleep with frequent awakenings and a night with a proper balance of light, REM, and deep sleep. Each stage of sleep has its respective benefits and it is especially important that people consistently get enough hours of REM and deep sleep (Pacheco). Factors such as the time a person's bedtime, gender, age, caffeine consumption, alcohol consumption, smoking status, and exercise can all play a role in a person's ability to get both quantity and quality hours of sleep. For my DS2500 project, using a Kaggle data set, I hope to identify which factors play a role in either helping or hindering sleep quality so that people can help improve their sleep based on adjustments to their current lifestyle habbits.

                                   Works Cited

NIH. “Brain Basics: Understanding Sleep.” National Institute of Neurological Disorders and Stroke, U.S. Department of Health and Human Services, 10 Feb. 2023, https://www.ninds.nih.gov/health-information/public-education/brain-basics/brain-basics-understanding-sleep#:~:text=There%20are%20two%20basic%20types,brain%20waves%20and%20neuronal%20activity.

Pacheco, Danielle. “Deep Sleep: What It Is and How Much You Need.” Sleep Foundation, 13 Feb. 2023, https://www.sleepfoundation.org/stages-of-sleep/deep-sleep#:~:text=What%20Is%20Deep%20Sleep%3F,View%20Source%20.

2.¶

In [6]:
import pandas as pd

df_sleep = pd.read_csv('sleep_efficiency.csv')
df_sleep
Out[6]:
ID Age Gender Bedtime Wakeup time Sleep duration Sleep efficiency REM sleep percentage Deep sleep percentage Light sleep percentage Awakenings Caffeine consumption Alcohol consumption Smoking status Exercise frequency
0 1 65 Female 2021-03-06 01:00:00 2021-03-06 07:00:00 6.0 0.88 18 70 12 0.0 0.0 0.0 Yes 3.0
1 2 69 Male 2021-12-05 02:00:00 2021-12-05 09:00:00 7.0 0.66 19 28 53 3.0 0.0 3.0 Yes 3.0
2 3 40 Female 2021-05-25 21:30:00 2021-05-25 05:30:00 8.0 0.89 20 70 10 1.0 0.0 0.0 No 3.0
3 4 40 Female 2021-11-03 02:30:00 2021-11-03 08:30:00 6.0 0.51 23 25 52 3.0 50.0 5.0 Yes 1.0
4 5 57 Male 2021-03-13 01:00:00 2021-03-13 09:00:00 8.0 0.76 27 55 18 3.0 0.0 3.0 No 3.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
447 448 27 Female 2021-11-13 22:00:00 2021-11-13 05:30:00 7.5 0.91 22 57 21 0.0 0.0 0.0 No 5.0
448 449 52 Male 2021-03-31 21:00:00 2021-03-31 03:00:00 6.0 0.74 28 57 15 4.0 25.0 0.0 No 3.0
449 450 40 Female 2021-09-07 23:00:00 2021-09-07 07:30:00 8.5 0.55 20 32 48 1.0 NaN 3.0 Yes 0.0
450 451 45 Male 2021-07-29 21:00:00 2021-07-29 04:00:00 7.0 0.76 18 72 10 3.0 0.0 0.0 No 3.0
451 452 18 Male 2021-03-17 02:30:00 2021-03-17 10:00:00 7.5 0.63 22 23 55 1.0 50.0 0.0 No 1.0

452 rows × 15 columns

Data Dictionary¶

Field Name Data Type Description
ID int a unique identifier for each test subject
Age int age of the test subject
Gender str male or female
Bedtime str the time the test subject goes to bed each night
Wakeup Time str the time the test subject wakes up each morning
Sleep Duration float the total amount of time the test subject slept (in hours)
Sleep Efficiency float a measure of the proportion of time in bed spent asleep
REM Sleep Percentage int the percentage of total sleep time spent in REM sleep
Deep Sleep Percentage int the percentage of total sleep time spent in deep sleep
Light Sleep Percentage int the percentage of total sleep time spent in light sleep
Awakenings float the number of times the test subject wakes up during the night
Caffeine Consumption float the amount of caffeine consumed in the 24 hours prior to bedtime (in mg)
Alcohol Consumption float the amount of alcohol consumed in the 24 hours prior to bedtime (in oz)
Smoking Status str whether or not the test subject smokes
Exercise Frequency float the number of times the test subject exercises each week

The data above is sufficient to make progress on my real-world problem described above. Basic, relevant information about each subject such as their age, gender, and sleep schedule is listed. Potential factors for impacting sleep quality such as caffine consumption, alcohol consumption, smoking status, and exercise frequency are all measured for each of the subjects. Awakenings, REM Sleep, Deep Sleep, and Light Sleep percentages for each of the subjects are also given so both the quantity and quality of their sleep can be tracked.

3.¶

For the project, I plan on clustering together subjects based on their age and lifestyle habbits (caffine consumption, alcohol consumption, smoking status, and exercise frequency). Clustering by both lifestyle habbits and age will allow us to see if there is a correlation between sleep quality and all of these lifestyle factors while eliminating the possibilitity that the differences in sleep quality may be due to difference in age.

In [ ]: