Sleep Quality Project¶

Problem Statement¶

Sleep is essential to vigilance, learning abilities, hand-eye coordination, mood, memory and more, yet people of all socioeconomic backgrounds and ages struggle to get both sufficient and quality sleep each night. Factors like caffiene and alcohol consumption, exercise, early bedtimes and sleep duration are often associated with sleep, but are they truly reflective of sleep quality, or just duration?

Solution¶

The goal of this project is to estimate how much deep (quality) sleep a person can expect to recieve based on how long they sleep, whether they have caffiene, when they get up in the morning, and how often they exercise.

Impact¶

If successful, this work may yield a regression model which predicts an amount of deep sleep (in hours) a person can expect to recieve based on the features in the dataset.

Potential negatives outcome of such a machine learning tool is that it may give people a potentially innacurate understanding of the health effects of factors like caffiene, alcohol, exercise, sleep duration, and bedtime, as these things impact each person uniquely and sleep quality is based on a variety of additional factors not counted here, including stress, noise pollution, and health.

Dataset:¶

Kaggle: Sleep Efficiency

This dataset shows the following features for factors in / measures of sleep

ID = unique subject identifier (int)
Age = age in years (int)
Gender = sex (str)
Bedtime = time they go to bed(military) (float ????)
Wakeup time = time they wake up (military) (float ????)
Sleep duration = amount of hours spent in bed (float)
Sleep efficiency = amount of time actually asleep (time slept out of time in bed) (float)
REM sleep percentage = percentage of REM sleep (float)
Deep sleep percentage = percentage of deep sleep (float)
Light sleep percentage = percentage of light sleep (float)
Awakenings = amount of times they reported waking up at night (float)
Caffeine consumption = caffiene consumotion in the 24 hours prior to bedtime (float)
Alcohol consumption = alcohol consumption in the 24 hours prior to bedtime (float)
Smoking status = smoking status (str)
Exercise frequency = days per week exercised

In [1]:

import pandas as pd
df_sleep_efficiency = pd.read_csv('sleep_efficiency.csv')
df_sleep_efficiency

Out[1]:

	ID	Age	Gender	Bedtime	Wakeup time	Sleep duration	Sleep efficiency	REM sleep percentage	Deep sleep percentage	Light sleep percentage	Awakenings	Caffeine consumption	Alcohol consumption	Smoking status	Exercise frequency
0	1	65	Female	2021-03-06 01:00:00	2021-03-06 07:00:00	6.0	0.88	18	70	12	0.0	0.0	0.0	Yes	3.0
1	2	69	Male	2021-12-05 02:00:00	2021-12-05 09:00:00	7.0	0.66	19	28	53	3.0	0.0	3.0	Yes	3.0
2	3	40	Female	2021-05-25 21:30:00	2021-05-25 05:30:00	8.0	0.89	20	70	10	1.0	0.0	0.0	No	3.0
3	4	40	Female	2021-11-03 02:30:00	2021-11-03 08:30:00	6.0	0.51	23	25	52	3.0	50.0	5.0	Yes	1.0
4	5	57	Male	2021-03-13 01:00:00	2021-03-13 09:00:00	8.0	0.76	27	55	18	3.0	0.0	3.0	No	3.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
447	448	27	Female	2021-11-13 22:00:00	2021-11-13 05:30:00	7.5	0.91	22	57	21	0.0	0.0	0.0	No	5.0
448	449	52	Male	2021-03-31 21:00:00	2021-03-31 03:00:00	6.0	0.74	28	57	15	4.0	25.0	0.0	No	3.0
449	450	40	Female	2021-09-07 23:00:00	2021-09-07 07:30:00	8.5	0.55	20	32	48	1.0	NaN	3.0	Yes	0.0
450	451	45	Male	2021-07-29 21:00:00	2021-07-29 04:00:00	7.0	0.76	18	72	10	3.0	0.0	0.0	No	3.0
451	452	18	Male	2021-03-17 02:30:00	2021-03-17 10:00:00	7.5	0.63	22	23	55	1.0	50.0	0.0	No	1.0

452 rows × 15 columns

Solution Methodology¶

The sleep features of interest include: sleep efficiency, caffiene consumption, exercise, and wakeup time. These features will be used in a regression model that predicts the amount of deep sleep (duration * percentage of deep sleep) that a person can expect to recieve based on their lifestyle and sleep habits. These features are not equally important in determining sleep quality, so I would have to use scale normalization to equalize their respective weights.