Sleep Quality Project¶

Problem Statement¶

Sleep is essential to vigilance, learning abilities, hand-eye coordination, mood, memory and more, yet people of all socioeconomic backgrounds and ages struggle to get both sufficient and quality sleep each night. Factors like caffiene and alcohol consumption, exercise, early bedtimes and sleep duration are often associated with sleep, but are they truly reflective of sleep quality, or just duration?

Solution¶

The goal of this project is to estimate how much deep (quality) sleep a person can expect to recieve based on how long they sleep, whether they have caffiene, when they get up in the morning, and how often they exercise.

Impact¶

If successful, this work may yield a regression model which predicts an amount of deep sleep (in hours) a person can expect to recieve based on the features in the dataset.

Potential negatives outcome of such a machine learning tool is that it may give people a potentially innacurate understanding of the health effects of factors like caffiene, alcohol, exercise, sleep duration, and bedtime, as these things impact each person uniquely and sleep quality is based on a variety of additional factors not counted here, including stress, noise pollution, and health.

Dataset:¶

Kaggle: Sleep Efficiency

This dataset shows the following features for factors in / measures of sleep

  • ID = unique subject identifier (int)
  • Age = age in years (int)
  • Gender = sex (str)
  • Bedtime = time they go to bed(military) (float ????)
  • Wakeup time = time they wake up (military) (float ????)
  • Sleep duration = amount of hours spent in bed (float)
  • Sleep efficiency = amount of time actually asleep (time slept out of time in bed) (float)
  • REM sleep percentage = percentage of REM sleep (float)
  • Deep sleep percentage = percentage of deep sleep (float)
  • Light sleep percentage = percentage of light sleep (float)
  • Awakenings = amount of times they reported waking up at night (float)
  • Caffeine consumption = caffiene consumotion in the 24 hours prior to bedtime (float)
  • Alcohol consumption = alcohol consumption in the 24 hours prior to bedtime (float)
  • Smoking status = smoking status (str)
  • Exercise frequency = days per week exercised
In [1]:
import pandas as pd
df_sleep_efficiency = pd.read_csv('sleep_efficiency.csv')
df_sleep_efficiency
Out[1]:
ID Age Gender Bedtime Wakeup time Sleep duration Sleep efficiency REM sleep percentage Deep sleep percentage Light sleep percentage Awakenings Caffeine consumption Alcohol consumption Smoking status Exercise frequency
0 1 65 Female 2021-03-06 01:00:00 2021-03-06 07:00:00 6.0 0.88 18 70 12 0.0 0.0 0.0 Yes 3.0
1 2 69 Male 2021-12-05 02:00:00 2021-12-05 09:00:00 7.0 0.66 19 28 53 3.0 0.0 3.0 Yes 3.0
2 3 40 Female 2021-05-25 21:30:00 2021-05-25 05:30:00 8.0 0.89 20 70 10 1.0 0.0 0.0 No 3.0
3 4 40 Female 2021-11-03 02:30:00 2021-11-03 08:30:00 6.0 0.51 23 25 52 3.0 50.0 5.0 Yes 1.0
4 5 57 Male 2021-03-13 01:00:00 2021-03-13 09:00:00 8.0 0.76 27 55 18 3.0 0.0 3.0 No 3.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
447 448 27 Female 2021-11-13 22:00:00 2021-11-13 05:30:00 7.5 0.91 22 57 21 0.0 0.0 0.0 No 5.0
448 449 52 Male 2021-03-31 21:00:00 2021-03-31 03:00:00 6.0 0.74 28 57 15 4.0 25.0 0.0 No 3.0
449 450 40 Female 2021-09-07 23:00:00 2021-09-07 07:30:00 8.5 0.55 20 32 48 1.0 NaN 3.0 Yes 0.0
450 451 45 Male 2021-07-29 21:00:00 2021-07-29 04:00:00 7.0 0.76 18 72 10 3.0 0.0 0.0 No 3.0
451 452 18 Male 2021-03-17 02:30:00 2021-03-17 10:00:00 7.5 0.63 22 23 55 1.0 50.0 0.0 No 1.0

452 rows × 15 columns

Solution Methodology¶

The sleep features of interest include: sleep efficiency, caffiene consumption, exercise, and wakeup time. These features will be used in a regression model that predicts the amount of deep sleep (duration * percentage of deep sleep) that a person can expect to recieve based on their lifestyle and sleep habits. These features are not equally important in determining sleep quality, so I would have to use scale normalization to equalize their respective weights.