Parkinsons Detection¶

Part 1¶

Disease detection is becoming increasingly important in Data science. If we are able to detect if a person has symptoms of a particular disease or if we can predict the disease early, we can reduce the chances of a person getting the disease. By using data science, we can try and predict if a person has Parkinsons disease based on their voice recordings.

Part 2¶

In [4]:
import pandas as pd


df_parks = pd.read_csv('parkinsons.data')
df_parks.dropna(how='any', inplace=True)
df_parks.head()
Out[4]:
name MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer ... Shimmer:DDA NHR HNR status RPDE DFA spread1 spread2 D2 PPE
0 phon_R01_S01_1 119.992 157.302 74.997 0.00784 0.00007 0.00370 0.00554 0.01109 0.04374 ... 0.06545 0.02211 21.033 1 0.414783 0.815285 -4.813031 0.266482 2.301442 0.284654
1 phon_R01_S01_2 122.400 148.650 113.819 0.00968 0.00008 0.00465 0.00696 0.01394 0.06134 ... 0.09403 0.01929 19.085 1 0.458359 0.819521 -4.075192 0.335590 2.486855 0.368674
2 phon_R01_S01_3 116.682 131.111 111.555 0.01050 0.00009 0.00544 0.00781 0.01633 0.05233 ... 0.08270 0.01309 20.651 1 0.429895 0.825288 -4.443179 0.311173 2.342259 0.332634
3 phon_R01_S01_4 116.676 137.871 111.366 0.00997 0.00009 0.00502 0.00698 0.01505 0.05492 ... 0.08771 0.01353 20.644 1 0.434969 0.819235 -4.117501 0.334147 2.405554 0.368975
4 phon_R01_S01_5 116.014 141.781 110.655 0.01284 0.00011 0.00655 0.00908 0.01966 0.06425 ... 0.10470 0.01767 19.649 1 0.417356 0.823484 -3.747787 0.234513 2.332180 0.410335

5 rows × 24 columns

In [ ]:
data_dict = {'name': 'subject name and recording number',
             'MDVP:Fo(Hz)': 'Average vocal fundamental frequency',
             'MDVP:Fhi(Hz)': 'Maximum vocal fundamental frequency',
             'MDVP:Flo(Hz)': 'Minimum vocal fundamental frequency',
             'MDVP:Jitter(%)': 'measure of variation in fundamental frequency' ,
             'MDVP:Jitter(Abs)': 'measure of variation in fundamental frequency',
             'MDVP:RAP': 'measure of variation in fundamental frequency',
             'MDVP:PPQ': 'measure of variation in fundamental frequency',
             'Jitter:DDP': 'measure of variation in fundamental frequency',
             'MDVP:Shimmer': 'measure of variation in amplitude',
             'Shimmer:DDA': 'measure of variation in amplitude',
             'NHR': 'measure of ratio of noise to tonal components in the voice',
             'HNR': 'measure of ratio of noise to tonal components in the voice',
             'status': "Health status of the subject (one) - Parkinson's, (zero) - healthy ",
             'RPDE': 'nonlinear dynamical complexity measure',
             'DFA': 'Signal fractal scaling exponent',
             'spread1': 'nonlinear measure of fundamental frequency variation',
             'spread2': 'nonlinear measure of fundamental frequency variation',
             'D2': 'nonlinear dynamical complexity measure',
             'PPE': 'nonlinear measure of fundamental frequency variation'}

Link to dataset: https://archive.ics.uci.edu/ml/datasets/parkinsons

Part 3¶

We'll build and train a classifier to predict whether a person has parkinsons based on the data of voice recordings for each person. We will use the different voice variables to train a classifier to detect if a person has parkinsons (1) or not (0).