cancer¶

Cancer is one of the worlds most vile and evil diseases that kills almost 10 million people annually. Cancer is dependent on a multitude of things, a holistic combination of genetic and environmental factors. Because cancer is so complicated yet obviously such a massive issue, it is important that people be made more aware of specific factors or ways they can diminish the chances they develop such a horrendous disease.

In [1]:
import pandas as pd
pd.read_excel('https://query.data.world/s/4vvd4j2sbjbfecu7asuoavllk654ku')
Out[1]:
Patient Id Age Gender Air Pollution Alcohol use Dust Allergy OccuPational Hazards Genetic Risk chronic Lung Disease Balanced Diet ... Fatigue Weight Loss Shortness of Breath Wheezing Swallowing Difficulty Clubbing of Finger Nails Frequent Cold Dry Cough Snoring Level
0 P1 33 1 2 4 5 4 3 2 2 ... 3 4 2 2 3 1 2 3 4 Low
1 P10 17 1 3 1 5 3 4 2 2 ... 1 3 7 8 6 2 1 7 2 Medium
2 P100 35 1 4 5 6 5 5 4 6 ... 8 7 9 2 1 4 6 7 2 High
3 P1000 37 1 7 7 7 7 6 7 7 ... 4 2 3 1 4 5 6 7 5 High
4 P101 46 1 6 8 7 7 7 6 7 ... 3 2 4 1 4 2 4 2 3 High
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 P995 44 1 6 7 7 7 7 6 7 ... 5 3 2 7 8 2 4 5 3 High
996 P996 37 2 6 8 7 7 7 6 7 ... 9 6 5 7 2 4 3 1 4 High
997 P997 25 2 4 5 6 5 5 4 6 ... 8 7 9 2 1 4 6 7 2 High
998 P998 18 2 6 8 7 7 7 6 7 ... 3 2 4 1 4 2 4 2 3 High
999 P999 47 1 6 5 6 5 5 4 6 ... 8 7 9 2 1 4 6 7 2 High

1000 rows × 25 columns

Factors ranked on a scale 1-10, 10 being the most evident in the patient patient_id: string age: integer gender: integer air_pollution: integer alcohol_use: integer dust_allergy: integer occupational_hazards: integer genetic_risk: integer chronic_lung_disease: integer balanced_diet: integer obesity: integer smoking: integer passive_smoker: integer chest_pain: integer coughing_of_blood: integer fatigue: integer weight_loss: integer shortness_of_breath: integer wheezing: integer swallowing_difficulty: integer clubbing_of_finger_nails: integer frequent_cold: integer dry_cough: integer snoring: integer

Although it is a subjective disease meaning that some environmental factors may be be stronger for predispositioning certain people to cancer, we can use data to extrapolate the most serious and strongest cancer-causing factors. We can group each factor as its own and view how prevalent those factors are in most patients from the dataset, and also look at factors in conjunction to see if that impacts the likelihhood of developing cancer. Because these data are ranked by severity in any givven factor 1-5, it will come in handy to distinguish between direct causation vs circumstantial correlation.