Recently, I came across a news article saying breast cancer survivors can pause treatments in order to pursue pregnancy. I thought this was quite intriguing and decided to look into some datasets relating to breast cancer. I came across a very potential data set on Kaggle.
Link to dataset: https://www.kaggle.com/datasets/0248260fceaaaab93ceb231f0deb49f979a9ce4ed30f54260c8a18d9270bbcb0?resource=download
Link to article:
https://abcnews.go.com/Health/video/breast-cancer-survivors-pause-treatments-babies-study-95763252
import matplotlib.pyplot as plt
from google.colab import files
import pandas as pd
df = pd.read_csv("BRCA 2.csv")
df
Patient_ID | Age | Gender | Protein1 | Protein2 | Protein3 | Protein4 | Tumour_Stage | Histology | ER status | PR status | HER2 status | Surgery_type | Date_of_Surgery | Date_of_Last_Visit | Patient_Status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | TCGA-D8-A1XD | 36.0 | FEMALE | 0.080353 | 0.42638 | 0.54715 | 0.273680 | III | Infiltrating Ductal Carcinoma | Positive | Positive | Negative | Modified Radical Mastectomy | 15-Jan-17 | 19-Jun-17 | Alive |
1 | TCGA-EW-A1OX | 43.0 | FEMALE | -0.420320 | 0.57807 | 0.61447 | -0.031505 | II | Mucinous Carcinoma | Positive | Positive | Negative | Lumpectomy | 26-Apr-17 | 09-Nov-18 | Dead |
2 | TCGA-A8-A079 | 69.0 | FEMALE | 0.213980 | 1.31140 | -0.32747 | -0.234260 | III | Infiltrating Ductal Carcinoma | Positive | Positive | Negative | Other | 08-Sep-17 | 09-Jun-18 | Alive |
3 | TCGA-D8-A1XR | 56.0 | FEMALE | 0.345090 | -0.21147 | -0.19304 | 0.124270 | II | Infiltrating Ductal Carcinoma | Positive | Positive | Negative | Modified Radical Mastectomy | 25-Jan-17 | 12-Jul-17 | Alive |
4 | TCGA-BH-A0BF | 56.0 | FEMALE | 0.221550 | 1.90680 | 0.52045 | -0.311990 | II | Infiltrating Ductal Carcinoma | Positive | Positive | Negative | Other | 06-May-17 | 27-Jun-19 | Dead |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
336 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
337 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
338 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
339 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
340 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
341 rows × 16 columns
dict = {"Patient_ID": "Unique Identifier of the patient",
"Age": "Age of the patient at time of record ",
"Gender": "Gender of Patient at time of record ",
"Protein1" : "Expression level (undefined units)",
"Protein2":"Expression level (undefined units)",
"Protein3": "Expression level (undefined units)",
"Protein4" :"Expression level (undefined units)",
"Tumour_Stage": "stage 1,2 or 3",
"Histology":"microscopic structure of tissues. Types: Infiltrating Ductal Carcinoma, Infiltrating Lobular Carcinoma, Mucinous Carcinoma",
"ER status": "Negative/Positive",
"PR status": "Negative/Positive",
"HER2 status": "Negative/Positive",
"Surgery_type": "Lumpectomy, Simple Mastectomy, Modified Radical Mastectomy, Other",
"Date_of_Surgery": "date when the surgery was performed (DD-MM-YYYY)",
"Date_of_Last_Visit": "Date of last visit (DD-MM-YY), null if the patient didn’t visited again after the surgery",
"Patient_Status": "Dead/Alive" }
dict
{'Patient_ID': 'Unique Identifier of the patient', 'Age': 'Age of the patient at time of record ', 'Gender': 'Gender of Patient at time of record ', 'Protein1': 'Expression level (undefined units)', 'Protein2': 'Expression level (undefined units)', 'Protein3': 'Expression level (undefined units)', 'Protein4': 'Expression level (undefined units)', 'Tumour_Stage': 'stage 1,2 or 3', 'Histology': 'microscopic structure of tissues. Types: Infiltrating Ductal Carcinoma, Infiltrating Lobular Carcinoma, Mucinous Carcinoma', 'ER status': 'Negative/Positive', 'PR status': 'Negative/Positive', 'HER2 status': 'Negative/Positive', 'Surgery_type': 'Lumpectomy, Simple Mastectomy, Modified Radical Mastectomy, Other', 'Date_of_Surgery': 'date when the surgery was performed (DD-MM-YYYY)', 'Date_of_Last_Visit': 'Date of last visit (DD-MM-YY), null if the patient didn’t visited again after the surgery', 'Patient_Status': 'Dead/Alive'}
I could possibly use Machine Learning to predict whether the patient will survive based on certain characteristics like Age, Protein Expression Level, Tumor Stage, Histology, ER/PR/HER2 status, surgery type. Hopefully, we will be able to figure out the type of surgery that is most successful. I am really excited about the research I will be doing in order to understand the different types of surgeries, histologies, protein and how it affects patients in general.