May 2020


Machine learning has interesting applications in the field of healthcare, including disease diagnosis prediction. Researchers can feed their model the necessary patient data and recieve a prediction accordingly. Its interesting to see how these predictive tools could be used to assist clinicians in the future :o

The aim of this project is to make predictions about the survivability rate of cancer patients using data from the cBioPortal

By utilizing the patients’ clinical and genetic data, we can predict their 1-year, 3-year, and 5-year surival rate. 

Because we will be assessing sixteen different cancer types, we ran through many different workflows to see how we can best clean and preprocess the data, train the model, and validate the accuracy and ROC scores.


There were many decisions to make along the preprocess->train->validate pipeline so this tree-based model helped with visualization. 

We ran several different pipeline combinations and tried to evaluate bottlenecks in accuracy. This is still an ongoing project and will be updated as further insights are made.