Monday 6 August 2018

Machine Learning | Cancer prediction

In a previous post a number of machine learning models were trained on a dataset. Details of the data set are here. The relative importance of each of the 32 dimensions was also deduced from a multi-dimensional visualisation. This post builds on the results of that to finally use the models for prognosis.

video for post content

The models that were trained included a Gradient boosting classifier, Random forest, an SVC and a K-neighbour classifier. In addition to those models, another Gradient boosting classifier was trained with parameter tuning. For the algorithm a range of values for learning rate, max_features and max depth were specified in a parameter grid. A model selection and evaluation tool like GridSearchCV and a scoring parameter is used to choose a model with best quality.

Using the generated models:

After this model training and generation, we need to use the models. Also when there are a number of algorithms, samples and models, it would be great to select a model and apply an out of bag data/sample through it. 

To do that with this cancer data set, we serialise the models, samples and provision them over an application. The models and their details are stored in the database with a link to their respective files. The same is done for samples. Since each model utilises a particular algorithm, a list of algorithms is also maintained. So algorithms, models and samples can be added to the application. From there a user interface is presented to choose a sample and the model to apply.

This enables users in, say a medical institution, to add new models/samples as they are generated and conduct a prognosis. Models can also be revisited and looked up as they are updated.

Screens of conducting a prognosis on stored models and samples are shown below. 

1) Selecting a sample

2) Selecting a model

3) Result

Another prognosis with K-neighbour classifier on a different sample.

Details of a particular model can also be viewed.