Tuesday 19 November 2019

Deep learning | Tensor Flow | Building a neural net *Updated with hyper parameter tuning

Here we look at training a neural network for classifying UCI cancer data as benign or malignant. Project has multiple machine learning models for this purpose and is described here In this post a neural network is built using keras and tensor flow backend. Scikit-learn is used for pre-processing data. The final network with hyper parameter tuning looks like this.


Initial hyper parameters used:


1) Visualise the data: There are 36 dimensions of which 10 are not contributing to the prediction. A visualisation is already in the project here. Those 10 dimensions are removed to decrease noise.

2) Pre-process data: The data distribution is examined. For each dimension check the value distribution. The objective here is to figure out which way to pre-process the data. Neural network layers have activation functions. The common activation functions like relu work as max(0, value). Also in each epoch the weights are tweaked by a small amount. If the input is not normalized it can affect the learning. Here the data is normalized using MinMax scaler so that each dimension is between 0 and 1. The medical data renders itself to this better. Every dimension did not have following type of distribution. Figure is for cell texture.



3) Build the layers (dropouts, layer count and neurons in each layer): The decision of how many layers and how many neurons on each layer is mostly based on accuracy/loss plots for fit (see end of post) and visualisation in step 1. The model looks like this.



4) Parameter TuningUsing Grid Search CV the following parameters and hyper parameters were tuned:

a) Optimizer: SGD was selected after comparison with RMSprop and Adam.



b) Epochs: 50-55 was selected after GridSearchCV and Stratified k-fold cross validation.

c) Batch size: Initially 32 but 5 was chosen after GridSearch CV with 30, 35, 40 among other options.

d) Learning rate: initially kept at 0.027 during cross validation. This was changed after looking at loss and accuracy curves from initial training.  Finally using GridSearchCV 0.24.

e) Momentum: 0.44 initially tweaked on loss and accuracy curve plots. Final value 0.41.

A screen of the grid search with best params



Number of intermediate layers, layer neuron counts and dropouts in each layer were decided based on fit from accuracy and loss curves during training. Also each dropout layer has a different dropout ratio. 

Avoiding over fitting, under fitting and unknown fitting. Unnecessarily higher epoch count resulted in over fitting, lower epoch gave under fitting. The epoch count was increased after the intermediate layers with more neurons were added. This resulted in an acceptable fit in loss function behaviour. An example of over fitting is shown below. Notice that validation error is going up as training error is staying low towards right.


.

No comments: