Saturday, 30 November 2019

Deep Learning | Tensor flow | GPU (cuda/cuDNN) Vs CPU w and w/o EarlyStopping

In the previous post we built a neural net and tuned its hyper parameters. The hyper parameters were tuned using GridSearchCV. Here we look at modifying training the same network on GPU and compare that with training on CPU. 

Tools used
Tensorflow-gpu: 2.0.0
Keras version 2.3.1
Pandas version 0.25.3
Scikitlearn version 0.21.3

Nvidia GTX 960M 2GB
Intel i7 6700HQ 16 GB

Now, the number of epochs needed was tuned separately last time. Loss/accuracy Vs epochs curves raises the question of whether those many epochs are needed for this network and data. This has an impact on the time needed to train the network. Given the low end GPU the numbers are as expected.

Could we train the network in a lower duration with an acceptable loss of accuracy?

Early stopping is used here to answer this question. Results are shown below. Instead of going through 55 epochs, it decides to stop when the loss cannot be minimised beyond a certain point around 17-27 epochs. Early stopping parameters used is shown below.


Hyper parameter grid is small to begin with

Results (2 parameters and 3 values each)

Wall time Accuracy learning rate momentum
CPU 1 min 28 sec 0.978685 0.03 0.43
CPU Early Stopping 35.8 sec 0.973357 0.024 0.39
GPU 9 min 1 sec 0.978685 0.024 0.41
GPU Early Stopping 2 min 54 sec 0.971580 0.024 0.41

Results (2 parameters 5 values each)

Parameter grid is modified with additional ranges.

Wall time Accuracy learning rate momentum
CPU Early Stopping 1 min 35 sec 0.9822 0.03 0.39
GPU Early Stopping 9 min 37 sec 0.9751 0.033 0.41

Tuesday, 19 November 2019

Deep learning | Tensor Flow | Building a neural net *Updated with hyper parameter tuning

Here we look at training a neural network for classifying UCI cancer data as benign or malignant. Project has multiple machine learning models for this purpose and is described here In this post a neural network is built using keras and tensor flow backend. Scikit-learn is used for pre-processing data. The final network with hyper parameter tuning looks like this.


Initial hyper parameters used:


1) Visualise the data: There are 36 dimensions of which 10 are not contributing to the prediction. A visualisation is already in the project here. Those 10 dimensions are removed to decrease noise.

2) Pre-process data: The data distribution is examined. For each dimension check the value distribution. The objective here is to figure out which way to pre-process the data. Neural network layers have activation functions. The common activation functions like relu work as max(0, value). Also in each epoch the weights are tweaked by a small amount. If the input is not normalized it can affect the learning. Here the data is normalized using MinMax scaler so that each dimension is between 0 and 1. The medical data renders itself to this better. Every dimension did not have following type of distribution. Figure is for cell texture.



3) Build the layers (dropouts, layer count and neurons in each layer): The decision of how many layers and how many neurons on each layer is mostly based on accuracy/loss plots for fit (see end of post) and visualisation in step 1. The model looks like this.



4) Parameter TuningUsing Grid Search CV the following parameters and hyper parameters were tuned:

a) Optimizer: SGD was selected after comparison with RMSprop and Adam.



b) Epochs: 50-55 was selected after GridSearchCV and Stratified k-fold cross validation.

c) Batch size: Initially 32 but 5 was chosen after GridSearch CV with 30, 35, 40 among other options.

d) Learning rate: initially kept at 0.027 during cross validation. This was changed after looking at loss and accuracy curves from initial training.  Finally using GridSearchCV 0.24.

e) Momentum: 0.44 initially tweaked on loss and accuracy curve plots. Final value 0.41.

A screen of the grid search with best params



Number of intermediate layers, layer neuron counts and dropouts in each layer were decided based on fit from accuracy and loss curves during training. Also each dropout layer has a different dropout ratio. 

Avoiding over fitting, under fitting and unknown fitting. Unnecessarily higher epoch count resulted in over fitting, lower epoch gave under fitting. The epoch count was increased after the intermediate layers with more neurons were added. This resulted in an acceptable fit in loss function behaviour. An example of over fitting is shown below. Notice that validation error is going up as training error is staying low towards right.


.

Sunday, 6 October 2019

Streaming sensor data to dashboard | 100 ms sampling

In this post we look at adding streaming sensor data visualisation to the dashboard. The data we look at is the temperature sensor measurements from the cpu, for each core. Data is sampled at 100 ms i.e 10 samples per second. Each data point has a time stamp. 



Tools used:

- Django 2.1.13 and Python 3.7.4 on docker images for back end.
- Front end is built using java script and libraries like dc.js and d3.js

When the cpu load goes up we can see the data updated on the dashboard. Some points for building a visualisation that updates very frequently like this includes

1) Data size and network distance: It is a good idea to check the impact on the network or if the network is capable of it. Data is streamed in small units but, is 4TB per 24 hours translating to 370 Mbps or 280 Mbps. Would the network be able to handle this if there are multiple sources for data. As the distance to stream to gets longer latency will start to show.

2) Continuous connection: Streaming response means a continuous connection is maintained between the client and server.

3) Data storage: Here the data is continuously send to the client as and when it becomes available. This depends of the data type. At times dashboards fetch data from a database and update the visualisation every 5 seconds or 10 seconds. If the database is involved then the polling interval will affect the database with that load. Also, fetching data from the database every time can include disk latency. So although the data can be saved to disk asynchronously, here it is just streamed back to the client.   

Tuesday, 17 September 2019

Upgrading to Python 3.7.4 + Django 2.1.12 and moving all micro services to Docker

Here we look at two major updates to my web application.

1. Upgrading to Python 3.7.4 and Django 2.1.12.
2. Move micro services to docker.

1. Upgrading to Python 3.7.4 and Django 2.1.12

We will look into prerequisites and steps to make the upgrade in a project. Python version was 3.5.3. Release, change logs and new features are linked below. 3.7.4 is not yet available on Ubuntu 18.04 and was installed from source. In addition to all the new major features and extending lifespan of project, Python 3.7.4/3.7.8 is to Python 3 what 2.7 was to Python 2. 

Python 3.7.4 release
Release schedule for Python 3.7.4
Whats new in Python 3.7.4
Changelog

Django 2.1.12 release notes

Prerequisites


1) Tests: You should have reliable tests for project. If there are no tests for a project, then upgrading on any language / software will be difficult.

2) Read change log and release notes: These help to get an idea of effort required and estimation. The objective here is to identify changes that have impacts and backward incompatible changes. Python version chosen also depends on supported libraries used in project. 

Django 2.1.12 has changes that made impact. As examples, the new model view permission created on db migrations,  database router allow_relations and contrib.auth views. These were in the release notes and did show up as expected during the upgrade.

Steps 

1. Build Python 3.7.4 from source. Install to a specific location. For steps see this. Here it was built with --enable-optimizations, --enable-shared and LDFLAGS for the target system.



2. Run the new Python interpreter and check version. Also run ldd on Python executable to make sure it is looking at the correct place for libpython so file.

3. Identify Python version requirements of third party libraries in pip requirements file. Create a Python 3.7.4 virtual environment and install requirements. One of the issues encountered was that numpy 1.14.0 does not have Python 3.7 wheels. Similar version upgrades for celery, kombu, scikit-learn were necessary and also applied. This takes care of changes to requirements for Python virtual environment.

4) Add Django 2.1.12 to pip requirements.

5) Run tests in project.
    5 a) Identify each failure in tests. Some of the breaks encountered were scikit-learn deprecated            methods (like cross_validation, serialization incompatibility in machine learning model store              joblib  module), Django auth app migrations, db routers, djangrestframework upgrade to 3.10.3          and passsword security rules in Django.
    5 b) Change the code to use new features, move out of deprecated methods, method signatures and      fix failures encountered.
    5 c) Do this until all tests pass.

    Coverage can also be used during test.


6) Once all tests pass modify Dockerfile for Django to build Python from source and set Python virtual environment. Check in container that it is using the correct interpreter and environment.

The docker image is now ready with the required Python version and requirements.

2. Micro services on docker swarm

The web application has 10 services. These are as follows

1) web app: This is the Django app running under uWsgi. Within each of these there are 6 Django apps each of which can be run alone or in groups within a web app service.

2) Memcache: This helps to minimise database hits and overall web app performance. Three instance types for three item sizes are used. 

3) Celery workers: These do the long running tasks asynchronously.

4) Celery beat: This is for initiating periodic tasks.

5) Flower: a tool used to monitor celery

6) Rabbitmq: Message broker for celery.

7) Loadbalancer: nginx sitting in front of the web app service

8) Static server: Serves assets like js, css and static content.

9) Media server: Serves media that users have uploaded. Examples are avatar pictures, machine learning models, out of bag data and the like.

10) Database: Two Postgres instances

Of these all except that database have been moved to docker. The reasons for not moving the database services are similar to those listed below.
Also, if a database needs to be scaled, there are options available in the database that can be used. Scaling the services individually using containers and the database using its own options keeps things predictable.

Directory structure looks like


Starting the application involves initiating a docker swarm and then deploying the stack on it. The two commands are

$ docker swarm init --advertise-addr <network>


$ docker stack deploy -c <the-stack.yml file> <a name for your application>


Checking the services that make up the stack with

$ docker stack services <name of your application>


As mentioned in the previous post, scaling any individual service involves increasing replicas count in the stack's yml file and redeploying the stack.

Saturday, 31 August 2019

Asset & media serving with Docker Swarm | Jmeter results before and after scaling

Here we look at web application asset and media serving with a docker swarm. Assets include java script, css, images and files required by the web application. Media means files that are uploaded by users. This media includes trained machine learning model files and out of bag data samples from this post

Previously also assets and media were served by nginx on separate virtual machines. Scaling meant adding a new virtual machine to the group. Now these services have been moved to swarm and orchestrated with docker compose yml file. This way content distribution can be scaled faster. The whole application stack will move one-by-one to this swarm. All required docker images are built and stored locally. For more on using docker swarms see this official docker page

The changes to architecture are highlighted in orange. There is a loadbalancer container, multiple (8) load balanced asset service containers and multiple (8) load balanced media service containers. As more capacity (# of replicas) is added to each of the services, the docker stack just needs to be updated. 

* Jmeter test results before (4 replicas) and after scaling the services (8 replicas) are given at the end of this post. Note: Jmeter is set to follow and access embedded (media/asset) links on each page.


Steps

1. Create docker images for each component and verify that the containers work as expected.

Loadbalancer, Asset and Media services are based on nginx docker image. How to setup, build and test nginx docker images can be found here. For this web application, each service container is configured as needed with separate nginx conf files.

2. Create docker compose yaml file which will orchestrate the docker services. 

Detailed steps on how to do this are here. This is essentially a file that lists out docker services that make up the application stack. Here loadbalancer, asset serving and media serving are services for the web application. Each service is configurable with the number of replicas (load balanced containers) and resource constraints per container. Each load balanced service is exposed via ports on the host machine. These ports as needed should be made available for access from downstream clients. Here only the loadbalancer is exposed over port 5000. The docker folder structure is as follows



3. Deploying the stack:

$ docker stack deploy -c docker-compose.yml hud-cdn



4. Verifying services


5. Verifying the application by accessing via exposed port on docker host.



6. Jmeter load test results before and after scaling

6a. with 4 replicas each for asset service and media service, throughput is 156/sec



6b. with 8 replicas each for asset service and media service, throughput is 428/sec