Saturday 30 November 2019

Deep Learning | Tensor flow | GPU (cuda/cuDNN) Vs CPU w and w/o EarlyStopping

In the previous post we built a neural net and tuned its hyper parameters. The hyper parameters were tuned using GridSearchCV. Here we look at modifying training the same network on GPU and compare that with training on CPU. 

Tools used
Tensorflow-gpu: 2.0.0
Keras version 2.3.1
Pandas version 0.25.3
Scikitlearn version 0.21.3

Nvidia GTX 960M 2GB
Intel i7 6700HQ 16 GB

Now, the number of epochs needed was tuned separately last time. Loss/accuracy Vs epochs curves raises the question of whether those many epochs are needed for this network and data. This has an impact on the time needed to train the network. Given the low end GPU the numbers are as expected.

Could we train the network in a lower duration with an acceptable loss of accuracy?

Early stopping is used here to answer this question. Results are shown below. Instead of going through 55 epochs, it decides to stop when the loss cannot be minimised beyond a certain point around 17-27 epochs. Early stopping parameters used is shown below.

Hyper parameter grid is small to begin with

Results (2 parameters and 3 values each)

Wall time Accuracy learning rate momentum
CPU 1 min 28 sec 0.978685 0.03 0.43
CPU Early Stopping 35.8 sec 0.973357 0.024 0.39
GPU 9 min 1 sec 0.978685 0.024 0.41
GPU Early Stopping 2 min 54 sec 0.971580 0.024 0.41

Results (2 parameters 5 values each)

Parameter grid is modified with additional ranges.

Wall time Accuracy learning rate momentum
CPU Early Stopping 1 min 35 sec 0.9822 0.03 0.39
GPU Early Stopping 9 min 37 sec 0.9751 0.033 0.41

Tuesday 19 November 2019

Deep learning | Tensor Flow | Building a neural net *Updated with hyper parameter tuning

Here we look at training a neural network for classifying UCI cancer data as benign or malignant. Project has multiple machine learning models for this purpose and is described here In this post a neural network is built using keras and tensor flow backend. Scikit-learn is used for pre-processing data. The final network with hyper parameter tuning looks like this.

Initial hyper parameters used:

1) Visualise the data: There are 36 dimensions of which 10 are not contributing to the prediction. A visualisation is already in the project here. Those 10 dimensions are removed to decrease noise.

2) Pre-process data: The data distribution is examined. For each dimension check the value distribution. The objective here is to figure out which way to pre-process the data. Neural network layers have activation functions. The common activation functions like relu work as max(0, value). Also in each epoch the weights are tweaked by a small amount. If the input is not normalized it can affect the learning. Here the data is normalized using MinMax scaler so that each dimension is between 0 and 1. The medical data renders itself to this better. Every dimension did not have following type of distribution. Figure is for cell texture.

3) Build the layers (dropouts, layer count and neurons in each layer): The decision of how many layers and how many neurons on each layer is mostly based on accuracy/loss plots for fit (see end of post) and visualisation in step 1. The model looks like this.

4) Parameter TuningUsing Grid Search CV the following parameters and hyper parameters were tuned:

a) Optimizer: SGD was selected after comparison with RMSprop and Adam.

b) Epochs: 50-55 was selected after GridSearchCV and Stratified k-fold cross validation.

c) Batch size: Initially 32 but 5 was chosen after GridSearch CV with 30, 35, 40 among other options.

d) Learning rate: initially kept at 0.027 during cross validation. This was changed after looking at loss and accuracy curves from initial training.  Finally using GridSearchCV 0.24.

e) Momentum: 0.44 initially tweaked on loss and accuracy curve plots. Final value 0.41.

A screen of the grid search with best params

Number of intermediate layers, layer neuron counts and dropouts in each layer were decided based on fit from accuracy and loss curves during training. Also each dropout layer has a different dropout ratio. 

Avoiding over fitting, under fitting and unknown fitting. Unnecessarily higher epoch count resulted in over fitting, lower epoch gave under fitting. The epoch count was increased after the intermediate layers with more neurons were added. This resulted in an acceptable fit in loss function behaviour. An example of over fitting is shown below. Notice that validation error is going up as training error is staying low towards right.


Sunday 6 October 2019

Streaming sensor data to dashboard | 100 ms sampling

In this post we look at adding streaming sensor data visualisation to the dashboard. The data we look at is the temperature sensor measurements from the cpu, for each core. Data is sampled at 100 ms i.e 10 samples per second. Each data point has a time stamp. 

Tools used:

- Django 2.1.13 and Python 3.7.4 on docker images for back end.
- Front end is built using java script and libraries like dc.js and d3.js

When the cpu load goes up we can see the data updated on the dashboard. Some points for building a visualisation that updates very frequently like this includes

1) Data size and network distance: It is a good idea to check the impact on the network or if the network is capable of it. Data is streamed in small units but, is 4TB per 24 hours translating to 370 Mbps or 280 Mbps. Would the network be able to handle this if there are multiple sources for data. As the distance to stream to gets longer latency will start to show.

2) Continuous connection: Streaming response means a continuous connection is maintained between the client and server.

3) Data storage: Here the data is continuously send to the client as and when it becomes available. This depends of the data type. At times dashboards fetch data from a database and update the visualisation every 5 seconds or 10 seconds. If the database is involved then the polling interval will affect the database with that load. Also, fetching data from the database every time can include disk latency. So although the data can be saved to disk asynchronously, here it is just streamed back to the client.   

Tuesday 17 September 2019

Upgrading to Python 3.7.4 + Django 2.1.12 and moving all micro services to Docker

Here we look at two major updates to my web application.

1. Upgrading to Python 3.7.4 and Django 2.1.12.
2. Move micro services to docker.

1. Upgrading to Python 3.7.4 and Django 2.1.12

We will look into prerequisites and steps to make the upgrade in a project. Python version was 3.5.3. Release, change logs and new features are linked below. 3.7.4 is not yet available on Ubuntu 18.04 and was installed from source. In addition to all the new major features and extending lifespan of project, Python 3.7.4/3.7.8 is to Python 3 what 2.7 was to Python 2. 

Python 3.7.4 release
Release schedule for Python 3.7.4
Whats new in Python 3.7.4

Django 2.1.12 release notes


1) Tests: You should have reliable tests for project. If there are no tests for a project, then upgrading on any language / software will be difficult.

2) Read change log and release notes: These help to get an idea of effort required and estimation. The objective here is to identify changes that have impacts and backward incompatible changes. Python version chosen also depends on supported libraries used in project. 

Django 2.1.12 has changes that made impact. As examples, the new model view permission created on db migrations,  database router allow_relations and contrib.auth views. These were in the release notes and did show up as expected during the upgrade.


1. Build Python 3.7.4 from source. Install to a specific location. For steps see this. Here it was built with --enable-optimizations, --enable-shared and LDFLAGS for the target system.

2. Run the new Python interpreter and check version. Also run ldd on Python executable to make sure it is looking at the correct place for libpython so file.

3. Identify Python version requirements of third party libraries in pip requirements file. Create a Python 3.7.4 virtual environment and install requirements. One of the issues encountered was that numpy 1.14.0 does not have Python 3.7 wheels. Similar version upgrades for celery, kombu, scikit-learn were necessary and also applied. This takes care of changes to requirements for Python virtual environment.

4) Add Django 2.1.12 to pip requirements.

5) Run tests in project.
    5 a) Identify each failure in tests. Some of the breaks encountered were scikit-learn deprecated            methods (like cross_validation, serialization incompatibility in machine learning model store              joblib  module), Django auth app migrations, db routers, djangrestframework upgrade to 3.10.3          and passsword security rules in Django.
    5 b) Change the code to use new features, move out of deprecated methods, method signatures and      fix failures encountered.
    5 c) Do this until all tests pass.

    Coverage can also be used during test.

6) Once all tests pass modify Dockerfile for Django to build Python from source and set Python virtual environment. Check in container that it is using the correct interpreter and environment.

The docker image is now ready with the required Python version and requirements.

2. Micro services on docker swarm

The web application has 10 services. These are as follows

1) web app: This is the Django app running under uWsgi. Within each of these there are 6 Django apps each of which can be run alone or in groups within a web app service.

2) Memcache: This helps to minimise database hits and overall web app performance. Three instance types for three item sizes are used. 

3) Celery workers: These do the long running tasks asynchronously.

4) Celery beat: This is for initiating periodic tasks.

5) Flower: a tool used to monitor celery

6) Rabbitmq: Message broker for celery.

7) Loadbalancer: nginx sitting in front of the web app service

8) Static server: Serves assets like js, css and static content.

9) Media server: Serves media that users have uploaded. Examples are avatar pictures, machine learning models, out of bag data and the like.

10) Database: Two Postgres instances

Of these all except that database have been moved to docker. The reasons for not moving the database services are similar to those listed below.
Also, if a database needs to be scaled, there are options available in the database that can be used. Scaling the services individually using containers and the database using its own options keeps things predictable.

Directory structure looks like

Starting the application involves initiating a docker swarm and then deploying the stack on it. The two commands are

$ docker swarm init --advertise-addr <network>

$ docker stack deploy -c <the-stack.yml file> <a name for your application>

Checking the services that make up the stack with

$ docker stack services <name of your application>

As mentioned in the previous post, scaling any individual service involves increasing replicas count in the stack's yml file and redeploying the stack.

Saturday 31 August 2019

Asset & media serving with Docker Swarm | Jmeter results before and after scaling

Here we look at web application asset and media serving with a docker swarm. Assets include java script, css, images and files required by the web application. Media means files that are uploaded by users. This media includes trained machine learning model files and out of bag data samples from this post

Previously also assets and media were served by nginx on separate virtual machines. Scaling meant adding a new virtual machine to the group. Now these services have been moved to swarm and orchestrated with docker compose yml file. This way content distribution can be scaled faster. The whole application stack will move one-by-one to this swarm. All required docker images are built and stored locally. For more on using docker swarms see this official docker page

The changes to architecture are highlighted in orange. There is a loadbalancer container, multiple (8) load balanced asset service containers and multiple (8) load balanced media service containers. As more capacity (# of replicas) is added to each of the services, the docker stack just needs to be updated. 

* Jmeter test results before (4 replicas) and after scaling the services (8 replicas) are given at the end of this post. Note: Jmeter is set to follow and access embedded (media/asset) links on each page.


1. Create docker images for each component and verify that the containers work as expected.

Loadbalancer, Asset and Media services are based on nginx docker image. How to setup, build and test nginx docker images can be found here. For this web application, each service container is configured as needed with separate nginx conf files.

2. Create docker compose yaml file which will orchestrate the docker services. 

Detailed steps on how to do this are here. This is essentially a file that lists out docker services that make up the application stack. Here loadbalancer, asset serving and media serving are services for the web application. Each service is configurable with the number of replicas (load balanced containers) and resource constraints per container. Each load balanced service is exposed via ports on the host machine. These ports as needed should be made available for access from downstream clients. Here only the loadbalancer is exposed over port 5000. The docker folder structure is as follows

3. Deploying the stack:

$ docker stack deploy -c docker-compose.yml hud-cdn

4. Verifying services

5. Verifying the application by accessing via exposed port on docker host.

6. Jmeter load test results before and after scaling

6a. with 4 replicas each for asset service and media service, throughput is 156/sec

6b. with 8 replicas each for asset service and media service, throughput is 428/sec

Tuesday 19 March 2019

RESTful APIs | Provisioning data

Topical content analysis using word counts / tag clouds is a feature in my Head-Up Data project. The tag clouds are presented in html pages using visualisation tools. The same data may need to be presented on a mobile device or that data may be requested by other web services. In such cases only the data is requested. How the data is presented by a third party is up to them. The web services must provision its data in formats that clients can understand and use.

Here we look at a set of REST apis that provision page word-count data. Normally a user would navigate to the app url in a browser, look at the list of web pages, select a web page, examine the list of word counts/clouds and finally select one tag cloud. The REST Apis are also designed likewise using Django rest framework.

First we take a quick look at the user interface for the steps mentioned. Then we look at the REST APIs to achieve the same.

Web UI presenting a list of pages.

Web interface presenting page details and selecting a cloud from its list

The REST APIs and their browse-able screens are given below. 

Having a browse-able API helps people to learn the APIs faster. Developers can copy the url endpoints to a browser and see the details for themselves.

When a mobile app or another service accesses these APIs, only the json responses are sent back.

1) The data format used is Json.

2) APIs include

a) An API index to start with.
b) API to get the list of available Web pages.
c) API to get details of a particular web page.
d) API to get a list of all Word counts.
e) API to get a specific word count data.

3) Pagination: Each api that presents a list of data points will show only a fixed number of data points per page. The responses will have next and previous links for navigation. This is a better option that sending all the data points with each response.

4) Linked relations: A related object is always presented as a link. For example, the 'GET web page details' API has links to its word clouds. Not the actual word count/cloud data. If the requesting service wants to get that word count data, it can simply go to the link. On top of saving response data length, this approach also allows the data to be cached on the client side, say on a mobile device. The url can be used as a key.

API index

API for Web Pages list. The json response also has next and previous links to navigate the list.

API for a single web page. This has links to the page's word clouds.

API for a specific word count.

Wednesday 13 February 2019

Connection pooling | Database performance | pgBouncer | DB Server side Vs AppServer Side

In this iteration a database connection pooling solution is implemented. This enables the application to reuse database connections thus increasing performance at the database end. The pooling software used is pgBouncer.

> If the reader wants to compare the load test reports on pgBouncer on DB hosts Vs pgBouncer on Web app server scroll down to the end of this post.

The content of this post is discussed in the following video for those who prefer that to reading.

The changes at the db side is shown in application architecture below. Instead of connecting directly to the database applications connect to pgBouncer.

1) Setting up a connection consumes server resources and time. It requires a round trip to the database, authenticating a user and checking user privileges among others. Then the query is executed. Some client side frameworks hold on to a connection for a period of time and then release it. i.e The connection is closed after use. This means that a new request with the same credentials even from the same client has to go through the entire process again. This is can be avoided by returning the used connection to a pool and requesting new connections from a pool of active connections.

2) Each database has a configurable maximum concurrent connections limit. For Postgresql this is specified using max_connections in settings /var/lib/pgsql/data/postgresql.conf and has a default value of 100. Beyond this number connections are refused.

3) The load on the database can increase when a) when the user base increases resulting in increased activity b) new application features c) the application scales using load balanced web application servers to meet user load. 

Whatever the reason, an application that used to see a maximum of  60 concurrent connections to the database ends up getting 120. Some of those extra connections end up refused at the database end. Pooling solutions also have a maximum client connection limit. The objective to configure and use a set number of database connections efficiently. Say we have configured the database to allow 150 concurrent connections. The server may have one or more databases for different users. For example, database A and B for userA and userB respectively. Also assume Database A sees more hits/usage than database B. Of total connection allowed by the database say 10 connections may be needed for backend database admininstration and rest are free for application servers. On pgBouncer we can set the maximum allowed connections to say 280 (greater than the database maximum) and configure pgBouncer pools for each user + database pair. In this case we can do userA + database A pool with pool size of 90 and userB + database B with pool size 50. Thus regulate connections.

Adding a pooling solution is an overhead. An immediate increase in throughput is not evident until the application goes through a situation when there is contention for database connections. In that situation instead of getting rejected by the database and failing there, the request is fulfilled by the next available connection in pool. The application is configured to minimise database accesses using caching. So throughput may not be a direct indicator of performance gains. A contention can be simulated only by reducing the cache timeouts which is not ideal. Finally, if the maximum number of connections at pgBouncer is exceeded they are still refused.

The question of putting pgBouncer on the application server hosts or on the database

This depends on what works out in the end and is needed. For a small application with multiple databases we can host pgBouncer on the application server and tangibly save the round trip time to the db hosts. If there are many application servers we may be configuring large number of pooled connections at the source end and still end up with a deluge at the database end. This is like having the club bouncer stopping you at your door step because a headcount is reached at the club Vs you going to the club and getting stopped there.

Aggregate reports on load test with two runs are shown below. One with pgBouncer on DB hosts and second with pgBouncer on web server hosts. Both tests take 3-5 minutes to finish. 

Notice that with pgBouncer at web server hosts throughput increased by ~ 21.4% and errors dropped to near 0%. This is due to time saved with not having to setup connections to a remote host each time it is needed and having only a localhost connection. However the webserver hosts pgBouncer pools and has 100s of pooled connections ready to be used. As already mentioned multiple web servers with pools like this can cause deluge the database host.

Load test results with pgBouncer on DB hosts
Load test results with pgBouncer on web server hosts