Thursday 28 June 2018

Load testing and insights

In this post the visualisation/ml web application is load tested with JMeter. More about the application here. The setup is described in the following video



The application architecture is here

The first run performs a load test on a single deployment cell i.e only one application server in the pool. The rest of the architecture remains same. In the second run 2 deployment cells are load balanced and JMeter is pointed at the load balancer. 

In both runs the application is tested with load slightly more than expected load. In addition to monitoring and understanding server resources and the application stack under load, we can also notice expected and unexpected application behaviour. As we will see in this post, monitoring the application logs during such tests can also help with identifying and addressing some blind spots.  

Topology

#Load balancer: 1 Nginx deployment
#Web app servers: 2 vms
#Database hosts: 2 vms
#Media and assets hosts: 1 vm
#Memcached hosts: 2 hosts
#Celery hosts: 3 (2 shared with memcached hosts)

JMeter test details

#Users: 100
#Ramp up: 10
#Loop: 4

Listeners

Graph results: This give an overall throughput number.
Response time Graph: For each sample (app in this setup), a response time is plotted. Helps to compare apps. Say, for example the Equake app and the WordCloud app in project.

Each of the application pages are accessed twice as many times as the number of pages in each loop.



In the Django project, the Equake app is heavier than others. It accesses Leaflet for map tiles, remote external REST APIs and caches results. It is expected that the app will have a few requests (the ones that trigger cache misses) that will be slow. A timeout for accessing external APIs is set to 4 seconds for the Equake application. Load test will reveal the impact Equake app exerts on server resources and other apps like WordCloud, tax statistics and Dota2.

Results

Without load balancing

Test time: 30 mins
Throughput: 922.498 per minute
Response time graph is shown below



As expected the Equake app home page for the live earth quake view (yellow) has spikes. Other Equake app pages like monthly, weekly and daily earth quake views timeout at 4 seconds. The rest of the apps have high response times along with equake but not as much.

With load balancing

Test time: 5 min 22 seconds
Throughput: 1707.32 per minute
Response time graph is shown below



All apps except Equake have gone down to response times less than or between 200-300 milliseconds. The Equake pages that access external apis timeout on 4 seconds. The home page (live earth quake view) for equake app has a maximum response time of ~ 17 seconds compared to 1.5 minutes without load balancing.

Insights

1) Throughput increases close to a factor of 2.  

2) All apps except the Equake app behave consistently as expected under load. They start off with a response time of close to 1 second and quickly drop down to <= 200 milliseconds. Caching also works predictably as seen in the application logs. 

3) So, from the load test what is happening with the Equake app?

The remote external REST API has a high response time. But, this should go down like the other apps as caching is enabled. Application logs reveal that there are cache misses for the fetched external REST API data. Data was being fetched. But monthly and weekly earthquake GeoJson data are too large. Around 5-7 MB when saved in file. The default entry size in Memcached is 1MB. So it was being discarded just like that. This causes the entire data to be fetched again. Memcached entry sizes can be increased with the -I flag. However, this means that ~ 7-8 MB of data will be fetched from the cache each time. For this type of data the size will also differ. 5 MB for this week's Equake data and 8 MB for next week. It would be better to use dedicated cache with increased item size limits. Memcached pools with different item sizes can be setup. Regular apps can use the default pool and apps with greater entry size requirements can utilise the other pool. Another approach would be to host the Equake app separately on dedicated app servers. Yet another approach would be to chunk the data and store it in the regular Memcached pool.