tag:blogger.com,1999:blog-35848898671425190412024-03-04T23:14:47.046-08:00Programming CommunicationsHarisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comBlogger124125tag:blogger.com,1999:blog-3584889867142519041.post-23027312350348471872023-12-16T20:26:00.000-08:002023-12-16T20:26:15.580-08:00Django HMAC Authentication 3.0.0 - Camellia Cipher support<p> django-hmac-authentication package now supports Camellia cipher along with existing AES. User hmac secrets are secured with AES or Camellia 256 and chosen at random. To use feature just update package and run migrations.</p><p>pip install -U django-hmac-authentication</p><p>python manage.py migrate</p><p>version: 3.0.0</p><p>PyPi: https://pypi.org/project/django-hmac-authentication/</p><p>GitHub: https://github.com/harisankar-krishna-swamy/django_hmac_authentication</p><p><br /></p><p> </p>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-1210475106670723342023-08-25T19:40:00.002-07:002023-08-30T02:36:56.770-07:00Django HMAC Authentication 2.0.0<p> Checkout a new version of Django HMAC Authentication on Pypi and code on GitHub. Update includes namespaced settings from previous versions.</p><p>Built on Debian, KDE and CI/CD on GitLab</p><p>GitHub: https://github.com/harisankar-krishna-swamy/django_hmac_authentication</p><p>PyPi: https://pypi.org/project/django-hmac-authentication/</p>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-63231622436638111922023-05-21T03:05:00.005-07:002023-05-23T04:33:32.722-07:00Django HMAC Authentication<p>Checkout a Python package for Django HMAC authentication. Features include</p><p><br /></p><p>1. Django model with HMAC shared encrypted secret</p><p><br /></p><p>2. Authentication class HMACAuthentication</p><p><br /></p><p>3. Requests timeout</p><p><br /></p><p>4. Management command or a configured url to get key</p><p><br /></p><p>5. Javascript and Python client examples</p><p><br /></p><p>Check it out on</p><p><br /></p><p>PyPI: https://pypi.org/project/django-hmac-authentication/</p><p><br /></p><p>GitHub: https://github.com/harisankar-krishna-swamy</p>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-39334487877875725192023-04-11T21:42:00.000-07:002023-04-11T21:42:03.280-07:00Django salted api tokens<p>Django token authentication with hashed, salted tokens</p><p>Django model with token id and token protected with hash and salt</p><p>Authentication class using protected api tokens.</p><p><br /></p><p>Code @ Github: https://github.com/harisankar-krishna-swamy/django_salted_api_tokens</p><p>Pypi: https://pypi.org/project/django-salted-api-tokens/</p><p> </p>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-83107088563317313232023-03-19T17:15:00.005-07:002023-03-19T17:16:48.681-07:00A Django Cacheable Model<p> A cacheable model for django.</p><p>Features:</p><p>A generic way of creating cache keys from Django model fields</p><p>Retrieve django models from cache with field values (cache on the way if cache missed)</p><p>Retrieve all the model instances (suitable for small set of models)</p><p><br /></p><p><b>GitHub</b>: https://github.com/harisankar-krishna-swamy/django_cacheable_model</p><p><b>Pypi</b>: https://pypi.org/project/django-cacheable-model/</p>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-54568354643228462712020-11-12T16:52:00.002-08:002020-11-12T16:52:27.520-08:00Data structures using Python: Coils 3.0.1<div>1) Follow Hari's python datastructure library on GitHub</div><div><br /></div><div><a href="https://github.com/harisankar-krishna-swamy/coils">https://github.com/harisankar-krishna-swamy/coils</a></div><div><br /></div><div><b>2) Install</b></div><div><br /></div><div>pip install pycoils</div><div><br /></div><div><b>3) License</b></div><div><br /></div><div>Apache2 License</div><div><br /></div><div><b>4) New Features</b></div><div><br /></div><div>11 November 2020:</div><div><br /></div><div>Bit vector data structure</div><div><br /></div><div><b>5) List of data structures</b></div><div><br /></div><div>Stack using python list</div><div>Queue using python list</div><div>Heap (Min & Max) using python list.</div><div>Binary Search Tree with link inversion traversal</div><div>SplayTree -do-</div><div>LinkedList</div><div>DoublyLinkedList</div><div>SeperateChainHashTable (3 types of chaining using LinkedList, SplayTree, BinarySearchTree)</div><div>DisjointSetWithUnion (uses uptree nodes and path compression)</div><div>PriorityQueues</div><div>InternStore</div><div>Bit Vector</div><div><br /></div><div><b>6) Examples and usage</b></div><div><br /></div><div>Refer: pycoils/examples package</div>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-5380995796381858332020-04-30T09:09:00.000-07:002020-04-30T09:09:43.978-07:00Model Store: Adding Convolutional Neural Nets<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRGPpCsa8dCUfDrioLZIHqwez6a7-Fl329KsRIx85JT6W0VwvmIM-CTld_Y9bAkIOiH32SriZEBUsz7/embed?start=false&loop=false&delayms=3000" frameborder="0" width="480" height="299" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-91513319598652811792020-04-21T05:19:00.001-07:002020-04-21T05:19:19.585-07:00Automated model training | Luigi | Python<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vQQDeEHUSvKURQKrkBObWRqXnpWHKpX3Lcjcgc5wRd0x7Xx03Ps60eo7mjsilzi89lDo4D9MJR5GN6_/embed?start=false&loop=false&delayms=3000" frameborder="0" width="480" height="299" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-9879121468152303712020-03-28T17:39:00.001-07:002020-03-28T17:39:25.261-07:00Ore identification from images | Convolutional neural network<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vT-pj7iDt0PEHUiC6jh2t5ijq_7Ig1JJGivw2FVbpJb3yVSY2aGn9PCit7mDn1mPswoqT3McxpxizL-/embed?start=false&loop=false&delayms=3000" frameborder="0" width="480" height="299" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-90935896471080065212020-03-06T22:37:00.000-08:002020-03-06T22:37:41.151-08:00Model store v2 | TensorFlow tango with Django | Machine learning/Deep learning as a Service<br />
<br />
<br />
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vQbzdHnryIJtdXqbqJW5fRQIO0inOe8OkGzgGs-VoWhwGd9nmDXQEz6zg5y27vFgG5iUgyltKxOkDoz/embed?start=false&loop=false&delayms=10000" frameborder="0" width="480" height="299" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-7258959115481604932020-03-01T02:35:00.000-08:002020-03-01T18:02:12.204-08:00Word cloud of privacy statements: Google Vs Microsoft<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPUTeC8YQa46stoMj6TwMcYzh6W-3kpkIYPc9RjBbZxPNEssnlKh5YACkA773KnFhpvt-namFj68igJNIoFQwmvnzx5pKGmvBHWV-qjm4jbcKo6jOTR5OnnUo9VxLhXL-hi9VIbtATesE/s1600/google-privacy.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="706" data-original-width="1600" height="282" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPUTeC8YQa46stoMj6TwMcYzh6W-3kpkIYPc9RjBbZxPNEssnlKh5YACkA773KnFhpvt-namFj68igJNIoFQwmvnzx5pKGmvBHWV-qjm4jbcKo6jOTR5OnnUo9VxLhXL-hi9VIbtATesE/s640/google-privacy.png" width="640" /></a></div>
<span id="goog_74741751"></span><span id="goog_74741752"></span><br />
<br />
<br />
<br />
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZRcyAgGJd7CFknxPVapSGqquSWTHbH3WcUOVOxPMAzhRHBaRW7bHB2uYZAlf0gLu7LPv9PBuevegzFgfLS5bbJMVdbaCreNasSJflEAJsEBYhgsp_XMhIh-aaTYFll-bJU6NreEmM9CY/s1600/microsoft-privacy.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="739" data-original-width="1600" height="295" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZRcyAgGJd7CFknxPVapSGqquSWTHbH3WcUOVOxPMAzhRHBaRW7bHB2uYZAlf0gLu7LPv9PBuevegzFgfLS5bbJMVdbaCreNasSJflEAJsEBYhgsp_XMhIh-aaTYFll-bJU6NreEmM9CY/s640/microsoft-privacy.png" width="640" /></a></div>
<br />Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-54943421663012049392020-03-01T02:29:00.000-08:002020-03-01T18:02:33.765-08:00HUD: Model store | Machine learning & deep learning model store, rest api for interaction and machine learning prediction, token authentication<iframe allowfullscreen="true" frameborder="0" height="569" mozallowfullscreen="true" src="https://docs.google.com/presentation/d/e/2PACX-1vRpDqcoQCLZjLYrxdc8bEzb1lRjGiErF-NQhyYbV0FmNPxQSHo-mrCX9itSg3iKn4keyvjaNtkKOrbX/embed?start=false&loop=false&delayms=3000" webkitallowfullscreen="true" width="960"></iframe>Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-24026649321280639462019-11-30T20:41:00.001-08:002019-11-30T22:08:51.540-08:00Deep Learning | Tensor flow | GPU (cuda/cuDNN) Vs CPU w and w/o EarlyStopping<a href="http://harisankar-krishnaswamy.blogspot.com/2019/11/deep-learning-tensor-flow-building.html" style="text-align: justify;" target="_blank">In the previous post</a><span style="text-align: justify;"> we built a neural net and tuned its hyper parameters. The hyper parameters were tuned using GridSearchCV. Here we look at modifying training the same network on GPU and compare that with training on CPU. </span><br />
<span style="text-align: justify;"><br /></span>
<span style="text-align: justify;"><b>Tools used</b></span><br />
<pre>Tensorflow-gpu: 2.0.0</pre>
<pre>Keras version 2.3.1
Pandas version 0.25.3
Scikitlearn version 0.21.3</pre>
<pre></pre>
<span style="text-align: justify;">Nvidia GTX 960M 2GB</span><br />
<span style="text-align: justify;">Intel i7 6700HQ 16 GB</span><br />
<span style="text-align: justify;"><br /></span>
<span style="text-align: justify;">Now, the number of epochs needed was tuned separately last time. Loss/accuracy Vs epochs curves raises the question of whether those many epochs are needed for this network and data. This has an impact on the time needed to train the network. </span><span style="text-align: justify;">Given the low end GPU the numbers are as expected.</span><br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Could we train the network in a lower duration with an acceptable loss of accuracy?<br />
<br />
Early stopping is used here to answer this question. Results are shown below. Instead of going through 55 epochs, it decides to stop when the loss cannot be minimised beyond a certain point around 17-27 epochs. Early stopping parameters used is shown below.</div>
<br />
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHmoP009DrUw1Jrxhsn6IturLkxueMufemIAioVIEfROEeHzqhUmw2de079JxHn5Bb1E7XMyxcgN-y-4mzSiQVytH2-_gtlMGfD5JkHLonmrymTTtbAv79udvSTXXVtZtSAgxM-o2qXL8/s1600/es.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="84" data-original-width="1112" height="48" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHmoP009DrUw1Jrxhsn6IturLkxueMufemIAioVIEfROEeHzqhUmw2de079JxHn5Bb1E7XMyxcgN-y-4mzSiQVytH2-_gtlMGfD5JkHLonmrymTTtbAv79udvSTXXVtZtSAgxM-o2qXL8/s640/es.png" width="640" /></a></div>
<br />
Hyper parameter grid is small to begin with<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvQvXacEVYThtsRgRwVRdptZuYpCFtd5jW9C8ttENMicR0zX8bikyaBEigmas-U9nEEyIfiKwr70EG58zur7QBV_HVokCXEJYeLs6cj4xs0WLyOf5VPnHnO3Hr7sNoHWLAR985X51yoLI/s1600/old-params.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="108" data-original-width="510" height="67" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvQvXacEVYThtsRgRwVRdptZuYpCFtd5jW9C8ttENMicR0zX8bikyaBEigmas-U9nEEyIfiKwr70EG58zur7QBV_HVokCXEJYeLs6cj4xs0WLyOf5VPnHnO3Hr7sNoHWLAR985X51yoLI/s320/old-params.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<br />
<b>Results (2 parameters and 3 values each)</b><br />
<style .nobr="" type="text/css">
.tg {border-collapse:collapse;border-spacing:0;border-color:#aaa;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#aaa;color:#333;background-color:#fff;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#aaa;color:#fff;background-color:#f38630;}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<br />
<div class="nobr">
<table class="tg">
<tbody>
<tr>
<th class="tg-0lax"></th>
<th class="tg-0lax">Wall time</th>
<th class="tg-0lax">Accuracy</th>
<th class="tg-0lax">learning rate</th>
<th class="tg-0lax">momentum</th>
</tr>
<tr>
<td class="tg-0lax">CPU</td>
<td class="tg-0lax">1 min 28 sec</td>
<td class="tg-0lax">0.978685</td>
<td class="tg-0lax">0.03</td>
<td class="tg-0lax">0.43</td>
</tr>
<tr>
<td class="tg-0lax">CPU Early Stopping</td>
<td class="tg-0lax">35.8 sec</td>
<td class="tg-0lax">0.973357</td>
<td class="tg-0lax">0.024</td>
<td class="tg-0lax">0.39</td>
</tr>
<tr>
<td class="tg-0lax">GPU</td>
<td class="tg-0lax">9 min 1 sec</td>
<td class="tg-0lax">0.978685</td>
<td class="tg-0lax">0.024</td>
<td class="tg-0lax">0.41</td>
</tr>
<tr>
<td class="tg-0lax">GPU Early Stopping</td>
<td class="tg-0lax">2 min 54 sec</td>
<td class="tg-0lax">0.971580</td>
<td class="tg-0lax">0.024</td>
<td class="tg-0lax">0.41</td>
</tr>
</tbody></table>
</div>
<b><br /></b>
<b>Results (2 parameters 5 values each)</b><br />
<b><br /></b>
Parameter grid is modified with additional ranges.<br />
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimrKWa1DRCrWT9bTnsdDgUoBz6Ugw3ljGDBRoyzFC747s2L_Hg2Z8wX7uI3cPspOy4nPuJ8JuQUQIsna8awMV6blr5rLEUWbfDHf5DiYp-XneipMezAlExu7KtiaYiZhy3krPxHoClBhI/s1600/large_params.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="112" data-original-width="672" height="66" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimrKWa1DRCrWT9bTnsdDgUoBz6Ugw3ljGDBRoyzFC747s2L_Hg2Z8wX7uI3cPspOy4nPuJ8JuQUQIsna8awMV6blr5rLEUWbfDHf5DiYp-XneipMezAlExu7KtiaYiZhy3krPxHoClBhI/s400/large_params.png" width="400" /></a></div>
<br />
<div class="nobr">
<table class="tg">
<tbody>
<tr>
<th class="tg-0pky"></th>
<th class="tg-0pky">Wall time</th>
<th class="tg-0pky">Accuracy</th>
<th class="tg-0pky">learning rate</th>
<th class="tg-0pky">momentum</th>
</tr>
<tr>
<td class="tg-0pky">CPU Early Stopping</td>
<td class="tg-0pky">1 min 35 sec</td>
<td class="tg-0pky">0.9822</td>
<td class="tg-0pky">0.03</td>
<td class="tg-0pky">0.39</td>
</tr>
<tr>
<td class="tg-0pky">GPU Early Stopping</td>
<td class="tg-0pky">9 min 37 sec</td>
<td class="tg-0pky">0.9751</td>
<td class="tg-0pky">0.033</td>
<td class="tg-0pky">0.41</td>
</tr>
</tbody></table>
</div>
<br />Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-17574856577011481002019-11-19T07:18:00.000-08:002019-11-22T23:30:53.383-08:00Deep learning | Tensor Flow | Building a neural net *Updated with hyper parameter tuning<div style="text-align: justify;">
Here we look at training a neural network for classifying <a href="https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)" target="_blank">UCI cancer data</a> as benign or malignant. Project has multiple machine learning models for this purpose and is described <a href="http://harisankar-krishnaswamy.blogspot.com/2018/08/machine-learning-cancer-prognosis.html" target="_blank">here</a> In this post a neural network is built using keras and tensor flow backend. Scikit-learn is used for pre-processing data. The final network with hyper parameter tuning looks like this.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7i04o9Kb7Mq4QYVXznQouOgDntmWutQevI79vjfEpSRLHFfVL-OMxhDIcLlggFkTBLS0ikEiJ7sX6I4qEMHNCxp4JH5BlQp0V0ofvhcWlxs7Qb1CrdE1rHv9_EHBGvipCbjccuC7tiZg/s1600/index.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="737" data-original-width="403" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7i04o9Kb7Mq4QYVXznQouOgDntmWutQevI79vjfEpSRLHFfVL-OMxhDIcLlggFkTBLS0ikEiJ7sX6I4qEMHNCxp4JH5BlQp0V0ofvhcWlxs7Qb1CrdE1rHv9_EHBGvipCbjccuC7tiZg/s640/index.png" width="347" /></a></div>
<br /></div>
<div style="text-align: justify;">
Initial hyper parameters used:</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrooJ09CqQSdzmCJYRO7hf8rENU_Z3yB8qXqASxCIYpBZ73YB6PYVkBFByBl6eUBdW6UY6YbUPMzUZrO3vipMxt4G_l_ftADQzdARW97nBBh1U_RYfbzEBb7XwaVd5VINjQ3KzbIjzXao/s1600/hyperparameters.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="130" data-original-width="782" height="105" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrooJ09CqQSdzmCJYRO7hf8rENU_Z3yB8qXqASxCIYpBZ73YB6PYVkBFByBl6eUBdW6UY6YbUPMzUZrO3vipMxt4G_l_ftADQzdARW97nBBh1U_RYfbzEBb7XwaVd5VINjQ3KzbIjzXao/s640/hyperparameters.png" width="640" /></a></div>
<div style="text-align: justify;">
<br />
<div class="separator" style="clear: both;">
1) <b>Visualise the data</b>: There are 36 dimensions of which 10 are not contributing to the prediction. A visualisation is already in the project <a href="http://harisankar-krishnaswamy.blogspot.com/2018/02/multidimensional-data-visualisation-for.html" target="_blank">here</a>. Those 10 dimensions are removed to decrease noise.</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
2) <b>Pre-process data</b>: The data distribution is examined. For each dimension check the value distribution. The objective here is to figure out which way to pre-process the data. Neural network layers have activation functions. The common activation functions like relu work as max(0, value). Also in each epoch the weights are tweaked by a small amount. If the input is not normalized it can affect the learning. Here the data is normalized using MinMax scaler so that each dimension is between 0 and 1. The medical data renders itself to this better. Every dimension did not have following type of distribution. Figure is for cell texture.</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg97qBbnhyphenhyphenO56uYyqSHxyDUAI_14H-ZgmG6Dwl5ybOC9yFppPmFPeK5phz1QWQyfuMKrrnR8-D_BBtQZJTKQJbTypR0JPXxLHjdIccUxbrZiNNRO8DlYXKwe5aLcLZZIWClx7yCpf3z3lw/s1600/distribution.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1067" data-original-width="1600" height="213" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg97qBbnhyphenhyphenO56uYyqSHxyDUAI_14H-ZgmG6Dwl5ybOC9yFppPmFPeK5phz1QWQyfuMKrrnR8-D_BBtQZJTKQJbTypR0JPXxLHjdIccUxbrZiNNRO8DlYXKwe5aLcLZZIWClx7yCpf3z3lw/s320/distribution.png" width="320" /></a></div>
<br />
<br />
<div style="text-align: justify;">
3) <b>Build the layers</b> (dropouts, layer count and neurons in each layer): The decision of how many layers and how many neurons on each layer is mostly based on accuracy/loss plots for fit (see end of post) and visualisation in step 1. The model looks like this.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXVx9ZUUvoWhHowva_c700YFZo2_fzDKhYGgPeAvPHhM85ZFaQgZRgZEv8NMiKjyDYgDabcrH_SbwguQWiBWAP2dRULY0hcxooey19Q8GXAVVNNSuVZo82vTsAVxChZdzxiN_asnqmdj4/s1600/model.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="242" data-original-width="922" height="166" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXVx9ZUUvoWhHowva_c700YFZo2_fzDKhYGgPeAvPHhM85ZFaQgZRgZEv8NMiKjyDYgDabcrH_SbwguQWiBWAP2dRULY0hcxooey19Q8GXAVVNNSuVZo82vTsAVxChZdzxiN_asnqmdj4/s640/model.png" width="640" /></a></div>
<br />
<br /></div>
4) <b>Parameter Tuning</b>: <span style="text-align: justify;">Using Grid Search CV the following parameters and hyper parameters were tuned:</span><br />
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
a) <b>Optimizer</b>: SGD was selected after comparison with RMSprop and Adam.</div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggdctW3MR7M1JABPDaraQt2I9G6nNsvJZ4anhcR4drtnNdlNP2VY4zoQylVMTEzg7bA2VKP09NJ2hO1nq8qcFDn3yqtRfJhCDkqhCv84UzrW5JZRhtQma5NG42_Fc0PDpHGdSS0LuY1fg/s1600/optimizer+selection.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="424" data-original-width="1440" height="187" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggdctW3MR7M1JABPDaraQt2I9G6nNsvJZ4anhcR4drtnNdlNP2VY4zoQylVMTEzg7bA2VKP09NJ2hO1nq8qcFDn3yqtRfJhCDkqhCv84UzrW5JZRhtQma5NG42_Fc0PDpHGdSS0LuY1fg/s640/optimizer+selection.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
b) <b>Epochs</b>: 50-55 was selected after GridSearchCV and Stratified k-fold cross validation.</div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
c) <b>Batch size</b>: Initially 32 but 5 was chosen after GridSearch CV with 30, 35, 40 among other options.</div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
d) <b>Learning rate</b>: initially kept at 0.027 during cross validation. This was changed after looking at loss and accuracy curves from initial training. Finally using GridSearchCV 0.24.</div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
e) <b>Momentum</b>: 0.44 initially tweaked on loss and accuracy curve plots. Final value 0.41.</div>
<div>
<br /></div>
<div>
A screen of the grid search with best params</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhChjb0QB8Erhbx8LK0YttX_jJ4aIEFirqoN2yCuoDG8DtWE-cX7O85jXcIXpf4iT405vHRfw4BQ_dIb5fySq19OJ5w27sPIvpYtYv89uU7eIrDdZzXDWhfLLw0L50L9CNFuI8A1GQH3uw/s1600/tuning.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="122" data-original-width="1230" height="60" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhChjb0QB8Erhbx8LK0YttX_jJ4aIEFirqoN2yCuoDG8DtWE-cX7O85jXcIXpf4iT405vHRfw4BQ_dIb5fySq19OJ5w27sPIvpYtYv89uU7eIrDdZzXDWhfLLw0L50L9CNFuI8A1GQH3uw/s640/tuning.png" width="640" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Number of intermediate layers, layer neuron counts and dropouts in each layer were decided based on fit from accuracy and loss curves during training. Also each dropout layer has a different dropout ratio. </div>
<div>
<br /></div>
<b>Avoiding</b> over fitting, under fitting and unknown fitting. Unnecessarily higher epoch count resulted in over fitting, lower epoch gave under fitting. The epoch count was increased after the intermediate layers with more neurons were added. This resulted in an acceptable fit in loss function behaviour. An example of over fitting is shown below. Notice that validation error is going up as training error is staying low towards right.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzSWUuVauyReOP7SBcpAjJobY05Qrimf7lLE-dJ-o0QDVBhYrPOAVJNIF7ahZp6rcK7pCvwaFUFnAyMIqg6JIDhOlYEvJxvkqoZ2O-lkXuwMQtWipcVsRoZfp0OWXpaF3MubXNYbGHbX0/s1600/overfitting-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="872" data-original-width="1322" height="263" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzSWUuVauyReOP7SBcpAjJobY05Qrimf7lLE-dJ-o0QDVBhYrPOAVJNIF7ahZp6rcK7pCvwaFUFnAyMIqg6JIDhOlYEvJxvkqoZ2O-lkXuwMQtWipcVsRoZfp0OWXpaF3MubXNYbGHbX0/s400/overfitting-1.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIOIEMZ5OiTWSE9xqzOp3S3dZKkyIfTZN3Kl84l8z36NwNEAnX7i-61BO0390IUMxvC7AopfvEPzHQasqEYgYJ3nroMrJ-oW5dYwic-Z0sJGu5pVkeDSudbQOXSXFEXzvs3_F14Rdf5DI/s1600/loss2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="882" data-original-width="1342" height="262" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIOIEMZ5OiTWSE9xqzOp3S3dZKkyIfTZN3Kl84l8z36NwNEAnX7i-61BO0390IUMxvC7AopfvEPzHQasqEYgYJ3nroMrJ-oW5dYwic-Z0sJGu5pVkeDSudbQOXSXFEXzvs3_F14Rdf5DI/s400/loss2.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiAce33bFYfEtsD0kB0sBSmvbuujXnVgVawDUextvZMAsTo1ie06BzwNzVoiuGNEX1GhRUR8ASk81s8n3vjD_z4ww2kKUrs8MSnMIrBGDKPBlSIy0yFq3bdQUR3ApzCKXTg_MYo6f8FFzI/s1600/accuracy2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="888" data-original-width="1332" height="266" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiAce33bFYfEtsD0kB0sBSmvbuujXnVgVawDUextvZMAsTo1ie06BzwNzVoiuGNEX1GhRUR8ASk81s8n3vjD_z4ww2kKUrs8MSnMIrBGDKPBlSIy0yFq3bdQUR3ApzCKXTg_MYo6f8FFzI/s400/accuracy2.png" width="400" /></a><span style="text-align: justify;">.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.com0tag:blogger.com,1999:blog-3584889867142519041.post-48815579601431916492019-10-06T18:33:00.001-07:002019-10-06T18:33:10.283-07:00Streaming sensor data to dashboard | 100 ms sampling<div style="text-align: justify;">
In this post we look at adding streaming sensor data visualisation to the dashboard. The data we look at is the temperature sensor measurements from the cpu, for each core. Data is sampled at 100 ms i.e 10 samples per second. Each data point has a time stamp. </div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/32JbOcTqIR8/0.jpg" src="https://www.youtube.com/embed/32JbOcTqIR8?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Tools used:</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
- Django 2.1.13 and Python 3.7.4 on docker images for back end.</div>
<div style="text-align: justify;">
- Front end is built using java script and libraries like dc.js and d3.js</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
When the cpu load goes up we can see the data updated on the dashboard. Some points for building a visualisation that updates very frequently like this includes</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
1) Data size and network distance: It is a good idea to check the impact on the network or if the network is capable of it. Data is streamed in small units but, is 4TB per 24 hours translating to 370 Mbps or 280 Mbps. Would the network be able to handle this if there are multiple sources for data. As the distance to stream to gets longer latency will start to show.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
2) Continuous connection: Streaming response means a continuous connection is maintained between the client and server.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
3) Data storage: Here the data is continuously send to the client as and when it becomes available. This depends of the data type. At times dashboards fetch data from a database and update the visualisation every 5 seconds or 10 seconds. If the database is involved then the polling interval will affect the database with that load. Also, fetching data from the database every time can include disk latency. So although the data can be saved to disk asynchronously, here it is just streamed back to the client. </div>
Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.com0tag:blogger.com,1999:blog-3584889867142519041.post-16570828469200875192019-09-17T23:59:00.003-07:002019-09-18T00:05:41.266-07:00Upgrading to Python 3.7.4 + Django 2.1.12 and moving all micro services to DockerHere we look at two major updates to my web application.<br />
<br />
1. Upgrading to Python 3.7.4 and Django 2.1.12.<br />
2. Move micro services to docker.<br />
<h3>
1. Upgrading to Python 3.7.4 and Django 2.1.12</h3>
<div style="text-align: justify;">
We will look into prerequisites and steps to make the upgrade in a project. Python version was 3.5.3. Release, change logs and new features are linked below. 3.7.4 is not yet available on Ubuntu 18.04 and was installed from source. In addition to all the new major features and extending lifespan of project, Python 3.7.4/3.7.8 is to Python 3 what 2.7 was to Python 2. </div>
<div style="text-align: justify;">
<br /></div>
<a href="https://www.python.org/downloads/release/python-374/" target="_blank">Python 3.7.4 release</a><br />
<a href="https://www.python.org/dev/peps/pep-0537/" target="_blank">Release schedule for Python 3.7.4</a><br />
<a href="https://docs.python.org/3.7/whatsnew/3.7.html" target="_blank">Whats new in Python 3.7.4</a><br />
<a href="https://docs.python.org/3.7/whatsnew/changelog.html#python-3-7-4-final" target="_blank">Changelog</a><br />
<br />
<a href="https://docs.djangoproject.com/en/2.2/releases/2.1.12/" target="_blank">Django 2.1.12 release notes</a><br />
<h4>
</h4>
<h4>
Prerequisites</h4>
<div style="text-align: justify;">
<br />
1) Tests: You should have reliable tests for project. If there are no tests for a project, then upgrading on any language / software will be difficult.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
2) Read change log and release notes: These help to get an idea of effort required and estimation. The objective here is to identify changes that have impacts and backward incompatible changes. Python version chosen also depends on supported libraries used in project. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Django 2.1.12 has changes that made impact. As examples, the new model view permission created on db migrations, database router allow_relations and contrib.auth views. These were in the release notes and did show up as expected during the upgrade.</div>
<br />
<h4 style="text-align: justify;">
Steps </h4>
<div style="text-align: justify;">
1. Build Python 3.7.4 from source. Install to a specific location. For steps see <a href="https://docs.python.org/3/using/unix.html#building-python" target="_blank">this</a>. Here it was built with --enable-optimizations, --enable-shared and LDFLAGS for the target system.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggLHNM6WEUi2EcuH953MM43j9fzgRcN1nBlic_q-LxMXZoDwm76rTO09sJChz089a6qkQpTMQEF40ehnENWvnE2YCU1P8YV47H9kD-L_dbAzDbm7K3kBILhngp0m9alB4NHZIicb0KjcI/s1600/pycompile.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="118" data-original-width="1600" height="47" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggLHNM6WEUi2EcuH953MM43j9fzgRcN1nBlic_q-LxMXZoDwm76rTO09sJChz089a6qkQpTMQEF40ehnENWvnE2YCU1P8YV47H9kD-L_dbAzDbm7K3kBILhngp0m9alB4NHZIicb0KjcI/s640/pycompile.png" width="640" /></a></div>
<div style="text-align: justify;">
<br /></div>
<br />
<div style="text-align: justify;">
2. Run the new Python interpreter and check version. Also run ldd on Python executable to make sure it is looking at the correct place for libpython so file.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwrvp9LfqHRJ_DB6-1clUSHgjGZklhmmcn2IcrkjUH6lpqwRrrdlh171IvFW7PVuducIzy3ZtjL-demARgsAz8zMIO9ZmKnFFGLynVqZ3XkJJDVZW2Ruhymh-AN5Rwzy5NbVEiv8erz-0/s1600/ldd.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="286" data-original-width="1600" height="113" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwrvp9LfqHRJ_DB6-1clUSHgjGZklhmmcn2IcrkjUH6lpqwRrrdlh171IvFW7PVuducIzy3ZtjL-demARgsAz8zMIO9ZmKnFFGLynVqZ3XkJJDVZW2Ruhymh-AN5Rwzy5NbVEiv8erz-0/s640/ldd.png" width="640" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
3. Identify Python version requirements of third party libraries in pip requirements file. Create a Python 3.7.4 virtual environment and install requirements. One of the issues encountered was that <a href="https://github.com/numpy/numpy/issues/12026" target="_blank">numpy 1.14.0 does not have Python 3.7 wheels</a>. Similar version upgrades for <a href="https://github.com/celery/celery/pull/4902" target="_blank">celery</a>, kombu, scikit-learn were necessary and also applied. This takes care of changes to requirements for Python virtual environment.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
4) Add Django 2.1.12 to pip requirements.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
5) Run tests in project.</div>
<div style="text-align: justify;">
5 a) Identify each failure in tests. Some of the breaks encountered were scikit-learn deprecated methods (like cross_validation, serialization incompatibility in machine learning model store joblib module), Django auth app migrations, db routers, djangrestframework upgrade to 3.10.3 and passsword security rules in Django.</div>
<div style="text-align: justify;">
5 b) Change the code to use new features, move out of deprecated methods, method signatures and fix failures encountered.</div>
<div style="text-align: justify;">
5 c) Do this until all tests pass.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Coverage can also be used during test.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYzvUsQgUa_QjQHAE4ldM0ZKobRG7vhwyJDFwrF41DPhu3XzmbFGIvvGzImOoofz9M015PDuwcZC2gplfGgl0jfjq_NfhyphenhyphenCARGf1wi_A61it8Q72n6s-9jt6tPZ1Vs34Bc-vNrqBz2uiw/s1600/coverage.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="555" data-original-width="1600" height="219" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYzvUsQgUa_QjQHAE4ldM0ZKobRG7vhwyJDFwrF41DPhu3XzmbFGIvvGzImOoofz9M015PDuwcZC2gplfGgl0jfjq_NfhyphenhyphenCARGf1wi_A61it8Q72n6s-9jt6tPZ1Vs34Bc-vNrqBz2uiw/s640/coverage.png" width="640" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
6) Once all tests pass modify Dockerfile for Django to build Python from source and set Python virtual environment. Check in container that it is using the correct interpreter and environment.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj0aUU73_9CYLT2jnU2Qc27cSOzMk4v3pzJ_sxk6wTCk73tW6Z18jhlVg9ysXz6tWZxmzBEjtId4qUTJ8GHfIWYDXdMCkewEsdGbAlDvAfKA7rQlKUfHrE_zXCeUx0rbUwaubKQFhXBXM/s1600/container.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="182" data-original-width="1118" height="104" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj0aUU73_9CYLT2jnU2Qc27cSOzMk4v3pzJ_sxk6wTCk73tW6Z18jhlVg9ysXz6tWZxmzBEjtId4qUTJ8GHfIWYDXdMCkewEsdGbAlDvAfKA7rQlKUfHrE_zXCeUx0rbUwaubKQFhXBXM/s640/container.png" width="640" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The docker image is now ready with the required Python version and requirements.</div>
<h3>
</h3>
<h3>
2. Micro services on docker swarm</h3>
<div style="text-align: justify;">
The web application has 10 services. These are as follows</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
1) web app: This is the Django app running under uWsgi. Within each of these there are 6 Django apps each of which can be run alone or in groups within a web app service.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
2) Memcache: This helps to minimise database hits and overall web app performance. Three instance types for three item sizes are used. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
3) Celery workers: These do the long running tasks asynchronously.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
4) Celery beat: This is for initiating periodic tasks.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
5) Flower: a tool used to monitor celery</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
6) Rabbitmq: Message broker for celery.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
7) Loadbalancer: nginx sitting in front of the web app service</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
8) Static server: Serves assets like js, css and static content.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
9) Media server: Serves media that users have uploaded. Examples are avatar pictures, machine learning models, out of bag data and the like.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
10) Database: Two Postgres instances</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Of these all except that database have been moved to docker. The reasons for not moving the database services are similar to those listed below.</div>
<div style="text-align: justify;">
<a href="https://blogs.oracle.com/cloudnative/running-a-mysql-database-in-containers-the-right-way" target="_blank">https://blogs.oracle.com/cloudnative/running-a-mysql-database-in-containers-the-right-way</a></div>
<div style="text-align: justify;">
Also, if a database needs to be scaled, there are options available in the database that can be used. Scaling the services individually using containers and the database using its own options keeps things predictable.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Directory structure looks like</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiB1fKMlXC5fb1dZfknrOB7oNgCTRW94g5gUGjHd1Aq9aZDJBjWQbKwW1mzrusNhFIe1W7mZdBbAnMly6NCy2pwYPu8vQEVqHhz6mNtmzaSXMlz4-FwVGOQrosXcZ1E0p_erO8dTrS47v4/s1600/tree-0.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="433" data-original-width="532" height="259" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiB1fKMlXC5fb1dZfknrOB7oNgCTRW94g5gUGjHd1Aq9aZDJBjWQbKwW1mzrusNhFIe1W7mZdBbAnMly6NCy2pwYPu8vQEVqHhz6mNtmzaSXMlz4-FwVGOQrosXcZ1E0p_erO8dTrS47v4/s320/tree-0.png" width="320" /></a></div>
<br />
Starting the application involves initiating a docker swarm and then deploying the stack on it. The two commands are<br />
<br />
$ docker swarm init --advertise-addr <network><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVBPQ-oh4htfyR6W5pa-Nd_z8QCekgzf89SsSII6UrelJPTiIJzu6iiCdwcqk6hemiB5NewSzwmeRIzevI-FRetgWxW3Upjq2oyV8eEHMR9tquv_QsBx24-DK5V-HXpTd12D0LqdXugeY/s1600/swarm-init.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="173" data-original-width="1600" height="42" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVBPQ-oh4htfyR6W5pa-Nd_z8QCekgzf89SsSII6UrelJPTiIJzu6iiCdwcqk6hemiB5NewSzwmeRIzevI-FRetgWxW3Upjq2oyV8eEHMR9tquv_QsBx24-DK5V-HXpTd12D0LqdXugeY/s400/swarm-init.png" width="400" /></a></div>
<br />
$ docker stack deploy -c <the-stack.yml file> <a name for your application><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzL8i381J4CTjUx2PfQcCRyUDeoGwnAgMJs-GaIiIxNNEmpG-8aMy6p5NYxTMG9O_rODD7EmECVOlkrWPXQW-kEj11WMDi8PJWSgt5L8tOiSA370XFSP6Zed44KlnmrA9hM0AgG4vciCU/s1600/services.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="544" data-original-width="912" height="237" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzL8i381J4CTjUx2PfQcCRyUDeoGwnAgMJs-GaIiIxNNEmpG-8aMy6p5NYxTMG9O_rODD7EmECVOlkrWPXQW-kEj11WMDi8PJWSgt5L8tOiSA370XFSP6Zed44KlnmrA9hM0AgG4vciCU/s400/services.png" width="400" /></a></div>
<br />
Checking the services that make up the stack with<br />
<br />
$ docker stack services <name of your application><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1nC8KGFz01XSpWV19FMZv2yw2w4CVr_z8IbUZbsqfMk3ptkQuffAVUPkmIGxOmFfDddboh5ZvDjjpLDYoDUe3ZaIwzZduN3YghJSiaDTzukorQv4JYzMbhx2UGGZ0mWG1xPVGOB6xH2Q/s1600/ls-services.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="406" data-original-width="1512" height="169" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1nC8KGFz01XSpWV19FMZv2yw2w4CVr_z8IbUZbsqfMk3ptkQuffAVUPkmIGxOmFfDddboh5ZvDjjpLDYoDUe3ZaIwzZduN3YghJSiaDTzukorQv4JYzMbhx2UGGZ0mWG1xPVGOB6xH2Q/s640/ls-services.png" width="640" /></a></div>
<br />
As mentioned in the <a href="http://harisankar-krishnaswamy.blogspot.com/2019/08/asset-media-serving-with-docker-swarm.html" target="_blank">previous post</a>, scaling any individual service involves increasing replicas count in the stack's yml file and redeploying the stack.Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-6321936766143511622019-08-31T02:35:00.005-07:002019-08-31T07:58:10.479-07:00Asset & media serving with Docker Swarm | Jmeter results before and after scaling<div style="text-align: justify;">
Here we look at web application asset and media serving with a docker swarm. Assets include java script, css, images and files required by the web application. Media means files that are uploaded by users. This media includes trained machine learning model files and out of bag data samples from <a href="https://harisankar-krishnaswamy.blogspot.com/2018/08/machine-learning-cancer-prognosis.html" target="_blank">this post</a>. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Previously also assets and media were served by nginx on separate virtual machines. Scaling meant adding a new virtual machine to the group. Now these services have been moved to swarm and orchestrated with docker compose yml file. This way content distribution can be scaled faster. The whole application stack will move one-by-one to this swarm. All required docker images are built and stored locally. For more on using docker swarms see <a href="https://docs.docker.com/engine/swarm/swarm-tutorial/" target="_blank">this official docker page</a>. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The changes to architecture are highlighted in orange. There is a loadbalancer container, multiple (8) load balanced asset service containers and multiple (8) load balanced media service containers. As more capacity (# of replicas) is added to each of the services, the docker stack just needs to be updated. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
* Jmeter test results before (4 replicas) and after scaling the services (8 replicas) are given at the end of this post. Note: Jmeter is set to follow and access embedded (media/asset) links on each page.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvKVvBHfv9ILt3h6TOiCw_gZ8AF5wLDhDPZFHtyvh6NLHHBjbpaF3nngiz3n_nWhRWjDz7ApM2dqtoA4tTMFZWyuq58N9IPalBD1XnDfWOjBfDFuh5qUrYkDtuZvoRMfWm_HF4ndvTWvw/s1600/arch.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1225" data-original-width="1600" height="305" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvKVvBHfv9ILt3h6TOiCw_gZ8AF5wLDhDPZFHtyvh6NLHHBjbpaF3nngiz3n_nWhRWjDz7ApM2dqtoA4tTMFZWyuq58N9IPalBD1XnDfWOjBfDFuh5qUrYkDtuZvoRMfWm_HF4ndvTWvw/s400/arch.png" width="400" /></a></div>
<br />
<b>Steps</b><br />
<br />
<div style="text-align: justify;">
1. Create docker images for each component and verify that the containers work as expected.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Loadbalancer, Asset and Media services are based on nginx docker image. How to setup, build and test nginx docker images can be found <a href="https://hub.docker.com/_/nginx" target="_blank"><b>here</b></a>. For this web application, each service container is configured as needed with separate nginx conf files.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
2. Create docker compose yaml file which will orchestrate the docker services. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Detailed steps on how to do this are <a href="https://docs.docker.com/compose/gettingstarted/" target="_blank"><b>here</b></a>. This is essentially a file that lists out docker services that make up the application stack. Here loadbalancer, asset serving and media serving are services for the web application. Each service is configurable with the number of replicas (load balanced containers) and resource constraints per container. Each load balanced service is exposed via ports on the host machine. These ports as needed should be made available for access from downstream clients. Here only the loadbalancer is exposed over port 5000. The docker folder structure is as follows</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUrhLgeqfGJBr9G0pKT7HdBXOrsqEpm0Q-77Ur9HCoTYEiH5wiI9DGojdCjLlmrfaxqzeqnLWmEDR4qm1ZdCLflfCXYHi4m-Sf2evJf9imfFzBw9eL9OvzA9-c-M4CYOW6eeehKEeu82Q/s1600/tree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="566" data-original-width="540" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUrhLgeqfGJBr9G0pKT7HdBXOrsqEpm0Q-77Ur9HCoTYEiH5wiI9DGojdCjLlmrfaxqzeqnLWmEDR4qm1ZdCLflfCXYHi4m-Sf2evJf9imfFzBw9eL9OvzA9-c-M4CYOW6eeehKEeu82Q/s200/tree.png" width="190" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
3. Deploying the stack:<br />
<br />
$ docker stack deploy -c docker-compose.yml hud-cdn<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEid05r-WjLxKZe_RDv56LUsSUHaW5qjdQtnbO1k2CkU_Gisg392wQT1C4C8WzNUAT8rAPI7k4seShuGA4Hb-4-CPOtsCO35CNsF42ljtJf6RzxIVht7ONQOyjy2ooORuBu5XWLaaa0J4W8/s1600/deploying.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="220" data-original-width="766" height="113" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEid05r-WjLxKZe_RDv56LUsSUHaW5qjdQtnbO1k2CkU_Gisg392wQT1C4C8WzNUAT8rAPI7k4seShuGA4Hb-4-CPOtsCO35CNsF42ljtJf6RzxIVht7ONQOyjy2ooORuBu5XWLaaa0J4W8/s400/deploying.png" width="400" /></a></div>
<br />
<br />
4. Verifying services<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_e5EyVK566VM0hZnzAlHGI4zpdLwQ5hN8dIM1-hONxPMwjNed9q1sVQ2iIpUExGH0hbWhSxiYFacCPR5QtbgIA_TmpK8ASWXmlfY07mn7NJVjMgtKC6MU3n-STiOV5YRdujaKyLMKRcE/s1600/services.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="108" data-original-width="1600" height="42" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_e5EyVK566VM0hZnzAlHGI4zpdLwQ5hN8dIM1-hONxPMwjNed9q1sVQ2iIpUExGH0hbWhSxiYFacCPR5QtbgIA_TmpK8ASWXmlfY07mn7NJVjMgtKC6MU3n-STiOV5YRdujaKyLMKRcE/s640/services.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
5. Verifying the application by accessing via exposed port on docker host.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUWcs6RkMnXUt9cEbbrlv0wvQ16TdWkIM_uqyDAxcU9D4yFFalSVNDmri7OykPS6yjOMZv8UCmejHmEcUTy_sCQ4FJdzXYJk8HmbaIyeqK7JVrGUSoe6I_9Eu3DYG1CXUXdXX1gl0xwoY/s1600/app.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1414" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUWcs6RkMnXUt9cEbbrlv0wvQ16TdWkIM_uqyDAxcU9D4yFFalSVNDmri7OykPS6yjOMZv8UCmejHmEcUTy_sCQ4FJdzXYJk8HmbaIyeqK7JVrGUSoe6I_9Eu3DYG1CXUXdXX1gl0xwoY/s400/app.png" width="352" /></a></div>
<br />
<br />
6. Jmeter load test results before and after scaling<br />
<br />
6a. with 4 replicas each for asset service and media service, throughput is <b>156/sec</b><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMQ-d4NhxxyiOVT_5ikqTADLnh_Ia0jTULfHXeL6RPHQdYQKs0gvdNToGB4_pJEqeI0xLHmf_zFT9xRMa4fxu2PCWG6QaOxxWzcO4SUtawc4ltD2iZvyWr_Lr62urgn4mIIkPbnDneiSs/s1600/unscaled+throughput.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="594" data-original-width="1600" height="147" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMQ-d4NhxxyiOVT_5ikqTADLnh_Ia0jTULfHXeL6RPHQdYQKs0gvdNToGB4_pJEqeI0xLHmf_zFT9xRMa4fxu2PCWG6QaOxxWzcO4SUtawc4ltD2iZvyWr_Lr62urgn4mIIkPbnDneiSs/s400/unscaled+throughput.png" width="400" /></a></div>
<br />
<br />
6b. with 8 replicas each for asset service and media service, throughput is <b>428/sec</b><br />
<b><br /></b>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDghYC5oJeOUtPzMGZ-Q65Xdd6-DHmr7XBFXpeua8LbHzP5Ir97svIT6iXVb_UfdNjTNADooEbHhBR1cBRWkN5o7AaltEnZ5f6cbuaUigGaUSBLSLmQUELIJbylZ4OZfqhfyVjXfv_9GE/s1600/throughput2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="595" data-original-width="1600" height="148" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDghYC5oJeOUtPzMGZ-Q65Xdd6-DHmr7XBFXpeua8LbHzP5Ir97svIT6iXVb_UfdNjTNADooEbHhBR1cBRWkN5o7AaltEnZ5f6cbuaUigGaUSBLSLmQUELIJbylZ4OZfqhfyVjXfv_9GE/s400/throughput2.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comPerth WA, Australia-31.9505269 115.86045719999993-32.8119469 114.56956369999993 -31.0891069 117.15135069999992tag:blogger.com,1999:blog-3584889867142519041.post-86051563793250120672019-03-19T01:54:00.001-07:002019-03-19T01:56:54.185-07:00RESTful APIs | Provisioning dataTopical content analysis using word counts / tag clouds is a feature in my Head-Up Data project. The tag clouds are presented in html pages using visualisation tools. The same data may need to be presented on a mobile device or that data may be requested by other web services. In such cases only the data is requested. How the data is presented by a third party is up to them. The web services must provision its data in formats that clients can understand and use.<br />
<br />
Here we look at a set of REST apis that provision page word-count data. Normally a user would navigate to the app url in a browser, look at the list of web pages, select a web page, examine the list of word counts/clouds and finally select one tag cloud. The REST Apis are also designed likewise using Django rest framework.<br />
<br />
First we take a quick look at the user interface for the steps mentioned. Then we look at the REST APIs to achieve the same.<br />
<br />
Web UI presenting a list of pages.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDYGZ6FEOAPz6LBx_u8hmJKPqc_MNlODXJHsiQ6rKwglhV19pNES61aK1tZ_QAhjmhCAwu7slz2l7SdR9DnfLsn9_Grqfpbn0UM8polp4pUMQKkH7TnLYT6XS5lnTn_qnMRRQYGYcGy1c/s1600/html-webpage-list.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1446" data-original-width="1600" height="576" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDYGZ6FEOAPz6LBx_u8hmJKPqc_MNlODXJHsiQ6rKwglhV19pNES61aK1tZ_QAhjmhCAwu7slz2l7SdR9DnfLsn9_Grqfpbn0UM8polp4pUMQKkH7TnLYT6XS5lnTn_qnMRRQYGYcGy1c/s640/html-webpage-list.png" width="640" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEism2vkOeTj7NNUjOTwv4GR6Le2x9ZO9Xgxh_gjUrnq2OLA54rqLOAKaXb5cgyGIFcbjT4fwlz4Rk0bR0PtZZZWf4b-eGwCXWKCCZBqJRqPZH2D8qeIHcnmKwSWQxVu8yea8YA1To_mpmE/s1600/ui-abc-news.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><br /></a></div>
<br />
<br />
<div>
Web interface presenting page details and selecting a cloud from its list</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEism2vkOeTj7NNUjOTwv4GR6Le2x9ZO9Xgxh_gjUrnq2OLA54rqLOAKaXb5cgyGIFcbjT4fwlz4Rk0bR0PtZZZWf4b-eGwCXWKCCZBqJRqPZH2D8qeIHcnmKwSWQxVu8yea8YA1To_mpmE/s1600/ui-abc-news.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEism2vkOeTj7NNUjOTwv4GR6Le2x9ZO9Xgxh_gjUrnq2OLA54rqLOAKaXb5cgyGIFcbjT4fwlz4Rk0bR0PtZZZWf4b-eGwCXWKCCZBqJRqPZH2D8qeIHcnmKwSWQxVu8yea8YA1To_mpmE/s1600/ui-abc-news.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1584" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEism2vkOeTj7NNUjOTwv4GR6Le2x9ZO9Xgxh_gjUrnq2OLA54rqLOAKaXb5cgyGIFcbjT4fwlz4Rk0bR0PtZZZWf4b-eGwCXWKCCZBqJRqPZH2D8qeIHcnmKwSWQxVu8yea8YA1To_mpmE/s640/ui-abc-news.png" width="630" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The REST APIs and their browse-able screens are given below. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Having a browse-able API helps people to learn the APIs faster. Developers can copy the url endpoints to a browser and see the details for themselves.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
When a mobile app or another service accesses these APIs, only the json responses are sent back.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
1) The data format used is Json.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
2) APIs include</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
a) An API index to start with.</div>
<div style="text-align: justify;">
b) API to get the list of available Web pages.</div>
<div style="text-align: justify;">
c) API to get details of a particular web page.</div>
<div style="text-align: justify;">
d) API to get a list of all Word counts.</div>
<div style="text-align: justify;">
e) API to get a specific word count data.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
3) Pagination: Each api that presents a list of data points will show only a fixed number of data points per page. The responses will have next and previous links for navigation. This is a better option that sending all the data points with each response.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
4) Linked relations: A related object is always presented as a link. For example, the 'GET web page details' API has links to its word clouds. Not the actual word count/cloud data. If the requesting service wants to get that word count data, it can simply go to the link. On top of saving response data length, this approach also allows the data to be cached on the client side, say on a mobile device. The url can be used as a key.</div>
<br />
API index<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5O1HhHhGmZ01CXMOLarpAkIAF8qTjsycRuUD0XS9xZOKNH99Zb43Y5BgddUiByhcuqoPsBWBd-LKytcdLyQ0glzrardGnvIQOzrvnQ_VdBp9Y2CQYdiTcSEev0fEhU_38VAMG56PJRMo/s1600/api-index.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="470" data-original-width="1478" height="201" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5O1HhHhGmZ01CXMOLarpAkIAF8qTjsycRuUD0XS9xZOKNH99Zb43Y5BgddUiByhcuqoPsBWBd-LKytcdLyQ0glzrardGnvIQOzrvnQ_VdBp9Y2CQYdiTcSEev0fEhU_38VAMG56PJRMo/s640/api-index.png" width="640" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
API for Web Pages list. The json response also has next and previous links to navigate the list.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrKxBebdaTBwk9sbU4FFs1Y6Ng1o1UYKUUo2G9kAVJpcE8FZczx_XiPBv563q6TaVBLO5uRlFRc-QRgj6mz43SlKjxLDo9m3IV1gIcT0z1YQonmVh7haXRib2dghZ1e7sLFvZhDY8ZY2A/s1600/navigable+list+of+webpages.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="645" data-original-width="1600" height="257" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrKxBebdaTBwk9sbU4FFs1Y6Ng1o1UYKUUo2G9kAVJpcE8FZczx_XiPBv563q6TaVBLO5uRlFRc-QRgj6mz43SlKjxLDo9m3IV1gIcT0z1YQonmVh7haXRib2dghZ1e7sLFvZhDY8ZY2A/s640/navigable+list+of+webpages.png" width="640" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
API for a single web page. This has links to the page's word clouds.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdR7GbgWDVWJdIQbZp0mg_F2kw86951BCyw1FUjm8LPsCB1agGUn3yEqsLjOELfwT1r0dddGkyRJV0_E9ApCGLlaX-1vpEUu_bilRk76jUlqsqyBFh34dRMR-GOIwMMsiW2w-xOuQO1oo/s1600/abc-news-detail.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="616" data-original-width="1296" height="304" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdR7GbgWDVWJdIQbZp0mg_F2kw86951BCyw1FUjm8LPsCB1agGUn3yEqsLjOELfwT1r0dddGkyRJV0_E9ApCGLlaX-1vpEUu_bilRk76jUlqsqyBFh34dRMR-GOIwMMsiW2w-xOuQO1oo/s640/abc-news-detail.png" width="640" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
API for a specific word count.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCeVMOaU3-LW63vsZzKGc4YrJmp0O1nTE4mBkT_2LExxKg8CzgFeRnzMPDcLT1WelUiximJi9FH_xQazOMOzTEvzZoLOw4ZIZhOEeyNpPL2CeLe_M5PjViZZMYlzUjXZQKQZKkTYaaoH4/s1600/pwc-api-abc-news.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="808" data-original-width="1088" height="473" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCeVMOaU3-LW63vsZzKGc4YrJmp0O1nTE4mBkT_2LExxKg8CzgFeRnzMPDcLT1WelUiximJi9FH_xQazOMOzTEvzZoLOw4ZIZhOEeyNpPL2CeLe_M5PjViZZMYlzUjXZQKQZKkTYaaoH4/s640/pwc-api-abc-news.png" width="640" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.com0tag:blogger.com,1999:blog-3584889867142519041.post-16604082029057546882019-02-13T01:42:00.001-08:002019-03-12T23:16:42.471-07:00Connection pooling | Database performance | pgBouncer | DB Server side Vs AppServer Side In this iteration a database connection pooling solution is implemented. This enables the application to reuse database connections thus increasing performance at the database end. The pooling software used is <a href="https://wiki.postgresql.org/wiki/PgBouncer" target="_blank">pgBouncer</a>.<br />
<br />
> If the reader wants to compare the load test reports on <b>pgBouncer on DB hosts Vs pgBouncer on Web app server</b> scroll down to the end of this post.<br />
<br />
The content of this post is discussed in the following video for those who prefer that to reading.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/B2B9cjiovYE/0.jpg" src="https://www.youtube.com/embed/B2B9cjiovYE?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<br />
<br />
The changes at the db side is shown in application architecture below. Instead of connecting directly to the database applications connect to pgBouncer.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgA_eu3_53U4_tTWXOF4oYVwi_m7hkem-IH7rKnrh0fEIKRN8poe2xEsuix5WjKmkzSsAUW0anAgoJ0DGxxh_QJ1IBKR7J_QlVcQ8kIHN4tTc1TfYg6hkDVNhu37kt6DYOGS0WoX4H9nLY/s1600/HUD+_+Application+Architecture.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="808" data-original-width="984" height="262" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgA_eu3_53U4_tTWXOF4oYVwi_m7hkem-IH7rKnrh0fEIKRN8poe2xEsuix5WjKmkzSsAUW0anAgoJ0DGxxh_QJ1IBKR7J_QlVcQ8kIHN4tTc1TfYg6hkDVNhu37kt6DYOGS0WoX4H9nLY/s320/HUD+_+Application+Architecture.jpg" width="320" /></a></div>
<br />
<div style="text-align: justify;">
1) Setting up a connection consumes server resources and time. It requires a round trip to the database, authenticating a user and checking user privileges among others. Then the query is executed. Some client side frameworks hold on to a connection for a period of time and then release it. i.e The connection is closed after use. This means that a new request with the same credentials even from the same client has to go through the entire process again. This is can be avoided by returning the used connection to a pool and requesting new connections from a pool of active connections.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
2) Each database has a configurable maximum concurrent connections limit. For Postgresql this is specified using max_connections in settings /var/lib/pgsql/data/postgresql.conf and has a default value of 100. Beyond this number connections are refused.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
3) The load on the database can increase when a) when the user base increases resulting in increased activity b) new application features c) the application scales using load balanced web application servers to meet user load. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Whatever the reason, an application that used to see a maximum of 60 concurrent connections to the database ends up getting 120. Some of those extra connections end up refused at the database end. Pooling solutions also have a maximum client connection limit. The objective to configure and use a set number of database connections efficiently. Say we have configured the database to allow 150 concurrent connections. The server may have one or more databases for different users. For example, database A and B for userA and userB respectively. Also assume Database A sees more hits/usage than database B. Of total connection allowed by the database say 10 connections may be needed for backend database admininstration and rest are free for application servers. On pgBouncer we can set the maximum allowed connections to say 280 (greater than the database maximum) and configure pgBouncer pools for each user + database pair. In this case we can do userA + database A pool with pool size of 90 and userB + database B with pool size 50. Thus regulate connections.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Adding a pooling solution is an overhead. An immediate increase in throughput is not evident until the application goes through a situation when there is contention for database connections. In that situation instead of getting rejected by the database and failing there, the request is fulfilled by the next available connection in pool. The application is configured to minimise database accesses using caching. So throughput may not be a direct indicator of performance gains. A contention can be simulated only by reducing the cache timeouts which is not ideal. Finally, if the maximum number of connections at pgBouncer is exceeded they are still refused.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b>The question of putting pgBouncer on the application server hosts or on the database</b>: </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This depends on what works out in the end and is needed. For a small application with multiple databases we can host pgBouncer on the application server and tangibly save the round trip time to the db hosts. If there are many application servers we may be configuring large number of pooled connections at the source end and still end up with a deluge at the database end. This is like having the club bouncer stopping you at your door step because a headcount is reached at the club Vs you going to the club and getting stopped there.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<b>Aggregate reports</b> on load test with two runs are shown below. One with pgBouncer on DB hosts and second with pgBouncer on web server hosts. Both tests take 3-5 minutes to finish. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Notice that with pgBouncer at web server hosts throughput increased by ~ 21.4% and errors dropped to near 0%. This is due to time saved with not having to setup connections to a remote host each time it is needed and having only a localhost connection. However the webserver hosts pgBouncer pools and has 100s of pooled connections ready to be used. As already mentioned multiple web servers with pools like this can cause deluge the database host.</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Load test results with pgBouncer on DB hosts</div>
<div style="text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhywk6Cv37JGOy0254MhDs1Amnw6fSrEXMs4fOPcXA7sVnVPijQ_qlPdCFB5N6YAqHDa0t10hEIlnxMK8LhiqYvyTuEg2cD_5g0R5APPeVU6zPwarbB0xBLgTptovr_BKJk8Wnt8uZ1sRs/s1600/new-aggregate+report.png" imageanchor="1" style="clear: left; display: inline !important; margin-bottom: 1em; margin-right: 1em; text-align: center;"><img border="0" data-original-height="667" data-original-width="1600" height="260" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhywk6Cv37JGOy0254MhDs1Amnw6fSrEXMs4fOPcXA7sVnVPijQ_qlPdCFB5N6YAqHDa0t10hEIlnxMK8LhiqYvyTuEg2cD_5g0R5APPeVU6zPwarbB0xBLgTptovr_BKJk8Wnt8uZ1sRs/s640/new-aggregate+report.png" width="640" /></a> </div>
<div style="text-align: justify;">
Load test results with pgBouncer on web server hosts</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7fo4kZySk0XG1-vgNpFdcIKFcpQvOMSW8CLDaJKfWRqf9syAg7ulb4z7wDQ5Y59XgV1Q6ELvXG7WTUj2fFOBVh6p0dmjwX83H8t-djctDo7j8Q8NJwYvI6zPCwxDvNeF4TnTHcTOtshk/s1600/aggregate-pg-on-client.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="658" data-original-width="1600" height="262" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7fo4kZySk0XG1-vgNpFdcIKFcpQvOMSW8CLDaJKfWRqf9syAg7ulb4z7wDQ5Y59XgV1Q6ELvXG7WTUj2fFOBVh6p0dmjwX83H8t-djctDo7j8Q8NJwYvI6zPCwxDvNeF4TnTHcTOtshk/s640/aggregate-pg-on-client.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-6274616458727485922018-12-23T04:08:00.001-08:002018-12-23T04:08:38.698-08:00Throughput | Reaching 250 requests per second<div style="text-align: justify;">
In this post we look at how the throughput of my data visualization web application was improved to 267 requests per second in this iteration. This is a significant improvement with additional caching and application architecture modifications. Note that this new throughput is for the web application with a lot more functionalities like language support, machine learning models etc.</div>
<div style="text-align: justify;">
<br /></div>
New throughput on load test: 267 request per second<br />
<br />
<b>Background</b><br />
<br />
Previous JMeter test results with load balanced deployment and model caching is <a href="http://harisankar-krishnaswamy.blogspot.com/2018/06/load-testing-and-insights.html" target="_blank">here. </a>The same load test is used again to see the improvement.<br />
<br />
<br />
The load balanced architecture is described <a href="https://docs.google.com/drawings/d/15teW3_OiFE7dgimsoEpIBYivIevW1ejOET9OEvJHieU/edit" target="_blank">here</a>.<br />
<br />
<b>JMeter test results with new throughput</b><br />
<b> </b><br />
JMeter Response time Graph is shown below.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhA7rVrOZl3sZmp_1bIxQdz062WBTfqJioE_rte1nj637SI_feSd4N7Bz_6klpRU_hNLcyZzRTGHzIW10gquCo_04mAKh-g1VIfHgoohdXujc0zsnJtBd_wSX_aJtLxlbqlw7ch__wosxU/s1600/respons-time-graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="893" data-original-width="1600" height="356" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhA7rVrOZl3sZmp_1bIxQdz062WBTfqJioE_rte1nj637SI_feSd4N7Bz_6klpRU_hNLcyZzRTGHzIW10gquCo_04mAKh-g1VIfHgoohdXujc0zsnJtBd_wSX_aJtLxlbqlw7ch__wosxU/s640/respons-time-graph.png" width="640" /></a></div>
<br />
<br />
JMeter Graph results is shown below.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGZPQkJu2OP74U2Sgc82YxhTMS4HRUZRtn-U213HlkyKwjsLLabGIZlAgqO7qh_oRUrPyUPOVhwV8pS62b3-WAFubuU16KzRiL1XDh2GgKkcHd7PBWv4xzTeW1iV4eGqUUTYekTKnErNI/s1600/master-throughput.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1600" height="432" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGZPQkJu2OP74U2Sgc82YxhTMS4HRUZRtn-U213HlkyKwjsLLabGIZlAgqO7qh_oRUrPyUPOVhwV8pS62b3-WAFubuU16KzRiL1XDh2GgKkcHd7PBWv4xzTeW1iV4eGqUUTYekTKnErNI/s640/master-throughput.png" width="640" /></a></div>
<br />
<br />
<b>Techniques so far and modifications applied</b><br />
<b><br /></b>
<div style="text-align: justify;">
The previous caching technique focused on avoiding database hits. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
That employed a Django model based custom caching library. In addition to that, it also marked static files like js and images with down stream cache-control so that the browser does not download them each time. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
That far that is good. However</div>
<div style="text-align: justify;">
<br /><b></b></div>
<div style="text-align: justify;">
1) With the application there are a couple of scopes from improvement especially since a set of new features have been added among which language support is prominent. The Django application builds HTML templates to provision to clients. This includes all the html templates like navigation bar, user profile templates and page footer templates. Some template contents have to change based on say time, user, user language etc. However most templates once they have been generated based on one or more of the above can remain same and be reused. That's where template fragment caching comes in. A few things that can trip if not understood are</div>
<ul style="text-align: justify;">
<li>The gain from a single template fragment being cached is tiny. The return on investing in caching templates will only show as the number of concurrent requests on the application go up.</li>
<li>Also, locality of cache matters for template fragments. The savings to be made on time is small on each request. So even having to go to a cache on a different host will cost more than just building the template!. This mandates a local cache and is a modification to the architecture.</li>
</ul>
<div style="text-align: justify;">
2) Each view generates a response based on the request. For most requests the response is the same. For example, the response to a request for getting 'Word counts for www.cnn.com at 3 PM on 25 Dec 2018' is going to be the same. Such views need to be identified and cached. This helps with improving throughput.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
3) Finally, one physical aspect of your deployment that can affect performance is thermal throttling of CPUs. It is a good idea to check this too.</div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
<br /></div>
Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-10535177850879900662018-12-11T00:25:00.002-08:002018-12-11T00:25:49.230-08:00Lang support Dutch, Swedish... for WebApp | HUD<div style="text-align: justify;">
Supporting multiple languages for a Django web application is straight forward. 1) Add the LocaleMiddleWare to the list of middlewares. 2) Supply languages files with translations in <a href="https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html" target="_blank">PO files</a>. 3) The PO files are compiled into <a href="https://www.gnu.org/software/gettext/manual/html_node/MO-Files.html" target="_blank">MO files</a>. When a request is encountered django checks into the following places for the required lannguage; the url for a prefix, the session, a language cookie and <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language" target="_blank">Accept-Language</a> header in that order. This order can be seen in the locale middleware code at django > middleware > locale. Once the required language is identified, it is activated.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Within the application a user can have a preferred language. This is implemented the same way as preferred timezone. Users can go into their preferences page and choose from a list of supported languages. Currently English, Dutch, Swedish, Norwegian and French languages are supported. Two users with Dutch and Swedish language preferences are shown below.</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKHspzY_E-Rrmug7zsGBPnCtymzl_G0qQlqg9nye9DHu5-RxeIrujNl6kfq7voH7mL_p_lk5gYgzoy-vHdexNwG6ajxueO3Fi-y8jHeYeZ8nLgEiv6JMPQcQ07Q3hkseykSAVG0xDv0pI/s1600/1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1402" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKHspzY_E-Rrmug7zsGBPnCtymzl_G0qQlqg9nye9DHu5-RxeIrujNl6kfq7voH7mL_p_lk5gYgzoy-vHdexNwG6ajxueO3Fi-y8jHeYeZ8nLgEiv6JMPQcQ07Q3hkseykSAVG0xDv0pI/s320/1.png" width="280" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikG-A54dYdhGSHQSRZ3RWmSL2eNDG0fo-ONHiakE87IqsgvQykFek0mHQGJb_mbZPvipJULkRJDaKaiqNB-tkUtHvtqd7UZdOuJ3lprtKVhxHEDpx6EJlbJL5TGd06swmakibzbjIbtWk/s1600/2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1413" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikG-A54dYdhGSHQSRZ3RWmSL2eNDG0fo-ONHiakE87IqsgvQykFek0mHQGJb_mbZPvipJULkRJDaKaiqNB-tkUtHvtqd7UZdOuJ3lprtKVhxHEDpx6EJlbJL5TGd06swmakibzbjIbtWk/s320/2.png" width="282" /></a></div>
<br />
<div style="text-align: justify;">
When a user accesses the application for the first time the login page is shown in a language as on the <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language" target="_blank">Accept-Language header</a> from the browser. Here Chrome was set to French (browser settings shown in the end) and Opera in default language settings. Login based on request headers (for users above) is shown below.</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZVmpJdqSOQhCzISABP7zFitSH4KKnT_PtO9fEzb-jWstXFIGSFb5T3n5uXgHPcW_eXKeJZUsPGyI0Ig9txgq_ENriPea7SzE5_boZ1Y6gHcj-HFQLANz0Yp7LUgayYd8jDCK4SwoCCXg/s1600/4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="584" data-original-width="1600" height="232" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZVmpJdqSOQhCzISABP7zFitSH4KKnT_PtO9fEzb-jWstXFIGSFb5T3n5uXgHPcW_eXKeJZUsPGyI0Ig9txgq_ENriPea7SzE5_boZ1Y6gHcj-HFQLANz0Yp7LUgayYd8jDCK4SwoCCXg/s640/4.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div style="text-align: justify;">
The following screens show the two users accessing the application after logging in. For both the browsers the first language was what the browser mentioned in the request header. Once the user logs in the user's preferred language is activated. This is shown below where the preferred language of user on left is Dutch and Swedish on the right.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYY9QAF4o9mS6rO8kDX3Z15QBVCc40CQiXPaDBacY1prPPNDLD5P5wsamn_mdKmeqgNdqK1XGafdKXdWKBZS54MYcUgx-WCj4ExoQRHY0bfZPnWtT0kMqTNDnxhW-i-DBhUm56GuBDmPg/s1600/5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="905" data-original-width="1600" height="361" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYY9QAF4o9mS6rO8kDX3Z15QBVCc40CQiXPaDBacY1prPPNDLD5P5wsamn_mdKmeqgNdqK1XGafdKXdWKBZS54MYcUgx-WCj4ExoQRHY0bfZPnWtT0kMqTNDnxhW-i-DBhUm56GuBDmPg/s640/5.png" width="640" /></a></div>
<br />
Chrome language settings indicating French selection is as follows.<br />
<span id="goog_2062259648"></span><span id="goog_2062259649"></span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvkYGRtHAKgyxjQVbc3vaazFZ4Ut1UDqMJTYDZPjbGBQsuu8yA1_5D2g_KuK2lRmKo9Ge37nfa7Wyi6C9imEV1mbuhn3lgMQhrGH8-aihxJg0GjWs498viE5VPbyTezcyB6BAr4Qbhjbs/s1600/3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="937" data-original-width="1600" height="231" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvkYGRtHAKgyxjQVbc3vaazFZ4Ut1UDqMJTYDZPjbGBQsuu8yA1_5D2g_KuK2lRmKo9Ge37nfa7Wyi6C9imEV1mbuhn3lgMQhrGH8-aihxJg0GjWs498viE5VPbyTezcyB6BAr4Qbhjbs/s400/3.png" width="400" /></a></div>
<br />
<br />
<b>References</b><br />
<br />
Django docs on translation<br />
<br />
https://docs.djangoproject.com/en/2.1/topics/i18n/translation/Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-67036363246545978812018-11-28T06:28:00.004-08:002018-11-30T02:34:05.154-08:00*Updates with Runtimes* Web Search Engine with Word Counts<div style="text-align: justify;">
Here we look at implementation of a web search engine. The project already has the data on word counts for web pages. Pages added to the project have been crawled and content word counts are stored periodically. This was primarily for generating word clouds and text content analysis. However the word counts can also be used build a search index for the set of web pages. Given a bunch of words the search index can give back the list of pages within which the words occur. In addition to that, the word counts are attached with the timestamp at which the page was processed. This helps to find more recent occurrences.</div>
<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/Kl4QrnFuoU0/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/Kl4QrnFuoU0?feature=player_embedded" width="320"></iframe></div>
<br /></div>
<div style="text-align: justify;">
<div>
Quick overview of steps involved: </div>
<div>
<br /></div>
<div>
A) Filter word counts within a time period. Past 4 (or N) days.</div>
<div>
B) Build a trie data structure with the data. </div>
<div>
C) Compress the trie so that it can be held in memory. </div>
<div>
D) For a given search string made up of multiple words, find the set of web pages where the words occur. The compressed trie helps with this. Time complexity is described below. </div>
<div>
E) Find the intersection of the sets of web pages. </div>
<div>
F) Extract required information and send back results. This information includes as in other search engines the full url of the page, time of crawling (word count generation) and a title. </div>
<div>
G) Cache information as necessary to speed up the web view.</div>
<div>
<br />
<b>Runtimes</b> with cProfile are as follows:<br />
<br />
1) <b>Building the trie</b> takes 3.583 seconds for 173693 Words. Pickled Size is 119.2MB<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgH_V7JXyez0ay7G_EQRVHECRrXbsskGqaMv7_HcKUefQRc6vAQBcVbiNhOHXdkPi4r5ZOzh6QwP50iFMCMafgGp6FkyuijCRa67oDwEwd1uXGaZvWRwQ42a70iHExM-CypA1NaGewLjmQ/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="539" data-original-width="1600" height="214" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgH_V7JXyez0ay7G_EQRVHECRrXbsskGqaMv7_HcKUefQRc6vAQBcVbiNhOHXdkPi4r5ZOzh6QwP50iFMCMafgGp6FkyuijCRa67oDwEwd1uXGaZvWRwQ42a70iHExM-CypA1NaGewLjmQ/s640/1.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
2) <b>Compressing the trie</b> takes 2.38 seconds and Pickles size is 6.2MB<br />
<br />
3) <b>Searching </b>including fetching resulting web pages ~4ms<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFAakEpZzLqUkaAzYGtYac4YKw80vaETWZnVXrC5XgT4tTLYneIWaQxlMLHwddcHa2046JKpawnX3Q-Z-7z97rNkdqKiOrU6JpsEyKEMFGlF8GlLhOm-cbbZLQPSlH2GDTU9jxfe4IHG4/s1600/3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="396" data-original-width="1600" height="156" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFAakEpZzLqUkaAzYGtYac4YKw80vaETWZnVXrC5XgT4tTLYneIWaQxlMLHwddcHa2046JKpawnX3Q-Z-7z97rNkdqKiOrU6JpsEyKEMFGlF8GlLhOm-cbbZLQPSlH2GDTU9jxfe4IHG4/s640/3.png" width="640" /></a></div>
<br /><b></b>
<br />
4) <b>Searching all 10 strings above </b>including fetching results ~116ms<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbzCLG7NazTzFd48VhU2kDPOb0czp8hRREiWeWE1nySaiv28c5oJ7hULGDw3z8_59dGPmmFi__RX_MEomig6hmwuCDT89nNGkskuAVYXo9j41B3kBuO-vHm5bSnVZwXc6tZHUEYSNPxr8/s1600/4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="289" data-original-width="1600" height="113" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbzCLG7NazTzFd48VhU2kDPOb0czp8hRREiWeWE1nySaiv28c5oJ7hULGDw3z8_59dGPmmFi__RX_MEomig6hmwuCDT89nNGkskuAVYXo9j41B3kBuO-vHm5bSnVZwXc6tZHUEYSNPxr8/s640/4.png" width="640" /></a></div>
<b> </b> </div>
</div>
<div style="text-align: justify;">
Some screenshots of the engine at work are shown</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiAU4w7B-Y8jmPYUVOfEjlx52ss0xxR7_9BzYPHSJrXsM1Y0vLMZfavhA27oNQVQfl8d9s2ySXuUh1ss5LrdwDSvf5hGTYOenbdbin4y91bKZq3f_4vjzDQd92UC3bim02bQnNOrPM-KY/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="879" data-original-width="1600" height="348" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiAU4w7B-Y8jmPYUVOfEjlx52ss0xxR7_9BzYPHSJrXsM1Y0vLMZfavhA27oNQVQfl8d9s2ySXuUh1ss5LrdwDSvf5hGTYOenbdbin4y91bKZq3f_4vjzDQd92UC3bim02bQnNOrPM-KY/s640/1.png" width="640" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizH6ZxsOOZ0zj9S4cFBgR5RCtFvSIf6dH2wCL9PRya71BsNtw0A74zkypgdM87qkNqEz-OaKQznAYBhbkYFY6Z2UXwwUfC625v7da-W5a5EjRjE9nhjeV3fnGsnSMYA0Q-aKUqITXtQiA/s1600/2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1458" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizH6ZxsOOZ0zj9S4cFBgR5RCtFvSIf6dH2wCL9PRya71BsNtw0A74zkypgdM87qkNqEz-OaKQznAYBhbkYFY6Z2UXwwUfC625v7da-W5a5EjRjE9nhjeV3fnGsnSMYA0Q-aKUqITXtQiA/s400/2.png" width="363" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh04AaM9HM7RDvXLyy7YHLIisuS5crHOdO1AJMvjfjFZViw_iynoSUaUT2FSW9NS155TgQnRnAg26QlgCdu-b2y2tdeH32DJNnmJdBE5bntc7BSqNAXTl6atxRZveWQcL4lHnSP6r_kU3k/s1600/7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="878" data-original-width="1542" height="227" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh04AaM9HM7RDvXLyy7YHLIisuS5crHOdO1AJMvjfjFZViw_iynoSUaUT2FSW9NS155TgQnRnAg26QlgCdu-b2y2tdeH32DJNnmJdBE5bntc7BSqNAXTl6atxRZveWQcL4lHnSP6r_kU3k/s400/7.png" width="400" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Apart from the trie index the rest of the data is already part of the project database. However the trie is not part of the database. It is generated when required, compressed and held in memory.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Quick overview of tries: At the core of the index is a data structure called Trie (compressed). A trie is an m-ary tree where each node branches out based on the character encountered in a key. The interesting thing about tries is this. For a set of K unique characters a node has K+1 pointers. Based on the keys that are inserted into a trie the number of nodes can change. For a given trie, if S is node count, key count is N and L is the length of longest key then the search for any key is in O(L) independent of K and N. Storage requirement is (K+1) x S x P bits independent of N, the number of keys in the trie. P is number of bits in a pointer.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Compressing the trie: Once the trie has been constructed it can be compressed. Multiple techniques such as Patricia Tries and de la Briandais trees can be used. However, here the project uses a different technique. Any trie with N nodes and a K character set can be represented by an M x K table. The table can be shrinked further using a sparse matrix. Here we see the difference in serialised size of the trie index.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
For 79 web pages in the project, with in a week there can be minimum 2 crawls so ~ 160 word count data rows for the web pages. Sizes for objects were also monitored using pympler trackers for Python 3.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Uncompressed Trie size 25628388 bytes ~ 25MB</div>
<div style="text-align: justify;">
Compressed Trie size 21586721bytes ~ 21 MB</div>
<div style="text-align: justify;">
Compressed Trie with minimum selected data in leaf nodes 7281173 bytes ~ 7.2 MB</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This 7.2 MB trie index can be held in memory or cached. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The search results are in decreasing order of timestamps. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The architecture of the crawler project was discussed previously <a href="https://www.youtube.com/watch?v=OXcx1ZKmN-E" target="_blank">here</a>. Crawling and word counts are executed in celery async tasks. This architecture is shown below.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhupqZcaL0wyWXgHeE_wIAc2aNQRToAS1g_wNwyNwsJMN9tajxq59uHskvbdq9xlNCku-jgRG4I7Mnd6Xz6hJHdXjmQmu5VlEaDTYkG40eOeJLGWD1BUpwJ26mOND1FmQ9ZqRlDBYxkrvg/s1600/HUD+_+Application+Architecture.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="808" data-original-width="984" height="262" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhupqZcaL0wyWXgHeE_wIAc2aNQRToAS1g_wNwyNwsJMN9tajxq59uHskvbdq9xlNCku-jgRG4I7Mnd6Xz6hJHdXjmQmu5VlEaDTYkG40eOeJLGWD1BUpwJ26mOND1FmQ9ZqRlDBYxkrvg/s320/HUD+_+Application+Architecture.jpg" width="320" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Future work: </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
1) Currently a set intersection of the words is used. More options like OR and NOT can be supported using expression trees. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
2) Storing the pages themselves in the filesystem for reference would be great. But this is not feasible at the present disk allowance.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
3) Since the word counts are time stamped, a date time search window option can be given to users. Holding the index over an increased period of time raises the size of the index too.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
4) It would be feasible to rank the pages on a complex parameter than just time stamps. Relevant visits and count can be used along with timestamps.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
5) Word edit distance can be used to correct words as in popular search engines.</div>
<div style="text-align: justify;">
<br /></div>
Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-8841856538847745872018-11-14T07:59:00.000-08:002018-11-14T08:06:46.464-08:00HUD | UI updates to web pages and earth quake visualisations<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/7-TdaN0ufAg/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/7-TdaN0ufAg?feature=player_embedded" width="320"></iframe></div>
<br />Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.com0tag:blogger.com,1999:blog-3584889867142519041.post-53385370295286734592018-11-06T09:00:00.004-08:002018-11-06T09:01:59.552-08:00Django custom caching library v2<div style="text-align: justify;">
In a <a href="http://harisankar-krishnaswamy.blogspot.com/2017/10/django-models-i-will-cache-you-if-i-can.html" target="_blank">previous post</a> we looked at a very early version of a caching library used in my Django project. This has been enhanced to include new features as requirements came up. Although this library is based on practical requirements that showed up, the two primary api are documented well. This is so that the user is aware of what the library can handle well and avoid performance degradation. Coding up this library has been primarily to help with keeping caching code DRY. Compared to the previous version there are no changes at the models. There are three additions.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
i) Prefetched relation support</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Django documentation on Prefetch is available <a href="https://docs.djangoproject.com/en/1.10/ref/models/querysets/#prefetch-objects" target="_blank">here</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In Django it is a common practice to prefetch related relations while querying a model. While this is a good idea, this can really degrade performance by increasing the number of sql queries by O(N) where N is the number of prefetched rows. To address prefetching, both apis will accept a tuple of Prefetch objects. Not the prefetch related names. The reason is as follows. Prefetch objects allow more control on what is prefetched. This helps with performance especially using the .only(*fields) api from queryset as shown below.</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5NshgdEiHeSKVp7eDoFyFo-tv5VeDol47_YKivCXARTPQooOdMVLyeNshtKGiNzya9bR_pA63mf9tM-h6p9UAt_CQlqg30ZJ-uv7wcqmU80mAZnuL4BHDIauNqpcQ-9dJmWJeZM2lOZ4/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="164" data-original-width="1600" height="65" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5NshgdEiHeSKVp7eDoFyFo-tv5VeDol47_YKivCXARTPQooOdMVLyeNshtKGiNzya9bR_pA63mf9tM-h6p9UAt_CQlqg30ZJ-uv7wcqmU80mAZnuL4BHDIauNqpcQ-9dJmWJeZM2lOZ4/s640/1.png" width="640" /></a></div>
<br />
<div style="text-align: justify;">
In the code we want to get a web page and prefetch its related page word counts. We control what columns are needed from the prefetched relation, PageWordCount, using a queryset. Then we pass the Prefetch to the api. This is important for caching as too much prefetched data will result in memory consumption at database and web server but also cause Django to silently fail when the data is set to memcached. Memcached has a configurable 1MB object size limit. Notice the foreign key reference to web page in the only fields. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In order to understand the loop hole which will cause sql to be fired, we need to understand how Django handles prefetch. On the primary relation Django brings in the web pages and uses an IN SQL query to bring in the PageWordCounts. Now it does the join in Python i.e it tries to find the PageWordCounts that belong to each WebPage. For that you need the foreign key field. If you did not mention it in the only(*fields) Django will send out an sql query for exactly that, for each prefetched row. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Prefetch support in the other api is shown below. Here we are pre-loading the cache with a list of all WebPages. This is a better example of where forgetting the above point will cost a lot.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiF8YgfS9p9NGYI4vuVdDJF4Y5zr_ic1nekpbUh1xZDXnfiROqCsVHlWIZO49umPxoHDeMiKbrdd8H7MYtuz5b8xlZYXGs5KIYv5k697w17PSfKv9rQDS64sNKJsGxsjiXhracqyPG6GoQ/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="193" data-original-width="1600" height="76" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiF8YgfS9p9NGYI4vuVdDJF4Y5zr_ic1nekpbUh1xZDXnfiROqCsVHlWIZO49umPxoHDeMiKbrdd8H7MYtuz5b8xlZYXGs5KIYv5k697w17PSfKv9rQDS64sNKJsGxsjiXhracqyPG6GoQ/s640/1.png" width="640" /></a></div>
<br />
The api signatures are shown below. First one allows fetching rows based on fields. Cache entry is set based on the specified fields. The second fetches all rows.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjR4hMkI9bh7JmyK5h2CzW4vEaO5Phyphenhyphen0oLcfxTjCXvIS4doCyLX8t8l7fC991Sqpc06y4b4obyQ3-Y0O9PLbXSLbyyV1LX0Ie13f1kXmGjLrd25zWpPq5iJpyw71L-VOFU_p7Lw6PC2bYw/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="426" data-original-width="1600" height="169" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjR4hMkI9bh7JmyK5h2CzW4vEaO5Phyphenhyphen0oLcfxTjCXvIS4doCyLX8t8l7fC991Sqpc06y4b4obyQ3-Y0O9PLbXSLbyyV1LX0Ie13f1kXmGjLrd25zWpPq5iJpyw71L-VOFU_p7Lw6PC2bYw/s640/1.png" width="640" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOQ2GtErg8fXCZ4Si1_uGCXecTJSXurIOE1U2O3GSkP_aW-PmHF7gS_blssB3t4-Aas_ehQ6U_FgPycsV5o9jVoMn7zlQL6QrJWHQ9aRcnUXlYGQK0Hsd7kNwRmYmG0cagfb7bM9lrWfY/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="345" data-original-width="1600" height="138" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOQ2GtErg8fXCZ4Si1_uGCXecTJSXurIOE1U2O3GSkP_aW-PmHF7gS_blssB3t4-Aas_ehQ6U_FgPycsV5o9jVoMn7zlQL6QrJWHQ9aRcnUXlYGQK0Hsd7kNwRmYmG0cagfb7bM9lrWfY/s640/1.png" width="640" /></a></div>
<br />
<br />
<div style="text-align: justify;">
ii) select_related</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Django doc on this is <a href="https://docs.djangoproject.com/en/2.1/ref/models/querysets/#select-related" target="_blank">here</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This is a simple forwarding of required fields. Similar to prefetch but for one-to-one and foreign keys relations.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
iii) Chunked bulk updates to memcached</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Once all the rows are fetched using all_ins_from_cache api, we will have a list of instances. This list can be huge. The api loops through the list and sets the individual cache entries using set_many. However, set_many was silently failing with 100-120 entries. Possibly due to large amount of data being passed over a single call. To avoid this, the instances list is broken into manageable chunks and each chunk is passed to set_many. Chunk size can be configured.</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhA4W6k6dwVTiphgUvWCDkHe-02vKx2GQNRN2sI8A708D-dtDt24HdEpUp5go_jmjc4BH1AwmvxxTudqIfLr05HZhi54rOQf76jiqRcFO4AeAuC4AgopZ_8qy4UhSPJdqW2SZK0q_fXtxA/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="178" data-original-width="1294" height="88" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhA4W6k6dwVTiphgUvWCDkHe-02vKx2GQNRN2sI8A708D-dtDt24HdEpUp5go_jmjc4BH1AwmvxxTudqIfLr05HZhi54rOQf76jiqRcFO4AeAuC4AgopZ_8qy4UhSPJdqW2SZK0q_fXtxA/s640/1.png" width="640" /></a></div>
<br />
<br />
The resulting library is more usable in the Django project data set. Cache set/get code is more sophisticated and helps to keep code DRY.<br />
<br />
<br />Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.comtag:blogger.com,1999:blog-3584889867142519041.post-45494457594799404642018-10-17T10:57:00.000-07:002018-10-17T12:42:25.935-07:00HUD | Enable/Disable Django Apps<div style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/3hck-3yM2b8/0.jpg" src="https://www.youtube.com/embed/3hck-3yM2b8?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<br />
This post describes how Django apps in a project can be enabled/disabled via settings. The requirement is to be able to enable/disable Djanga apps with flags. If the app is enabled in the project then it will be loaded, its urls and templates will be available to users. On the other hand if an app is disabled then its templates and urls are not available to the user.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
For example in the screenshot below the word cloud app is enabled on the left. The app is available on the navbar and homepage . The deployment to the right does not have the word cloud app enabled. Notice that the templates have adjusted themselves based on configuration.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1QwpH7g5x3Wbc6CDAKRz3mFzBhaIopO4GePd2oa0JEj8zmj6idh_olYay64Cv9t-zTDn1IdRQsxNrquoJOln9zd487XJpFaPJax07Dew8AJjyg3cWJWAWOpxGyV1BVboisn2xAmFQFIs/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="847" data-original-width="1600" height="338" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1QwpH7g5x3Wbc6CDAKRz3mFzBhaIopO4GePd2oa0JEj8zmj6idh_olYay64Cv9t-zTDn1IdRQsxNrquoJOln9zd487XJpFaPJax07Dew8AJjyg3cWJWAWOpxGyV1BVboisn2xAmFQFIs/s640/1.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
To prevent Django from loading an app is easy. Just do not add it to the list of INSTALLED_APPS in Django settings. However the root url conf, ROOT_URLCONF will need to be changed accordingly. Again, url references in templates cannot be enabled/disabled using just the INSTALLED_APPS. Editing templates and root url confs to tailor them just for a specific deployment is not recommended. This creates additional effort as that particular deployment will need to be tracked and maintained separately.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A better implementation is to specify whether an app is enabled/not and the project will load appropriately. For each app we need its</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
a) Switch name: This tells the rest of the project whether the app is enabled or disabled. i.e a flag like wordcloud_enabled to check against. This is particularly useful for templates. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
b) Url regex: This is the base url pattern for the app. For example all urls for articles app will have the "articles/" base pattern as prefix. And the url for posting articles can be https://www.server.com/<b>article/</b>post</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
c) Urls module: The Python module that holds the app's url patterns. In the above example the url patterns for posting, editing and deleting articles etc are specified in this module.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
These 3 details are easily configurable in named tuples as shown.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVJ8AODaeU15zEMIqosjyLnVfhbR6AeWqwarcCcjD1JSBKMXp7KDu3zty2S7x8x-29lleZOhNMySARzwkiK67rSdkeDgv8gG5ZDfj0-hjmwrh2jobAyLPObE-L90yw6i_tul04Lo9nfQY/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="108" data-original-width="1520" height="44" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVJ8AODaeU15zEMIqosjyLnVfhbR6AeWqwarcCcjD1JSBKMXp7KDu3zty2S7x8x-29lleZOhNMySARzwkiK67rSdkeDgv8gG5ZDfj0-hjmwrh2jobAyLPObE-L90yw6i_tul04Lo9nfQY/s640/1.png" width="640" /></a></div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
1) Controlling apps in templates: </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Users should not be able to view or use urls of a disabled app. In order to achieve this templates should know whether an app is disabled. This is made possible using a list of Application tuples and a template context processor. The switches tell the template whether an app is enabled or not. The context that contains the app's state is generated from the Application tuples and made available by a context processor. </div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpKEoB1lrK40LPfgUdi7TRXZ0_f5a8S28GPG7x7K9h-6hANxoT2GCJ1GRpxxXPyBOyZrjnUzIYylHC67Q7UZHG1As-FVi3pfsW2HxfaWbYxU3SaHisIBfwNQ7OOsiCF6vQIX7uve4JkeY/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="226" data-original-width="1600" height="90" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpKEoB1lrK40LPfgUdi7TRXZ0_f5a8S28GPG7x7K9h-6hANxoT2GCJ1GRpxxXPyBOyZrjnUzIYylHC67Q7UZHG1As-FVi3pfsW2HxfaWbYxU3SaHisIBfwNQ7OOsiCF6vQIX7uve4JkeY/s640/1.png" width="640" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In this solution, so long as the templates are as modular as possible, i.e utilising template hierarchy to separate template fragments, we can enable/disable parts of the user interface. The solution becomes as simple as the following check in a template for the navbar for the ml machine learning application.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxGFa8ZZLCQCSANbrD7nxMknoZKdxDcnNHGBIIporBWSg97m-AhdnHQWtLNkKV8p2vmtpPXzCZcW3dw5r7N7VvX0geWFIY6kmPzqBZ8gvCixL0X6WvZosWVJj4VXWfnBegookdxXN4RXI/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="298" data-original-width="1210" height="155" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxGFa8ZZLCQCSANbrD7nxMknoZKdxDcnNHGBIIporBWSg97m-AhdnHQWtLNkKV8p2vmtpPXzCZcW3dw5r7N7VvX0geWFIY6kmPzqBZ8gvCixL0X6WvZosWVJj4VXWfnBegookdxXN4RXI/s640/1.png" width="640" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Notice that the template uses the flag 'ml_enabled' to check ml app is enabled. Each application needs a flag that describes its state. This flag/label is also configurable.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
2) Now comes the root url confs. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
For each enabled application we need the url regex and the urls module for the application. These are added to urlpatterns in root url conf. This is shown below.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOFuWWwPwkHzsW2hrbFHJ-JZZGJhmuLL4BOOj__FsZyPHTil-jn9h4C30FbvfarwRJigewyie-4wzIxP3vEnt-zeLIg_6Rl-eMfZrVSkhMcNfhYT_4ookR3PSOHgwRVefnwJaaIUmTbwM/s1600/1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="158" data-original-width="1600" height="62" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOFuWWwPwkHzsW2hrbFHJ-JZZGJhmuLL4BOOj__FsZyPHTil-jn9h4C30FbvfarwRJigewyie-4wzIxP3vEnt-zeLIg_6Rl-eMfZrVSkhMcNfhYT_4ookR3PSOHgwRVefnwJaaIUmTbwM/s640/1.png" width="640" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
</div>
Harisankar Krishna Swamyhttp://www.blogger.com/profile/08640482899790313232noreply@blogger.com0