Tuesday, 24 April 2018

Implementing Captchas

"Completely Automated Public Turing Tests to Tell Computers And Humans Apart (CAPTCHAs) are challenge-response tests used to determine whether or not the user is a human" -- Wikipedia

These are used to ensure that a human is indeed in front of the screen making a web request. Sensitive parts of application are protected with captchas. A form that adds data into a database may need to be protected from accepting automated postings. Without protection, a stream of automated posts can not only just swamp the application but can also fill the database/disk with valid junk. For example, gibberish in a feedback form. 

Contemporary captchas are usually one of the following

1) One image or a set of images with a challenge question
An image will have text in it. The user has to type the text into a box. Another type is a set of images and the user has to pick a subset of the images. Say you are shown 4 images and there are animals in only 2 of them. You have to select the correct ones with the animals.

2) A click request on a box
3) An audio playback 
4) Even a short video playback can be used

The last two have storage and bandwidth impact on a web application.


A pluggable captcha that can be used in any Django web application. A mechanism to add and configure captchas with challenges. Once a captcha is added, the system must pick it from there. Captchas have to be one level up in difficulty. i.e something more than just 'enter the text', although these can also be used. 


A single image captcha with additional semantic requirements is implemented. A reusable Django app 'captchas' holds the model, form etc to select and process captchas. The template can be included in any HTML form. The default display is as a Bootstrap4 card. How and where this card renders on a form is up to the page designer. Django views just need to send a form in responses for GET and process the submitted form in  a POST. The validation of the captcha is isolated in its form.

The add web page functionality in HUD application is protected with these captchas. This implementation can not only ask for just the text but can also ask for anything based on the image. This iteration includes captchas with challenges like 

- Type the text
- Type only the numbers seen in image
- Type only the letters
- Type the last four letters in reverse

Examples are

Or anything that can be inferred from the image i.e the challenge is configurable as shown. In this iteration basic colored images were used. Using strike throughs, blurs and other effects and on the images can further confuse models. It is also important to change the size of the image as it will slightly increase processing cost.


1) There is a one-to-many relation between the images and challenges. With many images and challenges this approach can mitigate the effect of a sweatshop. A captcha image will show up with a different challenge thus mitigating image signature based attacks. If an attacker is getting past the security then it has to be on expensive discipline.

2) There are online free captcha services that can be easily integrated  to sites. However, these tend to have one or another pattern. The popular services may have already been subjected to continuous automated machine learning to created models. Such models are posed with a custom unfamiliar challenge thus making it difficult.

3) Ability to change the challenge over time allows for reuse. This is because it is the challenge that can hold a semantic requirement on a static image.

4) Even if the captcha images are harvested from the application, the challenge remains unknown. The challenge on a harvested image can be changed to a more complicated question.