Monday, 16 December 2013

Cloud Computing Application Development Paradigm: App Engine From Google

App Engine is an execution environment as a service. Behind the scenes this is a super set of all the other 'as a service' models namely infrastructure as a service, software as a service and platform as a service. Developers just build and deploy while all the software, infrastructure, platform and the like are addressed implicitly. 

In a traditional software deployment scenario, the production servers are prepped with application server software, databases, user accounts, recovery/backup in the form of standby mirrors etc. Here app deployment does not need any of that. Users still get to provision websites and web apps which will scale with demand. Plus, developers do not get hold of a virtual machine like Amazon cloud. Without a hardware/virtual instance, developers need not install anything separately for production. Memcache and load balancing is built-in to App Engine. Database is replaced by the high replication data store. Web server is out of the question because handling urls is what App Engine does best, everything is built around handling urls (web server like). Cron jobs, task handlers and back-end instances are all triggered and handled via url handlers. All in all, this brings an application development paradigm targeting an execution environment in the cloud.

Since Apps do not get a virtual machine in the cloud on which you can install anything apps wish, applications need to stick to ground rules, use the tools and community guidelines to be fast, reliable and scalable. Otherwise Apps will not be able to cut it. Serving web pages, performing offline tasks like scheduled reports, upgrades to data store models, all require that Apps stick to the rules. For example if an app breaches its memory limit it is terminated. Web apps run on process instances rather than machine instances. These process instances are one of the core enablers for scaling with requests. Instances are of different types with different capacities for memory, CPU, web request/response size, memcache size and offline data storage including logs. Instances can go from 128MB 600MHz to 1024 MB 2.4GHz. Choosing these options directly impacts your app's performance. Billing depends on, where applicable, the services in which you overrun your quota above what you already paid for. 

On a lighter note the billing mechanism can be summed like this. You rent a Ferrari. But then you pay extra 10 cents for every turn of each tyre if you go over 40 miles per hour. And you pay 20 cents if you do a reverse and 15 cents for every oscillation of the windscreen wiper. Every now and then there is a chance that your Ferrari will be terminated and restarted remotely.  The point is, billing and execution environment can be frustrating at first (possibly eternally) if not understood correctly.

The data store will experience contention if an App throws quite a lot of stuff into it sequentially. Contention leads to timeouts. But, App Engine retires data store ops as such. But it is better to have a retry in app logic too with a back off mechanism. This is officially advised too. Data models are really flexible. But queries are learned during development and App Engine readies indices before hand. 

Task recovery mechanisms in case of failure take priority one, if it was not already that. Servers in warehouses have a higher failure probability than servers in house. This is mainly due to the sheer numbers involved. If you have 10 high end machines in-house the chances on one failing within a year could be remote. But on a cloud data center with 10000+ servers the probability of servers going down or needing replacement/maintenance just goes up. As a start up noticed over years that their Amazon instances' life span is on average 200 days! What about App Engine then? We are dealing with process instances so disruptions are pretty much normal and immediate. Instances especially back ends will be terminated without much warning. So either way having your processes running on machines/instances somewhere far away means that there will be disruptions. Apps can register for shutdown event handlers to do something before going down. Also there is no guarantee that this will be triggered. 

Again if Apps are running a huge task say collecting information from the web to process offline and then deliver to the client later; Apps need to address disruptions in the sense that apps need to be able to recover from the point where a huge task failed due to a cloud infrastructure problem. This makes you do two things in particular on App Engine, breakup tasks for independent execution and handle breaks using the platform mechanism. App Engine allows this with Tasks and Task queues. The idea is Apps must break a huge task into manageable units which will be run independently. Otherwise apps risk continuous disruptions from quota overrun such as memory limitations or request timeouts or data store contention. One or the other. 

One robust solution is to have a task chains. Tasks will do a small amount of work and before finishing, queue the next task to be executed. Failed tasks in a queue will be re-tried. App Engine prevents triggering the same task over and over again by using Tombstoned task names. Simply, it remembers a task name say for a day and does not allow a task to be queued in the same name. So imagine your taskA triggered taskB and taskA failed for some reason without spawning the rest of the tasks taskC and taskD. When it comes back online it won't be able to add taskB. taskB will be executed from the queue as and when its time comes. There are a host of retry mechanisms like token buckets and back offs.