Monday 16 December 2013

Cloud Computing Application Development Paradigm: App Engine From Google

App Engine is an execution environment as a service. Behind the scenes this is a super set of all the other 'as a service' models namely infrastructure as a service, software as a service and platform as a service. Developers just build and deploy while all the software, infrastructure, platform and the like are addressed implicitly. 

In a traditional software deployment scenario, the production servers are prepped with application server software, databases, user accounts, recovery/backup in the form of standby mirrors etc. Here app deployment does not need any of that. Users still get to provision websites and web apps which will scale with demand. Plus, developers do not get hold of a virtual machine like Amazon cloud. Without a hardware/virtual instance, developers need not install anything separately for production. Memcache and load balancing is built-in to App Engine. Database is replaced by the high replication data store. Web server is out of the question because handling urls is what App Engine does best, everything is built around handling urls (web server like). Cron jobs, task handlers and back-end instances are all triggered and handled via url handlers. All in all, this brings an application development paradigm targeting an execution environment in the cloud.

Since Apps do not get a virtual machine in the cloud on which you can install anything apps wish, applications need to stick to ground rules, use the tools and community guidelines to be fast, reliable and scalable. Otherwise Apps will not be able to cut it. Serving web pages, performing offline tasks like scheduled reports, upgrades to data store models, all require that Apps stick to the rules. For example if an app breaches its memory limit it is terminated. Web apps run on process instances rather than machine instances. These process instances are one of the core enablers for scaling with requests. Instances are of different types with different capacities for memory, CPU, web request/response size, memcache size and offline data storage including logs. Instances can go from 128MB 600MHz to 1024 MB 2.4GHz. Choosing these options directly impacts your app's performance. Billing depends on, where applicable, the services in which you overrun your quota above what you already paid for. 

On a lighter note the billing mechanism can be summed like this. You rent a Ferrari. But then you pay extra 10 cents for every turn of each tyre if you go over 40 miles per hour. And you pay 20 cents if you do a reverse and 15 cents for every oscillation of the windscreen wiper. Every now and then there is a chance that your Ferrari will be terminated and restarted remotely.  The point is, billing and execution environment can be frustrating at first (possibly eternally) if not understood correctly.

The data store will experience contention if an App throws quite a lot of stuff into it sequentially. Contention leads to timeouts. But, App Engine retires data store ops as such. But it is better to have a retry in app logic too with a back off mechanism. This is officially advised too. Data models are really flexible. But queries are learned during development and App Engine readies indices before hand. 

Task recovery mechanisms in case of failure take priority one, if it was not already that. Servers in warehouses have a higher failure probability than servers in house. This is mainly due to the sheer numbers involved. If you have 10 high end machines in-house the chances on one failing within a year could be remote. But on a cloud data center with 10000+ servers the probability of servers going down or needing replacement/maintenance just goes up. As a start up noticed over years that their Amazon instances' life span is on average 200 days! What about App Engine then? We are dealing with process instances so disruptions are pretty much normal and immediate. Instances especially back ends will be terminated without much warning. So either way having your processes running on machines/instances somewhere far away means that there will be disruptions. Apps can register for shutdown event handlers to do something before going down. Also there is no guarantee that this will be triggered. 

Again if Apps are running a huge task say collecting information from the web to process offline and then deliver to the client later; Apps need to address disruptions in the sense that apps need to be able to recover from the point where a huge task failed due to a cloud infrastructure problem. This makes you do two things in particular on App Engine, breakup tasks for independent execution and handle breaks using the platform mechanism. App Engine allows this with Tasks and Task queues. The idea is Apps must break a huge task into manageable units which will be run independently. Otherwise apps risk continuous disruptions from quota overrun such as memory limitations or request timeouts or data store contention. One or the other. 

One robust solution is to have a task chains. Tasks will do a small amount of work and before finishing, queue the next task to be executed. Failed tasks in a queue will be re-tried. App Engine prevents triggering the same task over and over again by using Tombstoned task names. Simply, it remembers a task name say for a day and does not allow a task to be queued in the same name. So imagine your taskA triggered taskB and taskA failed for some reason without spawning the rest of the tasks taskC and taskD. When it comes back online it won't be able to add taskB. taskB will be executed from the queue as and when its time comes. There are a host of retry mechanisms like token buckets and back offs.

Tuesday 26 November 2013

Google AppEngine BackEnds in Python

AppEngine backends allow us to perform long term, memory and processor intensive work. As the name suggests these are for back-end type work loads. Say for example you are crawling and scraping websites to collect price points from multiple sources to serve later or just doing a report. These types of tasks can be done on the back end. AppEngine backends are separate instances from the front end instances. In terms of quotas, this means a lot. Instead of clubbing all your work on the front-end instances, you can do work on the back end and save hours on the front end. 

An important thing to bear in mind is that AppEngine backends share URL and code with your main app. Backends are part of your app but seperate instances. They are accessible via their names plus the main front end app as shown in the screenshot below. All the urls that your front end app exposes are available to the backends too. For example if your AppName is backendstest2013, you access the app as backendtest2013.appspot.com. And, if you have a back end named backendone, you trigger/invoke the urls on the backend as backendone.backendtest2013.appspot.com/<same URL here>

There are two types of back-ends dynamic and resident. Dynamic backends are started on the first request and stay alive for the request. After some idle time, they are taken down. Resident backends are always on. 

Here is an example of a python backend for AppEngine. A zip of source code is available here. The folder structure for this example is shown below. Project folder is backend_trial.

The App takes two URLs "/" and "/scripttest". The handlers are defined in app.yaml as shown here
There is a controller servlet like handler defined in backendtrial.py. This handler for the main page or default app page just shows some settings. The handler for "/scripttest" inside backendtrial.py is shown below.
AppEngine backends for python are configured in backends.yaml. The file is shown below.

We declare one instance of class B1. It is dynamic and publicly available for access from outside the main App. Class puts limits on your CPU and memory power. B1 is least with 128MB and 600MHz. Class B8 has 1024MB and 4.2GHz. 

The code for handling URL "/scripttest" is shown below. We will access this from both the app instance and back end instance. The code just prints the backend name from the backends.yaml file, if the code is executed from a backend.
Upload the app using command
$ appcfg.py update backend_trial

Upload backend info using command
$ appcfg.py backends backend_trial update

Test the main app and back end by Accessing the app using URLs
1) Main App
2) Backend call
Source for the project is available ->> here

Thursday 24 October 2013

A coder post : Link inversion traversal to scan tree in constant space

This is a coder's post and discusses a binary tree traversal technique.

Binary tree is a popular data structure and there are a couple of popular tree traversal techniques. One based on recursion. This includes pre-order, in-order and post-order traversals. The second by using an auxiliary data structure to travel the tree level by level. A pre-order traversal visits a node first then its left child and then its right child. For the example tree below, we visit in pre-order NY then Chicago then Seattle. If it was post order we would see the nodes/places as we travel the tree in this order Chicago, Seattle, NY.



A recursive pre-order traversal code is shown below. i.e we visit a node first then, its left and right children in that order.
The second technique uses a stack as an auxiliary data structure to hold the next nodes to visit. This algorithm pops a node of the stack. Visits the node and pushes its children onto the stack; right child first. To start off push the root of tree on to the stack. For the example tree, we push NY onto the stack. Then, until the stack is empty we pop a node i.e NY here off the stack. Visit NY and then push Seattle and Chicago onto the stack. And repeat. You get the idea.

Recursion has memory, performance overhead for recursive function calls. The auxiliary data structure also has memory overhead. Link inversion traversal goes through the tree as if the links form a solid wall. i.e Imagine walking through your home with one hand always touch the wall. You would have followed the wall(s) one after the other. Similarly link inversion starts from the root walks through the links as if they are walls. An example shown below starts at New York and ends at New York. The exclamation shows the visits. Each node is visited thrice. One each for the pre, post and in-order concepts. (Refer the code below).



Link inversion traversal, uses the tree pointers themselves to store backtracking  information. i.e while going down the tree the links are reversed to aid back track instead of an additional data structure. The links will be restored on the way back through the tree thus leaving the tree as it was. Link inversion traversal needs a couple of extra node references. Also, a bit field on each node to signal the direction we went after first looking at the node. So link inversion generally is referred to as scanning the tree in constant space.  A link inversion code snippet is shown below. The full code for the post is available at the end.


Profiling performance: Link inversion traversal is definitely faster than a pre-order traversal.
Input: A balanced binary search tree of 300+ city names. The link inversion code shows up faster on Netbeans profiler. Notice the recursive time and link inversion time in the screen shot below.


Source code for Link inversion traversal is available at the link below:
https://drive.google.com/folderview?id=0BxhHg0qy5gi4dTlWUC1KU0RUUUU&usp=sharing

References:
1) Data structures and their algorithms by Harry R. Lewis & Larry Denenberg.



Thursday 12 September 2013

Big Data Analytics: Using Python UDFs for better managing code and development

Apache Pig is a data flow scripting language. We specify how we want the data to be grouped, filtered etc. It generates hadoop mapreduce jobs corresponding to pig statements. But Pig does not have functionalities of a general purpose programming language. User defined functions help by allowing us to extend Pig. We can specify our data manipulation along with Pig operators using UDFs. For example if we are filtering data, we can write a UDF to do this our way. Then, ask Pig to use it by saying FILTER by. In a previous post we saw how to write User defined functions in Java.  As mentioned in the post, since Hadoop and pig are already in Java it make sense to write a UDF in Java. Also it was mentioned that writing a UDF in Python could mean less effort and better code management.  

For Java we create a jar file which contains our code for the UDF. We register that jar with Pig. As the number of UDFs goes up and changes become frequent even in development stage this adds to the effort. Changing the code, testing, jar it, then see how it works with pig. This could end up taking a lot of time.(Refer the previous post).

To save time on development iterations, we can embed Apache pig in python scripts. We have seen this too. UDFs can also be in Python which make things a lot easier. There is our pig script embedded in a python file. UDFs can be in the same file or as seperate modules in the same project. The net effect is that you avoid the overhead of heading over to another project to change and jar your UDF. There is also the case of specifying schemas of your UDF's output. This is extra code in Java but, in python it boils down to annotations. Basically less code to write and maintain. Deployment also becomes easier. Python UDFs and simplicity involved are demonstrated below.

There are two ways to use python and pig.

Examples: We have a dump of data from seismic sensors. We need to find all the locations where there has been an earthquake of magnitude > 5 and we want the number of such quakes over the data. We filter data a UDF and we use another UDF to align data properly.  Python UDFs are used are for sake of demo. The full code for the project is available here. Notice that everything is under one project. In the code there are two folders QuakeDataRunner and QuakeDataRunner2 which demonstrate both these approaches.

A) One is to seperate the python (UDF) and the Pig script that used the python UDFs.
In this case, import the UDF file into the Pig script using jython. Pig uses the internal Jython engine for this purpose. The files are shown below. First the Python UDF is as follows


The Pig script which uses the UDF above is as follows
Run the code as follows
If you have pig in your PATH you can run it as pig -x <pig_file> from the folder.

B) Embedding Pig script in Python
Here also the program structure does not change. All that needs to be done is to use the Java wrappers for Pig in Python. This code is also run with pig command. The UDF file is the same as above, but the main program is a Python script with Pig as follows

Run the code as follows. As before you can add pig to your PATH to avoid referring to the pig executable in every command.


Click for Code used this post

References:
1. Programming Pig by Alan Gates.
2.Embedding Pig in scripting Languages by Julien Le Dem.

Sunday 25 August 2013

Graph Databases: Analysing a wide variety of datasets, relationships, events, behavior - Part 1

Graph databases allow us to analyse and query connected data. Doing the same with relational databases although possible becomes cumbersome. Relationships are at the centre of Graph modelling and are themselves data. Consider the information that 'Charlie called Betty over phone'. We can model this using relational data or others. The question always when data modelling is what queries do you want to support. If we expand the above example to include sms, emails (with CC, BCC) and then we want to find who called Charlie and email'ed Betty from the Finance department to which they returned the call/email? The relations emailed, called, sms'ed are easy to model in graphs. This type of data modelling is applicable to real world scenarios such as communication analysis, behavior analysis, networks, asset management and social networks. 

Another example on social networks. Suppose we want to find who are friends with Charlie, who like Heineken, watched the movie 'Avatar' and also friends with  Betty? This is where Graphs are better at modelling the data. In a graph database you deal with entities/nodes, relationships and attributes. Both nodes and relationships can have attributes. Attributes are simply key-value pairs. Have a look at the code that creates the example graph below in this post. Based on your query you walk the graph to find nodes that satisfy your criteria. Social networks such as Facebook, Google Plus and Linkedin use such datasets to suggest connections, recommendations and serving ads.   

The following is an example of a small graph of 10 people, their likes and connections to one another. We use Neo4j graph database and its query interface Cypher for this example. The entities involved are listed below.

a) List of friends: Hari, Catherine, Ted Lawson, Brandon Bringle, Barney Bringle, Jamie Lawson, Vickie Lawson, Harriet Bringle, Julie Crawford, Mathew Alan. 

b) Other entities:
Hobbies: Coding, Walking
Food: Burger, Icecream
Beverage: Heineken, Coke
Sport: Badminton, Rugby



c) Sample relationships for our small social network

Catherine LIKES Badminton, Coding, Walking
Catherine is Friends with Jamie and Hari on the network.

Ted Lawson LIKES rugby, Ice cream 
Ted is Friends with Vickie, Catherine and Brandon.
......etc. You get the idea. The Cypher code to create this graph is listed at the end.

d) We can run queries such as "Who are friends with Hari and like Coding?".
Who are friends with Hari, Mathew and like coding and Icecream?

Screen shots: a) Creating the database on Neo4j


b) Running a query1 mentioned above

c) Running query2 discussed above

Sample Code: a) Creating the database

CREATE (Hari {name: "Hari", born: "1980"}),
(Cath {name: "Catherine", born: "1982"}),
(Ted {name: "Ted Lawson"}),
(Bran {name: "Brandon Bringle"}),
(Barney {name: "Barney Bringle"}),
(Jamie {name: "Jamie Lawson"}),
(Viki {name: "Vickie Lawson"}),
(Harriet {name: "Harriet Bringle"}),
(Julie {name: "Julie Crawford"}),
(Matt {name: "Mathew Alan"}),

(Hobby1 {name: "Coding", type: "tech"}),
(Food1 {name: "Veggie Beany Burger", type: "food"}),
(Beverage1 {name: "Heineken", type: "beverage"}),
(Sport1 {name: "Badminton", type: "game"}),

(Hobby2 {name: "Walking", type: "activity"}),
(Food2 {name: "Black Current", type: "icecream"}),
(Beverage2 {name: "Coke", type: "beverage"}),
(Sport2 {name: "Rugby", type: "game"}),

(Cath)-[:LIKES]->(Sport1), (Cath)-[:LIKES]->(Hobby1), (Cath)-[:LIKES]->(Hobby2),
(Cath)-[:FRIEND_WITH]->(Hari), (Cath)-[:FRIEND_WITH]->(Jamie),

(Ted)-[:LIKES]->(Sport2), (Ted)-[:LIKES]->(Food2), 
(Ted)-[:FRIEND_WITH]->(Viki), (Ted)-[:FRIEND_WITH]->(Cath),  (Ted)-[:FRIEND_WITH]->(Bran),

(Bran)-[:LIKES]->(Food1),  (Bran)-[:LIKES]->(Food2), (Bran)-[:LIKES]->(Beverage1),
(Bran)-[:FRIEND_WITH]->(Hari), (Bran)-[:FRIEND_WITH]->(Ted),  (Bran)-[:FRIEND_WITH]->(Harriet),

(Barney)-[:LIKES]->(Beverage1), (Barney)-[:LIKES]->(Beverage2),
(Barney)-[:FRIEND_WITH]->(Hari), (Barney)-[:FRIEND_WITH]->(Ted),  (Barney)-[:FRIEND_WITH]->(Harriet),
(Barney)-[:FRIEND_WITH]->(Cath),

(Jamie)-[:LIKES]->(Hobby1), (Jamie)-[:LIKES]->(Food2),
(Jamie)-[:FRIEND_WITH]->(Hari), (Jamie)-[:FRIEND_WITH]->(Viki),  (Jamie)-[:FRIEND_WITH]->(Harriet),
(Jamie)-[:FRIEND_WITH]->(Julie),

(Viki)-[:LIKES]->(Hobby1), (Viki)-[:LIKES]->(Food2), (Viki)-[:LIKES]->(Food1),
(Viki)-[:FRIEND_WITH]->(Hari), (Viki)-[:FRIEND_WITH]->(Jamie),  (Viki)-[:FRIEND_WITH]->(Harriet),
(Viki)-[:FRIEND_WITH]->(Julie),

(Harriet)-[:LIKES]->(Food2), (Harriet)-[:LIKES]->(Food1),
(Harriet)-[:FRIEND_WITH]->(Matt), (Harriet)-[:FRIEND_WITH]->(Jamie),  (Harriet)-[:FRIEND_WITH]->(Julie),

(Julie)-[:LIKES]->(Food2), (Julie)-[:LIKES]->(Food1),(Julie)-[:LIKES]->(Hobby2), (Julie)-[:LIKES]->(Beverage2),
(Julie)-[:FRIEND_WITH]->(Matt), (Julie)-[:FRIEND_WITH]->(Jamie),  (Julie)-[:FRIEND_WITH]->(Harriet),

(Matt)-[:LIKES]->(Food1), (Matt)-[:LIKES]->(Beverage2),
(Matt)-[:FRIEND_WITH]->(Harriet), (Matt)-[:FRIEND_WITH]->(Julie);


b) Query 1
START hari=node(*) MATCH (hari)<-[:FRIEND_WITH]-(friends), (friends)-[:LIKES]->(Hobby)
WHERE Hobby.type! = 'tech'
RETURN DISTINCT friends.name;


c) Query 2
START hari=node(*) MATCH (hari)<-[:FRIEND_WITH]-(friends), (friends)-[:LIKES]->(Hobby), (friends)-[:LIKES]->(Food), (friends)-[:FRIEND_WITH]->(target)
WHERE target.name! = "Mathew Alan" AND Food.type! = "icecream" AND Hobby.name! = "Walking"
RETURN DISTINCT friends.name; 


References:
1) www.neo4j.org




Friday 9 August 2013

Celatum Pro is now on Google Play: The power of encryption in the palm of your hand for 2 Dollars


Android app on Google Play
Celatum Pro is the commercial version of my android app Celatum. Celatum Pro offers a lot more than the free version.  Check it out ;-)

A) Celatum Pro feature List

1) Choose from range of encryption algorithms and key sizes during setup.
2) Type and save encrypted notes.
3) Send/Communicate your encrypted data to your friends as GMail attachments.
4) Your communication i.e attachments are protected using one-time session keys.
5) Maintain a list of trusted contacts with whom you can communicate securely using Celatum Pro.
6) Send your Celatum Pro handle to friends so that they can send you secure notes.
7) Receive your Friend's Celatum Pro details so that you can send them secure notes.
8) Change your password.
9) Erase your encryption keys and delete your data files.

B) What is better in Celatum Pro compared to the free version?

1) Celatum Pro allows you to select from a range of encryption algorithms and key sizes during setup. Check the available algorithms, key sizes below.

2) It also sports a better UI. Notable is the fluid bouncing list.

3) Works on tablets*. 

4) Celatum Pro uses the original Bouncy Castle Open Source Java Security Library for encryption. The free version of Celatum used Google's (Android) modified version of Bouncy Castle. i.e this uses the UN-modified version of Java Security library from Bouncy Castle.
 
C) The security options available are

I) Encryption algorithms available

1) AES
2) TwoFish
3) Serpent

These work on 256 bit keys within Celatum Pro.

II) Hash algorithms
SHA256, SHA 384, SHA 512, RIPEMD256, RIPEMD320, Whirlpool, MD5

III) RSA Key sizes
1024, 2048, 3072, 4096

D) Celatum Pro support email is celatum.pro2013@gmail.com.

E) "Celatum Pro How To" Videos are available on Youtube at channel
http://www.youtube.com/playlist?list=PLdlYcCzFdEQZOgv9KLCAVvkewkHnxF515

F) Celatum Pro FAQ is at
https://docs.google.com/file/d/0B45t0ZTBY2rASEdWX3VYTkRmUEk/edit?usp=sharing

G) Celatum Pro License Agreement is at
https://docs.google.com/document/d/1XnOZ58nLqtmgmm-BWmfLdG9UZe-Vu9DbpAUESTjxaA8/edit?usp=sharing

H) Note:

* Testing on tablets was done using simulator and all functionalities were found satisfactory.
* Celatum is priced differently in different countries.
* Currently only English version is available for the app. Other language packs will be made available soon.

I) Want to buy Celatum Pro Source Project from the developer? Get in touch on the email provided!

Screens:






Keywords: Encryption, Celatum, notes, secure, send, rsa, aes, twofish, ripemd, sha, security, android, app, communication.

Monday 29 July 2013

Defensive coding : A Little effort to avoid a bug

Programmer's version of Murphy's law: Any code that can go wrong, will go wrong.

Defensive coding means you put a little bit of effort to think how your code will be used and when/where it can fail. Think of interfaces to your code. Then avoid the buggy situations. You think of where your code can fail while you write it. Test suites can help once your code takes some shape and also when you make changes to your API. 

An example: Suppose your code is returning an array or a collection. In the following code the list to be returned is initialised to an empty list. This helps to avoid a NullPointerException.How?

class SomeName
{
   ArrayList<String> values;
   
  public SomeName()
  {
     //List with 0 elements
     values = new ArrayList<String>();
  }//constructor 

   public void doWork()
  {
     //We do some work which generates an array.
   }//

   public String[] getResults()
   {
        return values;
   }//
}//class

If the list object was not initialised and it was not created, the calling code could fail. Suppose the calling code did this. The null value returned (if it was never set) would fail on list.size(). list would be null.

//Calling code
list = someNameObj.getResults();
for(int i = 0; i < list.size(); i++)
{
   //..............
}//for


If this is buried in tons of code with re-throws (does happen) then we have lost time tracking it. As trivial as this might seem, this scenario comes up frequently during reviews and even in standard APIs. A small foresight can avoid a nasty bug.  Defensive programming helps save time as you move forward in your project.

Sunday 7 July 2013

Securing data on AppEngine datastore and other clouds. Encryption? Then what?

Recently there is a lot of discussion on programming sites about encrypting data on cloud environments like App Engine. This post is a Sunday evening rumination on the topic and a small experiment on the concept. Test code is available at the end of the post. 

Assuming the following,

1) You have decided to bear the extra computing to store encrypted data in your AppEngine datastore table.
2) You have decided to bear the extra space and computing that may be needed to subsequently process your encrypted data. After all you just don't want to store the data. You want to be able to access it on some query.

If we encrypt the stuff we are putting on to AppEngine like infrastructure, how do we compare data on tables to answer queries? A thought is to hash the data bytes too and store it in a shadow table. Then use a one dimensional pattern recognition algorithm to compare.



Hashing your data: Suppose you have 10 bytes of data. You encrypted it and put it on your AppEngine table. Hash the 10 bytes taking 2 consecutive bytes at a time. i.e if your data is say "abcdef" hash(a,b) then hash(b,c) and so on. Store this sequence of hash on another table possibly with a foreign key  reference. If you use MD5 hashing then, it mean you end up with 9 * 16 bytes of hash for 10 bytes of data. But as per the assumption you have decided to go this far to protect data on your cloud. This type of hashing (using signature of) 2 consecutive characters/bytes comes in Donald Knuth's art of programming book.


Comparing data now comes down to comparing the hashes generated above. Hashes of the same 2 bytes should match. So how similar are the two hashes can be answered by one dimensional pattern recognition? After hashing the original table data and input data, we have 2 byte sequences with negative, positive and zero values. They can be compared by finding the maximum contiguous sum on each byte sequence.  If the two sums are same or near then the underlying data is also same or similar. This throws up the question of what does 'near' mean.

A small test: We have two strings. We generate a hash for each of the strings. The hash is a sequence of bytes. It is obtained by applying MD5 hash on every 2 consecutive bytes of the strings. From the output we can see that this could work. But, how to compare two byte sequences and its implementation on the underlying database also comes into question.

Snapshot of hashing code is here. This type of using signatures for document matching is mentioned in Knuth's art of programming.

Snapshot of finding maximum contiguous sum in a byte array

Complete source code is available here.
https://drive.google.com/folderview?id=0BxhHg0qy5gi4SzVWdXJfS2NGOWs&usp=sharing

Sample input strings
Sample output
So maybe we can have a secure library for Appengine? What about comparison operators like < & >?

Thursday 4 July 2013

Android UI: Scrollable Edit text with OK Cancel Buttons at the bottom

This post is part of the Android UI series. While designing my app I was searching for a GUI which had a) an edit box where user can type in stuff. More than one line i.e a multi line edit text box. b) The edit text box is scrollable c) A bottom panel or footer of 2 buttons which can say OK or Cancel.

One unique requirement I have seen on stackoverflow is this edit text box to grow as much as possible but stops short of blocking the panel of buttons. i.e the buttons will be shown irrespective of the amount of text in the box.The UI needed is as show below.

a) UI with buttons panel. Edit Text has not filled the view adequately.

b) UI with scrollable Edit Text and Button panel. Text box has a lot of content but will not block the Button panel.

The layout is as follows
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
              xmlns:tools="http://schemas.android.com/tools"
              android:layout_width="match_parent"
             android:layout_height="match_parent" >
    <LinearLayout
          android:orientation="vertical"
          android:layout_width="fill_parent"
          android:layout_height="wrap_content">
         
        <LinearLayout
              android:layout_width="fill_parent"
              android:layout_height="wrap_content"
            android:orientation="horizontal">
       
        <TextView
               android:id="@+id/note_taker_title_prompt"
               android:layout_width="wrap_content"
               android:layout_height="wrap_content"
               android:paddingLeft="6dip"
              android:paddingRight="6dip"
              android:paddingTop="6dip"
               android:text="@string/note_taker_title_prompt" />
           
            <EditText
                android:id="@+id/notetaker_title_et"
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:layout_marginBottom="4dp"
                android:layout_marginLeft="4dp"
                android:layout_marginRight="4dp"
                android:layout_marginTop="16dp"
                android:gravity="center"
                android:singleLine="true" />
        </LinearLayout>
       
   <LinearLayout
          android:orientation="vertical"
          android:layout_width="fill_parent"
          android:layout_height="wrap_content">
       
        <EditText
            android:id="@+id/notetaker_text_et"
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:scrollHorizontally="false"
            android:scrollbars="vertical"
            android:layout_weight="1"
            android:layout_marginBottom="4dp"
            android:layout_marginLeft="4dp"
            android:layout_marginRight="4dp"
            android:layout_marginTop="16dp"
            android:singleLine="false" />
   
            <LinearLayout
                      android:layout_width="fill_parent"
                      android:layout_height="wrap_content"
                      android:orientation="vertical"
                      android:gravity="center">
              <View
                      android:layout_width="match_parent"
                   android:layout_height="1dip"
                   android:background="#393b3e"
                   android:layout_alignParentTop="true"/>
                 <LinearLayout
                    android:layout_width="fill_parent"
                    android:layout_height="wrap_content"
                    android:gravity="center_vertical"
                    android:orientation="horizontal">

                    <Button
                        android:id="@+id/notetaker_cancel_btn"
                        style="?android:attr/borderlessButtonStyle"
                        android:layout_width="wrap_content"
                        android:layout_height="wrap_content"
                        android:layout_gravity="center_horizontal"
                        android:layout_weight="1"
                        android:gravity="center"
                        android:paddingLeft="6dip"
                        android:paddingRight="6dip"
                        android:paddingTop="6dip"
                        android:text="@string/get_btn_back" />

                     <View
                        android:layout_width="1dip"
                        android:layout_height="match_parent"
                        android:background="#393b3e"
                        android:layout_centerHorizontal="true"/>

                    <Button
                        android:id="@+id/notetaker_save_btn"
                        style="?android:attr/borderlessButtonStyle"
                        android:layout_width="wrap_content"
                        android:layout_height="wrap_content"
                        android:layout_gravity="center_horizontal"
                        android:layout_weight="1"
                        android:gravity="center"
                        android:paddingLeft="6dip"
                        android:paddingRight="6dip"
                        android:paddingTop="6dip"
                        android:text="@string/note_taker_save_btn" />

                    </LinearLayout>
                   
                <View android:layout_width="match_parent"
                         android:layout_height="1dip"
                         android:background="#393b3e"
                         android:layout_alignParentBottom="true"/>   
        </LinearLayout>
        </LinearLayout>   
</LinearLayout>
</RelativeLayout>

Android UI: Flat Halo dark buttons like Sony Xperia ion

This is also part of the Android UI series. Here we see how to get the look and feel of the black Halo flat buttons seen on Sony ion or Xperia phones. Quite good looking buttons.

The objective UI is as follows. Notice the Back button at the bottom.

The layout xml for this button is as follows.
<?xml version="1.0" encoding="utf-8"?>   
        <LinearLayout
                android:id="@+id/about_footerlayout"
                android:layout_marginTop="3dip"
                android:layout_height="wrap_content"
                android:orientation="vertical"
                android:layout_width="fill_parent"
                android:gravity="center">
                <View
                    android:layout_width="match_parent"
                    android:layout_height="1dip"
                    android:background="#393b3e"
                    android:layout_alignParentTop="true"
                />
                <Button
                    android:id="@+id/about_back_btn"
                    style="?android:attr/borderlessButtonStyle"
                    android:layout_width="match_parent"
                    android:layout_height="wrap_content"
                    android:layout_gravity="center_horizontal"
                    android:gravity="center"
                    android:paddingBottom="6dip"
                    android:paddingLeft="6dip"
                    android:paddingRight="6dip"
                    android:paddingTop="6dip"
                    android:text="@string/get_btn_back" />
        </LinearLayout>

Android UI: Activity to show a license agreement

This post is also part of the Android series. Here we see a way to show license agreements on android UI with OK Cancel or Accept Reject buttons at the bottom. The license text ofcourse is scrollable and must occupy the left over space. We also throw in a check box for the user to disable the screen after reading/accepting the agreement.

The end UI looks like this
The layout Xml file is as follows. Notice the textview inside scroll view and the checkbox. There are suggestions on web to not use LinearLayout in a nested fashion. But, this one works quite well and is reasonably fast on a virtual device too.

<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
        xmlns:tools="http://schemas.android.com/tools"
        android:orientation="vertical"
        android:layout_width="fill_parent"
        android:layout_height="match_parent">
  
        <ScrollView
                android:id="@+id/license_agree_scrollView"
                android:layout_width="fill_parent"
                android:layout_height="wrap_content"
                android:layout_weight="1"
                android:fillViewport="true">
           
                    <TextView
                        android:id="@+id/license_agree_textview"
                        android:layout_width="match_parent"
                        android:layout_height="match_parent"
                        android:paddingLeft="6dip"
                        android:paddingRight="6dip"
                        android:paddingTop="6dip" />
            </ScrollView>
   
    <LinearLayout
        android:orientation="vertical"
        android:layout_width="fill_parent"
        android:layout_height="wrap_content">
   
          <LinearLayout
            android:orientation="vertical"
            android:layout_width="fill_parent"
            android:layout_height="wrap_content">
            <View
                    android:layout_width="match_parent"
                    android:layout_height="1dip"
                    android:background="#393b3e"
                    android:layout_alignParentTop="true"/>
            <TextView
                android:id="@+id/license_agree_prompt"
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:paddingLeft="6dip"
                android:paddingRight="6dip"
                android:paddingTop="6dip"
                android:text="@string/license_accept_prompt" />
            <CheckBox
                android:id="@+id/license_agree_check_noshow"
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:text="@string/license_check_no_show_again" />
        </LinearLayout>
       
        <LinearLayout
                android:layout_width="fill_parent"
                android:layout_height="wrap_content"
                android:orientation="vertical"
                android:gravity="center_vertical">
                <View
                    android:layout_width="match_parent"
                    android:layout_height="1dip"
                    android:background="#393b3e"
                    android:layout_alignParentTop="true"/>
                
                        <LinearLayout
                        android:layout_width="fill_parent"
                        android:layout_height="wrap_content"
                        android:layout_weight="1"
                        android:gravity="center_vertical"
                        android:orientation="horizontal">   
                        
                            <Button
                                android:id="@+id/license_agree_reject"
                                style="?android:attr/borderlessButtonStyle"
                                android:layout_width="wrap_content"
                                android:layout_height="wrap_content"
                                android:layout_gravity="center_horizontal"
                                android:layout_weight="1"
                                android:gravity="center"
                                android:paddingBottom="6dip"
                                android:paddingLeft="6dip"
                                android:paddingRight="6dip"
                                android:paddingTop="6dip"
                                android:text="@string/license_agree_reject_btn" />

                             <View
                                android:layout_width="1dip"
                                android:layout_height="match_parent"
                                android:background="#393b3e"
                                android:layout_centerHorizontal="true"/>
                            
                            <Button
                                android:id="@+id/license_agree_accept"
                                style="?android:attr/borderlessButtonStyle"
                                android:layout_width="wrap_content"
                                android:layout_height="wrap_content"
                                android:layout_gravity="center_horizontal"
                                android:layout_weight="1"
                                android:gravity="center"
                                android:paddingBottom="6dip"
                                android:paddingLeft="6dip"
                                android:paddingRight="6dip"
                                android:paddingTop="6dip"
                                android:text="@string/license_agree_accept_btn" />

                         </LinearLayout>       
             <View
                    android:layout_width="match_parent"
                    android:layout_height="1dip"
                    android:background="#393b3e"
                    android:layout_alignParentBottom="true"/>   
        </LinearLayout>
    </LinearLayout>
</LinearLayout>


Android: How to show an activity as a dialog

This post is part of a series on Android UI nuances. After releasing my app on playstore I thought I would add my answers to the contributions on stackoverflow.

How do you make an activity behave like a dialog? For example the figure shows a dialog which is an activity.

a) Have your activity ready.
b) On your app's manifest file change the activity as shown below
c) Where DialogTheme is defined in your styles.xml file as follows

That is all to get an activity to behave like a dialog.

Saturday 29 June 2013

Celatum for Android : Secure notes and GMail attachments

Celatum is, an Android application that I developed, is now available at Google Play Store. Celatum allows you to store and send secure notes using GMail attachments. Check it out ;-)

Get it on Google Play

* Read the document on how Celatum works
https://docs.google.com/file/d/0BxhHg0qy5gi4SWRiQ0JwRXU2RWc/edit?usp=sharing
* An FAQ is here
https://docs.google.com/file/d/0BxhHg0qy5gi4cnliX3YxV2s5UEU/edit?usp=sharing

Saturday 8 June 2013

Hadoop Streaming in Python : Nuances & Sample

Hadoop streaming allows us to write mappers and reducers in languages like python, ruby and even C. Using python, the amount of code involved in writing a mapper and reducer is less compared to Java. Also, the streaming has some nuances to watch for.

1) Streaming works like Unix command line pipes. i.e data is written and read in text. Apart form the first input and last output, standard input and standard output are used. i.e your mapper in python does a print on its standard output and the reducer gets it on its standard input.

2) The mapper and reducer work on splits. Splits are the chunk of data that a mapper is given. It need not be the same as the HDFS chunk size. From observation it is seen that, this value changes as the program proceeds.

3) Streaming works on lines of input rather than (key, value) pairs. Although the notion of (key,value ) pairs can be used in streaming too. If you don't specify the (key, value) pairs and also the separator, defaults will be used. These are tab for the separator and the whole input line for key and an empty value. What this means to us developers is that, if you don't use/impose the notion of key value pairs, you do the sorting within a group in your reducer code. i.e when the whole line is used as key, your reducer may not necessarily receive the shuffled & sorted input it expects. When you write the mappers and reducers in Java your code actually receives the key value pair. Here you won't. So watch out for the sorting.

For this use the following parameters when running your streaming program.
-D stream.map.output.field.separator=, means "," is the separator used. Default is tab.
-D stream.num.map.output.key.fields=4 means all of the output line until the 4th separator is used as key and the rest as value.

4) Re-order the map outputs with the key parameter above to ensure grouping of input to reducer (as you want it to be).

5) Within your python code add #!/usr/bin/python at the very beginning to tell the program which interpreter to use for the mapper code and reducer code.

6) Check the mapper and reducer using the command line and sample data. If it works there, it should work on Hadoop. For example, this checks the working of a mapper and reducer written in python
cat ./datagen/small-earthquake.data | ./stream/mapper.py | sort | ./stream/reducer.py


Sample scenario: A 2GB file of earthquake information is available. Here, we write a mapper to filter the input and a reducer to do a count on the data it receives. File includes the following columns in order year, city name, quake magnitude, latitude, longitude. The mapper filters the input between years 1960 and 2000. The reducer counts the number of quakes for each city in that data.


The mapper  and reducer code in python are shown below

Notice that, the mapper rearranges the order of columns to <city, year> and tab is used as the separator since this is the default one. Plus, since city is the first column in map output, map output will be grouped on the basis of the first column i.e up until the first delimiter tab. Refer the nuances above.

Run the streaming command on Hadoop as follows:


hadoop jar ~/Development/hadoop-1.0.3/contrib/streaming/hadoop-streaming-1.0.3.jar \
-input /hadoop/streampython/large-earthquake.data \
-output  /hadoop/streampython/filtered \
-file ./stream/mapper.py \
-mapper 'mapper.py' \
-file ./stream/reducer.py  \
-reducer 'reducer.py'


Sample code: The git, eclipse project for this is available at this link. It also has a data gen script that, can generate 2GB plus test earthquake data.

https://drive.google.com/folderview?id=0B-oeIeog2xb7RWlpRDBROEdFT2c&usp=sharing

Screens:
a) Input files:

b) The SPLIT_RAW_BYTES for the map task.

c) Job Status with two reducers

Some Errors to watch out for:
1) You see the following on the job tracker -> Task tracker for you maps. Your maps seem to have been killed more times than allowed.


java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
 at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)


For myself, this was due to the fact that, I had not added  #!/usr/bin/python to my python files. Once added, there were no issues. There was no need to change permissions on any files.

2) Basic Hadoop Nuances:

a) Add the following property to your core-site.xml. This tells hadoop which location to use for its tmp folder. A location that does not get modified over re-boots.

<property>
<name>hadoop.tmp.dir</name>
<value>/home/harisankar/Hadoop/tmp</value>
</property>

b) Add the following two properties to your hdfs-site.xml. These tell hadoop where to store its name node details and also the data chunks. You can see the data chunks for your files here.

        <property>
  <name>dfs.name.dir</name>
  <value>/home/harisankar/Hadoop/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/home/harisankar/Hadoop/data</value>
</property>

Otherwise your datanode and namenode playup after a re-boot. You may see exceptions like, could replicate to only 0 nodes and Data node's Connection to namenode timedout after n tries. in the log files.

Saturday 18 May 2013

Knapsack 0/1 problem and algorithm: Implementation in Python, Dynamic programming and Memoization

This post is on the Knapsack algorithm which does the following. You have a list of items each with a value and weight. You have a total capacity as limit. You have to take as many items to increase value within the limit. A good example, similar to the one given by MIT Professor John Guttag (in open courseware) is, you break into a bank vault with a sack which can hold 100Kg of weight. You are faced with a pile of gold coins,  million dollar bonds, a huge stack of Marlboro cigarettes, stacks of red bull and Heineken. How much of each should you take  given your limit? A better example is you are going to Mars and your allotment in the shuttle is 400Kg and you have to take items that are valuable in that limit.

The Algorithm is recursive but, uses memoization to cut short the search through the solution tree. i.e for a pair (number of things remaining to take from, remaining capacity) i.e you have 5 more things to take from and weight limit is now 30Kg then, what do you take for this? If this is already solved in the recursion somewhere we use that, instead going down to that. This is where memoization comes in and Python dictionary is well suited for this. So, you have sub problems and find optimal solutions to those problems and solve the parent problem.

The result of the algorithm is a pair (V, I) where V is the accrued value of items we decided to take. I is the list of items we decided to take.

Knapsack 0/1 Algorithm Pseudo code:

knapsack(<L: list of items>, <W: weight limit now>, <S: solutions accumulated over sub-problems so far>)
{
        IF S is empty i.e first call THEN
           Initialize S the set/dictionary to hold solutions to sub-problems

       IF  combination of (L, W) is already in S THEN
           (V,I) for this sub-problem =  S(L, W)
       ELSE IF L is empty and W is 0 THEN
           (V, I) is (0, nothing)    
       ELSE IF we take the first item on list and its weight, when added, exceeds the limit THEN
              (V,I) = knapsack(L - first item at this call, W, S)
       ELSE
               /* What we do here is. We assume we take the first item. Then what is (V,I) for the remaining
                   walks. Also, what is the result (V,I) id we don't take the item at this point. Choose which
                   ever solution that maximises the value.
               */
               We assume we take the first item of the list.
                Let V1 = Value of first item + (V, I) for knapsack(L - first item, W - weight of first Item, S)
                We also find (V2, I) if we don't take the item.
                       i.e (V2, I) for knapsack(L - first item, W, S)
             
                IF V1 is better than V2 THEN
                       go with that result (V1, I)
                ELSE
                       go with the other result (V2, I)
       Add the corresponding result (V, I) to S
}

Sample with the following items. <Item Name, Weight, Value>

('Gold Biscuits', 20, 100)
('Heineken', 2, 30)
('Marlboro', 1, 10)
('Currency in $10 bills', 10, 10)
('Currency in 50$ bills', 45, 40)
('Million$ Govt Bonds', 1, 1000)
('Diamonds', 25, 80)
('Lays Chips', 1, 4)
('RedBull Cans', 20, 35)
('Predictions on WallStreet', 1, 200)
('Key Combination of Vault', 1, 200)
('Harry Potters List of real life Spells', 2, 1)

Sample Output: As you can see the program takes the things of value :))


Solution Python Code: Is available at
https://drive.google.com/folderview?id=0BxhHg0qy5gi4TkpuTHQ4OWZGYnc&usp=sharing

References:

1. MIT Open courseware on Introduction to Computer Science and Programming, Prof John Guttag