Friday 23 May 2014

Accomodate data via interning - Java

Software receives data from multiple sources and holds necessary portions of it in memory. Data may be read from disks and networks. More data can be fit into a program's memory if, only one copy of relevant data is stored. For example, lets say a software keeps track of vehicles and related information like number plates. Vehicles' number plates are registered in a state. The software does not need to have multiple copies of a Registered State object for each vehicle registered in the same state. If there are a 100000 vehicles registered in Colorado, then only one instance of the Registered State of Colorado is needed. If the Object's state does not change for the execution time, this saves space. This approach called Flyweight Pattern or interning is used to hold 'immutables' in a program. 

The following screen shows the profiling of the example described above. A collection of 100000 vehicles is built and all of them are set to Registered State (Colorado). On the profiler output we see that, once the data loading is finished there is only one Live Registered State representing Colorado. 

Profile Output:

Code for intern store
Here we use a synchronized map to hold weak references to an instance. Each class is again mapped to these 'maps of instances'. For the purpose of storing the objects in synchronized maps, we override the hashcode method for the data object. For this the program we use HashCodeBuilder from Apache libraries (commons-lang-2.4.jar).

Source for the sample is here

This is the case with Python too. An example on the Python shell to count the active reference to a string object and object ids is shown below.

1) ACM Webinar on Taking the Big out of Big Data By Kate Matsurdia
2) Apache Commons library

Tuesday 20 May 2014

Java Non-blocking IO - Part 1

Non blocking Input Output package aims to increase performance and improve responses. In regular blocking IO you read from a socket's stream and block on the read() until you get data. In NIO, you register to be notified of events and respond only when that event occurs. Programs can avoid blocking and take action on being notified. NIO uses channels and 'a channel represents a connection to stuff like sockets, files etc and can perform multiple io operations like read, write'. Instead of an inputstream and outputstream we have a single channel to both. A profiling of NIO servers here shows that a design with NIO may not always be better than a normal multi-threaded blocking IO design for applications. Using NIO on a client like mobile devices may not (or may) be useful but for a server it may be different. Registering for events is done via 'Selectors'. Multiplexing channels is made possible via Selectors. A Selectable (multiplexable) Channel's registration with a Selector for events is represented by Keys in the Selector for different operations. When a bunch of operations on the SelectableChannel is available, the selector gives keys representing those operations.

To write a simple NIO server which accepts connections
1) open a ServerSocketChannel & set is as non-blocking
2) get the underlying socket for the channel just created & bind it to an address.
3) open a  selector
4) register the ServerSocketChannel with the Selector for a OP_ACCEPT operation. 

To handle different events
1) check the selectors for the number of ready operations (keys set)
2) if it is -1 continue to wait for an event.
3) Once there are keys to go with, retrieve the keySet which holds the keys whose channels are ready.
4) Loop through each key and handle the corresponding IO.
4a) Filter operations types by comparing the keys' ready operation code.
5) mark key as handled. (remove it)

To Write and Read from Channel:
Channels perform two-way communication and handle blocks of Data. These blocks of data are read 'into' and written 'out from' Buffers. Buffers have internal byte arrays to hold the data and can be retrieved.
Buffers have a current position from where data will be next read or written into. Flipping a buffer sets the limit as the current position and returns the position to the beginning. So if we want to use the data so far read into a buffer b and send the same data out using the same buffer, we need to flip it. This code just sends the data back.

The full code for the project is here. The client app in the project simply keeps track of time taken to get a response from the server.