Friday 23 May 2014

Accomodate data via interning - Java

Software receives data from multiple sources and holds necessary portions of it in memory. Data may be read from disks and networks. More data can be fit into a program's memory if, only one copy of relevant data is stored. For example, lets say a software keeps track of vehicles and related information like number plates. Vehicles' number plates are registered in a state. The software does not need to have multiple copies of a Registered State object for each vehicle registered in the same state. If there are a 100000 vehicles registered in Colorado, then only one instance of the Registered State of Colorado is needed. If the Object's state does not change for the execution time, this saves space. This approach called Flyweight Pattern or interning is used to hold 'immutables' in a program. 

The following screen shows the profiling of the example described above. A collection of 100000 vehicles is built and all of them are set to Registered State (Colorado). On the profiler output we see that, once the data loading is finished there is only one Live Registered State representing Colorado. 

Profile Output:



Code for intern store
Here we use a synchronized map to hold weak references to an instance. Each class is again mapped to these 'maps of instances'. For the purpose of storing the objects in synchronized maps, we override the hashcode method for the data object. For this the program we use HashCodeBuilder from Apache libraries (commons-lang-2.4.jar).

Source for the sample is here

This is the case with Python too. An example on the Python shell to count the active reference to a string object and object ids is shown below.



References
------------------
1) ACM Webinar on Taking the Big out of Big Data By Kate Matsurdia
2) Apache Commons library

No comments: