Objective : To Parallelise merge sort algorithm.
The part of merge sort that presents opportunity for parallel programming is the merge part. This takes two sorted arrays and merges them. The usual algorithm compares each element of the two arrays and writes them in order and finally writing the array with more elements, if any.
For parallel merging there is another algorithm, for two arrays A and B. Choose an element X of A so that it divides A into two almost equal parts A1 and A2. Find where X will fit into B (using binary search). Split B at this point into B1 and B2. Merge A1 and B, A2 and B2 recursively. The advantage is that, these two merges can be implemented by independant hardware such a multiple processors since the loop dependancy of the serial algorithm is removed in the parallel algorithm. The program will provide a visible speed up on Cilk plus for intel. But, on Java vm on my 2 core processor machine the speed up is overrun with the overhead for scheduling, Executor service in Java concurrency package.
The run times of parallel and serial merges are shown below. The overhead of Future, Callable, Executor in java concurrency package is coupled with the recursive calls.
The parallel merge is 10 times slower with two cores But, it is up to the jvm to parallelise. In Java, programmers use concurrency package for parallel executions whereas C++ has a number of opensource options for targeting multi core parallelism. The speedup approaches serial speeds depending on the judicious choice of when to stop the recurssion tree (ie the number of elements at which we switch to serial merge!).
The Java code part spawns a new thread for one of the merges described above and the calling thread continues with the current merge. We can use two threads for each of the sub-merges but, it adds to the overhead. The first approach is shown below with the latter commented.
This is a fork and join approach/pattern for parallel programming. We can also use a map pattern for parallel programming by dividing the array conveniently and merging at multiple stages using a reducer, maybe that will give a lot more direct speed up than fork-join pattern.
Reference: This technique for parallel merge sort originally
appearing in 'Structured Parallel Programming' from Morgan Kaufman. If interested read this book.