Home Blog/Rants Downloads C++ Papers GitHub LinkedIn
Michael-Maniscalco.com
Pecking at the keyboard until something clever emerges
MSufSort 4 - update (fully parallel suffix array construction)
November 16, 2016

It took a lot longer than I wanted, although not long than I had anticipated, to produce a version of MSufSort which was fully parallel. There is still work to be done to make it worthy of 'Return of The King' status but I'm certainly happy that a lot of the road map has been achieved. There is a single component of MSufSort 4 remaining before the algorithm can clearly differentiate itself as best in class again. Specifically, induction sorting. This is a bit odd, really, considering that induced sorting is the single characteristic trait that every version of MSufSort has in common as a core algorithm. At it's heart the entire MSufSort family is a direct comparison, induced sort SACA. So, it is unexpected that it should be the only component of MSufSort 4 which remains on the road map.

The theoretical work has been done for a long time - how to do parallel induced sorting. But there are many variations to try out in practice and I'm not sure that I want to invest my time in that work at this point. I'm much more enthusiastic about Glimpse to be honest. It's not easy to decide that my personal project of a decade and a half is no longer a priority, but it just isn't. I guess it is true that with age comes wisdom. But I've come to realize that Glimpse has far more potential in the long term. It will become a product and not a project.

Anyhow, MSufSort 4/demo is posted on https://github.com/michaelmaniscalco/msufsort.

MSufSort 4 - multi-threaded performance analysis
November 14, 2016

The chart below graphs the performance of MSufSort 4 on enwik8 using an increasing number of CPUs. The test machine has 4 physical cores. Using virtual cores (hyper threading) has very little effect on performance and so are not presented in the graph. The red line line represents the 'theoretical' best performance assuming that increasing the number of cores from 1 to N would result in an overall decrease in time required to compute the suffix array by a factor of N. The blue line represents the 'actual' performance of MSufSort 4. The chart demonstrates that MSufSort scales almost exactly at the theoretical best as the number of cores increases. I'm reasonably pleased with this performance.

MSufSort 4
June 2, 2015
I've had MSufSort 4 algorithm improvements fleshed out for years now but simply have not had any free time to work on the implementation in any way. I was able to put together a prototype several years back which demonstrated that it clearly outperformed existing solutions.

Road map:   (* indicates new for version 4)

  • Three pivot multikey quicksort
  • Insertion sort specialized for strings *
  • Fully parallel quicksort/insertion sort
  • Cache friendly improved two stage
  • Tandem repeat sort
  • Opportunistic induction sort *
  • Fully parallel second stage for improved two stage *
Site Redo
June 1, 2015
Finally getting the site and its content back in order. Project work will resume on MSufSort 4 as well as M03 with context skipping. I'll be starting with MSufSort 4 and the content will be placed on github. Since I do not have a lot of time available I'll be adding to the project feature by feature until the project is complete. Updates will be documented here.