www.Michael-Maniscalco.com    : The MSufSort Algorithm
Home Blog Compression Sorting Papers References Links History Sudoku About me Contact
MSufSort 3.1.1 beta is available! Results will be posted soon. MSufSort.dll is also available. It exports the BWT, reverse BWT and suffix array functionality of 3.1 beta.

MSufSort 3.1.1 is a high speed and light weight suffix sorting algorithm. The comparisons below show the speed of this suffix sorting algorithm vs. several other leading suffix sorting algorithms including Archon 4r0 (Dima Malyshev), Deep-Shallow (Manzini & Ferragina), QSufSort (Larsson & Sadakane) and DivSufSort 1.2.2 (Mori).

The first table is the timing results for each of the algorithms when applied to the files of Manzini's large file corpus. The machine used during testing was a Toshiba A75 S2112 laptop with 1.5GB and 3.08GHz clock speed. Processor speed was set to maximum with silent mode selected.


Manzini's Larger File Corpus
File: MSufSort 3.1.1 Archon 4r0 DeepShallow DivSufSort 1.2.2 QSufSort
chr22 30.14 34.73 36.97 32.03 102.64
etext99 101.23 132.79 177.23 112.36 352.41
gcc 63.95 84.02 165.19 60.89 243.99
howto 28.42 35.91 41.84 32.03 114.22
jdk13c 60.44 105.81 146.86 61.02 258.91
linux 84.37 109.50 128.36 84.81 320.38
reuters 114.29 220.73 292.64 123.84 440.33
rfc 91.01 125.21 146.52 97.44 379.41
sprot34 98.08 135.79 158.75 111.39 346.57
w3c2 86.12 167.33 250.72 87.37 484.50
Totals: 758.05 1,151,82 1,545.08 803.18 3,043.36


This second table shows the results for each suffix sorting algorithm on the new universal robustness corpus called "The Gauntlet." This test set is comprised of files which are typically difficult to sort and is intended to test the algorithms for both robustness and seek out catastrophic performance cases. This test set may be added to by anyone provided the file to be added demonstrates that any one of the suffix sorting algorithms listed demonstrates clear catastrophic performance. The file needs to be at least large enough to measure the algorithms overall performance. Smaller files can show times which are more reflective of a program's initialization time etc rather than actual algorithm robustness. The machine used during testing was a Toshiba A75 S2112 laptop with 1.5GB and 3.08GHz clock speed. Processor speed was set to maximum with silent mode selected.


The Gauntlet (Universal Robustness Corpus)
File: MSufSort 3.1.1 Archon 4r0 DeepShallow DivSufSort 1.2.2 QSufSort
abba 11.57 18.78 113.77 11.19 100.87
book1x20 13.05 25.11 370.63 13.75 156.99
fib_s14930352 25.17 77.59 713.55 25.52 133.38
paper5x80 0.51 0.73 3.02 0.32 5.70
fss9 3.22 8.88 28.41 3.63 22.41
fss10 17.55 71.69 369.70 20.39 108.83
houston 0.97 1.47 452.06 0.47 6.81
abac 0.04 0.04 103.55 0.03 0.16
test1 2.51 9.56 4.11
test2 5.53 20.22 1.77
test3 2.05 11.44 1.69
Totals: 82.17 245.51 82.87


Gauntlet corpus contributors:
abba (10,500,600 bytes) - Michael Mansicalco
book1x20 (15,375,420 bytes) - Michael Maniscalco
fib_s14930352 (14,930,352 bytes) - Simon Puglisi
fss9 (2,851,443 bytes) - Simon Puglisi
fss10 (12,078,908 bytes) - Simon Puglisi
houston (3,840,000 bytes) - Graham Houston
paper5x80 (981,924 bytes) - Michael Maniscalco
abac (200,000 bytes) - Yuta Mori
test1 (2,097,152 bytes) - Yuta Mori
test2 (2,097,152 bytes) - Yuta Mori
test3 (2,097,152 bytes) - Yuta Mori