.Net's Array.Sort (up to at least version 4.0) has serious weaknesses:
1. It is insecure and using it makes you vulnerable to a malicious attacker. .Net uses an ordinary quicksort with the pivot selected by the median-of-three method. It is easy to provoke quicksort's worst-case (quadratic) behavior and increase running times by multiple orders-of-magnitude. An attacker will be happy to exploit this as an effective denial-of-service attack.
2. It is inflexible. It does not allow you to provide a delegate for the swap function so sorting data structures where data synchronization is required to maintain consistency as items are moved is impossible. Also, you can only sort items on the .Net heap so sorting unmanaged memory is impossible.
3. It is slower than it should be even in the absence of an attacker.
Zimbry.Introsort is twice as fast in the average case and rarely less than 13% faster in any case.
1. It is insecure and using it makes you vulnerable to a malicious attacker. .Net uses an ordinary quicksort with the pivot selected by the median-of-three method. It is easy to provoke quicksort's worst-case (quadratic) behavior and increase running times by multiple orders-of-magnitude. An attacker will be happy to exploit this as an effective denial-of-service attack.
2. It is inflexible. It does not allow you to provide a delegate for the swap function so sorting data structures where data synchronization is required to maintain consistency as items are moved is impossible. Also, you can only sort items on the .Net heap so sorting unmanaged memory is impossible.
3. It is slower than it should be even in the absence of an attacker.
Zimbry.Introsort addresses each of these problems.
1. It is secure. It is based on David Musser's Introsort algorithm. Introsort is essentially a quicksort that, should it fail, falls back to a secure heapsort.
2. It is flexible. Both the compare and swap operations are provided by the user. You can use it to sort anything.
3. It is faster. This wasn't an explicit objective but it's nice that we don't have to trade away performance to get a secure and flexible sort.
Click the links to see the benchmarks:
Let's look at the worst-case of dealing with an adversary.
It takes .Net over 26 minutes to sort one million integers when they are provided by an adversary. Zimbry.Introsort does it in half a second.
Those are the worst-case results. We can disable the adversary and benchmark it again:
Zimbry.Introsort is twice as fast in the average case and rarely less than 13% faster in any case.
(Each test was run only once so the timings for small arrays contain noticeable sampling noise. A more robust benchmark would filter multiple samples.)
I am releasing the source under the MIT license: Click here for the source
Some notes on the source:
You'll find many alternative sort algorithms in the Zimbry.Sort.OtherSorts project. I experimented with these along the way. You can enable them in the benchmark if you have a great deal of patience.
The class in QuicksortAdversary.cs was derived from Doug McIlroy's paper, A Killer Adversary for Quicksort. Be careful. It will beat up quicksort and steal its lunch money.
Zimbry.Introsort contains four sort algorithms layered together:
1. Quicksort with pivot selected by median-of-nine: For large partitions.
2. Quicksort with pivot selected by median-of-five: For small partitions.
3. Heapsort as a fall-back when quicksort recurses too deep: Heapsort is slower than quicksort in the best case but it has no quadratic behavior to exploit so it provides effective protection against an adversary.
4. Insertion sort: For tiny partitions where quicksort is inefficient.
Using these four algorithms lets us enjoy the performance advantage of quicksort for the typical case with protection against a malicious attacker in the worst case.
Both quicksorts use Bentley & McIlroy's "fat-pivot" partitioning method from their paper, Engineering a Sort Function, for better performance. This is a big part of why it performs better than .Net's quicksort in many tests.
While this is an improvement it is far from the last word in sorting. Some ideas to consider:
Better performance may be found with Vladimir Yaroslavskiy's dual-pivot quicksort.
It really needs special versions for handling known data types (avoiding the requirement for using compare and swap delegates in all cases). This would give a significant speed improvement.
There's more room for performance tuning. I tried to leave the code in a fairly readable state and some sacrifices could be made to buy a little more performance.
It would be nice to add support for stable sorting.