Quicksort identifies a pivot element in the list and then partitions the list into two sublists, those elements less than the pivot and those greater than the pivot. Spreadsort generalizes this idea by partitioning the list into n/c partitions at each step, where n is the total number of elements in the list and c is a small constant (in practice usually between 4 and 8 when comparisons are slow, or much larger in situations where they are fast). It uses distribution-based techniques to accomplish this, first locating the minimum and maximum value in the list, and then dividing the region between them into n/c equal-sized bins. Where caching is an issue, it can help to have a maximum number of bins in each recursive division step, causing this division process to take multiple steps. Though this causes more iterations, it reduces cache misses and can make the algorithm run faster overall.
In the case where the number of bins is at least the number of elements, spreadsort degenerates to bucket sort and the sort completes. Otherwise, each bin is sorted recursively. The algorithm uses heuristic tests to determine whether each bin would be more efficiently sorted by spreadsort or some other classical sort algorithm, then recursively sorts the bin.
Like other distribution-based sorts, spreadsort has the weakness that the programmer is required to provide a means of converting each element into a numeric key, for the purpose of identifying which bin it falls in. Although it is possible to do this for arbitrary-length elements such as strings by considering each element to be followed by an infinite number of minimum values, and indeed for any datatype possessing a total order, this can be more difficult to implement correctly than a simple comparison function, especially on complex structures. Poor implementation of this value function can result in clustering that harms the algorithm's relative performance.
The worst-case performance of spreadsort ultimately depends on what sort it switches to on smaller bins — O(n log n) if it uses a worst-case O(n log n) sort such as mergesort or heapsort, and O(n2) if it uses quicksort. In the case of distributions where the size of the key in bits k times 2 is roughly the square of the log of the list size n or smaller (2k < log(n)2), it does better in the worst case, achieving O(n·(k - log(n)).5) worst-case time.
Experiments were done comparing an optimized version of spreadsort to the highly-optimized C++ std::sort, typically implemented with quicksort switching to insertion sort on small sublists. On lists of integers spreadsort shows a 30-50% time improvement for random data on linux operating systems.
In space performance, spreadsort is worse than most in-place algorithms: in its simplest form, it is not an in-place algorithm, using O(n) extra space; in experiments, about 20% more than quicksort using a c of 4-8. With a cache-aware form, less memory is used and there is an upper bound on memory usage of the maximum bin count times the maximum number of recursions, which ends up being a few kilobytes times the size of the key in bytes. Although it uses asymptotically more space than the O(log n) overhead of quicksort or the O(1) overhead of heapsort, it uses considerably less space than the basic form of mergesort, which uses auxiliary space equal to the space occupied by the list.
Spreadsort also works efficiently on problems too large to fit in memory and thus requiring disk access.
void FindExtremes(DATATYPE *Array, SIZETYPE uCount, DATATYPE & piMax, DATATYPE & piMin) { SIZETYPE u = 1; piMin = piMax = Array[0]; for(u < uCount; ++u){ if(Array[u] > piMax) piMax=Array[u]; else if(Array[u] < piMin) piMin= Array[u]; } }
//---------------------SpreadSort Source-----------------
Bin * SpreadSortCore(DATATYPE *Array, SIZETYPE uCount, SIZETYPE & uBinCount, DATATYPE &iMax, DATATYPE &iMin) { //This step is roughly 10% of runtime, but it helps avoid worst-case behavior and improve behavior with real data //If you know the maximum and minimum ahead of time, you can pass those values in and skip this step for the first iteration FindExtremes((DATATYPE *) Array, uCount, iMax, iMin); if(iMax == iMin) return NULL; DATATYPE divMin,divMax; SIZETYPE u; int LogDivisor; Bin * BinArray; Bin* CurrentBin; unsigned logRange; logRange = RoughLog2Size((SIZETYPE)iMax-iMin); if((LogDivisor = logRange - RoughLog2Size(uCount) + LOG_MEAN_BIN_SIZE) < 0) LogDivisor = 0; //The below if statement is only necessary on systems with high memory latency relative to their processor speed //Otherwise known as modern processors if((logRange - LogDivisor) > MAX_SPLITS) LogDivisor = logRange - MAX_SPLITS; divMin = iMin >> LogDivisor; divMax = iMax >> LogDivisor; uBinCount = divMax - divMin + 1; //Allocate the bins and determine their sizes BinArray = (Bin *) calloc(uBinCount, sizeof(Bin)); //Memory allocation failure check and clean return with sorted results
if(!BinArray) {
printf("Using std::sort because of memory allocation failuren");
std::sort(Array, Array + uCount);
return NULL;
}
//Calculating the size of each bin; this takes roughly 10% of runtime
for(u = 0; u < uCount; ++u)
BinArray[(Array[u] >> LogDivisor) - divMin].uCount++;
//Assign the bin positions
BinArray[0].CurrentPosition = (DATATYPE *)Array;
for(u = 0; u < uBinCount - 1; u++) {
BinArray[u + 1].CurrentPosition = BinArray[u].CurrentPosition + BinArray[u].uCount;
BinArray[u].uCount = BinArray[u].CurrentPosition - Array;
}
BinArray[u].uCount = BinArray[u].CurrentPosition - Array;
//Swap into place
//This dominates runtime, especially in the swap; std::sort calls are the other main time-user
for(u = 0; u < uCount; ++u) {
for(CurrentBin = BinArray + ((Array[u] >> LogDivisor) - divMin); (CurrentBin->uCount > u);
CurrentBin = BinArray + ((Array[u] >> LogDivisor) - divMin))
SWAP(Array + u, CurrentBin->CurrentPosition++);
//Now that we've found the item belonging in this position, increment the bucket count
if(CurrentBin->CurrentPosition == Array + u)
++(CurrentBin->CurrentPosition);
}
//If we've bucketsorted, the array is sorted and we should skip recursion
if(!LogDivisor) {
free(BinArray);
return NULL;
}
return BinArray;
}void SpreadSortBins(DATATYPE *Array, SIZETYPE uCount, SIZETYPE uBinCount, const DATATYPE &iMax , const DATATYPE &iMin, Bin * BinArray, SIZETYPE uMaxCount) { SIZETYPE u; for(u = 0; u < uBinCount; u++){ SIZETYPE count = (BinArray[u].CurrentPosition - Array) - BinArray[u].uCount; //don't sort unless there is at least two items to compare if(count < 2) continue; if(count < uMaxCount) std::sort(Array + BinArray[u].uCount, BinArray[u].CurrentPosition); else SpreadSortRec(Array + BinArray[u].uCount, count); } free(BinArray); }
void SpreadSortRec(DATATYPE *Array, SIZETYPE uCount) { if(uCount < 2) return; DATATYPE iMax, iMin; SIZETYPE uBinCount; Bin * BinArray = SpreadSortCore(Array, uCount, uBinCount, iMax, iMin); if(!BinArray) return; SpreadSortBins(Array, uCount, uBinCount, iMax, iMin, BinArray, GetMaxCount(RoughLog2Size((SIZETYPE)iMax-iMin), uCount)); }