A survey of adaptive sorting algorithms

A Survey of Adaptive Sorting Algorithms

VLADIMIR ESTIVILL-CASTRO

LANIA, R6bsamen 80, Xalapa, Veracruz 91000, Mexico

DERICK WOOD

Department of Computer Science, University of Western Ontario, London, Ontario N6A 5B7, Canada

The design and analysis of adaptive sorting algorithms has made important

contributions to both theory and practice. The main contributions from the theoretical

point of view are: the description of the complexity of a sorting algorithm not only in

terms of the size of a problem instance but also in terms of the disorder of the given

problem instance; the establishment of new relationships among measures of disorder;

the introduction of new sorting algorithms that take advantage of the exlstmg order

in the input sequence; and, the proofs that several of the new sorting algorithms

achieve maximal (optimal) adaptivity with respect to several measures of disorder. The

main contributions from the practical point of view are: the demonstration that several

algorithms currently in use are adaptive: and, the development of new algorithms,

similar to currently used algorithms that perform competitively on random sequences

and are significantly faster on nearly sorted sequences. In this survey, we present the

basic notions and concepts of adaptive sorting and the state of the art of adaptive

sorting algorithms.

Categories and Subject Descriptors. A. 1 [General Literature]: Introductory andSurvey; E.2 [Data]: Data Storage Representations—composite structures; linked

representatzons; E.5 [Data]: Files—sortzng/searchz ng; F.2.2 [Analysis of Algorithmsand Problem Complexity]: Nonnumerlcal Algorithms and Problems—sorttng and

searchzng; G.3 [Mathematics of Computing]: Probability andStatistics—probabilistic algorithms

General Terms: Algorithms, Theory

Additional Key Words and Phrases: Adaptive sorting algorithms, comparison trees,

measures of disorder, nearly sorted sequences, randomized algorithms

INTRODUCTIONwe cannot do better than optimal (so-

Sorting is the computational process called n log n) sorting algorithms. How-of rearranging a given sequence of items ever, when the sorting algorithm takes

into ascending or descending order advantage of existing order in the input,

[Knuth 1973]. After a first course in Data the time taken by the algorithm to sort is

Structures or Algorithms, the impression a smoothly growing function of the size of

is that for comparison-based algorithms, the sequence and the disorder in the

This work was done while the first author was a Postdoctoral Fellow and the second author was a full-timefaculty member in the Department of Computer Science, University of Waterloo, Waterloo, Canada. It was

earned out under Natural Sciences and Engineering Research Council of Canada Grant No. A-5692 andunder an Information Technology Research Centre grant.Permission to copy without fee all or part of thm material is granted prowded that the copies are not madeor distributed for direct commercial advantage, the ACM copyright notice and the title of the publicationand its date appear, and notice is given that copying is by permission of the Association for ComputingMachinery. To copy otherwise, or to republish, requires a fee and/or specific permission.1992 ACM 0360-0300/92/1200-0441 $01.50

ACM Computmg Surveys, Vol. 24, No. 4, December 1992

442 ● V. Estiuill-Castro and D. Wood

CONTENTS

INTRODUCTION1,1 Hlstorlcal Background

1,2 Examples of Measures of DmorderI 3 Orgamzatlon of the Paper

1 WORST CASE (INTERNAL) ADAPTIVESORTING ALGORITHMS1 1 Generic Sorting Algorithms1 ~ COOk.Kl~ Dlvl~,Un

13 Partltlon Sort14 Exponential Search1 5 P.daptjve Merging

2 EXPECTED-CASE (INTERNAL} ADAPTIVESORTING ALGORITHMS2 1 Dlstrlhutlonal Analysls

22 Randornlzed Algorithms23 Randomized Genenc Sort24 Randomized Partition Sort25 Skip Sort

3 EXTERNAL SORTING3 1 Replacement SelectIon

4, FINAL REMARKSREFERENCES

sequence. In this case, we say thatthe algorithm is adaptive [Mehlhorn1984]. Adaptive sorting algorithmsare attractive because nearly sorted

sequences are common in practice [Knuth

1973, p. 339; Sedgewick 1980, p. 126;

Mehlhorn 1984, p. 54]; thus, we have the

possibility of improving on algorithms

that are oblivious to the existing order in

the input.

Straight Insertion Sort (see Figure 1)is a classic example of an adaptive sort-

ing algorithm. In Straight Insertion Sort

we scan the input once, from left to right,

repeatedly finding the correct position of

the current item, inserting it into an

array of previously sorted items. The

closer a sequence is to being sorted, the

less is the time taken by Straight Inser-

tion Sort to sort it. In fact, the perfor-

mance of Straight Insertion Sort can be

described in terms of the size of the input

and the number of inversions in the

input. More formally, let In U( X) denote

the number of inversions in a sequence

X= (x-l, . . ..x. ). where (z,~) is an

inversion ifi <J” and x, > x,.

procedure Straight InsertIon Sort(.Y, n );

X[o] := -CO;for] := 2 to n do

begin L := ] – 1:

end;

Fiaure 1.

t := X[j];

whilet < ,l-[t] dobegin .Y[t + 1] := .Y[t]:

~.=t–l.

end;.1-[1+ 1] = t;

A meudo-Pasccd Imriementatlon of

Straight Insert~on Sort.

Intuitively, lnu( X) measures disorder,

since its value is minimized when X is

sorted and when its value depends on

only the relative order of the elements in

X. For a sequence X with n elements,

Straight Insertion Sort performs exactly

ln~(X) + n – 1 comparisons and

Inu( X) + 2n – 1 data moves and uses

constant extra space, We say that

Straight Insertion Sort is an adaptive

algor;thm with respect to Inv. Emp;ricalevidence confirms that Straight Insertion

Sort is efficient for both small sequences

and nearly sorted sequences [Cook and

Kim 1980: Knuth 19731.

In this survey we present the basic

notions and concepts of adaptive sorting

and the state of the art of adaptive sort-

ing algorithms. The initial motivation

for adaptive sorting algorithms is the

high frequency with which nearly sorted

inputs occur in practical applications.

Sorting a nearly sorted sequence should

reauire less work than sorting a ran-

domly permuted sequence. For ~xample,if we know that, apart from one elementin the wrong position, the input sequence

is sorted. then the seauence can be sorted

by scanning the sequ&ce to find the mis-

placed element and performing a binary

search to find its correct position. In this

case, sorting takes linear time without

resorting to the full power of a general

sorting algorithm. In the general case,

we do not know anvthing in advance

about the existing ord~r in ~he input, butwe want the sorting algorithm to “dis-

ACM Computmg Surveys, Vol 24, No. 4, December 1992

Adaptive Sorting Algorithms ● 443

cover” existing order and profit from it toaccelerate the sorting.

We focus our attention on comparison-based sorting algorithms for sequentialmodels of computation. The reason forthis choice is that most of the work onadaptive sorting falls within this frame-work. However, in an effort to be compre-hensive, we mention briefly in Sections 3and 4 results that do not fit into thisframework.

9 Notational Conventions. The cardi-nality of a set S is denoted by IIS 11,andthe length of a sequence X is denotedby IX 1.We assume throughout the sur-vey only that the elements to be sortedbelong to a total order; however, wealways use integer examples. The col-lection of all finite sequences of distinctnonnegative integers is denoted by~ < N The base-2 logarithm is denoted

by ]0~.

1.1 Historical Background

As early as 1958, Burge observed thatthe best algorithm for a sorting problemdepends on the order already in the data,and he proposed measures of disorder toevaluate the extent to which elementsare already sorted [Burge 1958]. Initialresearch on adaptive sorting algorithmsconcentrated on internal sorting algo-rithms and followed three directions.First, during 1977– 1980, data structureswere designed to represent sortedsequences and provide fast insertions forelements whose position was close to theposition of previous insertions. Differentadaptive variants of insertion-based sort-ing algorithms were obtained by usingthese data structures to maintain thesorted portion of the sequence [Brownand Tarjan 1980; Guibas et al. 1977;Mehlhorn 1979]. Second, Cook and Kim[1980] and Wainwright [ 1985] performedempirical studies. Third, Dijkstra [1982]introduced Smooth Sort. The intuitiveideas arising from these efforts wereformalized by Mannila who, in 1985,introduced a formal framework for theanalysis of adaptive sorting algorithmsin the worst case [Mannila 1985b]. This

framework consists of two parts: First,the introduction of the concept of a mea-sure of presortedness as a function thatevaluates disorder; second, the concept ofoptimal adaptivity of an algorithm withrespect to a measure of presorted-ness. Because of the developments ofPetersson [1990] and Estivill-Castro[1991], adaptive sorting algorithms arenow well understood.

We evaluate the disorder in a sequenceby a real-valued function that we call ameasure of disorder; it is a functionfrom N < N to St. The measure of effi-ciency of a sorting algorithm is the num-ber of comparisons it performs. Thenumber of comparisons provides not onlya reasonable estimate of the relative timerequirements of all implementations butalso enables lower bounds to be obtainedunder the decision tree model of com-putation. Mannila [1985a] defined a sort-ing algorithm to be optimally adaptive

(or maximally adaptive) with respect to ameasure of disorder if it takes a numberof comparisons that is within a constantfactor of the lower bound.

A general lower bound for sorting withrespect to a measure M of disorder isobtained as follows. Let below (z, n, M )

denote the set of permutations of n itemswith no more disorder than z withrespect to M; then, we have

below(z, n,M) = {Y GN<NIIYI =n

and M(Y) <z}.

Consider sorting algorithms having asinput not only a sequence X of length 12but also an upper bound z on the valueof M(X). We analyze the comparison ordecision trees for such algorithms [Knuth1973, p. 182; Mehlhorn 1984, p. 68]; wegive an example of a comparison tree inFigure 2. These trees are binary and musthave at least Ilbelow( z, ?2,M)ll leaves, thenumber of possible inputs; so the heightof a tree is at least fl(logllbelow( z,n, M )1l). Therefore, such algorithms takeat least Q(log Ilbelow( z, n, M)ll) time. If

an upper bound on M(X) is not avail-able, the algorithm cannot be fasterthan one that is provided with the best

4CM Computing Surveys, Vol. 24, No, 4, December 1992

444 . V. Estivill-Castro and D. Wood

Figure 2. A comparison tree for sorting the sequence Yl, ~ ~, X3

possible bound z = M(X). Finally, sincetesting for sortedness requires lineartime, a sorting algorithm should be al-lowed at least a linear number of com-parisons.

The following definition captures thenotion of optimal adaptivity in the worstcase [Mannila 1985b].

Definition I. 1. Let M be a measure of

disorder and S be a sorting algorithm

which uses T~(X) comparisons on input

X. We say that S is optimal with respectto M (or M optimal) if, for some c >0,we have, for all X E N‘’,

Ts(X) s c ~max{lXl,

logllbeloz.o(M( X),lXl, M)ll}.

An example of a sorting algorithm thatis optimal with respect to inversions isMannila’s Local Insertion Sort [Mannila1985 b]; it is an insertion-based sortingalgorithm. On the ith pass, xl, . . . . x,_litems have been sorted, and x, is to beinserted into its correct position. Mannilauses a level-linked balanced (a, b ) treewith a finger to represent the initialsorted segment. Let d,(X) denote thedistance from the (i – l)th insertionpoint to the ith insertion point; that is,d,(X) =Il{jll <j<i and (x, _l <xj <x,or xl<x~<x ,_ ~)}ll. Level-linked bal-anced (a, b ) trees support the search forXL’S position in c(1 + log[ ci, (X) + 1])amortized time, where c is a constant[Brown and Tarjan 1980]. We useMannila’s argument to show that LocalInsertion Sort is lnu optimal. Let I,(X)denote the number of inversion pairs in

X, where x, is the second element in theinversion pair; that is,

l,(x) =Il{jll <j<z and Xj >x, } .

Now E~~ll 1,(X) = lnu(X), and it is nothard to see that

ci,(X) < max{l, (X),l, _l(X)}. (1)

Thus, Elxll log[d, (X) + 1] ~ 2~~~1 log[ 1,(%) + 1]. Since the algorlthminserts IX I elements into an initiallyempty tree, the amortized bound givesrise to a worst-case upper bound on thetime taken by Local Insertion Sort to sortX. Using Equation (1), properties of thelogarithm, and the fact that the geomet-ric mean is never larger than the arith-metic mean, we obtain the followingderivation,

1“

c : (1+ log[d, (x) + 11)~=1

[= Clxl + clog h (d,(x)+1)1=1 1

(“l/lxl

– Clxl + 2clxllog 6 [I,(x) + 1]—1=1 1

“’x’(1+2’Og[*+lllThus, the time taken by Local Inser-

tion Sort is a smooth function of IX I andthe disorder of X. The more inversionsthere are in X, the more work is per-formed by Local Insertion Sort. Guibas


Adaptive Sorting Algorithms “ 445

et al. [1977] obtained the following lowerbound for adaptive sorting with respectto inversions.

logl16elow(Inu(X), 1X1, Inu)ll

= Q(lxllog[l + hzv(x)/lxll);

thus, Local Insertion Sort is lrzu optimal.Note that if a sorting algorithm is opti-

mal with respect to a measure M, then itis optimal in the worst case; in otherwords, optimally adaptive algorithms are

optimal.

1.2 Examples of Measures of Disorder

Although the number of inversions is animportant measure of disorder, it doesnot capture all types of disorder. Forexample, the sequence

W. =([n/2j + l,[rz/2j +2,... ,?2,

1 ,...,l21)l)

has a quadratic number of inversionsalthough it consists of two ascendingruns; therefore, it is nearly sorted in asense not captured by inversions.

We now describe 10 other measures ofdisorder that have been used in the studyof adaptivity.

(1)

(2)

Dis. We may consider that, interms of the disorder it represents,an inversion pair in which the ele-ments are far apart is more signifi-cant than an inversion pair in whichthe elements are close together.One measure of this kind is Dis

that is defined as the largest dis-tance determined by an inversion[Estivill-Castro and Wood 1989a].For example, let

WI = (6,2,4,7,3,1,9,5,10,8);

then (6, 5) is an inversion whoseelements are farthest apart, andDis(Wl) = 7.

Max. We may also consider thatlocal disorder is not as important asglobal disorder. For example, in alibrary, if a book is one slot awayfrom its correct position, we are still

(3)

(4)

(5)

able to find it, since the call numberwill get us close enough to where itis; however, a book that is very farfrom its correct position is difficultto find. A measure that evaluatesthis type of disorder is Max definedas the largest distance an elementmust travel to reach its sorted posi-tion [Estivill-Castro and Wood,1989a]. In IVl, 1 must travel five

positions; thus, Max(WI) = 5.

Exe. The number of operationsrequired to rearrange a sequenceinto sorted order may be our pri-mary concern. A simple operationto rearrange the elements in asequence is an exchange; thus, wemay define Exc as the minimumnumber of exchanges required tosort a sequence [Mannila 1985 b]. Itis impossible to sort the examplesequence WI with fewer than sevenexchanges; thus, Exc( WI ) = 7.

Rem. We may consider that disor-der is produced by the incorrectinsertion of some records and evalu-ate disorder by Rem, defined as theminimum number of elements thatmust be removed to obtain a sortedsubsequence [Knuth 1973, Sect.5.2.1, Exer. 39]. By removing fiveelements from WI we obtain thesorted subsequence (2, 4,7,9, 10).Removing fewer than five elementscannot give a sorted subsequence;thus, Rem(WI) = 5. We can alsodefine Rem(X) as 1X1 – Las(X),

where Las(X) is the length of alargest ascending subsequence; thatis, Las(X) = max{tl~i(l), i(2), . . . . i

(t)ll < i(l) < i(2) < . . . < i(t) S nand x,(l) < . . . <x }.z(t)Runs. Since ascending runs repre-sent sorted portions of the input andsince a sorted sequence has theminimum number of runs, it is nat-ural to define a measure that is thenumber of ascending runs. In orderto make the function zero for asequence with no disorder, we defineRuns as the number of boundariesbetween runs. These boundaries are

ACM Computing Surveys, Vol 24, No. 4, December 1992

446

(6)

(7)

(8)

(9)

(10)

. V. Estiuill-Cczstro and D. Wood

called step-downs [Knuth 1973, p.161], at which a smaller elementfollows a larger one. For example,WI = (6~2,4, 7J3~l,9J5,10J,8)has Rzzns(Wl) = 5.

S US. A natural generalizationof Rztrzs is the minimum number ofascending subsequences into whichwe can partition the given sequence.We denote this measure by SUS

for Shuffled Up-Sequences[Levcopoulos and Petersson 1990].

SMS. SUS can be generalized fur-ther by defining SMS(X ) (for Shuf-fled Monotone Subsequence) asthe minimum number of monotone

(ascending or descending) subse-quences into which we can partitionthe given sequence [Levcopoulos andPetersson 1990]. For example,

Wz = (6,5,8,7,10,9,12,11,4,3,2)

has Runs(Wz) = 7, whereas

SUS(W, ) = II{(6, 8,10, 12),

(5$7,9,11),

(4), (3), (2)}11 = 5

and

SMS(WZ) = II{(6, 8,10, 12),

(5,7,9,11),

(4,3,2)}11 = 3.

Enc. Several researchers havedesigned sorting algorithms, deter-mined on which permutations theydo well, and then have defined newmeasures as the result of theirinvestigations. Skiena [1988] pro-posed the measure Enc( X ), definedas the number of sorted lists con-structed by Melsort when appliedto x.Osc. Levcopoulos and Petersson[ 1989al defined the measure Oscfrom a study of Heapsort. The mea-sure Osc evaluates, in some sense,the “oscillation” of large and smallelements in a given sequence.

Reg. Moffat and Petersson [1991;

1992] defined a new measure calledReg; any Reg-optimal sorting algo-rithm is optimally adaptive withrespect to the other 10 measures.

Are these 10 measures and Inv differ-ent? From the algorithmic point of view,if two measures Ml and Mz partition

the set of permutations into exactly the

same classes of below sets, then any al-

gorithm that is All optimal is also Mz

optimal and conversely. Therefore, the

measures are indistinguishable. we cap-

ture these ideas in the following defini-

tion.

Definition 1.2. Let Ml, Mz: N < N +

!li be two measures of disorder. We saythat

(1) Ml is algorithmically finer than

Mz (denoted Ml <.l~ Mz ) if and onlyif any All-optimal algorithm is alsoMz optimal.

(2) Ml and Mz are algorithmicallyequivalent (denoted Ml =.l~ Mz ) if

and only if Ml <.[~ Mz and Ml <.l~

M,.

Figure 3 displays the partial ordering,with respect to <.lg, of the measuresthat we have defined.

Using a complex data structure calleda historical search tree, Moffat andPetersson designed an insertion-basedsorting algorithm that makes an optimalnumber of comparisons with respect toReg but does not make an optimal num-

ber of data moves; therefore, it takes

fl(log Reg(,X ) + IX Ilog logl X 1) time andis not Reg optimal [Moffat and Petersson

1991: 1992]. Their work constitutes an

important theoretical contribution and

raises the question of the existence of a

universal measure U that is finer than

all measures. However, even if such a

universal measure U exists, there maybe neither a U-optimal algorithm nor apractical implementation [Mannila1985a].

A sequence is nearly sorted if eitherit requires few operations to sort it or if itwas created from a sorted sequence witha few perturbations. Each measure that

ACM Computmg Surveys, Vol 24. No. 4, December 1992

Adaptive Sorting Algorithms “ 447

Dts =.19 .bfax Runs

7’

Erc

I nv,$[J,$

1

Osc\ f/R’m

Figure 3. The partial order of the 11 measures ofdisorder. The ordering with respect to <tit& is Up

the page; for example, lnu optlrnahty Implies DLS

optimality.

we have defined illustrates that disordercan be measured in many different ways;each is important because they formalizethe intuitive notion of a nearly sortedsequence. For ease of reference, Table 1presents tight lower bounds for optimal-ity with respect to 11 different measuresof disorder. The lower bound for Inv was

established by Guibas et al. [1977]; thelower bounds for Rem and Runs wereestablished by Mannila [ 1985 b]; the lowerbounds for Eric, Exe, Osc, SMS, and SUS

were established by Levcopoulos andPetersson [ 1989a; 1990; 1991 b]; the lowerbounds for Dis and Max were estab-lished by Estivill-Castro and Wood[1989]; and the lower bound for Reg wasestablished by Moffat and Petersson[1991; 1992].

Measures of disorder are a fundamen-tal concept, but little can be said aboutthem because of their generality. More-over, measures of disorder appear in con-texts other than adaptive sorting. InStatistics, measures of disarray or right-invariant metrics are used to obtain coef-ficients of correlation for rank correlationmethods. These coefficients of correla-tion are used to test the significance ofobserved rank correlations. For exam-ple, Inv appears in the definition ofKendall’s~, the most popular coefficientof correlation [Kendall 1970]. Right-invariant metrics have applications incryptography where they are used tobuild tests for random permutations

Table 1, The Worst-Case Lower Bounds forDiferent Measures of Disorder

M Lower bound:

log \lbelow(.TI(X), /k-l, Jf)l[

Dis Q(lxl(l+log[Dis(x)+ l]))

Ezc fl(lxl + EZc(x)log[llm(x) + 1])

Enc Q(1X (1+ log [Eric(x)+ l]))

Inv

Max Q(IXI (1 + log [Maz(X) + 1]))

“SC +1 )

Reg f2(lXl + log[Reg(X) + 1])

Rem Q(I.Y] + Rem(.Y)log[Rem(.Y) + 1])Runs Q(/.Y/(l+log[Rrfns(x)+ I]))

ShfS fl(/X[(l+log[SMS(X) + 1]))

S1’s ft(l.k”l (1 + Iog[sus(x) + l]))

[Sloane 1983]. Other applications of mea-sures of disorder include the study oferror sensitivity of sorting algorithms[Islam and Lakshman 1990]. The studyof the relationships between right-invariant metrics and measures of disor-der has resulted in general algorithmsfor the pseudorandom generation ofnearly sorted sequences with respect to ameasure of disorder [Estivill-Castro1991].

1,3 Organization of the Paper

Most studies of adaptive sorting have hadas their goal a guarantee of optimalworst-case performance. This approachhas resulted in a large family of adaptivesorting algorithms and elegant theoreti-cal results. Section 1 studies adaptivityfrom the point of view of worst-case ar,al-ysis. Rather than presenting a list ofsorting algorithms and their adaptivityproperties, we present well-developeddesign tools and generic sorting algo-rithms. The adaptivity results are pre-sented in a general form with respect toan abstract measure of disorder. Fromthis general scheme, specific algorithmsare obtained as particular applications ofthe design tools. The adaptivity proper-ties of particular sorting algorithms



appear as simple consequences of thegeneral result.

Worst-case analysis requires us toguarantee best possible performance forall inputs even if some cases are unlikelyto occur. This guarantee can result incomplex sorting algorithms that usesophisticated data structures with largeoverhead. We suggest an alternativeapproach: Design algorithms that areadaptive in the expected case. Section 2studies adaptivity from the point of viewof expected-case analysis.

In Section 3, we discuss external sort-ing algorithms, that is, algorithms thatare used to sort sequences that cannot bestored entirely in main memory. Currenttechnology allows the sorting of large filesto be performed on disk drives [Salzberg1988; 1989]. Since Replacement Selection

allows full overlapping of 1/0 operationsduring initial run creation for externalsorting and since it creates runs that arelarger than the available memory,Replacement Selection is a fundamentalpractical tool and is almost the only algo-rithm in use. We present results thatdemonstrate the adaptive performance ofReplacement Selection.

Finally, in Section 4 we discuss someopen problems and some directions forfurther research. In particular, we sur-vey the efforts that have been made todevelop adaptive in-place sorting algo-rithms (that is, with constant extra space)and adaptive sorting algorithms forsequences with repeated keys. We alsomention the efforts that have been madeto define adaptivity of parallel sortingalgorithms.

I. woRsT-cAsE (INTERNAL) ADAPTIVESORTING ALGORITHMS

It is certainly useful to design an adap-tive sorting algorithm for a particu-lar measure; however, the diversity ofmeasures of disorder suggests that analgorithm that is adaptive to severalmeasures is much more useful, since in ageneral setting, the type of disorder inthe input is unknown. Initially, manydiscoveries were made by studying a par-

ticular measure, but researchers haveincreasingly focused their attention onsorting algorithms that are adaptive withrespect to several measures. Unfortu-nately, it appears that we cannot increasethe number of measures without addingcomplex machinery with large overheadthat renders the algorithm impractical.The tradeoff between the number of mea-sures and a practical implementation iscentral to the selection of an adaptivesorting method. We illustrate this trade-off by presenting progressively moresophisticated algorithms that are adap-tive with respect to more measures,although we favor adaptive algorithmsthat are practical.

1.1 Generic Sorting Algorithms

We present a generic adaptive sortingalgorithm that enables us to focus atten-tion on the combinatorial properties ofmeasures of disorder rather than thecombinatorial properties of the algorithm[Estivill-Castro and Wood 1991b; 1992b].Using it, we obtain a practical adaptivesorting algorithm, optimal with respectto several important measures of disor-der and smoothly adaptive for othercommon measures.

The structure of the generic sortingalgorithm, Generic Sort, should not besurprising. It uses divide and conquerand balancing to ensure 0( n log n )worst-case time. What is novel, however,is that we can establish adaptability withrespect to a measure M of disorder byensuring that the method of division, thedivision protocol, satisfies threerequirements. First, division should takelinear time in the worst case; second, thesizes of the sequences that it generates(and that are not sorted) should be almostthe same; and, third, it should not intro-duce too much disorder. We formalizewhat is meant by too much disorder inTheorem 1.1. The generic sorting algo-rithm has the form shown in Figure 4. Ifa sequence cannot be divided into smallersequences or is so close to sorted orderthat it can be sorted in linear time by analternative sorting algorithm, then it is

ACM Computing Surveys, Vol. 24, No 4, December 1992

Adaptive Sorting Algorit?lms “ 449

Generic Sort(.Y )

e .Y is sorted. Terminate.

* .k- is simple. Sort It using an alternat]~e

sorting algorithm for simple sequences.

o .Y is neither sorted nor simple.

– Apply a dii’ision protocol to divide

.%- into at least s > 2 d]sjnint <e-

quences.

– Recursively sort the sequences us-

I ng Ge~) eric Sc}rt.

– Lferge the sorted sequences t(] gIVC

.1- in sorted order.

Figure 4. Genertc Sort. The first generic sortingalgorithm,

considered to be simple. Generic Sortleaves us with two problems: What arereasonable division protocols, and whatis meant by too much disorder? Threeexample division protocols are:

Straight diz~ision. Divide a sequencex= (x,,.. ., x. ) into two halves X, =,.XI ~lX1/21 and XR = X1+ L1.~1/21 Ixl.

Odd-Even division. Divide X into thesubsequence XeU,~ of elements in evenpositions and the subsequence XOd~ ofelements in odd positions.

Median division. Divide X into thesequence of all elements smaller thanthe median of X and the sequenceof all elements larger than themedian of X (denoted by X. andX, , respectively).

Observe that each of these division proto-cols satisfies the time and size require-ments. The notion of too much disorder ismade precise in the following theorem[Estivill-Castro and Wood 1991b; 1992b].

THEOREM 1.1. Let M be a measure of

disorder such that M(X) = O implies X is

simple, D ● ~i and s = N be constants

such that O < D < 2 and s > 1, and DP

be a linear-time division protocol that

divides a sequence mto s sequences of

almost equal sizes.

(1) Generic Sort is worst-case optimal; it

takes 0( IX Ilogl X 1) time in the worst

case.

(2) If there is an no = N such that, for

all sequences X with 1X1 > no, DP

satisfies

~ M(jth sequence) < D[s/2]M(X),~=1

then Generic Sort is adaptive to the

measure M; it takes

O(lxl(l + log[iw(x) + 11))

time in the worst case.

(3) D <2 is necessary.

1.1.1 Applications of Generic Sort

Consider the variant of Mergesort

displayed in Figure 5. The straight-division protocol takes linear time, andInv(X ~ ,IZYI,2J) + Inv(Xl+Llxl,2j ,x,)

accounts for all the inversions exceptthose inversion pairs with one element in

Xl, ~1~1VJ and the other in Xl+ll~l/21 1x1.Thus, we have

lnv(X) 2 Inv(x.1 ,lx/2])

+~nV(X1+[lxl/21 lX1).

Similarly,

Rem(X) > Rem(X 1 [1.Yl/2J)

+Rem(X1+ll.yl/z J lx).

Therefore, Straight Merge Sort is adap-tive with respect to Inv and Rem. Now,

Runs( Xl ~l.~l/~l ) + Ru~s(xl t[lxl/2J 1.X1)

accounts for all the step-downs in Xexcept possibly for a step-down betweenx 1 llX1/2j and xl+[lxl/2J 1X1’ Thus, wehave

Runs(x) ~ Runsfxl .llxl/2j)

+Runs(.Xl+[lxli~l I.YI)~

and Theorem 1.1 and the lower boundimply that Straight Merge Sort is opti-mal with respect to Runs.

Now, for the measure Dis, we have

Dis(X) > Dis(Xl ,lxl/zl)

ACM Comuutm~ Surveys. V.1 24, No. 4, December 1992

450 “ V. Estivill-Castro and D. Wood

Straight Merge Sort(X);

if not sor~ed(.Y)then begin

Straight Merge Sort(-Yl 1X112),

Stra]ght Merge Sort(.l-l+lJ 112 I,t 1):

Merge(.~1 plizl ~l+l~llz IYI);

end;

Figure 5. A pseudo-Pascal Implementation ofStrazght Merge Sort,

and

Dis(x) > Dis(xl+llxl ~, ,x,);

therefore,

2Dis(X) > Dis(Xl ~lx,~l~

+D~s(xl+[lxl 21 12YI).

This bound is tight because, forthe sequence W = (2, 1, 4, 3, . . . ), wehave Dis(W ) = Dis(Wl ,Iwl, zl)=

Dis(Wl+[lWljzl ,Wl ) = 1. Assume that

Theorem 1.1 holds when D = 2. Thisassumption implies that Straight MergeSort should take 0(1 W 1[1 + log 21) =0( IW 1) time. But clearly, Straight MergeSort requires 0([ W Ilog IW 1) comparisonsto sort W. Thus, D <2 is necessary forTheorem 1.1 to hold, and Straight MergeSort is not adaptive with respect to Dis.

To construct an algorithm that is adap-tive with respect to Dis, we use theodd-even division protocol. It can beshown that Dis( X)/2 > Dis( XO~d) andDis(X)/2 > Dis(X,,,. ), and immediatelywe have

Dis(x) > Dis(x, uen) + Dis(x(,dd).

Moreover, it is easy to prove that, for anysequence X,

Inv(X) > Inv(XeL,,~) + 7nv(XO~~),

and

Rem(X) > Rern(X~, e,,) + Rem(Xo~~).

Thus, the modified version of Mergesortpresented in Figure 6 takes fewercomparisons than the minimum of 6 IX I(log[lnv(X) + 1] + 1), 61 Xl(log[Dis

Odd-Even Merge Sort( .Y );

if not sor~ed(.~)then begin

Odd Even ~~evv sor~(-~,~,n);Odd-Even Aferge sOrt(.~od~ );

kfer&?(.~,”p~,.~od~ );

end;

Figure 6. A pseudo-Pascal Implementation ofOdd-Everl A{erge Sort.

(X) + 11 + 1), 61Xl(log[ Rem(X) + 11 +1), and 61Xl(log[Mcm(X) + 1] + 1). Inother words, if Odd-Even Merge Sort isgiven a sequence that is nearly sortedwith respect to any of the above meas-ures, it will adapt its time requirementsaccordingly. In fact, for Dis and Max, it

is optimal; see Table 1.As a final application of Theorem 1.1,

we combine the two previous algorithmsand obtain an algorithm that is adaptivewith respect to Inv, Exe, and Rem andoptimal with respect to Runs, Dis,

and Max. We merely combine the twodivision protocols to give Odd-Even

Straight Merge Sort; see Figure 7. Withthe same design tool we have obtained asorting algorithm that is

Simple and leads to implementations

that are competitive with traditional

methods on random sequences.

Adaptive with respect to several

measures; thus, it leads to implemen-tations that are faster on nearly sortedinputs.

The reader may ask why we do not use asorting algorithm that is Reg optimaland covers all the measures we haveintroduced? As we have pointed out, thereis no known sorting algorithm that isReg optimal. The message we are

attempting to communicate is that more

machinery is required to obtain adaptiv-

ity with respect to more measures, and

we have to be very careful that the over-

head needed does not outweigh the sav-

ings obtained by the gained adaptivity.

We hope to give you a feeling for those

methods whose overhead is small that

result in adaptivity for many measures.

ACM Computmg Surveys, Vol 24, No 4, December 1992


Odd-Even Straight Merge Sort(X);

if notsorted(.l-)then begin

O-E S M Sort( Xe.en, ):

O-E S M Sort(XO~~~ ):

O-E S M Sort( .%-

omMsort(fl:,!;Merge(.Ye.enL ..roddL , .YetienR ,-~oddR );

end;

Figure 7. A pseudo-Pascal implementation ofOdd-Even Straight Merge Sort.

We now present a division protocol thatallows us to achieve optimality in Exe,Inv, and Rem with little increase in thecomplexity of the code.

1.2 Cook-Kim Division

In their empirical studies Cook and Kim[1980] chose the measure Rem because it“is an easily computed measure thatcoincides with our intuitive notion [ofnearly sorted] .“ They designed a divisionprocedure to make Quickersort adaptivewith respect to Rem. The resulting sort-ing algorithm is called CKsort. Later,Levcopolous and Petersson [ 1991b]observed that Cook-Kim division isvery powerful, since we can use it to addRem- and Exe to the list of measures forwhich a sorting algorithm is optimal.

Let A be any sorting algorithm thattakes O(n log n) comparisons in theworst case. We describe how Cook-Kimdivision can be used to obtain a sortingalgorithm that is Rem- and Exc optimaland also M optimal, for any measure M

for which algorithm A is M optimal. LetX =(X1,.+., x.) be the sequence to besorted. Cook-Kim division divides X intotwo subsequences such that one of thesubsequences is sorted, and the other haslength at most 2Rem(X ). We sort theunsorted subsequence with algorithm A

and merge the resulting sequences, inlinear time, to obtain a sorted sequence.In total, Cook-Kim division adds alinear-time term to the time taken byalgorithm A. The division procedure isdefined as follows. Initially, xl is theonly element in the sorted subsequence.

Now, examine x,, for i = 2,. ... 1X1. If xlis larger than the last element in thesorted subsequence, then x, is appendedto the sorted subsequence; otherwise, thelast element in the sorted sequence andx, are placed in a second subsequence,and the first sorted subsequence shrinks.

For example, Cook-Kim division can becombined with the test for sortedness inOdd-Even Straight Merge Sort to obtainan algorithm that is optimal with respectto Dis, Exe, Max, Rem, and Runs. More-over, Levcopoulos and Petersson [ 1991b]generalized Cook-Kim division to obtaina division protocol that satisfies thelinear-time and equal-size requirementsof Generic Sort. They demonstrated thatthe resulting algorithm is lnu - and Rem-

optimal. We now present their divisionprotocol.

LP division protocol. Given a sequenceX= (xl,..., x.), initially place x ~ in asequence S and create two emptysequences XL and Xa. Now, examine x,,fori =2,..., 1X1. If x, is larger than thelast element in S, append x, to S; other-wise append x, to XL and remove the lastelement of S and append it to X~. Weobtain a sorted sequence S and twosequences X~ and XL of the same length.

When we combine straight divisionwith Odd-Even division and LP divi-sion in Generic Sort we obtain an algo-rithm which we call LP Odd-Even

Straight Merge Sort that is Exe, Dis,

Inv, Rem, and Runs optimal. Harris[1981] has shown that when Natural

Mergesort is implemented using linkedlists, as suggested by Knuth [1973, Sec.5.2, Ex. 12; Sec. 5.2, Ex. 12], it is acompetitive (with respect to comparisonsand data movements) adaptive sortingalgorithm that behaves well on randomsequences. In fact, LP Odd-Even StraightMerge Sort can be implemented withlinked lists without difficulty. With thisimplementation, LP Odd-Even StraightMerge Sort results in a practical andsimple utility. Comparisons of CPU time,for nearly sorted sequences (with respectto Dis, Inv, Rem, and Runs), betweenLP Odd-Even Straight Merge Sort and


452 * V. Estivill-Castro and D. Wood

other algorithms indicate that LP Odd-Even Straight Merge Sort is the mosteffective alternative. Moreover, CPU tim-ings of LP Odd-Even Straight Merge Sortshow that it is competitive with othersorting algorithms on random data andmuch faster on nearly sorted data.

1.3 Partition Sot-t

In Section 1.1, we presented a genericsorting algorithm that divides the inputinto s parts of almost equal size, where sis a constant. We now describe an alter-native generic sorting algorithm in whichthe number s of parts can vary anddepends on the disorder in the input[Estivill-Castro and Wood 1992c]. More-over, the sizes of the individual parts arenot restricted, although the total size ofthe simple parts must be at least a fixedfraction of the total size of the parts. Thisapproach makes the code for the gene-ric algorithm more complex, but, inexchange, we obtain optimal adaptivitywith respect to more measures. In partic-ular, the division protocol is more sophis-ticated and may require more than lineartime, since it must ensure that the disor-der has been significantly reduced ina constant fraction of the parts. Theparts where the disorder is small areconsidered simple.

Letting X be a sequence, we say thata set P(X) of nonempty sequencesx,,. . ., Xp is a partition of X if everyelement of X is in one and only onesequence in P(X) and if the sequences inP(X) have elements only from X. Thenumber of sequences in P(X) is calledthe size of the partition.

The structure of the generic sortingalgorithm, Partition Sort, is again basedon divide and conquer. Given a sequenceX, a partition protocol computes a parti-tion P(X) = {XI,. ... X , ~ ~jll}. Note thatthe size of the partition ~epends on theinput X. Each X, that is simple is sortedusing a secondary sorting algorithm thatsorts simple sequences. Each part that isnot simple is sorted by a recursive call ofPartition Sort. In the final step, all thesorted parts are merged; see Figure 8.

The merging of IIP(X)II sorted partstakes O(IXI(l + log[llP(X)ll + 11)) time

(pairing the parts Xz, ~ and Xz,, fori=l ,..., lllp(X)ll/21 and merging themreduces the number of parts by a half in

linear time). Partition Sort can take asmuch time to compute the partition as itdoes to merge the sequences resultingfrom the recursive calls. Our goal, how-ever, is to obtain an algorithm that takesO(IXI(l + log[ll P(X)ll + 1])) time, IfIIP(X)II is related to a measure M ofdisorder, then the algorithm will beadaptive with respect to M. However, wewill achieve our goal only if a constantfraction of the recursive calls are simpleand only if the sizes of the partitionsobtained during further recursive callsdo not increase arbitrarily.

We make these notions precise in thefollowing theorem [Estivill-Castro andWood 1992c].

THEOREM 1.2. Let c c !It be a con-

stant with O < c < 1. Let PP be a parti-

tion protocol and d > 0 be a constant

such that, for all sequences X ● N ‘‘:

●

☛

●

The partition protocol PP creates a par-

tition P(X) = {Xl, . . . . X ,,P(x),,} of xmaking no more than d\Xl(l +

log[ IIP(X)II + 1]) comparisons

The sum of the lengths of the simple

sequences in P(X) is at least clXl

If PP is applied to any X, in P(X),

then no more than IIP( X)11 sequences

are obtained

If there is a constant k >0 and a sorting

algorithm S such that, for all simple

sequences Y resulting from partitions of

X and its parts, algorithm S sorts Y by

making no more than klYllog(ll P(X)ll + 1comparisons, then Part Ltzon Sort makes

O(IXI(l + log[ll P(X)lll + 1])) compar-

isons. In other words, Partition Sort is

adaptive with respect to the size of the

initial partition.

Before we present some applications ofPartition Sort -we make two observations.First, Generic Sort can be regarded as avariant of Partition Sort. The specificform of Generic Sort, however, results in

ACM Computing Surveys, Vol 24, No 4, December 1992


Partition Sort(X)

e X is sorted. Terminate.

o .Y is simple. Sort it using an algorithm

for simple sequences.

c X is neither sorted nor simple.

– Create a partition P(Y)

= {x,,..., x-,,p(~),,}

–Fori= l,..., IIP(X)II, sort X, re-

cursively

– Merge the sorted sequencesto giveJ“ in sorted order

Figure 8. Partztion Sort. The second generic

sorting algorithm.

a stronger theorem (Theorem 1.1) that iseasier to apply. Second, Carlsson andChen [1991] attempted to use PartitionSort as a general framework for adaptivesorting algorithms and defined the“minimal size of a specific type r of parti-tions” as a generic measure of disorder;thus the relationship between the size ofthe partition and the measure would beimmediate,

1.3.1 Applications of Partition Sort

The first application is to Natural iWerge-

sort [Knuth 1973, p. 161]. NaturalMergesort partitions the input intoascending runs. A sequence is consideredsimple if it is sorted, and, in fact, in thisexample all parts of the partition aresimple. Clearly, a partition into ascend-ing runs can be obtained in linear timeby scanning the input and finding allstep-downs. Since the number of ascend-ing runs is directly related to the mea-sure Runs, Natural Mergesort takes0(1 X1(1 + log[ Rurzs(X) + 1])) time andis, therefore, Runs optimal; a new proofof Mannila’s known result [Mannila1985b].

Our second application is to Skiena’sMelsort, which we now describe [Skiena1988]. When Melsort is applied to aninput sequence X, it constructs a parti-tion of the input that consists of a set of

sorted lists called the encroaching listsof X. Since encroaching lists are sorted,the simple parts are sorted sequences,and all parts are simple. In the finalstep, Melsort merges the lists to obtainthe elements of X in sorted order. Theencroaching set of a sequence X =(X1,..., x.) is defined by the followingprocedure: We say that x, fits a double-ended queue D if x, can be added toeither the beginning or the end of Dto maintain D in sorted order. We startwith x ~ as the only element in the firstdouble-ended queue, and, for i =

2, ..., 1X1, we insert x, into the firstdouble-ended queue in which it fits. Wecreate a new double-ended queue if x ~does not fit any existing double-endedqueue. An example should make this pro-cess clear. Consider the sequence W~ =(4, 6,5,2,9, 1,3,8,0, 7). Initially, D,consists of 4. The second element, 6, fitsat the end of DI. The third element, 5, isbetween 4 and 6, so 5 is added to anempty Dz. The next three elements all fitDI and are placed there. The element 3does not fit Dl, but it fits Dz. Similarly,8 fits Dz, and O fits DI; but the lastelement requires a new double-endedqueue. The final encroaching set is

{Dl = [0,1,2,4,6,9], D,=[3,5,8],

D~ = [7]}.

The number of lists in the encroachingset is the measure Enc of disorder. Thus,in our example, Enc((4, 6,5,2,9,1,3,

8,0, 7)) = 3. Since the encroaching setcan be constructed in O(IXI(l +log[ Eric(X) + l])) time, Melsort takesO(IXI(l + log[ Eric(X) + 1])) time; there-fore, Melsort is Enc optimal, and we haveobtained a new proof of Skiena’s result[Skiena 1988].

Our third application is to Slab Sort

[Levcopoulos and Petersson 19901. SlabSort is a sorting algorithm that achievesoptimality with respect to SMS( X) (theminimum number of shuffled monotonesubsequences of X) and, therefore, opti-mality with respect to Dis, Max, Runs,

and SUS. Although Slab Sort is animportant theoretical breakthrough it

ACM Computing Surveys, Vol. 24, No 4, December 1992


has limited practical value becauseit requires repeated median finding.

Slab Sort sorts sequences X withSMS(X) S z using O(IXI(l + log[.z +l])) comparisons. The input sequence Xis stably partitioned into p = [ z 2/2]

parts of almost equal size using the 11 +

lX1/pl-th, 11 + 21X1/pl-th,.. .,11 + (p –1)1XI/p ]-th elements as pivots. Theseelements are found by repeated medianfinding that makes a total of O(lXllog[ p + 1])comparisons. The partitioning issimilar to the partitioning in Quicksort,but with p – 1 pivots, and it can be car-ried out in 0(1X1(1 + log[ p + 1])) time,as is shown by Frazer and McKellar[1970] for the partitioning in Samplesort.

Before describing Slab Sort, we mustdefine zig -z ag shuffled sequences

because, in this application, zig-zag sequences correspond to simplesequences.

Definition 1.3 Letting X ● N <‘,we denote a subsequence of X by

(~,~l),. . . . ~,(,)), where i: {l,.. ., S} ~{l,... , IX 1}is injective and monotonicallyincreasing. We say that a subsequence

(~t(l)> . ..7 ~~) ) is an up-sequence if

~L[l) < XL(2) . . . < x,(~). Similarly, wesay that a subsequence. ( xl(l), . . . . xl(~) )is a clown-sequence If x,(l) > X,(2, >. . . > x,(,). A subsequence is monotone

if it is either a down-sequence or anup-sequence. We say that two subsequen-ces (x Z(l), . . . , ~z(s))> (~J(l)2 . ..2 ~,(,))intersect if {z(l), i(l) + 1, . . . . i(s)} n

{j(l),j(l)+ 1,...,j(t)} + 0.

For example, if WO = (6, 5, 8,7, 10, 9, 4, 3, 2, 1), the subsequences(6, 8, 10) and (5, 7,9) intersect, but

(6,8, 10) and (4,3,2, 1) do not.

Definition 1.4 A sequence X is a zig-zag shuffled sequence if there is apartition of X into SMS( X ) monotonesubsequences such that no up-sequenceintersects a down-sequence.

A geometric interpretation of zig-zagshuffled sequences can be obtained asfollows. Given a sequence X, for each x,we plot a point (i, ranh( x, )) in the plane.

Given the minimum decomposition of Xinto monotone subsequences, we join theconsecutive points of each monotone sub-sequence with line segments. Then, Xis a zig-zag shuffled sequence if nodown-sequence curve intersects anup-sequence curve; for example, seeFigure 9.

Let Xl, Xz, . . . . XP be the subse-

quences given by the Slab Sort partitionof X. Zig-zag shuffled sequences areimportant because if X, is a zig-zag shuf-fled sequence and SMS(X,) < z, thenEnc( X, ) < z and Melsort sorts X inO(IXI(l + log[ z + 1])) time; hence, Mel-sort can be used to sort zig-zag shuffledsequences, Moreover, for any X,, if Mel-sort attempts to construct an encroach-ing set with more than z sorted lists,then we know that X, is not simpleand should be sorted by a recursiveinvocation of Slab Sort.

Finally, we must ensure that a con-stant fraction of the elements in Xbelongs to a simple subsequence. LetX be a sequence with SMS( X ) < z.

Assume that a minimum partition bymonotone sequences has ZI down-sequences and Zz up-sequences withZ1+Z2 <Z. Thus, the number of inter-sections among up-sequences and down-sequences is bounded by Zlzz < z2/4.Since Slab Sort stably partitions X intop = [ z2/21 intervals Xl, . . . . XP, at leasthalf of these intervals are zig-zag shuf-fled sequences with SMS( X, ) < z. Hence,half of the intervals are simple, and, sinceall intervals are almost the same size,about half of the elements in the originalsequence belong to simple sequences. Weconclude that given a sequence X and aninteger z such that SMS(X ) < z, SlabSort sorts X in O(IXI(l + log[ z + 1]))time.

1.4 Exponential Search

Let M be a measure of disorder. Supposethat we have a sorting algorithm A thathas two inputs, a sequence X and abound z on the disorder of X, and usesthe bound z to sort X in O(logl Ibelow

(z, IXl, M)ll) time. Although algorithm A

ACM Computing Surveys, Vol. 24, No. 4, December 1992


Figure 9. A plot of a zig-zag shuffled sequence

is adaptive, it requires z as input, andthus we do not consider it to be a generalsorting algorithm. We describe a design

tool however, that allows us to use algo-rithm A as a building block for a generalsorting algorithm that is M adaptive.

The idea is to estimate a bound on thedisorder in the input and then applyalgorithm A [Estivill-Castro and Wood1989]. We describe a scheme to approxi-mate the disorder based on a variant ofexponential search [Mehlhorn 1984,p. 184].

The sorting algorithm Search Sort usesa variant of exponential search to find anupper approximation to A!l( X). In otherwords, we find a bound z, such that (1)M(X) s z and (2) z is not too farfrom M(X), and then give this bound to

algorithm A [Estivill-Castro and Wood1992d].

Let k be a constant such that, for allX = N’N and all z with M(X) < z,algorithm A on inputs X and z takes nomore than k x ~(1X 1,z) comparisons,where

f(lxl, z)

= max{l Xl,logllbelow(z, 1X1, M) Il.

The general algorithm Search Sort isshown in Figure 10. Note that f(l Xl, Z)

is nondecreasing function of z. If theupdating of the estimate of the bound onthe disorder is carefully selected, the timetaken by Search Sort is dominated by thelast invocation of algorithm A, and thus,

we obtain an algorithm that requires only

the sequence X as input and, up to a

constant factor, is as fast as algorithm

Search Sort(Y)

6 Set an initial bound .Z = C, where c > ()

is a small constant.

o Repeat

– Call algorithm A with inputs .X

and :, and count the comparisons

.4 performs.

- If A makes more than k x f(l.l”[, z)

comparisons, then interrupt A and

update the guess z (usually hy

squaring z).

– If .4 finishes successfully with at

most k x f( /.%-l, :) comparisons,

then terminate.

Figure 10. Search Sort. The third generic sorting

algorithm.

A. The following theorem formalizes thedetails.

THEOREM 1.5 Let k >0 be a constant

such that, for all X G N c N and all z

with M(X) < z, algorithm A on inputs X

and z takes no more than k X f(l X 1,z)

comparisons. If c > 1 is a constant such

that, for all z,

c x f(lxl, z2’) <f(lxl, zz’”),

and if Search Sort updates the guess z by

squaring it, then Search Sort makes

O(f(lxl, M(x))

comparisons to sort X; Search Sort is M

optimal.

ACM Computmg Surveys, VO1 24, No. 4, December 1992


1.4.1 An Application of Search Sort

As an application of Search Sort wetransform Slab Sort (see Section 1.3),which requires a bound on the disorderin the input, into a general sorting algo-

rithm that is SMS optimal.

We let the initial guess be z = 2, and,

rather than counting the comparisons

performed by Slab Sort, we test if the

bound z is adequate by verifying that at

least half of the parts in the partition aresimple. If at any time we do not obtain

this many simple parts, then we square

the value of z, and the recursive calls of

Slab Sort use the new value of z. It is not

hard to verify that the new algorithm

makes 0( IX 1(1 + log[ 1 + SMS( X)]) com-parisons, so it is SMS optimal. Thisexample illustrates that combiningPartition Sort and Search Sort gives ageneral sorting algorithm whose divisionprotocol is sophisticated and powerful.

1.5 Adaptive Merging

We have presented three generic sortingalgorithms that are based on divide andconquer. Their adaptive behavior is adirect consequence of a carefully designeddivision protocol, whereas the mergingphase is simple and oblivious to the dis-order in the input. However, if the inputis nearly sorted, then it is intuitivelyclear that there should be little disorderamong the sorted sequences that resultfrom the recursive calls. There have beentwo efforts to design sorting algorithmswhose merge phase profits from suchexisting order.

Carlsson et al. [1990] presented thefirst adaptive sorting algorithm based onadaptive merging. Their work extendedthe notion of adaptivity and optimaladaptivity to merging algorithms; theyproposed a new merging algorithm,Adaptmerge. Using this merging algo-rithm to obtain a sorted sequence fromthe runs in the input, they obtained avariant of Natural Mergesort that is Dis-,

Exe-, Rem-, and Runs optimal. Unfortu-nately, Adaptmerge represents its outputas a linked list of sorted segments, whichcomplicates its implementation.

The second sorting algorithm based onadaptive merging was designed by VanGelder [1991]. The merging strategyis similar to that of Adaptmerge and isbased on a simple idea: Let X =(xl, . . .. x.) and Y=(yl, . . ..y~) besorted sequences stored in an array withX before Y. Assume n is known, and thegoal is to merge X and Y. Their over-lap (given by indexes 1 and r such thatxl < yl < .%l+l and y, < x~ < y,+,) isfirst found. Second, the smaller ofx~, . . ..x.L and yl, . . . . y~ is copied toanother array and then merged with thelarger sequence in the original array. Toobtain a sorting algorithm, Van Gelderuses straight division for the dividephase and the above merging strategy tocombine sorted sequences. The resultingalgorithm has little overhead and is Dis-

and Runs optimal. It is also adaptive for

the measures Rem and Exe, but notoptimal.

2. EXPECTED-CASE (INTERNAL)ADAPTIVE SORTING ALGORITHMS

In contrast to the pessimistic view takenby worst-case analysis, expected-caseanalysis provides a more practical view,because normally the worst-caseinstances of a problem are unlikely. Wenow discuss sorting algorithms that areadaptive in the expected case. Thereare two approaches for expected-casecomplexity [Karp 1986; Yao 1977],the distributional approach and therandomized approach. In the distri-butional approach a “natural” distribu-tion of the problem instances is assumed,and the expected time taken by the algo-rithm over the different instances is eval-uated. This approach may be inaccurate,since the probabilistic assumptionsneeded to carry out the analysis may befalse. We show that, based on the distri-butional approach, sound definitions ofoptimality with respect to a measureof disorder can be made; but little newinsight is obtained from them. Thisdifficulty is circumvented with random-ized sorting algorithms, since theirbehavior is independent of the distribu-

ACM Computmg Surveys, Vol 24, No, 4, December 1992

tion of the instances to be solved. (Theexpected execution time of a randomizedalgorithm is evaluated on each individ-ual instance.) We demonstrate thatadaptive randomized sorting algorithmsexist and that they are simple and prac-tical. In contrast, worst-case time sortingalgorithms that are adaptive for sophisti-cated measures like SMS use complexdata structures or require medianfinding.

2.1 Distributional Analysis

Our goal here is to argue that distribu-tional analysis provides no new insightinto the behavior of adaptive sortingalgorithms. As we have pointed out, thedistributional approach to expected-casecomplexity analysis assumes a distribu-tion of the problem instances. Thestandard assumption for distributionalanalysis of the s;rting problem is that allpermutations of the keys are equallylikely to occur (clearlv. an assum~tionthat “can hardly be tru~ ‘in practice; ‘how-ever, it is accepted, for example, in thetextbooks that analyze Quicksort). More-over, nearly sorted sequences are com-mon in practice, and adaptive sortingalgorithms are more efficient [Knuth1973, p. 339; Mehlhorn 1984, p. 54].Under the assumption that all permuta-tions are equally likely, Katajainen andMannila [1989] defined optimality foradaptive algorithms, in the expected case,using the distributional approach.Although their definition is sound, it isunclear whether it provides any newinsights. They have been unable toconstruct an exam~le of a sortirwalgorithm that is optimal in the expectedcase and subo~timal in the worst case.Moreover, the’ definition does not shedlight on the behavior of Cook and Kim’sCKsort. Recall that CKsort atmlies Cook-Kim division to a sequence X and splitsX into two sequences, one of which issorted. Quichersort is used to sort theunsorted sequence, and, finally, the twosequences are merged [Cook and Kim1980]. CKsort takes 0(1X12) time in theworst case, and the computer science fra-

Adaptiue Sorting Algorith ms ● 457

ternity has assumed that it is adaptivewith respect to Rem in the expected case[Cook and Kim 1980; Wainwright 1985].In the second phase of the algorithm,however, the sequences of length at most2 Rem(X) that are given to Quickersortare not equally likely to occur; therefore,CKsort is not expected-case optimal withrespect to any nontrivial measure andKatajainen and Mannila’s [1989]definition of optimality.

Li and Vitanyi [ 1989] show that theuniversal distribution is as reasonableas the uniform distribution and, they areable to explain why nearly sortedsequences appear more frequently inpractice. Moreover, they show that if anexpected-case analysis is carried out forthe universal distribution, then theexpected-case complexity of an algorithmis of the same order as its worst-casecomplexity. Their result implies thatQuicksort requires @(l X /2) time inthe expected case, and it shows thatdistributional-complexity analysisdepends heavily on the assumed distri-bution. More realistic distributions areusually mathematically intractable. Inaddition, not onlv are nearlv sortedsequences more lik”ely, buttribution may change overof the difficulties posedtributional approach, werandomized approach.

2.2 Randomized Algorithms

We use randomization as

also their dis-time. Becauseby the dis-consider the

a tool for thedesign of sorting algorithms that are bothadaptive and practical. With randomiza-tion, we can enlarge the set of measuresfor which a sorting algorithm is adaptivewithout increasing the difficulty of apractical implementation. The difficultieswith the distributional approach are cir-cumvented by randomized algorithmsbecause their behavior is independent ofthe distribution of the instances to besolved. Janko [19761 and Bentley et al.[1981] have successfully used randomiza-tion for a constrained sorting problem.

Using the terminology introducedby Mehlhorn [1984] and Yao [1977] for


458 ● V. Estivill-Castro and D. Wood

comparison-based algorithms, wetranslate the fl(logllbelow(., . , . )11)lower bounds for the worst case to lowerbounds for the distributional expectedcase, which in turn, result in lowerbounds for the randomized expected case.These implications allow us to capturethe notion of optimal adaptivity for ran-domized sorting algorithms.

Again let M be a measure of disorderand consider sorting algorithms havingas input not only the sequence X butalso an upper bound z on the value ofM(X). Recall that the correspondingdecision trees for such algorithms haveat least IIbelow(z, 1X1, M)ll leaves, sincebelow(z, 1X1, M) is the number ofpossible inputs. The average depthof a decision tree D with at leastIIbelow( z, 1X1, M )11 leaves is denoted byAverage-depth(D) and is

depth of Y in D

EYGbelow(z, [Xl, M) Ilbelow(z, 1X1, M)ll “

(2)

This formula implies that there is a con-stant c > 0 such that, for every sortingalgorithm D that sorts sequences inbelow(z, 1X1, M),

Average-depth(D) > c

x logllbelow(z, 1X1, M) Il. (3)

Now, a randomized sorting algorithmcorresponds to a randomized decisiontree [Mehlhorn 1984]. In a randomizeddecision tree there are two types of nodes;we give an example of a randomized com-parison tree in Figure 11. The first typeof node is the ordinary comparison node.The second type is a coin-tossing nodethat has two outgoing edges labeled Oand 1 which are each taken with a proba-bility of 1/2. Without loss of generalitywe can restrict attention to finite ran-domized decision trees; thus, we assumethat the number of coin tosses on inputsof size n is bounded by some function

k ( n). Which leaf is reached depends notonly on the input but also on the sequences G {o, l}k’n) of outcomes of coin tosses.For any permutation Y and sequences E {O, 1}~(”), let d~j , denote the depth of

the leaf reached with the sequence sof coin tosses and input Y. Let T be arandomized decision tree for the sequenceX; then, the expected number of compar-isons performed by T on X is denoted byE[ T( X )] and is given by

that is, E[ 7’( X)] is the average numberof comparisons of d;, , over all random

sequences s ● {O, l}k[”’.Let w be the family of all randomized

algorithms, and let R be a specific ran-domized algorithm. We denote by T~(X)

the number of comparisons performed byalgorithm R on input X; T~( X) is arandom variable. A lower bound on thelargest expected value of T~(X), for Xin below(z, 1X1, M), is given by thefollowing equation:

max EIT~(X)lXGbelow(z, lXl, M)

2 min maxTcW Y~ZJdML,(Z, Ixl, M)

E[Z’(Y)].

Observe that once the sequence s ~

{0, l]k(”’ is fixed, a randomized decisiontree becomes an ordinary decision treeD. that sorts sequences in below

(Z, IX 1, M), since the maximum value ofE[ T( Y )] is never smaller than its aver-age value:

max E[T~(X)]XEbelou, (z, 1.X1,M)

2 min E‘~g Yshelou, (z, lX/, M)

E[T(Y)]

\lbelow(z,lXl, M)\\ “


Adaptive Sorting Algorithms o 459

XI?X2

coin(1

r1<x3<r2 x3~x1<r2 zz<zl<.r~ rl<xs<.z,

Figure 11. A randomized comparison tree for sorting the sequence x ~, x ~, X3.

Now, replacing ,?3[Z’(Y )] with

reordering the summations, since theyare independent, and using Equations (2)and (3) we obtain

max E[TR(X)]X= belo20(2,1Xl, M)

2 minT=@

,e{:,~(. z~n’

Average depth[ D, ]

>dlogl\ below (z, lXl, M)ll.

A randomized adaptive sorting algorithmcannot be faster than the fastest ran-domized sorting algorithm, which uses abound on the disorder, when it is pro-vided with the best possible bound,namely z = M(X). Intuitively, an adap-tive randomized sorting algorithm R isoptimal in the expected case with respectto a measure M of disorder if for everysequence X, the expected value of 7’~(X)is within a positive constant of the mini-mum value. Finally, since testing forsortedness requires linear time, a sortingalgorithm should be allowed at least alinear number of comparisons. Therefore,we capture the notion of optimal adaptiv-ity for randomized sorting algorithms asfollows.

Definition 2.1. Let M be a measureof disorder, R be a randomized sortingalgorithm, and E[ T~( X)] denote the

expected number of comparisons per-formed by R on input X. We say that Ris optimal with respect to M (or M opti-mal) if, for some c > 0, we have, for allXeiV’N,

E[T~(X)]

< c X max{l X\,logllbelow( M(X),

[xl, M)/l}. ~

Thus, for randomized algorithms thelower bounds for optimal adaptivity coin-cide with the lower bounds for optimaladaptivity in the worst case as displayedin Table 1; however, with randomizationit is possible to obtain adaptivity for moremeasures with simpler implementa-tions. Moreover, the relationships <.lgand =.lg between measures are pre-served for randomized algorithms.

2.3 Randomized Generic Sort

We now present a randomized genericadaptive sorting algorithm RandomizedGeneric Sort that facilitates the design ofan adaptive algorithm by enabling us tofocus our attention, once more, on thecombinatorial properties of measures ofdisorder rather than on the combinato-rial properties of the algorithm. The ran-domized generic adaptive algorithm isbased on divide and conquer. The divi-sion phase is performed, however, by arandomized division protocol that, in anexpectation sense that we formalize later,does not introduce disorder. The genericalgorithm gives rise to randomized sort-ing algorithms that are adaptive in the


460 e V. Estivill-Castro and D. Wood

expected case; moreover, they are simple Randomized Generic Sort(.X” )

and practical.We denote by B-[ P(X)] the probability

that, on input X, the randomized divi-sion protocol produces a partition

P(X) = {Xl, . . . . Xllp(x)ll}. Let TRDP(X, P(X)) be the number of comparisonsperformed by the randomized divisionprotocol to create P(X) when partition-ing X. We say that the protocol is linearin the expected case if there is a constantk such that, for all X G iV <‘,

EIT~~P(X)l = ~ Pr[P(X)lP(x)

“ TRDP(X, J’(X))

s klX1.

An example of a randomized division pro-tocol is

● Randomized median division: Select,with equal probability, an element xfrom X as a pivot to partition X intox . . (a sequence of elements smallerthan x) and X,, (a sequence of ele-ments larger than x). The size ofthe partition is either two or three

(the pivot counts as one element of thepartition).

The structure of Randomized GenericSort is presented in Figure 12.

The following theorem formally estab-lishes that Randomized Generic Sortresults in a sorting algorithm that isadaptive in the expected case with respectto a measure M, when we use a divisionprotocol that reduces the disorder in theexpected case. Observe that

IIP(X)II Ijyzl

z—,=, IXIM(XJ

is the sum of the disorder in the parts ofthe partition weighted by their corre-sponding fractions of IX 1. Observe alsothat

IIP(X)II lx I

~ Pr[P(X)] ,~1 ~M(X,)

P(x)

● X is sorted, Terminate.

0 .Y is simple. Sort 1- using an alternative sorting

algorithm.

● .Y is ne]ther sorted nor simple.

Apply a randomized division protocol to

divide X into at least two disjoint se

quences.

– Recursively sort the sequences using

Randomized Generic Sort.

— lferge the sorted sequences to give X in

sorted order

Figure 12. Randomwed Gener~c Sort. A random-ized generic sorting algorithm.

is the expected disorder in the parts withrespect to the set of partitions generatedby the division protocol.

THEOREM 2.2 Let M be a measure of

disorder such that M(X) = O implies X is

simple, c be a constant with O < c <

1, and RDP be an expected-case linear-

time randomized division protocol that

randomly partitions a sequence X into a

partition P(X) = {Xl,.. ., XIIP(X)II} with

probability Pr[ P(X)].

(1) Randomized Generic Sort is expected-case optimal; on a sequence of length

n, it takes O(n log n) expected time.

(2) If there is an no G N and a constantp >2 such that, for all sequences X

with 1X1 > no, RDP satisfies 2<

\]P(X)ll <p, and

IIP(X)II lx I

~ Pr[P(X)l ,~1 ~M(X,)P(x)

< c x M(X),

then Randomized Generic Sort is

adaptive with respect to M in the

expected case, it takes

O(lxl(l + Iog[k’(x) + 11))

time in the expected case.



Since it is the first time that the resulthas been presented for a generic

algorithm, as compared with a similarresult for a specific algorithm by Estivill-Castro and Wood [1992a], we provide themain ideas of its proof.

Sketch of the proof of Theorem 2.2 Theproof requires an induction on 1X 1. Letb >0 be a constant such that:

The expected time taken by Random-ized Generic Sort to sort all simplesequences of length n is at most bn.

The expected time taken to discoverthat a sequence of length n is not sim-ple and, in this case, to partition it andcarry out any merging after the recur-sive calls is at most bn.

Let d be greater than b/log[2/(1 + c)],T~~~( X) denote the number of compar-isons performed by Randomized GenericSort on input X, E[Z’~~~(X)] denote theexpected value of this random variable,and E[ T~~~ ]( n, k) denote the maximumvalue of E[ T~~~( X)] over all X of lengthn with M(X) = k. We show by inductionthat

E[7’~~~](n,k) <dn(l +log[k + l]),

which establishes the theorem. We sup-ply only the induction step since the basisis immediate. Thus, we assume k > 1,and we let X be a sequence of length nwith M(X) = k, The induction hypo-thesis is

E[T~~~](n’,k) <dn’(1 +log[k + 11)>

for all n’ < n. Now,

E[T,~~(X)] < bn + ~ Pr[P(X)]P(x)

[

IIP(X)II

1~ E[TRG.(X,)] ,~=1

since the right-hand side is the averageover all possible partitions of the sum ofthe cost of sorting each part. By the defi-

nition of E[ T~~s(X)],

E[TRG~(x)]s bn + ~ Pr[P(X)]

P(x)

IIP(X)II

~ E[T..s]~=1

(1X,1,M(XL)).

The induction hypothesis gives

‘[ TRGS(X)]< bn + ~ Pr[P(X)]

P(x)

IIP(X)II

~ dlXll(l + log~=1

[l+ M(X,)]).

Algebraic manipulation enables us toshow that this inequality implies that

E[T’GS(X)]s (b +d)n +dn log

[

1 + ~ Pr[P(X)]P(x)

Ilp(x)ll1X,134(X,)L

~=1 I1X1 “

Now, by hypothesis 2 and since d >

log[2/(1 + c)], we have

~[TRGs(X)l s (b + cl)~

+ dnlog[l + cM(X)]

<bn+dn

+ dnlog[l + M(X)]

<dn(l + log[l + M(X)]).

as required.

ACM Computing Surveys, Vol. 24, No. 4, December 1992


2.3.1 Applications of Randomized Generic Sort

Different variants of Quicksort have beenobtained by choosing different pivotselection strategies. Standard Random -

ized Quicksort selects the pivot randomly

and uniformly from the elements in

the sequence [Mehlhorn 1984]. Stan-

dard Randomized Quicksort can take

quadratic time on every sequence; more-

over, the number of comparisons

performed by Standard Randomized

Quicksort is a random variable Ts~~(X)such that T~~~(X) = fl(l Xllogl X]) andE[T,,~~(X)] = @(l Xllogl Xl), for everysequence X [Knuth 1973]. Thus, Stan-dard Randomized Quicksort is obliviousto the order in the input.

AS an application of R.andornized

Generic Sort, we modify Standard Ran-domized Quicksort such that while itpartitions the input, it checks whetherthe input is already sorted, and, if so, norecursive calls are made. We call thisalgorithm Randomized Quicksort. Thenumber of comparisons performed by thealgorithm on an input sequence X is a

random variable between c1 IX I andc2 1X12, for some constants cl, Cz > 0.Since Quicksort uses exchanges to rear-range the elements, we use Exe, the min-imum number of exchanges required tosort a sequence, to measure disorder. Itturns out that this measure character-izes precisely the adaptive behavior ofRandomized Quicksort as the followingresult shows [Estivill-Castro and Wood1992a].

THEOREM 2,3 The expected number of

comparisons performed by Randomized

Quicksort on a sequence X is denoted

by EITRQ(X)l and is @(l Xl(l +log[ Exe(x) + l])).

For k = 2,3, . . . . [n/21, the sequence

W(k) =(2,1,4,3,...,2 [2j,j,

2[h/2] – 1)

has E.m( W(k )) = k. We can construct asequence of length n and measure k by

appending the sorted sequence

YO =(2[k/2~ + l,2[k/2] + 2,..., n)

to W(k). Since T~~(W(h)Y) = Q(lnl(l +

log[ k + 1])), we have shown thatE[T~~(X)] is Q(lxl(l + Iog[-llxc(x) +

1])). The proof of the upper bound followsfrom Theorem 2.2. We now show thatRandomized Quicksort fulfills the

hypothesis of Theorem 2.2 [Estivill-Castro and Wood 1992a].

LEMMA 2.4 Let X = (xl, . . . . xm),

Xl(s) = (xi,.. ., x~_l) be the left subse-

quence produced by Randomized Quick-

sort’s partition routine when the sth

element is used as the pivot, and X,(s) =

(X:+l,.. , x;) be the corresponding right

subsequence. Then,

:Exc(X) >: ~

[

s–1—E.xc(Xl(s))

~=~ n–l

n—s+

1~Exc(xr(s)) .

The proof of this lemma, which uses clas-sic results by Cayley [1849], is laboriousbecause there are sequences in whichmost of the elements, when used as piv-ots, increase the disorder; that is, a pivots can give llxc(Xl(s)) + Exc(X,(S)) >Exc( X). However, Lemma 2.4 says that,on average, the partition given by a ran-domly selected pivot does not increasethe disorder as measured by Exe.

Cook and Kim [1980], Dromey [1984],Wainwright [ 1985], and Wegner [1985]have designed adaptive versions ofQuicksort that are deterministic algo-rithms with a worst-case complexityof 0( IX 12), but their distributionalexpected-case analysis seems mathemat-ically intractable. Thus, apart fromsimulation results, no other descriptionof their adaptive behavior is known.Estivill-Castro and Wood [ 1992b] haveshown that a worst-case bound of@(lXl(l + log[ 13xc(X) + l])) comparisonsis achievable if the median is used as thepivot. Observe that, although Random-ized Quicksort can be coded succinctlyand efficiently, it is not Exc optimal since

ACM Computmg Surveyb, Vol 24, No. 4, December 1992


this hypothesis implies that its runningtime would be O(IX] + Exc(X)[ 1 + logExe(x)]).

The second application of Randomized

Generic Sort is to Randomized Merge-sort, a variant of Mergesort that is Runs

optimal in the expected case. It isnot optimal in the worst case, withrespect to any nontrivial measure, for thesame reason that Randomized Quicksortis not optimal; that is@(lX\’) in the worst c~s~#~(~~e~unsorted sequence X. RandomizedMergesort first determines whether theinput is already sorted, and, if so, it halts.

Otherwise, Randomized Mergesort uni-formly selects a split position and thenproceeds in the same way as the usualMergesort. The time complexity of Ran-domized Mergesort for a sequence X is arandom variable independent of thedistribution of the problem instan-ces. From Table 1 we know that log I\below(X, Runs)ll is Q(IXI[l + log(Runs(X) + I)]); thus, the following theoremimplies that Randomized Mergesort isRuns optimal in the expected case.

THEOREM 2.5 lfEIT’~~( X)] is the ex-

pected number of comparisons performed

by Randomized Mergesort on a sequence

X, then

EITRll~(x)] = 6)1X1(1 +

log[l + Runs(X)])).

Theorem 2.5 follows by veriffing thatRandomized Mergesort is an applicationof Randomized Generic Sort that ful-fills the hypotheses of Theorem 2.2.Although the uniform selection of thesplit position is the most practical alter-native, Randomized Mergesort may useother distributions to select the splitposition [Estivill-Castro and Wood,1992a].

2,4 Randomized Partition Sort

Naturally we can define a randomizedversion of Partition Sort, which we callRandomized Partition Sort. It is alsobased on divide and conquer through a

randomized division protocol. We canrelax the requirements, however, that therandomized division protocol takes lineartime in the expected case and that deeperrecursive levels have no more partsthan the partition at the fist level. ForRandomized Partition Sort, given asequence X, the randomized partitionprotocol obtains a partition P(X) =

{x, . . . . X,, P(X),,} with probability

Pr[ P(X)]. Each simple part X, is sortedusing an alternative sorting algorithmfor simple sequences, and each part thatis not simple is sorted by a recursive call.In the final step, all sorted parts aremerged; see Figure 13. The number ofparts in the partition, IIP(X )11,is a ran-

dom variable; thus, the expected timetaken for the merge is no more than

~ c XPr[P(X)]lXl(l + logP(x)

[Ilp(.mll + 11)

< Clxl(l + log

[E[llz’(x)ll] + l]),

where c is a constant and where

E[ IIP(X) III is the expected value ofIIP(XM.

Thus, the expected time taken by thepartition protocol to obtain a partition ofX is bounded from above by c1X1(1 +log[ Ell P(X) II] + l]).If the expected time

taken by the recursive calls is alsobounded by cIXI(l + log[E[llH-X)ll] +l]), then we obtain an algorithm that isadaptive with respect to E[ IIP(X)III, and

if E[ IIP(X) III is related to a measure Mof disorder, we obtain adaptivity withrespect to M.

This approach works however only ifthe expected length of the simplesequences is a constant fraction of thetotal length of the sequence to be parti-tioned as is made precise in the followingtheorem.

THEOREM 2.6 Let c be a constant with

0< c < 1, RDP be a randomized divisionprotocol, and d be a constant such that,

for all sequences X = N‘’, the RDP cre-

ates a partition P(X) = {X1, . . . . XIIP(X-)ll}

of X in no more than d IX 1(1 + log


464 . V. Estiuill-Castro and D. Wood

Randomized Partition Sort( 1“ )

o X is sorted. Terminate

o .X is simple. Sort it using an algorithm for sim-

ple sequences.

e X is neither sorted nor simple,

– Using a randomized dlvmlon protocol con-

struct a partition

P(A) = {JY1 . . . . ..Z-IIP(X)II}.

–Fort= l,,,., IIP(.Y) II, sort .Y, recursively,

– Merge the sorted sequences to give .Y in

sorted order.

Figure 13. A relaxed, randomized divmon protocolresults m Randomized Partlt~on Sort.

[ E[ IIP(X) II] + 1]) comparisons, and

cIXI 2 ~ Pr[P(X)] z IxLl>P(x) 1eJ(P(x))

where J( P( X)) is the set of indices of

simple parts.

If there is a constant k >0, and a sort-

ing algorithm S such that for all possible

simple sequences X, in P(X), algorithm

S sorts Xl by making no more than

klXJ(l + log[E[ll P(X)ll] + 1]) compar-

isons, then the expected number of

comparisons to sort X by Randomized

Partition Sort is

O(]XI(l + log[E[ll P(X)ll] + 1])).

Randomized Partition Sort is adaptive

with respect to the expected size of the

partition.

One application of Randomized Parti-

tion Sort is to Randomized Slab Sort.

Recall that Slab Sort sorts a sequence Xwith SMS( X) < z using no more thanO(IXI(l + log[l + z])) comparisons. Itrequires, however, p = [ z 2/21 pivots thatare found using median findingthat makes O(IXI(l + log[l + p])) com-parisons. In contrast, Randomized SlabSort uses random selection of a sample ofp pivots as in Samplesort [Frazer andMcKellar 1970]. Estivill-Castro and Wood

[ 1991c] show that Randomized Slab Sortsatisfies the requirements of Theorem2.6. Thus, for a sequence X withSMS(X) < z, Randomized Slab Sorttakes O(IXI(l + log[l + z ])) expectedtime.

2.5 Skip Sort

We have argued that sorting algorithmswhich are adaptive with respect to sev-eral measures take advantage of existingorder even when it appears in differentforms. Such algorithms are, however,hard to obtain without the use of com-plex data structures when worst-caseadaptivity is desired. In contrast, ran-domization enables us to achieveexpected-case optimality with respect tomany measures with simpler data struc-tures. To illustrate this point once more,we recall that Moffat and Petersson[ 1991] designed an insertion-sortingalgorithm that, in the worst-case, per-forms an optimal number of comparisonswith respect to all measures of presorted-ness appearing in the literature, animportant theoretical contribution. Thedata structure that they use to representthe sorted portion of the input is a histor-ical search tree that provides fast localinsertions. A local insertion is an inser-tion at a position in a sorted sequencethat is not too far from previous inser-tions. The additional virtue of Moffat andPetersson’s historical search tree is thatit implements two types of local inser-tions efficiently: Those insertions forwhich distance is measured by space (thenumber of links followed) in the datastructure and those for which distance ismeasured by time in the sequence ofinsertions (the number of elements inthe sequence between the successorand the element currently beinginserted). Unfortunately, the use of a his-torical search tree implies not only thatthe time taken by the sorting algorithmis not proportional to the number of com-parisons, but also, and more seriously,that the possible applications of the algo-rithm are limited because of theoverhead.

ACM Computmg Surveys, Vol. 24, No 4, December 1992


level 7

leuel 6

leuel 5

level 4

level 3

level 2

level 1

HEADER Ist 2nd 3rd ith 5th 6th 7th 8th

Figure 14. A skip list of 8 elements with the search path when searching for the 6th element. The updatevector for the 6th element is [ 5th, 5th, 4th, 4th, 4t?L 4th, HEADER], since these positions define the nodeswhose pointers may need to be modified if the height of the node at the 6th position changes.

We present Skip Sort, a practical sort-

ing algorithm that is optimal with respectto Dis, Exe, Inv, Max, Rem, ancl Runs

in the expected case. Skip Sort is an

insertion-sorting algorithm that uses a

skip list to represent a sorted subse-quence. Informally, a skip list is a singlylinked list with additional forward point-ers at different levels; an example skiplist is shown in Figure 14. It is a simpleand practical probabilistic data structurethat has the same asymptotic expected-time bounds as a balanced binary searchtree and allows fast local insertions [Pugh1990]. The price we pay for this perfor-mance is that skip lists are no longerworst-case optimal, but only expected-case optimal. Note that the same behav-ior can be achieved with randomizedtreapsl [Aragon and Seidel 1989]; we use

skip lists because updates in a treaprequire rotations, and care must be takento preserve the heap order of thepriorities.

Since Skip Sort is an insertion-sortingalgorithm, the elements are consideredone at a time, and each new element is

inserted at its appropriate position rela-tive to the previously inserted elements.The standard skip-list insertion of xl[Pugh 1990], begins at the header; how-ever, for Skip Sort, the insertion of x,into the skip list begins at the position ofx ,_ ~, the previously inserted element.

—1Treaps are search trees with keys and priorities

stored in their nodes. The keys are arranged inin-order and the priorities in heap order.

During the insertion of each element into

(and the deletion of an element from) theskip list, Pugh [1990] maintains what hecalls the update vector. It satisfies theinvariant:

update[ J ] points to the rightmost node

of level ~ or higher that is to the left of

the position of x,_ ~ in the skip list; see

Figure 14.

Since level-l links are the standard listlinks, update[ 1] points to the node justbefore x,_ ~. The search for the insertionposition of x, begins by comparing x,and X,.l. If x, < X,.l, then we search to

the left; otherwise, we search to the right.If we must search to the left, then we

repeatedly increase the level and com-

pare x, with the nodes pointed to by

updatel 1], update[21, . . ., until we find a

level j such that the element of the nodepointed to by update[ j] is smaller thanX,. At this stage, x, can be inserted afterthe node given by update[ j] using theskip-list insertion algorithm. If we ever

reach the top level of the skip list, weinsert x ~ by the skip-list insertion algo-

rithm beginning at the header. For

example, suppose we insert 18 immedi-

ately after 30 has been inserted into the

skip list of Figure 14. We compare 18

with 30 and determine that we must go

to the left. After comparing 18 with

update[l] = 19 and update[2] = 19, we

find that update[3] = 15 is smaller than18. Thus, skip-list insertion proceeds

from the fourth element and down from



level 3. A search to the right is performedsimilarly.

TO insert an element into a skip list

that is d positions from the last insertion

position takes 0(1 + log d) expected time

[Papadakis et al. 1990; Pugh 1990]. Let

CI( d) denote the number of comparisonsperformed by an insertion d positions

away from the last position and ~~~( X)

denote the number of comparisons per-

formed by Skip Sort on input X =

(xl,..., X.). Since J?ZITSS(X)I is the sum

of the expected times of the IX I insertions,

EITs~(X)l

1X1= ~ EICJ(d,(x))l

~=1

\z=l

)+log[l +d, (X)] ,

where d,(X) = Il{jll<j < i and (x, _l <x, < XL or x~<xl<x ,- ~)}ll. From thispoint onwards, the proofs of worst-case

optimality [Estivill-Castro and Wood

1989a; Levcopoulos and Petersson 1991b;

Mannila 1985b] can be modified to obtain

the following theorem.

THEOREM 2.7 Skip Sort is expected-

case optimal with respect to Inv, Runs,

Rem, Exe, Max, and Dis.

Sketch of the proofi To illustrate thetechnique, we give the proof for Runs.

We analyze x~!]l 1 + log[l + d,(X)] aswe did in the proof for the worst-case

time of Local Insertion Sort and use

E[TSS(X)]

(

1X1

)

= O z 1 +log[l +d, (X)] (4)~=1

to bound the expected value of T~s(X) bya function of IX I and the disorder in X.

Since logllbelow(Runx( X), 1X1, Run.sill =fl(l Xl[l + log(Runs(X) + l)]),to show

that Skip Sort is Runs optimal in theexpected case we must show that, for all

X, E[Ts~(X)] = O(IXI[l + log(Runs

(X) + l)]). Let R = Runs(X) + 1 and

t(r)be the set of indices of the rth run.

Thus, x!= ~ Ilt(r)ll = 1X1. Moreover, foreach run r, Ej.t(,) dl(X) < 21X1. Now,

:1 + log[l + d,(X’)1

I

1=1

=Ixl+f ~r=l jet(r)

log(l + dl(X)).

For integers dJ >0 such that ~, G,(, )dj < 21X1, the sum XJ=t(r)log(l + d ) is

maximized when d, = 21 X1/llt/r)ll.

Therefore,

~ 1 + log[l + d,(X)]

I

~=1

< 1X1 + f Ilt(r)lllogr=l

(1 + 21X1/llt(r)ll).

The right-hand side

1X1 + f Ilt(r)lllog(l + 21xl/llt(7-)11)~=1

= lx]+ 1X1 f~=1

[l”g(l + 21X\/llt(r)ll)]ll’(r’llxl

= 1X1 + Ixllogfi~=1

[1 + 21xl/]lt(r)ll]l’’’’/lxl

Since the geometric mean is no larger

than the arithmetic mean, we obtain.

1X1 + Ixllogfi~=1

[1 + 21xl//l t(r)ll]l’’’’ll/lxl

< 1X1 + Ixllog f [2 + llt(7-)11/l-Nl~=1

= 1X1(1 + log[2R + 1]).

This inequality and Equation (4) giveE[Tss(X)] = O(lXl(log[l + Runs(X)]))as required.


Furthermore, because of the simplicityof skip lists, Skip Sort is a practicalalternative to Local Insertion Sort.

Tables 2 and 3 presents simulationresults that imply that Skip Sort is twiceas fast as Local Insertion Sort. Both algo-rithms were coded in C, and simulationswere performed on a VAX-8650 runningUNIX 4.3DBS and measured using gprof[Graham et al. 1982]. Each algorithmsorted the same set of 100 nearly sortedsequences for each input length. The Rem

nearly sorted sequences were generatedusing Cook and Kim’s method [Cook andKim 1980; Wainwright 1985] withRem(X)/l Xl <0.2. The percentage ofdisorder in Table 3 is given by Max

(X)/1 Xl s 0.2, and the permutationswere generated on the basis that Max isa normal measure of disorder [Estivill-Castro 1991].

3. EXTERNAL SORTING

Currently, the sorting of sequences ofrecords that are too large to be held inmain memory is performed on disk drives[Salzberg 1988; 1989]. Initial runs arecreated during the first phase of externalsorting, and, during a second phase theruns are merged. With more than onedisk drive, Replacement Selection allowsfull overlapping of 1/0 operations andthe calculations performed in main mem-ory to create initial runs that are usuallylarger than the available main memory.If only one disk drive is available, Heap-sort also allows full overlapping, but thelength of the created runs is the size ofavailable main memory [Salzberg 19881.During the second phase of external sort-ing the initial runs are combined by Mul-tiway Merge, and the number ofruns merged in each pass (the orderof the merge) is chosen to optimize theseek time and the number of passes[Salzberg 1988; 1989]. Indeed it is possi-ble to achieve a two-pass sort, one passto create the initial runs and onepass to sort the runs with MultiwayMerge [Zheng and Larson 1992].

Sorting a very large sequence impliesthat the records must be read from diskat least once; thus, full overlapping of

Adaptive Sorting Algorithms 9 467

1/0 with the operations performed inmain memory produces initial runs forfree. Replacement Selection is of centralimportance because longer initial runsresult in fewer initial runs and, there-fore, fewer passes over the sequence inthe second phase. Thus, larger initialruns reduce the overall sorting time.

The classic result on the performanceof Replacement Selection establishesthat, when all input sequences areassumed to be equally likely, the asymp-totic expected length of the produced runsis twice the size of available main mem-ory [Knuth 1973, p. 254]. Similar resultsconcerning the length of the first, second,and third initial run as well as thelast and penultimate initial run havebeen obtained by Knuth [1973, p. 2621.Other researchers have modifiedReplacement Selection such that, asymp-totically, the expected length of the ini-tial runs is more than twice the size ofavailable memory [Dinsmore 1965;Dobosiewicz 1984; Frazer and Wong1972; Ting and Wang 1977]. These meth-ods have received limited acceptancebecause they require more sophisticated1/0 operations and prohibit full overlap-ping; thus, the possible benefits hardlyjustify the added complexity of themethod. Similarly, attempts to designnew external sorting methods that profitfrom the existing order in the input faceinefficient overlapping of 1/0 operations.

Several researchers have observed,however, that the lengths of theruns created by Replacement Selec-tion increase as the disorder in theinput sequence decreases. For fixed inputlength, as the lengths of the initial runsincreases, the number of passes tomerge these runs decreases. Thus, exter-nal sorting using Replacement Selectionis sensitive to the disorder in the inputwhen two or more disk drives are avail-able. Let P be the maximum number ofrecords that can be stored in main mem-ory. In the worst case, ReplacementSelection produces runs no larger thanP. Replacement Selection facilitates themerging phase when it produces runs ofmore than P elements. Based on thisidea, Estivill-Castro and Wood [1991a]

ACM Computing Surveys, Vol. 24, No. 4. December 1992


Table 2. A Comparwon of Local Inserhon Sort and Skip Sort on Rem Nearly Sorted SequencesThe CPU time IS measured In mdkseconds

Algorithm I.xl

64 128 256 512 1024 2048 4096 8192

L I Sort 57.8 120.2 247.0 517,1 1086,56 2191.28 4618.76 9224.77

Skip Sort 25.2 51.2 105.6 224.4 483.11 979.32 2094.97 4181.15

Table 3. A Comparison of Local Inserhon Serf and Skip Sort on Max Nearly SortedSequences (CPU t!me m mllhseconds)

.&lgorit hm I.y

64 128 256 512 1024 2048 4096 8192

L I Sort 63.5 127.4 24.5.6 497.9 1052.76 2163.80 3115.64 9552.80

Skip Sort 27.1 54.2 109.5 234.4 507.61 1080.16 2370.37 4881.20

describe mathematically the worst-caseperformance of Replacement Selection onnearly sorted sequences. They introducetwo measures to assess the performanceof Replacement Selection. These mea-sures are minimized when ReplacementSelection does not create runs with morethan P elements and are maximizedwhen Replacement Selection creates asorted sequence. Their results show howthe performance of Replacement Selec-tion smoothly decreases as the disorderin the input sequence increases. In theiranalysis, they show that, as is commonin practice, it is essential to considerinput sequences that are much largerthan the size of main memory; otherwise,the disorder in the input sequence haslittle effect. The reason is that, for shortnearly sorted sequences, ReplacementSelection does not produce long runsbecause there are simply not enoughelements. We now summarize theseresults.

3.1 Replacement Selection

The environment for ReplacementSelection consists of two disk drives; seeFigure 15. The input sequence is readsequentially from the IN disk drive usingdouble buffering. While one of the inputbuffers, say 1,, is being filled, i G {O, 1},

Replacement Selection reads elementsfrom the other input buffer II_,. The

roles of IL and II_, are interchangedwhen one buffer is full, and the other oneis empty. The output sequence is writtensequentially to the second disk driveusing a similar buffering technique.

For the purposes of our discussion weconcentrate on Replacement Selectionbased on selection trees [Knuth 1973,p. 254]. We assume that there is room inmain memory for P elements, togetherwith buffers, program variables, andcode. Replacement Selection organizesthe P elements in main memory into aselection tree with P nodes.

Notational convention: We use XIto denote the input sequence and X. todenote the corresponding output sequenceproduced by Replacement Selection withP nodes. P is omitted throughout—we

should really write X:.

Replacement Selection operates as fol-

lows. Initially, the selection tree is filled

with the first P elements of XI; thesmallest element is at the root. The ele-ment at the root is the smallest elementin the selection tree that belongs to thecurrent run. Repeatedly, the root ele-ment is removed from the tree, appendedto X., and replaced by the next element


nz,

Adaptive Sorting Algorithms ●

Selection Tree —

Figure 15. Environment for Replacement SelectCon.

from XI. In this way P elements arekept in main memory, except during ini-tialization and the termination. To makecertain that the elements that enter andleave the tree do so in proper order, wenumber the runs in X.. After an elementhas been deleted from the selection treeand is replaced by a new element x, wediscover the run number that x belongsto by the following reasoning: If x issmaller than the element that was justadded to the current run, then x mustbe in the next run; otherwise, x belongsto the current run.

Consider the length of the input tobe fixed and the disorder in the inputto be variable. It is intuitively clear thatReplacement Selection creates initialruns with more than P elements whenXI is nearly sorted. However, it is unclearhow many initial runs will be larger thanP. It is even less clear, for example, howsorted should XI be to guarantee that atleast one in four of the initial runs islarger than P. This observation leads usto a fundamental question: Letting c bea positive integer, how sorted should XIbe to guarantee that, on average, at leastone in c of the initial runs is larger thanP? The first measure of the performanceof Replacement Selection is R.uns-

Ratio(Xo ), the fraction of the runs thatare larger than P; it is defined as follows:

Runs-Ratio( X. )

number of runs in X.

that are larger than P.

number of runs in X. “

DOUT

469

This measure is naive, but we are able toanalyze it. The second measure of theperformance of Replacement Selection isthe average run length of X.. Itis denoted by ARL(XO ) and is defined asthe average of the lengths of the runsin X.:

ARL( X. )

sum of the lengths of

the runs in X.

number of runs in X.

Ixol

number of runs in X. “

In fact, the time taken by the secondphase of external sorting is a function ofthe average run length of X.. The num-ber of passes required by a multiwaymerge is

1 + Llogd,(number of runs in X.)]

= 1 + Llogo(length of

Xo/ARL( Xo)j , (5)

where o is the order of the merge.The measure Runs-Ratio(Xo ) can be

as small as O. We expect that if M(XT) issmall with respect to P, for some mea-sure M of disorder; then Runs-Ratio( X. )is at least positive. Some considerationmust be given, however, to the length ofX with respect to P because if XI isshorter than P, then the value Runs-

ACM Computmg Surveys. V’ol. 24, No. 4, December 1992


Smallest 1 e -------------------------------A

value of

Runs-Ratio(.Yo)

lJ2- - —

—

—~

+ DLS(.YI)

1P 2P 3P 4P 5P 6P 7P 8P

Figure 16. As the disorder in XI grows, the performance of Replacement Selection, quantified by thesmallest value of Runs-l? atzo, decreases

Ratio( X. ) is O independently of howsorted XI is. Similarly, we expect that ifM(XI ) is small, then ARL( X. ) is some-what larger than P; however, this prop-erty does not hold for all measures ofdisorder. Fortunately, disorder can beevaluated in many ways. Intuitively,Replacement Selection is oblivious tolocal disorder, since it can place elementsthat are no more than P positions awayfrom their correct position. ReplacementSelection is sensitive, however, to globaldisorder. Recall that one measure thatevaluates this type of disorder is Max,

defined as the largest distance an ele-ment must travel to reach its sorted posi-tion, and another is Dis, defined as thelargest distance determined by an inver-sion. In fact, Dis and Max are equiva-lent since, for all unsorted X, Max(X) <

Dis(X ) <2 Max( X ) [Estivill-Castro andWood 1989].

Theorem 3.1 characterizes the Runs-

Ratio with respect to Dis. We presentthe result in graphical form in Figure 16,which illustrates how the performance ofReplacement Selection decreases as thedisorder increases.

THEOREM 3.1 Let c >0 be a constant,

and let XI be such that IXII >

([c] + l)P. lfDis(XI) < cP, then

(1) Runs-Ratio(Xo ) = 1 when c < 1;thus, when the disorder according to

Dis is less than P, Replacement

Selection maximizes Runs-Ratio.

(2) Runs-Ratio(Xo) > l/[c] when c >

1; thus, Runs-Ratio is at least

P/Dis(X).

Moreover, these bounds are tight.

It is not surprising that if the distancedetermined by an inversion pair that isfarthest apart is less than P, then X.is sorted, and Runs-Ratio(Xo ) is maxi-

mized. However, it is not immediate that

Dis(XI) < 2P implies that X. is sorted.As the bound on the disorder grows, theperformance of Replacement Selectiondecreases. Now we can determine howsorted should XI be to guarantee that atleast one in c runs in X. is larger thanP. For example, if we want to make surethat one in two runs produced byReplacement Selection is larger than P,

then Dis(XI) should be less than 3P.What was unclear, now seems simple.

As we have mentioned, Runs-Ratio is

a simple measure. We have used it as afirst approach to the study of the averagerun length. Recall (see Equation (5)) thatthe overall cost of external sorting is afunction of the average run length.

As promised, Theorem 3.2 gives thecorresponding bounds on the average runlength of X. when XI is nearly sortedwith respect to Dis; Figure 17 is thecorresponding graphical interpretationof the theorem. This result confirmsthat the performance of ReplacementSelection, measured by the average runlength of X., decreases smoothly as thedisorder in XI (measured by Dis)

increases.

THEOREM 3.2 Let XI be such that

IxIl > (lc1 + 2)p, r = P/l XII, andDis(XI) < cP. Then,



s

Lo

2P . -r ~,,,, ,,, ,,4t !,,,,,!,,

,,,,,,,,,Smallest , ,,,, ,,,e,

values of

ARL(.XO),

P. -.--- ;------------------------------------

It

1 Dzs(.k_, )

!

1P 2P 3P 4P .5P 6P 7P ~ p

Figure 17. The smallest values of ARL(XO ) define a decreasing function of the disorder. These values,which lie in the shaded area, show how the performance of Replacement Selection decreases as thedisorder increases.

fiL(Xo)

I (3 - C)P

1 + (2 – c)/rwhen 1 < c s 3/2

(3/2)P

1 + 1/2rwhen 3/2 < c < 2

> (1 + (1/lc]))P when2 ~ ~—

1 + [c]/r

Moreover, for all integers c k 2, thesebounds are tight.

Theorems 3.1 and 3.2 provide boundson the performance of ReplacementSelection and formally establish that ifthe input is nearly sorted, then the runsthat are produced must be longer. Figures16 and 17 are graphical representationsof Theorems 3.1 and 3.2, respectively,that show that the performance ofReplacement Selection decreases slowlyas ‘a function of theby Dis.

4. FINAL REMARKS

The objectives of this

disorder measured

survey are primar-

ily:

Q to present the basic notions andconcepts of adaptive sorting

. to describe the state of the art ofadaptive sorting

We believe that our survey providesinsight into the theory of adaptive sort-ing algorithms and brings importantpoints to the attention of practitioners.Research on adaptive sorting algorithmsis still active; moreover, adaptive sortingoffers several directions for furtherresearch. There have been and continueto be attempts to modify well-knownsorting algorithms to achieve adaptivity.Moffat [1990] studied Insertion Sort

using a splay tree; however, the charac-terization of the adaptive behavior of thisalgorithm is open. Sophisticated variantsof Natural Mergesort that increase thenumber of measures to which the algo-rithm is adaptive have also been pro-posed [Moffat 1991; Moffat et al. 1992].

Other researchers have studied mea-sures of disorder and their mathematicalstructure [Estivill-Castro et al. 1989;Chen and Carlsson 1989]. In addition,the discoveries that have been made forsequential models have prompted otherresearchers to design parallel sortingalgorithms with respect to Dis [Altman

ACM Computmg Surveys, Vol Z4, No 4, December 1992

472 * V. Estivill-Castro and D. Wood

and Igarashi 1989] and other measures[Levcopoulos and Petersson 1988; 1989b].

We now discuss, in some detail, someother directions for further research.First, we believe that researchers shoulddevelop universal methods for largeclasses of measures of disorder that leadto practical implementations. If Xl,Xz, . . . . X, is a partition of X into dis-joint subsequences, then

normally holds, for a measure M of dis-order; that is, all division protocols resultin a division with D < 2. With Generic

Sort, we showed that it is necessary andsufficient for a protocol to satisfy D < 2to obtain an adaptive algorithm withrespect to M. We conjecture that it isalways possible to design a linear-timedivision protocol that partitions asequence X into subsequences Xl,X2,.. . ,X. of almost the same lengthsuch that D < 2. If this conjecture holds,then an adaptive sorting algorithm thatmakes O(IXI(l + log[ 1 + M(X)])) com-parisons, in the worst case, could alwaysbe obtained.

Second, Estivill-Castro and Wood

[ 1991b; 1992bl obtained adaptivityresults using the median-division proto-col. These results suggest a generaliza-tion of Theorem 1.1 in which the boundon the disorder introduced at the divisionphase is a function of the input sizerather than constant, but is still boundedabove by 2. In other words, the resultssuggest that the condition in Theorem1.1 can be weakened to:

Let D(n) be a function such that, for all n,

D(n) <2 and, for all sequences X of

size n., the division protocol satisfies

~ i14(sth subsequence)~=1

s D(n)[s/2]A4(x)

If this generalization holds, then moresorting algorithms could be shown to havethe performance claimed in Theorem 1.1.

Third, Shell Sort is a very efficientsorting method in practice. There aremany ways of choosing the sequence ofincrements for Shell Sort [Gonnet 1984;Knuth 1973]; however, Shell Sortrequires Cl( n log n) comparisons to sortevery sequence of length n. Shell Sortshows some adaptivity, but it has notbeen shown to be optimal with respect toany measure of disorder. It would be use-ful if we could describe the behavior ofShell Sort as a function of the input sizeand the disorder in the input. Further-more, a variant of Shell Sort that adaptsthe sequence of increments to the exist-ing order in the input would be evenmore interesting.

Fourth, adaptive variants of Heapsort

have been proposed [Dijkstra 1982; Leong1989; Levcopoulos and Petersson 1989a;Petersson 1991]. But either the in-placeproperty of Heapsort is lost, or theiradaptivity is insignificant. A naturalquestion is: Are there any in-place sort-ing algorithms that are optimally adap-tive with respect to important measures?Levcopoulos and Petersson [ 1991b] havemodified Cook-Kim division to obtain anin-place Rem-optimal algorithm, Huangand Langston [ 1988] have developed analgorithm to merge two sorted sequencesin-place. Their linear-time in-place merg-ing algorithm can be used with Generic

Sort to obtain Rans-, Dis-, Max-, Exe-,

and Rem-optimal sorting algorithms thatsort in-place such that the number ofdata moves is proportional to the numberof comparisons. The resulting sortingalgorithms are bottom up rather thanrecursive. Finally, using linear-timein-place merging, linear-time in-placeselection, and encoding pointers byswapping elements, Levcopoulos andPetersson [1991a] have recently designedan in-place sorting algorithm that isInu and Rem optimal.

Fifth, Carlsson and Chen [1991] con-

sider adaptivity with respect to the num-

ber of distinct keys in a sequence rather

than to its disorder. Their results are not

new [Munro and Spira 1976; Wegner

1985]; they suggest, however, a new

direction for research; namely, design


Adaptive Sorting Algorithms 9 473

sorting algorithms that are adaptive withrespect to both the disorder and thenumber of distinct keys.

Finally, we mention adaptive parallelsorting. There are several models of par-allel computers, and many different sort-ing algorithms have been designed fordifferent architectures [Akl 1985; Quinn1987]. An SIMD computer consists of anumber of processors operating underthe control of a single instruction streamissued by a central control unit. Theprocessors each have a small privatememory for storing programs and dataand operate synchronously. A number ofmetrics have been proposed for evalu-ating parallel algorithms. Akl [1985]describes the most important one. Theparallel running time of a parallelalgorithm is defined as the total numberof the two kinds of steps executed by thealgorithm: routing steps and computa-tional steps. For a problem of size n, theparallel worst-case running time of analgorithm is denoted by w(n), and thenumber of processors it requires isdenoted by P(n). The cost c(n) of a par-allel algorithm is defined as the productw(n) P(n). A parallel algorithm is said tobe cost optimal if C(n) is of the sameorder of time complexity as an optimalsequential algorithm.

When sorting on a shared-memory par-allel computer, it is usually assumed thatthe data is stored in the shared memory.Since the introduction of Ajtai et al.[1983] O(rz log n)-cost sorting network,there are many parallel sorting algo-rithms that use 0(n) processors and takeO(log n) time; hence, they are cost opti-mal. On a CRCW PRAM it takes con-stant time to test whether a sequence is

usual notion of the number of processorsrequired to solve a problem, they rede-fine this complexity measure as follows.At step t, the parallel algorithm maydecide to declare several processors idleand not be charged for them. The algo-rithm has full power to request moreprocessors or to declare some processorsto be idle. The cost of the algorithm isredefined to be the sum, over all timesteps t,, of the number of processorsactive at time t,.In this model, they havedescribed parallel sorting algorithms thatare cost optimal with respect to themeasures lrw and Rem. Although it isusuallv assumed that the number ofproces~ors does not change dynamicallyduring execution of an algorithm, the ideathat the algorithm can adapt its resourcerequirements (processors) according tothe difficulty of the input is a step for-ward. With one mocessor. the al~orithm’stime complexit~ grows ‘ smoot~ly fromO(n) to O(n log n) according to the disor-der in the input sequence. When we haveO(n) processors, the time complexity isalways Cl(log n) for all problem instances.Thus, a basic question is: Given P pro-cessors, can we design parallel sortingalgorithms whose running time is a non-decreasirm function of the disorder in theinput seq~ence, and, in this case, whatdoes it mean to be optimal? We havealmost no answers. For the measureRuns and the special case P = n/log n,Carlsson and Chen [1991] providea cost-optimal adaptive algorithm. For amore general case, McGlinn [19891 hasgeneralized CKsort and studied its adap-tive behavior empirically.

REFERENCESsorted; however, in most othe~ modelswith n processors, fl(log n) time is

AJTAI, M., KOML6S, J., AND SZEMERfiDI,E. 1983.An O(n log n) sorting network. In Proceedings

needed to test whether the input is sor- of the llth Annual ACM Symposium on the

ted. Thus, the cost of verifying whether ‘Theory of Computing (April). ACM, New York,

the input is sorted is the same as the cost 1-9.

of sorting. Am, S. G. 1985. Parallel Sorting Algorithms.

Because of these observations, it seemsAcademic Press, Orlando.

that there is no raison d%tre for parallelALTW, T., AND IGARASHI, Y. 1989. Roughly sort-

adaptive sorting. However, Levcopoulosing Sequential and parallel approach. J. InfiProcess. 12.2.154-158.

and Petersson [ 1988; 1989b] have taken ASAGON,c., AND SEIDEL, R. 1989. Randomized

an unusual approach. Rather than the search trees. In proceedings of the 30th IEEE



Symposium on Foundations of Computer Scz-ence. IEEE, New York, 540–545.

BENTLEY, J. L., ST~NAT, D F , AND STEELE, J M.

1981. Analysls of a randomized data struc-ture for representing ordered sets. In Proceed-ings of the 19th Annual Allerton Conference onCommunzcat~on, Control and Compatlng,364-372.

BROWN, M. R., AND TARJAN, R. E 1980. Design and

analysis of data structures for representingsorted lists SIAM J Comput 9, 594–614.

BURGE, W. H. 1958. Sorting. trees and measuresof order, Inf. Contr, 1, 181–197.

CARLSSON, S., AND CHEN, J. 1991 An optimal par-allel adaptive sorting algorithm Inf ProcessLett 39, 195-200.

CARLSSON, S., LEVCOPOULOS, C., AND PETERSSON, O.

1990. Subhnear mergmg and natural merge

sort. In Proceedings of SIGAL InternationalSymposium on Algorithms, Lecture Notes inComputer Science 450, Springer-Verlag, NewYork.

CAYLEY, A. 1849. Note on the theory of permuta-

tions. London, Edinburgh and Dublin Phllos.Mug. J. SCL. 34, 527-529,

CHEN, J , AND CARLSSON, S. 1989. A group-theoretic

apprOach to measures of presortedness. Tech

Rep Dept. of Computer Science, Lund Univ ,Lund, Sweden.

CHEN, J., AND CARLSSON, S. 1991. On partitions

and presortedness of sequences. In Proceedingsof the 2nd ACM-SIAM Symposium on DzscreteAlgorithms. ACM, New York, 63-71.

CooK, C. R., AND KIM, D. ,J 1980. Best sorting

algorithms for near}y sorted lists. Commun.ACM 23, 620-624.

DIJKSTRA, E, W. 1982. Smoothsort, an alternative

to sorting in situ. Scz. Comput. Program, 1,223–233

DINSMORE, R. J. 1965. Longer strings for sorting.Commun. ACM 8, 1 (Jan.), 48.

DOBOSIEWICZ, W. 1984 Replacement selection in

3-level memories Comput. J 27, 4, 334-339.

DROMIN, P, G 1984. Exploltmg partial order w,th

Qulcksort. Softw ,—Pratt. E.vper 14, 6,509-518.

ESTIVILL-CASTRO, V 1991 Sorting and measures

of disorder Ph.D. dissertation, Univ. ofWaterloo, Ontario, Canada Available asDepartment of Computer Science ResearchReport CS-91-07.

EhTIVILL-CASTRO, V., AND WOOD, D. 1989. A newmeasure of presortedness Inf Comput. 83,111-119

ESTIVILL-CASTRO, V., AND WOOD, D. 1991a Exter-nal sorting, imtial runs creation, and nearlysortedness Res Rep CS-91-36, Dept of

Computer Science, Univ. of Waterloo, Ontario,Canada

ESTIVILL-CASTRO, V., AND WOOD, D. 1991b. Practi-

cal adaptive sorting, In Advances zn Com-

pwtzng and Information—Proceedings of theInternational Conference on Computzng andInformation Lecture Notes in ComputerScience 497, Springer-Verlag, New York,47-54.

ESTIVTLL-CASTRO, V , AND WOOD, D. 1991cRandomized sor’ung of shuffled monotone

sequences. Res Rep. CS-91-24, Dept. ofComputer Science, Umv. of Waterloo, Ontario,Canada.

ESTIWLL-CASTRO, V., AND WOOD, D. 1992a. Ran-domized adaptive sorting. Rand. Strut. Alg.Available as Department of Computer ScienceResearch Report CS-91-21, Umverslty ofWaterloo To be published.

ESTIVILL-CASTRO, V, AND WOOD, D 1992b A

generic adaptive sorting algorlthm. Comput JTo be published.

ESTIVILI,-CASTRO, V., AND WOOD, D. 1992c, An

adaptwe generic sorting algorlthm that usesvariable par’utlonmg. In preparation,

ESTIVILL-CASTRO, V., AND WOOD, D. 1992d. The use

of exponential search to obtain generic sortingalgorithms. In preparation

ESTIWLL-CASTRO, V., MANNILA, H., .mm WOOD, D1989 Right invariant metrics and measuresof presortedness Discrete Appl Math To bepubhshed.

FRAZER, W. D,, AND MCKELLAR, A. C, 1970. Sam-plesort: A samphng approach to mmlmal stor-age tree sorting. J. ACM 17, 3 (July), 496–507.

FRAZER, W. D., ANII WONJG, C K 1972. Sorting by

natural selectlon. C’ommun. ACM 15, 10 (Ott ),910-913.

GONNET, G H 1984 Handbook of Algorithms andData Structures Addison-Wesley, Reading,Mass

GRAHAM, S. L., KESSLER, P. B., AND MCKUSIK, M, K.1982, gprof: A call graph execution profiler.SIGPLAN Not. 17, 6, 120-126.

GUIBAS, L J., MCCREIGHT, E M , AND PLASS, M. F.1977 A new representation of linear lists. Inproceedings of the 9th Annual ACM Sympo-.szum on Theory of Computing ACM, New York,49-60

HARRIS, J. D. 1981. Sorting unsorted and partiallysorted hsts using the natural merge sortSoftu,.—Pract, Exper. 11,1339-1340.

HUANG, B,, AND LANGSTON, M, 1988. PracticalIn-place mergmg. Commun. ACM 31, (Mar.),348-352.

ISLAM, T , AND LAKSHMAN, K B. 1990 On

the error-sensitivity of sort algorithms. InProceedings of the International Conferenceon Computmg and Information. CanadianScholar’s Press, Toronto, 81–85.

JANIIO, W 1976. A hst msertlon sort for keys witharbitrary key distributions ACM Trans MathSoftw. 2, 2 (June), 143-153.

KARP, R. M, 1986 Combmatorics, complexity and



randomness. Commun. ACM 29,

98-109.

KATAJAINEN, J., AND MANNILA, H. 1989.

age case optimality of a presortingUnpublished manuscript.

2 (Feb.),

On aver-

algorithm.

KATAJAINEN, J., LEVCOPOULOS, C., AND PETERSSON,O. 1989. Local insertion sort revisited. InProceedings of Optimal Algorithms. LectureNotes in Computer Science 401, Springer-Verlag, New York, 239-253.

KENDALL, M. G. 1970. Rank Correlat~on Methods,

4th ed. Griffin, London.

KNUTH, D. E. 1973. The Art of Computer Program-

m ing. Vol. 3, Sorting and Searching. Addison-Wesley, Reading, Mass.

LEONG, H. W. 1989. Preorder Heapsort. Tech. Rep.

National Univ. of Singapore.

LEVCOPOULOS, C., AND PETERSSON, O. 1988. An

optimal parallel algorithm for sorting presortedfiles. In Proceedings of the 8th conference onFoundations of Software Technology and Theo-retical Computer Science. Lecture Notes in

Computer Science 338, Springer-Verlag, NewYork, 154-160.

LNCOPOULOS, C., AND PETERSSON, O. 1989a.

Heapsort—adapted for presorted files. In Pro-ceedings of the Workshop on Algorz th ms andData Structures. Lecture Notes in Computer

Science 382, Sprmger-VerIag, New York,449-509.

LEvcoPouLos, C., AND PETERssoN, O.1989b. Anote

on adaptive parallel sorting. Inf Process. Lett.33, 187-191.

LEVCOPOULOS, C., AND PETERSSON, O. 1990. Sort-ing shuffled monotone sequences. In Proceed-

ings of the 2nd Scandmauzan Workshop onAlgorzthrn Theory. Lecture Notes in Computer

Science 447, Springer-VerIag, New York,181-191.

LFWCOPOULOS, C., AND PETERSSON, O. 1991a. An

optimal m-place sorting algorithm. In Proceed-

ings of the Ilth Conference on Foundations ofSoftware Technology and Theoretwal ComputerSctence.

LEVCOPOULOS, C., AND PETERSSON, O. 1991b.

Splitsort—an adaptive sorting algorithm. InfProcess. Lett. 39,205–211.

LI, M. AND VITANYI, P. M, B. 1989. A theory oflearning simple concepts under simple dlstrl-butlons and average case complexity for theumversal distribution. In proceedings of

the 30th IEEE Symposum on FoundationsofcomputerSc~ence. IEEE, New York, 34–39.

MANNILA, H. 1985a. Instance complexity for

sorting and NP-complete problems. Ph.D.dissertation, Univ. of Helsinki, Dept. ofComputer Science.

MANNILA, H. 1985b. Measures of presortedness

and optimal sorting algorithms. IEEE Trans.Comput. C-34,318-325.

MCGLINN, R. 1989. A parallel version of Cook

and Kim’s algorithm for presorted lists.

Softw.Pratt.Exper. 19,10 (Oct. ),917-930.

MEHLHORN, K. 1979. Sorting presorted files. In

Proceedings of the 4th GIConference on Theoryof Computer Sczence. Lecture Notes in Com-puter Science 67, Springer-Verlag, New York,199-212.

MEHLHORN, K. 1984. Data Structures and Algo-rithms, Vol. 1, Sortzng and Searchzng. EATCSMonographs on Theoretical Computer Science,Springer-Verlag, Berlin/Heidelberg.

MOFFAT,A. 1990. How good is splay sort. Tech.Rep. Dept. of Computer Science, The Univ. ofMelbourne, Parkville, Australia.

MOFFAT,A. 1991. Adaptive merging and a natu-

rally natural merge sort. In Proceedings of the14th Australian Computer ScLence Conference,

08.1-08.8.

MOFFAT, A., AND PETERSSON, O. 1991. Historicalsearching and sorting. In the Second AnnualInternational Symposium on Algorithms. Lec-ture Notes in Computer Science, Springer-Verlag, New York.

MOFFAT, A., .mm PETERSSON, O. 1992. A frame-

work for adaptive sorting. In Proceedings of the

3rd Scandl na ulan Workshop on Algorith m The-ory. Lecture Notes in Computer Science,Springer-Verlag, New York.

MOFFAT, A., PETERSSON, O., AND WORMALD, N. 1992.

Further analysis of an adaptive sorting algo-rithm. In Proceedings of the 15th Austrahan

Computer Sczence Conference, 603-613.

MUNRO, J. I., AND SPIRA, P. M. Sorting and search-mgin multisets. SZAMJ. Comput. 5, 1 (Mar.),1976.

PAPADAKIS, T., MUNRO, J I., AND POBLETE, P. V.1990. Analysis of the expected search cost in

skip lists. In Proceedings of the 2nd Scandma-uzan Workshop on Algorzthm Theory. LectureNotes in Computer Science 447, Springer-

Verlag, New York, 160-172.

PETERSSON, O. 1990. Adaptive sorting. Ph.D. dis-sertation, Lund Univ., Dept. of Computer

Science.

PETERSSON, O. 1991. Adaptive selection sorts.Tech. Rep. LU-CS-TR:9 1-82, Dept. of Computer

Science, Lund Univ., Lund, Sweden.

PUGH, W. 1990. Sklplists: Aprobabilistic alterna-twe to balanced trees. Commun. ACM, 33, 6

(June), 668-676.

QuINN, M. J. 1987. Designing Efficient Algorithmsfor Parallel Computers. McGraw-Hill, NewYork

SALZBERG. B. 1988. Fde Structures: An AnalytlcApproach. Prentice-Hall, Englewood Cliffs, N.J.

SALZBERG, B. 1989. Merging sorted runs usinglarge main memory. Acts Infi 27, 195–215.

SE DGEWICII, R. 1980. Quzcksort. GarlandPublishing, New York.

SmENA, S. S. 1988. Encroaching lists as a mea-sure of presortedness. BIT 28, 755–784.



SLOANE, N. J. A. 1983. Encrypting by random

rotations. In Proceedings of Cryptography,Burg Feuerstezn 82. Lecture Notes in Computer

Science 149, Sprmger-VerIag, New York,71-128.

TING, T. C., .tum WANG, Y, W. 1977. Multlwayreplacement selectlon sort with a dynamicreservoir. Cornput. J. 20, 4, 298–301.

V~ GELDER, A. 1991. Simple adaptive merge sort.

Tech. Rep. Univ. of California, Santa Cruz,Calif.

WAINWRIGHT, R. L. 1985. A class of sorting algo-

rithms based on Qulcksort. Commun. ACM 28,396-402.

WEGNER, L. M 1985. Quicksort for equal keys,IEEE Trans. Cornput. C-34, 362-367.

YAO, A. C. 1977. Probabdistlc computations—toward a unified measure of complexity, InProceed~ngs of the 18th IEEE Sympos~um onFoundat~ons of Computer Sctence, 222–227.

ZHENG, L.-Q., AND LARSON, P.-A. 1992. Speedingup external mergesort. Res. Rep CS-92-40,Dept of Computer Smence, Univ of Waterloo,Ontano, Canada.

Received November 1991, final revmon accepted May 1992


Documents

A survey of adaptive sorting algorithms