Upload
biku06326
View
228
Download
0
Embed Size (px)
Citation preview
8/8/2019 Bi Tonic Sorting
1/32
Introduction toParallel Computing
George Karypis
Sorting
8/8/2019 Bi Tonic Sorting
2/32
Outline
Background
Sorting Networks
Quicksort
Bucket-Sort & Sample-Sort
8/8/2019 Bi Tonic Sorting
3/32
Background
Input Specification
Each processor has n/p elements A ordering of the processors
Output Specification
Each processor will get n/p consecutive elements ofthe final sorted array.
The chunk is determined by the processor ordering.
Variations Unequal number of elements on output.
In general, this is not a good idea and it may require a shift toobtain the equal size distribution.
8/8/2019 Bi Tonic Sorting
4/32
Basic Operation:Compare-Split OperationSingle element per processor
Multiple elements per processor
8/8/2019 Bi Tonic Sorting
5/32
Sorting Networks
Sorting is one of the fundamental problems in
Computer Science For a long time researchers have focused on the
problem of how fast can we sort nelements? Serial
nlog(n) lower-bound for comparison-based sorting
Parallel O(1), O(log(n)), O(???)
Sorting networks Custom-made hardware for sorting!
Hardware & algorithm
Mostly of theoretical interest but fun to study!
8/8/2019 Bi Tonic Sorting
6/32
Elements of Sorting Networks Key Idea:
Perform many comparisons in
parallel. Key Elements:
Comparators: Consist of two-input, two-output
wires
Take two elements on the inputwires and outputs them in sortedorder in the output wires.
Network architecture: The arrangement of the
comparators into interconnectedcomparator columns similar to multi-stage networks
Many sorting networks have beendeveloped. Bitonic sorting network
(log2
(n)) columns ofcomparators.
8/8/2019 Bi Tonic Sorting
7/32
Bitonic Sequence
Bitonic sequences are
graphically represented
by lines as follows:12
7
0
46
8/8/2019 Bi Tonic Sorting
8/32
Why Bitonic Sequences? A bitonic sequence can be easily sorted in
increasing/decreasing order.
s s1 s2
Every element ofs1 will be less than or equal to every element ofs2
Both s1 and s2are bitonic sequences. So how can a bitonic sequence be sorted?
Bitonic
Split
8/8/2019 Bi Tonic Sorting
9/32
An example
8/8/2019 Bi Tonic Sorting
10/32
Bitonic Merging Network
A comparator network that
takes as input a bitonicsequence and performs a
sequence of bitonic splits
to sort it.
+BM[n]
A bitonic merging
network for sorting in
increasing order an n-
element bitonic
sequence.
-BM[n]
Similar sort in decreasing
order.
8/8/2019 Bi Tonic Sorting
11/32
Are we done? Given a set of elements, how do we re-arrange them into
a bitonic sequence? Key Idea:
Use successively larger bitonic networks to transform the set into
a bitonic sequence.
8/8/2019 Bi Tonic Sorting
12/32
An example
8/8/2019 Bi Tonic Sorting
13/32
Complexity
How many columns of
comparators are requiredto sort n=2lelements? i.e., depth d(n) of the
network?
8/8/2019 Bi Tonic Sorting
14/32
Bitonic Sort on a Hypercube One-element-per-processor case
How do we map the algorithm onto a hypercube? What is the comparator?
How do the wires get mapped?
What can you say about the
pairs of wires that are inputs
to the various comparators?
8/8/2019 Bi Tonic Sorting
15/32
Illustration
8/8/2019 Bi Tonic Sorting
16/32
8/8/2019 Bi Tonic Sorting
17/32
Algorithm
Complexity?
8/8/2019 Bi Tonic Sorting
18/32
Bitonic Sort on a Mesh
One-element-per-processor case
How do the wires get mapped?
Which one is better?Why?
8/8/2019 Bi Tonic Sorting
19/32
Row-Major Shuffled Mapping
Complexity?
Can we do better?
What is the lowest bound of sorting on a mesh?communication performed by each process
8/8/2019 Bi Tonic Sorting
20/32
More than one element perprocessor Hypercube
Mesh
8/8/2019 Bi Tonic Sorting
21/32
Bitonic Sort Summary
8/8/2019 Bi Tonic Sorting
22/32
Quicksort
8/8/2019 Bi Tonic Sorting
23/32
Parallel Formulation
How about recursive decomposition?
Is it a good idea?
We need to do the partitioning of the array around
a pivot element in parallel.
What is the lower bound of parallel
quicksort?
What will it take to achieve this lower bound?
8/8/2019 Bi Tonic Sorting
24/32
Optimal for CRCW PRAM One element per processor
Arbitrary resolution of the concurrent writes. Views the sorting as a two-step process:
(i) Constructing a binary tree of pivot elements
(ii) Obtaining the sorted sequence by performing an inorder
traversal of this binary tree.
8/8/2019 Bi Tonic Sorting
25/32
Building the Binary Tree
Complexity?
8/8/2019 Bi Tonic Sorting
26/32
Practical Quicksort Shared-memory
Data resides on a shared array. During a partitioning each
processor is responsible for a
certain portion.
Array Partitioning: Select & Broadcast pivot.
Local re-arrangement.
Is this required?
Global re-arrangement.
8/8/2019 Bi Tonic Sorting
27/32
Efficient Global Rearrangement
8/8/2019 Bi Tonic Sorting
28/32
Practical Quicksort
Complexity
Complexity for message-passing is similar assuming that the all-to-all
personalized communication is not cross-bisection bandwidth limited.
8/8/2019 Bi Tonic Sorting
29/32
A word on Pivot Selection
Selecting pivots that lead to balanced
partitions is importanceheight of the tree
effective utilization of processors
8/8/2019 Bi Tonic Sorting
30/32
Sample Sort Generalization of bucket sort with data-driven sampling
n/p elements per-processor. Each processor sorts is local elements.
Each processor selects p-1 equally spaced elements from itsown list.
The combined p(p-1) set of elements are sorted and p-1 equally
spaced elements are selected from that list.
Each processor splits its own list according to these splitters intopbuckets.
Each processor sends its ithbucket to the ithprocessor.
Each processor merges the elements that it receives. Done.
8/8/2019 Bi Tonic Sorting
31/32
Sample Sort Illustration
8/8/2019 Bi Tonic Sorting
32/32
Sample Sort Complexity
Assumes
a serial sort