Upload
harish-chetty
View
160
Download
1
Embed Size (px)
Citation preview
SORTING LARGE DATA Harish Chetty, Brian Finelli and Jacob Kattampilly
MODIFIED MERGE SORT We have modified the merge sort to gain more fine grained control over the different levels at which merge is performed.
E.g. We first separate the data into 9 parts so that each server does some of the sorting individually and we then merge all 9 parts in the final step using a 9 way merge in order to get the final sorted data.
SERVER DIVISION OF DATA
Server
Client 1
Client 2
Client 3Client 4Client 5
Client 6
Client 7
Client 8
SERVER DIVISION OF DATA
Server
Client 1
Client 2
Client 3Client 4Client 5
Client 6
Client 7
Client 8Server: Sends 1/9th of the data to each of the clients. And also sorts 1/9th of the data itself. Thus behaving as a client itself.Client: Each client sorts 1/9th of the data and returns it back to the server.The client and server are connected via TCP.
CLIENT DIVISION OF DATA
Client Data
Client Data 1
Thread 1
Thread 2 … Thread
16
Client Data2
Thread 1
Thread 2 … Thread
16
The client then divides data into 2 parts to eliminate memory wastage.
Data1 Data2
CLIENT DIVISION OF DATA
Client Data
Client Data 1
Thread 1
Thread 2 … Thread
16
Client Data2
Thread 1
Thread 2 … Thread
16
Parallel sorting of chunks on multiple cores
Parallel sorting of chunks on multiple cores
THREAD SORTING Thread i
Merge Sort
Network Sort
A Combination of Merge Sort and Network Sort is used inside each block
Sorts block iUses MS if len(blk) >= 16
else NS
MERGE BETWEEN THREADS
Merge
Thread
Thread
Thread
Thread
Data1
Data2
Thread
Data3
Data4
Thread
Thread
Data5
Data6
Thread
Data7
Data8
Thread
Thread
Thread
Data9
Data10
Thread
Data11
Data12
Thread
Thread
Data13
Data14
Thread
Data15
Data16
MERGE BETWEEN THREADSThread
Thread
Thread
Thread
Data1
Data2
Thread
Data3
Data4
Thread
Thread
Data5
Data6
Thread
Data7
Data8
Thread
Thread
Thread
Data9
Data10
Thread
Data11
Data12
Thread
Thread
Data13
Data14
Thread
Data15
Data16
Merge
The 16 sorted data blocks of each thread are merged together.
MERGE BETWEEN 2 DATA SETS WITHIN A CLIENT We have 2 4 GB data to be merged so we will require 8 GB of temporary space thus reaching a total of 16 GB which is our ram capacity.
Trick: Use 4 GB of temporary space to store results. The use one of the data array to store rest of the solution.
Data 1
Data 2
Temp +
Data 1
9 WAY MERGE AT SERVERServer
Buffer array
TCP/IP Buffer
Client 1
Buffer array
TCP/IP Buffer
Client 2
Buffer array
TCP/IP Buffer
Client 3
Buffer array
TCP/IP Buffer
Client 4
Buffer array
TCP/IP Buffer
Client 5
Buffer array
TCP/IP Buffer
Client 6
Buffer array
TCP/IP Buffer
Client 7
Buffer array
TCP/IP Buffer
Client 8
Merge
Data is transmitted in chunks from the clients to the server inorder to avoid latency due to network.
9 WAY MERGE AT SERVER (EACH STEP)
• Check 9 elements. One from server and others from each of the clients.
• Find the minimum of the 9 values.• Only store the minimum value if it is the 10th item (or
multiple of 10) in the final sorted data.
In this way we completely eliminated all intermediate disk read and writes.
FINAL RESULTSBest
test0 = 20:16test1 = 20:48
Averagetest0 = ~22-23test1 = ~22-23
Worsttest0 = ~25-28test1 = ~25-28