13
SORTING LARGE DATA Harish Chetty, Brian Finelli and Jacob Kattampilly

N way merge sort

Embed Size (px)

Citation preview

Page 1: N way merge sort

SORTING LARGE DATA Harish Chetty, Brian Finelli and Jacob Kattampilly

Page 2: N way merge sort

MODIFIED MERGE SORT We have modified the merge sort to gain more fine grained control over the different levels at which merge is performed.

E.g. We first separate the data into 9 parts so that each server does some of the sorting individually and we then merge all 9 parts in the final step using a 9 way merge in order to get the final sorted data.

Page 3: N way merge sort

SERVER DIVISION OF DATA

Server

Client 1

Client 2

Client 3Client 4Client 5

Client 6

Client 7

Client 8

Page 4: N way merge sort

SERVER DIVISION OF DATA

Server

Client 1

Client 2

Client 3Client 4Client 5

Client 6

Client 7

Client 8Server: Sends 1/9th of the data to each of the clients. And also sorts 1/9th of the data itself. Thus behaving as a client itself.Client: Each client sorts 1/9th of the data and returns it back to the server.The client and server are connected via TCP.

Page 5: N way merge sort

CLIENT DIVISION OF DATA

Client Data

Client Data 1

Thread 1

Thread 2 … Thread

16

Client Data2

Thread 1

Thread 2 … Thread

16

The client then divides data into 2 parts to eliminate memory wastage.

Data1 Data2

Page 6: N way merge sort

CLIENT DIVISION OF DATA

Client Data

Client Data 1

Thread 1

Thread 2 … Thread

16

Client Data2

Thread 1

Thread 2 … Thread

16

Parallel sorting of chunks on multiple cores

Parallel sorting of chunks on multiple cores

Page 7: N way merge sort

THREAD SORTING Thread i

Merge Sort

Network Sort

A Combination of Merge Sort and Network Sort is used inside each block

Sorts block iUses MS if len(blk) >= 16

else NS

Page 8: N way merge sort

MERGE BETWEEN THREADS

Merge

Thread

Thread

Thread

Thread

Data1

Data2

Thread

Data3

Data4

Thread

Thread

Data5

Data6

Thread

Data7

Data8

Thread

Thread

Thread

Data9

Data10

Thread

Data11

Data12

Thread

Thread

Data13

Data14

Thread

Data15

Data16

Page 9: N way merge sort

MERGE BETWEEN THREADSThread

Thread

Thread

Thread

Data1

Data2

Thread

Data3

Data4

Thread

Thread

Data5

Data6

Thread

Data7

Data8

Thread

Thread

Thread

Data9

Data10

Thread

Data11

Data12

Thread

Thread

Data13

Data14

Thread

Data15

Data16

Merge

The 16 sorted data blocks of each thread are merged together.

Page 10: N way merge sort

MERGE BETWEEN 2 DATA SETS WITHIN A CLIENT We have 2 4 GB data to be merged so we will require 8 GB of temporary space thus reaching a total of 16 GB which is our ram capacity.

Trick: Use 4 GB of temporary space to store results. The use one of the data array to store rest of the solution.

Data 1

Data 2

Temp +

Data 1

Kattampilly, Jacob
(Also mention trick to swap pointers)
Page 11: N way merge sort

9 WAY MERGE AT SERVERServer

Buffer array

TCP/IP Buffer

Client 1

Buffer array

TCP/IP Buffer

Client 2

Buffer array

TCP/IP Buffer

Client 3

Buffer array

TCP/IP Buffer

Client 4

Buffer array

TCP/IP Buffer

Client 5

Buffer array

TCP/IP Buffer

Client 6

Buffer array

TCP/IP Buffer

Client 7

Buffer array

TCP/IP Buffer

Client 8

Merge

Data is transmitted in chunks from the clients to the server inorder to avoid latency due to network.

Page 12: N way merge sort

9 WAY MERGE AT SERVER (EACH STEP)

• Check 9 elements. One from server and others from each of the clients.

• Find the minimum of the 9 values.• Only store the minimum value if it is the 10th item (or

multiple of 10) in the final sorted data.

In this way we completely eliminated all intermediate disk read and writes.

Page 13: N way merge sort

FINAL RESULTSBest

test0 = 20:16test1 = 20:48

Averagetest0 = ~22-23test1 = ~22-23

Worsttest0 = ~25-28test1 = ~25-28