IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing

IBM Research

®

© 2007 IBM Corporation

Introduction to Map-Reduce and Join Processing

IBM Research | India Research Lab

Hadoop – A Very Brief Introduction

A framework for creating distributed applications that process huge amounts of data.

Scalability, Fault Tolerance, Ease of Programming

Two main components HDFS – Hadoop Distributed File System

Map-Reduce

How data is organized on HDFS?

How data is processed using Map-Reduce?


HDFS

Stores files in blocks across many nodes in a cluster

Replicates the blocks across nodes for durability Default – 64 MB

Master/Slave Architecture

HDFS Master NameNode

• Runs on a single node as a master process

• Directs client access to files in HDFS HDFS Slave

DataNode

• Runs on all nodes in the cluster

• Block creation/replication/deletion

• Takes orders from the namenode


HDFS

A B CR1 1 2 3 R2 2 3 5R3 2 4 6R4 6 4 2R5 1 3 6R6 8 9 1R7 2 3 1R8 9 9 2R9 1 7 4R10 1 2 2R11 2 3 4R12 4 5 6R13 6 7 8R14 9 8 3R15 3 2 1

64 MB

64 MB

64 MB

Replication Factor = 3

All these blocksDistributed on the Cluster


HDFS NameNode

Data Nodes

Put File

1 2 3

4 5 6

1, 4, 5

2, 5, 6

2, 3, 4

File1.txt


HDFS

Read-Time = Transfer-Rate x Number of Machines

NameNode

Data Nodes

Read File

1 2 3

4 5 6

1, 4, 5

2, 5, 6

2, 3, 4


HDFS

Fault-Tolerant Handles Node Failures

Self-Healing Rebalances files across cluster

Data from the remaining two nodes

is automatically copied

Scalable Just by adding new nodes

NameNode

Data Nodes

Read File

1 2 3

4 5 6

1, 4, 5

2, 5, 6

2, 3, 4

3, 5, 6

2, 3, 6


Map-Reduce

Logical Functions : Mappers and Reducers

Developers write map and reducer functions then submit a jar to the Hadoop Cluster

Hadoop handles distributing the Map and Reduce tasks across the cluster


Map-Reduce

A map task is started for each split / 64 MB block. Each map task generates some intermediate data.

Hadoop collects the output of all map tasks, reorganizes them and passes the reorganized data to Reduce tasks

Reduce tasks process this re-organized data and generate the final output

Flow HDFS Block to Map Task

Map Task to Hadoop Engine

Hadoop Shuffles and Sorts the Map output

Hadoop Engine to Reduce Tasks and Reduce Processing


HDFS to Map Tasks Records are read one by one from each block and passed to map for processing.

The component is called InputFormat / RecordReader

A record is passed as a key-value pair. Key is an offset and the value is the record

Offset is usually ignored by the map

MAP-1

MAP-2

MAP-3

( 0, R1 1 2 3)(10, R2 2 3 5)(20, R3 2 4 6)(30, R4 6 4 2)(40, R5 1 3 6)

( 50, R6 8 9 1)(60, R7 2 3 1)(70, R8 9 9 2)(80, R9 1 7 4)(90, R10 1 2 2))

(100, R11 2 3 4)(110, R12 4 5 6)(120, R13 6 7 8)(130, R14 9 8 3)(140, R15 3 2 1)

Input-Format

Input-Format

Input-Format


Map Task

Takes in a key-value pair and transforms it to a set of key-value pairs

{K1, V1} ==> [{K2, V2}]

( 0, R1 1 2 3)(10, R2 2 3 5)(20, R3 2 4 6)(30, R4 6 4 2)(40, R5 1 3 6)

( 0, R6 8 9 1)(10, R7 2 3 1)(20, R8 9 9 2)(30, R9 1 7 4)(50, R10 1 2 2))

( 0, R11 2 3 4)(10, R12 4 5 6)(20, R13 6 7 8)(30, R14 9 8 3)(50, R15 3 2 1)

MAP-1

MAP-2

MAP-3

(2, 3)(2, 4)(2,4) (6, 4)

(2, 9) (4, 9) (8, 9)(2, 3)

(2, 3)(2, 5) (4, 5)(2, 7)

Example: If the second column is an odd number, don’t do anything. If the second column is an even numbergenerate as many pairs as the number of even divisors of the value in the second column. The key is the divisor and the value is the value in the third column


Hadoop Sorting And Shuffling

Hadoop processes the key-value pairs output by map in a fashion so that the values in all pairs with the same key are grouped together

These groups are then passed to reducers for processing

MAP-1

MAP-2

MAP-3

(2, 3)(2, 4)(2,4) (6, 4)

(2, 9) (4, 9) (8, 9)(2, 3)

(2, 3)(2, 5) (4, 5)(2, 7)

(2, [3, 3, 3, 4, 4, 5, 7, 9])(4, [5, 9])(6, [4])(8, [9])

Hadoop

Shuffle


Hadoop Engine to Reduce Tasks and Reduce Processing

Let the number of distinct keys (groups) be m Let the number of reduce tasks be k.

These m groups are distributed across k reduce tasks using a Hash function

Reduce task processes each group and generates the output. Example – Sums all the values

REDUCER 1

(2, [3, 4, 4, 9, 3, 3, 5, 7])(6, [4])

REDUCER 2

(4, [9, 5])(8, [9])

(2, 38)(6, 4)

(4, 14)(8, 9)


Word-Count

Hadoop UsesMap-Reduc

There is aMap-Phase

There is aReduce phase

(Hadoop, 1)(Uses, 1)(Map, 1)

(Reduce , 1)

(There, 1)(is, 1)(a, 1)

(Map, 1)(Phase, 1)

(There, 1)(is, 1) (a, 1)

(Reduce, 1)(Phase, 1)

(a, [1,1])(Hadoop, 1)

(is, [1,1])

(map, [1,1])(phase, [1,1])

(reduce, [1,1])(there, [1,1])

(uses, 1)

A-I

J-Q

R-Z

(a, 2)(hadoop, 1)

(is, 2)

(map, 2)(phase, 2)

(reduce, 2)(there, 2)(uses, 1)


Map-Reduce Example: Aggregation

Compute Avg of B for each distinct value of A

A B C

R1 1 10 12

R2 2 20 34

R3 1 10 22

R4 1 30 56

R5 3 40 17

R6 2 10 49

R7 1 20 44

MAP 1

MAP 2

(1, 10)(2, 20)(1, 10)

(1, 30)(3, 40)(2, 10)(1, 20)

(1, 17.5)

(2, 15) (3, 40)

(1, [10, 10, 30, 20])

(2, [10, 20])(3, [40])

Reducer 1

Reducer 2


Designing a Map-Reduce Algorithm

Thinking in terms of Map and Reduce What data should be the key?

What data should be the values?

Minimizing Cost Reading and Map Processing Cost

Communication Cost

Processing Cost at Reducer

Load Balancing All reducers should get similar volume of traffic

Should not happen that only few machines are busy while others are loaded


Join On Point Data Select R.A, R.B, S.D where R.A==S.A

A B C

R1 1 10 12

R2 2 20 34

R3 1 10 22

R4 1 30 56

R5 3 40 17

A D E

S1 1 20 22

S2 2 30 36

S3 2 10 29

S4 3 50 16

S5 3 40 37

MAP 1

MAP 2

(1, [R, 10])(2, [R, 20])(1, [R, 10])(1, [R, 30])(3, [R, 40])

(1, [S, 20])(2, [S, 30])(2, [S, 10])(3, [S, 50])(3, [S, 40])

(1, 10, 20)(1, 10, 20)(1, 30, 20)

(2, 20, 30)(2, 20, 10)(3, 40, 50)(3, 40, 40)

(1, [(R, 10), (R, 10),(R, 30), (S, 20)] )

(2, [(R, 20), (S, 30),(S, 10)] )

(3, [(R, 40), (S, 50),(S, 40)]

Reducer 1

Reducer 2


Join On Point Data Select R.A, R.B, S.D where R.A==S.A

Attribute A range is divided into k parts. A hash function hashes the value of attribute A to [1,…,k]

1 2 … … k

A reducer is defined for each of the k part

A tuple from R and S is communicated to reducer k if the value of R.A or S.A hashes to bucket k

Each reducer computes the partial join output


Join On Point Data Assume k = 3, h(1)=0, h(2)=1, h(3)=2

A B C

R1 1 10 12

R2 2 20 34

R3 1 10 22

R4 1 30 56

R5 3 40 17

A D E

S1 1 20 22

S2 2 30 36

S3 2 10 29

S4 3 50 16

S5 3 40 37

R1 1 10 12R3 1 10 22R4 1 30 56S1 1 20 22

0 1 2

R2 2 20 34S2 2 30 36S3 2 10 29

R5 3 40 17S4 3 50 16S5 3 40 37

R1 S1R3 S1R4 S1

R2 S2R2 S3

R5 S4R5 S5


Map-Reduce Example : Inequality Join Select R.A, R.B, S.D where R.A <= S.A Consider 3-Node Cluster

A B C

R1 1 10 12

R2 2 20 34

R3 1 10 22

R4 1 30 56

R5 3 40 17

A D E

S1 1 20 22

S2 2 30 36

S3 2 10 29

S4 3 50 16

S5 3 40 37

MAP 2

(r1, [S, 1, 20])(r2, [S, 2, 30])(r2, [S, 2, 10])(r3, [S, 3, 50])(r3, [S, 3, 40])

(1, 10, 20)(1, 10, 20)(1, 30, 20)

(1, 10, 50)(1, 10, 40)(2, 20, 50)(2, 20, 40)(1, 10, 50)(1, 10, 40)(1, 30, 50)(1, 30, 40)(3, 40, 50)(3, 40, 40)

MAP 1

(r1, [R, 1, 10])(r2, [R, 1, 10])(r3, [R, 1, 10])(r2, [R, 2, 20])(r3, [R, 2, 20]) ….. …..(r3, [R, 3, 40])

(r1, ([R, 1, 10], [R, 1, 10], [R, 1, 30], [S, 1, 20])

(r3, ([R, 1, 10],[R, 2, 20], [R, 1, 10],[R, 1, 30], [R, 3, 40],[S, 3, 50], [S, 3, 40])

Reducer 1

Reducer 3

……

Reducer 2


Why Join On Map-Reduce Is A Complex Task?

Data for multiple relations distributed across different machines Map-Reduce is inherently designed for processing a single dataset.

An output tuple can be generated only when all the input tuples are collected at a common machine

This needs to happen for all output tuples, is non-trivial.

Apriori, we don’t know which tuples are going to join to form an output tuple. That is precisely the join problem

Ensuring it, may involve lot of replication and hence lot of communication

Tuples from every candidate combination need to be collected at reducers and the join predicates need to be checked

Documents

IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing