Upload
buibao
View
235
Download
8
Embed Size (px)
Citation preview
DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINESdream-lab.in | Indian Institute of Science, BangaloreDISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINESdream-lab.in | Indian Institute of Science, Bangalore
DREAM:LabDREAM:Lab
©DREAM:Lab, 2014This work is licensed under a Creative Commons Attribution 4.0 International License
DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINESdream-lab.in | Indian Institute of Science, Bangalore
DREAM:Lab
SE252:Lecture 13-14, Feb 24/25ILO3:Algorithms and Programming
Patterns for Cloud Applications (Hadoop)
DREAM:Lab
DREAM:LabDREAM:LabDREAM:Lab
ILO 3
•
•
•
•
•
DREAM:LabDREAM:LabDREAM:Lab
Patterns & Technologies
DREAM:LabDREAM:LabDREAM:Lab
MapReduce
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Design Pattern
DREAM:LabDREAM:LabDREAM:Lab
MapReduce: Data-parallel Programming Model
map(ki, vi) List<km, vm>[]••
reduce(km, List<vm>[]) List<kr, vr>[]••
DREAM:LabDREAM:LabDREAM:Lab
MR Borrows from Functional Programming
••
•
•
•
•
MapReduce & MPI Scatter-Gather
DREAM:LabDREAM:LabDREAM:Lab
MapReduce: Programming Model
How nowBrown cow
How doesIt work now
brown 1cow 1does 1How 2it 1now 2work 1
<How,1><now,1><brown,1><cow,1><How,1><does,1><it,1><work,1><now,1>
<How,1 1><now,1 1><brown,1><cow,1><does,1><it,1><work,1>
Map(k1,v1) → list(k2,v2)Reduce(k2, list(v2)) → list(v2)
DREAM:LabDREAM:LabDREAM:Lab
Map
•
•
DREAM:LabDREAM:LabDREAM:Lab
Map
map(String input_key, String input_value):
// input_key: line number
// input_value: line of text
for each Word w in input_value.tokenize()
EmitIntermediate(w, "1");
(0, “How now brown cow”) →
[(“How”, 1), (“now”, 1), (“brown”, 1), (“cow”, 1)]
DREAM:LabDREAM:LabDREAM:Lab
let map(k, v) = emit(k.toUpper(), v.toUpper())
(“foo”, “bar”) → (“FOO”, “BAR”)
(“Foo”, “other”) → (“FOO”, “OTHER”)
(“key2”, “data”) → (“KEY2”, “DATA”)
let map(k, v) =
if (isPrime(v)) then emit(k, v)
(“foo”, 7) → (“foo”, 7)
(“test”, 10) → (nothing)
DREAM:LabDREAM:LabDREAM:Lab
Reduce
DREAM:LabDREAM:LabDREAM:Lab
Reduce
reduce(String output_key, Iterator intermediate_values)
// output_key: a word
// output_values: a list of counts
int sum = 0;
for each v in intermediate_values
sum += ParseInt(v);
Emit(output_key, AsString(sum));
(“A”, [1, 1, 1]) → (“A”, 3)
(“B”, [1, 1]) → (“B”, 2)
for each w
in value do
emit(w,1)
How nowBrown cow
How doesIt work now
for all w in
value do
emit(w,1)
<How,1><now,1><brown,1><cow,1>
<How,1><does,1><it,1><work,1><now,1>
<How,1 1><now,1 1>
<brown,1><cow,1>
<does,1><it,1><work,1>
How 2now 2
does 1it 1work 1
brown 1cow 1
sum =
sum + value
emit(key,sum)
DREAM:LabDREAM:LabDREAM:Lab
Anagram Example
public class AnagramMapper extends MapReduceBase implementsMapper<LongWritable, Text, Text, Text> {
private Text sortedText = new Text();private Text orginalText = new Text(); public void map(LongWritable key, Text value,
OutputCollector<Text, Text> outputCollector, Reporter reporter) {
String word = value.toString();char[] wordChars = word.toCharArray();Arrays.sort(wordChars);String sortedWord = new String(wordChars);sortedText.set(sortedWord);orginalText.set(word);// Sort word and emit <sorted word, word>outputCollector.collect(sortedText, orginalText);
}}
Anagram Example…public void reduce(Text anagramKey, Iterator<Text> anagramValues,
OutputCollector<Text, Text> results, Reporter reporter) {String output = "";while(anagramValues.hasNext()) {
Text anagram = anagramValues.next();output = output + anagram.toString() + "~";
}StringTokenizer outputTokenizer =
new StringTokenizer(output,"~");// if the values contain more than one word // we have spotted a anagram.if(outputTokenizer.countTokens()>=2) {
output = output.replace("~", ",");outputKey.set(anagramKey.toString());outputValue.set(output);results.collect(outputKey, outputValue);
}}
DREAM:LabDREAM:LabDREAM:Lab
5-min Assignment
DREAM:LabDREAM:LabDREAM:Lab
MapReduce for Histogram
int bucketWidth = 4Map(k, v) {
emit(v/bucketWidth, 1)}
Reduce(k, v[]){sum=0;foreach(w in v[]) sum++;emit(k, sum)
}
7296025
21103540
111162181
246810110
1,10,12,11,10,10,11,1
0,10,12,10,11,11,10,1
2,12,11,10,10,12,10,1
0,11,11,12,12,12,10,1
2,12,12,12,12,12,12,12,1
0,10,10,10,10,10,1
1,11,11,11,11,11,11,11,1
0,10,10,10,10,10,1
2,8 0,12 1,8
DREAM:LabDREAM:LabDREAM:Lab
MapReduce for Histogramint bucketWidth = 4Map(k, v) {
emit(v/bucketWidth, 1)}
Combine(k, v[]) {// same code as reducer
}
Reduce(k, v[]){sum=0;foreach(w in v[]) sum+=w;emit(k, sum)
}
8296025
21103540
111162181
246810110
2,10,12,11,1
0,10,12,10,1
2,12,11,10,1
0,11,11,12,1
0,10,11,1
1,11,10,1
0,12,10,1
2,12,10,1
2,3 1,4 0,7 2,6 1,3 0,5
2,12,12,1
1,11,11,11,1
0,10,10,10,1
0,10,10,1
2,12,12,1
1,11,11,1
0,10,10,10,10,1
2,12,12,1
2,32,6
1,41,3
0,70,5
2,9 1,7 0,12
DREAM:LabDREAM:LabDREAM:Lab
Hadoop Execution Model
DREAM:LabDREAM:LabDREAM:Lab
Hadoop MapReduce & HDFS
•
DREAM:LabDREAM:LabDREAM:Lab
HDFS Read/Write
DREAM:LabDREAM:LabDREAM:Lab
Scheduling a MR Job
DREAM:LabDREAM:LabDREAM:Lab
MapReduce w/ 1 & N Reducers
DREAM:LabDREAM:LabDREAM:Lab
Map only job
DREAM:LabDREAM:LabDREAM:Lab
Pipelining during Shuffle & Sort
DREAM:LabDREAM:LabDREAM:Lab
Sorting using MapReduce (Map Only)
DREAM:LabDREAM:LabDREAM:Lab
Sorting using MapReduce
MapReduce Execution Overview
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Resources
•
•
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Resources
•
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
DREAM:LabDREAM:LabDREAM:Lab
Locality
•
DREAM:LabDREAM:LabDREAM:Lab
Fault Tolerance
•
•
•
DREAM:LabDREAM:LabDREAM:Lab
Optimizations
DREAM:LabDREAM:LabDREAM:Lab
Reminder