Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
MapReduce
Vitaly Shmatikov
CS 5450
BackRub (Google), 1997
slide 2
NEC Earth Simulator, 2002
slide 3
Conventional HPC Machine
uCompute nodes• High-end processors• Lots of RAM
uNetwork• Specialized• Very high performance
uStorage server• RAID disk array
slide 4
HPC Programming Model
uPrograms described at a very low level• Detailed control of
processing and communication
uRely on small number of software packages• Written by specialists• Limits classes of
problems and solution methods
slide 5
Typical HPC Operation
uCharacteristics• Long-lived processes• Make use of spatial locality• Hold all data in memory (no disk access)• High-bandwidth communication
uStrengths• High utilitization of resources• Effective for many scientific applications
uWeaknesses• Requires careful tuning of applications to
resources• Intolerant of any variability
slide 6
HPC Fault Tolerance
uCheckpoint• Periodically store state of all
processes• Significant I/O traffic
uRestore after failure• Reset state to last checkpoint• All intervening computation
wasteduPerformance scaling
• Very sensitive to the number of failing components
slide 7
wasted
Google Data Center, 2016
slide 8
Dozens of these!
Ideal Cluster Programming Model
uApplications written in terms of high-level operations on the data
uRuntime system controls scheduling, load balancing
slide 9
MapReduce, 2004
slide 10
MapReduce Programming Model
uMap computation across many data objects• For example, 1010 webpages
uAggregate results in many different waysuSystem deals with resource allocation and availability
slide 11
Programming in Lisp
slide 12
Computing the sum of squares
• (map square ‘(1 2 3 4))Processes each record sequentially and
independentlyOutput: (1 4 9 16)
• (reduce + ‘(1 4 9 16))Computes (+ 16 (+ 9 (+ 4 1) ) )Output: 30
Application: Word Count
slide 13
SELECT count(word) FROM data GROUP BY word
Partial Aggregation
1. In parallel, each worker computes word counts from individual files
2. Collect results, wait until all finished3. Merge intermediate output4. Compute word count on merged intermediates
slide 14
Map
slide 15
Process individual records to generate (key, value) pairs
Welcome Everyone
Hello Everyone
Welcome 1Everyone 1 Hello 1Everyone 1
ValueKey
Parallel Map
slide 16
Process individual records to generate (key, value) pairs in parallel
Welcome Everyone
Hello Everyone
Welcome 1Everyone 1
Hello 1Everyone 1
Key Value
map task 1
map task 2
Process large number of records in parallel
Reduce
slide 17
Merge all intermediate values per key
Welcome 1Everyone 1 Hello 1Everyone 1
ValueKey
Everyone 2Welcome 1 Hello 1
ValueKey
Partitioning
slide 18
Merge all intermediate values in parallel:partition keys, assign each key to one Reduce
Welcome 1Everyone 1 Hello 1Everyone 1
ValueKey
Everyone 2Welcome 1 Hello 1
ValueKey
Reduce task 1
Reduce task 2
MapReduce API
slide 19
map(key, value) -> list(<k’, v’>)
• Applies function to (key, value) pair and produces set of intermediate pairs
reduce(key, list<value>) -> <k’, v’>
• Applies aggregation function to values• Outputs result
MapReduce WordCount
slide 20
map(key, value):for each word w in value:
EmitIntermediate(w, "1");
reduce(key, list(values): int result = 0;
for each v in values: result += ParseInt(v);
Emit(AsString(result));
Optimizations
slide 21
combine(list<key, value>) -> list<k,v>
• Perform partial aggregation on each mapper node<the, 1>, <the, 1>, <the, 1> à <the, 3>
• reduce() should be commutative and associative
partition(key, int) -> int• Need to aggregate intermediate values with same key• Given n partitions, map key to partition 0 ≤ i < n
– Typically via hash(key) mod n - but not always!
Putting It Together
slide 22
map combine partition reduce
Synchronization Barrier
slide 23
App: grep
u Input: large set of filesuOutput: lines that match pattern
uMap:Output a line if it matches the supplied patternuReduce: Copy the intermediate data to output
slide 24
App: Reverse Web-Link Graph
u Input: web graph, i.e., tuples (A, B) where page A links to page B
uOutput: for each page, list of pages that link to it
uMap:For each (source, target), output (target, source)uReduce: Output (target, list(source))
slide 25
App: URL Access Frequencyu Input: log of accessed URLs (e.g., from a proxy)u Output: for each URL, % of total accesses for that URL
u Map:For each URL, output (URL, 1)u Multiple reducers: Output (URL, urlCount)Chain another MapReduce...u Map:For each (URL, urlCount), output (1, (URL, urlCount))u Reduce: Compute TotalCount, output multiple (URL, urlCount/TotalCount)
slide 26
App: Sorting
u Input: series of valuesuOutput: sorted values
uMap:For each value, output (value, _)uPartitioning:Partition keys across reducers based on ranges
• Take data distribution into accounts to balance reducer tasksuReduce: Copy the intermediate data to output
slide 27
MapReduce Features
uCharacteristics• Computation broken up into
many short-lived tasks (mapping, reducing)
• Disk storage to hold intermediate results
uStrengths• Very flexible placement,
scheduling, load balancing• Can access large datasets
uWeaknesses• Higher overhead• Lower raw performance
slide 28
MapReduce Execution
slide 29
Fault Tolerance in MapReduce
uMap worker writes intermediate output to local disk, separated by partitioning; once completed, tells master node
uReduce worker told of location of mapper outputs, pulls their partition’s data, executes function across data
uNote:• “All-to-all” shuffle between mappers and
reducers• Written to disk (“materialized”) before each
stage
slide 30
Fault Tolerance in MapReduce
uMaster node monitors state of system• If master fails, job aborts
uMap worker failure• In-progress and completed tasks marked as idle• Reduce workers notified when map task is re-
executed on another map workeruReducer worker failure
• In-progress tasks are reset and re-executed• Completed tasks had been written to global file
systemslide 31
Stragglers
uStraggler = task that takes long time to execute• Bugs, flaky hardware, poor partitioning
uFor slow map tasks, execute in parallel on second “map” worker as backup, race to complete task
uWhen done with most tasks, reschedule any remaining executing tasks• Keep track of redundant executions• Significantly reduces overall run time
slide 32
Straggler Mitigation
slide 33
Goodbye MapReduce
slide 34
Modern Data Processing
slide 35
Big driver: machine learning