Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Principles of Data Management
Lecture #16 (MapReduce & DFS for Big Data)
Instructor: Mike Carey [email protected]
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Today’s News Bulletin
v Project dates § Query execution layer is due on 3/17
v Upcoming lectures § Today: MapReduce (and distributed file systems) § Next week: Wrap-up & review, in-class endterm
v Other upcoming events § The long-lost midterms will appear on Tuesday!
v Class participation opportunities § Teaching evaluations (after Tuesday J) § End-of-term opinion survey (watch for it!)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Motivation
v Google needed to process web-scale data § Data much larger than what fits on one machine § Needed parallel processing to get results in a
reasonable time § Wanted to use cheap commodity machines to do
the job v Credits: Some of the following slide content is excerpted from
the Google ODSI 2004 talk where MapReduce was publically born and/or Google’s SOSP 2003 talk on the underlying DFS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
Requirements
v Solution must § Scale to 1000s of compute nodes § Must automatically handle faults § Provide monitoring of jobs § Be easy for programmers to use
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5
MapReduce Programming model
v Input and Output are sets of key/value pairs v Programmer provides two functions
§ map(K1, V1) -> list(K2, V2) • Produces list of intermediate key/value pairs for each
input key/value pair
§ reduce(K2, list(V2)) -> list(K3, V3) • Produces a list of result values for all intermediate values
that are associated with the same intermediate key
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
MapReduce Pipeline
Map Shuffle Reduce
Read from DFS Write to DFS
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7
MapReduce in Action
Map (k1, v1) à list(k2, v2) • Processes one input key/value pair • Produces a set of intermediate key/value pairs
Reduce (k2, list(v2)) list(k3, v3) • Combines intermediate values for one particular key • Produces a set of merged output values (usually one)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
MapReduce Architecture
MapReduce MapReduce MapReduce MapReduce
Distributed File System
Network
MapReduce Job Tracker
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9
Software Components
v Job Tracker (Master) § Maintains Cluster membership of workers § Accepts MR jobs from clients and dispatches tasks
to workers § Monitors workers’ progress § Restarts tasks in the event of failure
v Task Tracker (Worker) § Provides an environment to run a task § Maintains and serves intermediate files between
Map and Reduce phases
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
MapReduce Parallelism
Hash Partitioning
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11
Example 1: Count Word Occurrences
v Input: Set of (Document name, Document Contents)
v Output: Set of (Word, Count(Word)) v map(k1, v1):
for each word w in v1 emit(w, 1)
v reduce(k2, v2_list): int result = 0; for each v in v2_list
result += v; emit(k2, result)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
Map
Example 1: Count Word Occurrences
Map
Reduce
Reduce
this is a line
this is another line
another line
yet another line
this, 1 is, 1 a, 1 line, 1 this, 1 is, 1 another, 1 line, 1
another, 1 line, 1
yet, 1 another, 1 line, 1
a, 1 another, 1 is, 1 is, 1
line, 1 line, 1 this, 1 this, 1
another, 1 another, 1
line, 1 line, 1 yet, 1
a, 1 another, 3 is, 2
line, 4 this, 2 yet, 1
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13
(Picture borrowed from Shiv Babu @ Duke University)
MapReduce Pipeline Revisited
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
Example 2: Equijoins
v Input: Rows of Relation R, Rows of Relation S v Output: R join S on R.x = S.y v map(k1, v1)
if (input == R) emit(v1.x, [“R”, v1])
else emit(v1.y, [“S”, v2])
v reduce(k2, v2_list) for r in v2_list where r[1] == “R”
for s in v2_list where s[1] == “S” emit(1, result(r[2], s[2]))
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15
Other Examples
v Distributed grep v Inverted index construction v Machine learning v Distributed sort v Fuzzy join v … v Or: A Pig script or a Hive query (which are
then auto-converted to a Hadoop MapReduce job series under the covers) – e.g., at Netflix
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
Fault Tolerant Evaluation
v Task Fault Tolerance is achieved through re-execution ( roll forward, not back!)
v All consumers consume data only after completely generated by the producer § This is an important property to isolate faults to
one task
v Task completion committed through Master v Cannot handle master failure
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17
Task granularity and pipelining
v Fine granularity tasks § Many more map tasks than machines
• Minimizes time for fault recovery • Pipelines shuffling with map execution • Better load balancing
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18
Optimization: Combiners
v Sometimes partial aggregation is possible on the Map side
v May cut down the amount of data needing to be transferred to the reducer (significantly in some cases, like grouped aggregation in Hive)
v combine(K2, list(V2)) -> K2, list(V2) v For Word Occurrence Count example,
Combine == Reduce (Q: Why?)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19
Map
Example 1: Word Count Revisited (With Combiners)
Map
Reduce
Reduce
this is a line
this is another line
another line
yet another line
this, 1 is, 1 a, 1 line, 1 this, 1 is, 1 another, 1 line, 1
another, 1 line, 1
yet, 1 another, 1 line, 1
a, 1 another, 1 is, 2
line, 2 this, 2
another, 2
line, 2 yet, 1
a, 1 another, 3 is, 2
line, 4 this, 2 yet, 1
ß Inte
rmedi
ate
Result
s
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20
Optimization: Redundant Execution
v Slow workers lengthen completion time v Slowness happens because of
§ Other jobs consuming resources § Bad disks/network etc
v Solution: Near the end of the job spawn extra copies of long running tasks § Whichever copy finishes first, wins. § Kill the rest
v In Hadoop this is called “speculative execution”
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21
Optimization: Locality
v Task scheduling policy § Ask DFS (next topic!) for locations of replicas of
input file blocks § Map tasks scheduled so that input blocks are
machine local or rack local
v Effect: Tasks read data at local disk speeds v Without this, rack switches limit data rate
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22
Distributed (Big!) Filesystem
v Used as the “store” for MapReduce data v MapReduce reads its input from DFS and
writes its output to DFS v Provides a “shared disk” view to applications
using local storage on shared-nothing hardware v Provides redundancy by replication to protect
from node/disk failures
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23
DFS Architecture
Taken from Ghemawat’s SOSP’03 paper (The Google Filesystem)
• Single Master (with backups) that track DFS file name to chunk mapping • Several Chunk servers that store chunks on local disks • Chunk Size ~ 64MB or larger • Chunks are replicated • Master only used for chunk lookups – Does not participate in transfer of data
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24
Chunk Replication
v Several Replicas of each Chunk § Replicas usually spread across racks and data centers
to maximize availability § 3 replicas common (local, same rack, different rack)
v Master tracks location of each replica of a chunk v When chunk failure is detected, master
automatically rebuilds new replica to maintain replication level
v Automatically picks chunk servers for new replicas based on utilization
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25
MapReduce & DFS: Summary
v Google laid a foundation for a new flurry of large-scale data storage and processing with their MR and DFS work in the early 2000’s
v Apache open source versions soon sprung up outside of Google: Hadoop MapReduce & HDFS
v Today, Big Data use cases are addressed with a mix of parallel RDBMS technologies as well as more “flexible” Hadoop-based technologies
v So where are we now…?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26
(Pig)
Today’s Tangled World
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27
(Pig)
Today’s Tangled World
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28
Additional Reading v Original MapReduce Paper (*** MUST READ! ***)
§ “Simplified Data Processing on Large Clusters” by Jeffrey Dean and Sanjay Ghemawat in OSDI ’04
v Original DFS Paper § “The Google Filesystem” by Sanjay Ghemawat, Howard Gobioff, and
Shun-Tak Leung in SOSP ’03
v MapReduce vs. Parallel DBMS Papers in CACM (Jan. 2010) § “MapReduce and Parallel DBMSs: Friends or Foes?” by Michael
Stonebraker, Daniel Abadi, David DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin
§ “MapReduce: A Flexible Data Processing Tool” by Jeffrey Dean and Sanjay Ghemawat
v EDBT “Ogres & Onions Keynote” Paper § “Inside "Big Data Management": Ogres, Onions, or Parfaits?” by Vinayak
Borkar, Michael J. Carey, and Chen Li in EDBT '12 (or watch the movie )