Upload
kellytechnologies
View
9
Download
0
Embed Size (px)
DESCRIPTION
Best Hadoop Institutes : kelly tecnologies is the best Hadoop training Institute in Bangalore.Providing hadoop courses by realtime faculty in Bangalore.
Citation preview
INTRODUCTION TO HADOOPPresented Bywww.kellytechno.com
ACKThanks to all the authors who left their slides on the Web.I own the errors of course.www.kellytechno.com
WHAT IS ?Distributed computing frame workFor clusters of computersThousands of Compute NodesPetabytes of dataOpen source, JavaGoogles MapReduce inspired Yahoos Hadoop.Now part of Apache groupwww.kellytechno.com
WHAT IS ?The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes:Hadoop Common utilitiesAvro: A data serialization system with scripting languages.Chukwa: managing large distributed systems.HBase: A scalable, distributed database for large tables.HDFS: A distributed file system.Hive: data summarization and ad hoc querying.MapReduce: distributed processing on compute clusters.Pig: A high-level data-flow language for parallel computation.ZooKeeper: coordination service for distributed applications.www.kellytechno.com
THE IDEA OF MAP REDUCEwww.kellytechno.com
MAP AND REDUCEThe idea of Map, and Reduce is 40+ year oldPresent in all Functional Programming Languages. See, e.g., APL, Lisp and MLAlternate names for Map: Apply-AllHigher Order Functions take function definitions as arguments, orreturn a function as outputMap and Reduce are higher-order functions.www.kellytechno.com
MAP: A HIGHER ORDER FUNCTIONF(x: int) returns r: intLet V be an array of integers.W = map(F, V)W[i] = F(V[i]) for all Ii.e., apply F to every element of Vwww.kellytechno.com
MAP EXAMPLES IN HASKELLmap (+1) [1,2,3,4,5] == [2, 3, 4, 5, 6]map (toLower) "abcDEFG12!@# == "abcdefg12!@#map (`mod` 3) [1..10] == [1, 2, 0, 1, 2, 0, 1, 2, 0, 1]www.kellytechno.com
REDUCE: A HIGHER ORDER FUNCTIONreduce also known as fold, accumulate, compress or injectReduce/fold takes in a function and folds it in between the elements of a list.www.kellytechno.com
FOLD-LEFT IN HASKELLDefinitionfoldl f z [] = zfoldl f z (x:xs) = foldl f (f z x) xsExamplesfoldl (+) 0 [1..5] ==15 foldl (+) 10 [1..5] == 25 foldl (div) 7 [34,56,12,4,23] == 0 www.kellytechno.com
FOLD-RIGHT IN HASKELLDefinitionfoldr f z [] = zfoldr f z (x:xs) = f x (foldr f z xs) Examplefoldr (div) 7 [34,56,12,4,23] == 8 www.kellytechno.com
EXAMPLES OF THEMAP REDUCE IDEAwww.kellytechno.com
WORD COUNT EXAMPLERead text files and count how often words occur. The input is text filesThe output is a text fileeach line: word, tab, countMap: Produce pairs of (word, count)Reduce: For each word, sum up the counts.www.kellytechno.com
GREP EXAMPLESearch input files for a given patternMap: emits a line if pattern is matchedReduce: Copies results to outputwww.kellytechno.com
INVERTED INDEX EXAMPLEGenerate an inverted index of words from a given set of filesMap: parses a document and emits pairsReduce: takes all pairs for a given word, sorts the docId values, and emits a pairwww.kellytechno.com
MAP/REDUCE IMPLEMENTATION IDEAwww.kellytechno.com
EXECUTION ON CLUSTERSInput files split (M splits)Assign Master & WorkersMap tasksWriting intermediate data to disk (R regions)Intermediate data read & sortReduce tasksReturnwww.kellytechno.com
MAP/REDUCE CLUSTER IMPLEMENTATIONsplit 0split 1split 2split 3split 4Output 0Output 1Input filesOutput filesM map tasksR reduce tasksIntermediate filesSeveral map or reduce tasks can run on a single computerEach intermediate file is divided into R partitions, by partitioning functionEach reduce task corresponds to one partitionwww.kellytechno.com
EXECUTIONwww.kellytechno.com
FAULT RECOVERYWorkers are pinged by master periodicallyNon-responsive workers are marked as failedAll tasks in-progress or completed by failed worker become eligible for reschedulingMaster could periodically checkpointCurrent implementations abort on master failurewww.kellytechno.com
www.kellytechno.com