21
 INT R O D U CTIONTO HADOOP P r es en tedB y www.kellytechno.com

Hadoop Training Institutes in Bangalore

Embed Size (px)

DESCRIPTION

Best Hadoop Institutes : kelly tecnologies is the best Hadoop training Institute in Bangalore.Providing hadoop courses by realtime faculty in Bangalore.

Citation preview

  • INTRODUCTION TO HADOOPPresented Bywww.kellytechno.com

  • ACKThanks to all the authors who left their slides on the Web.I own the errors of course.www.kellytechno.com

  • WHAT IS ?Distributed computing frame workFor clusters of computersThousands of Compute NodesPetabytes of dataOpen source, JavaGoogles MapReduce inspired Yahoos Hadoop.Now part of Apache groupwww.kellytechno.com

  • WHAT IS ?The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes:Hadoop Common utilitiesAvro: A data serialization system with scripting languages.Chukwa: managing large distributed systems.HBase: A scalable, distributed database for large tables.HDFS: A distributed file system.Hive: data summarization and ad hoc querying.MapReduce: distributed processing on compute clusters.Pig: A high-level data-flow language for parallel computation.ZooKeeper: coordination service for distributed applications.www.kellytechno.com

  • THE IDEA OF MAP REDUCEwww.kellytechno.com

  • MAP AND REDUCEThe idea of Map, and Reduce is 40+ year oldPresent in all Functional Programming Languages. See, e.g., APL, Lisp and MLAlternate names for Map: Apply-AllHigher Order Functions take function definitions as arguments, orreturn a function as outputMap and Reduce are higher-order functions.www.kellytechno.com

  • MAP: A HIGHER ORDER FUNCTIONF(x: int) returns r: intLet V be an array of integers.W = map(F, V)W[i] = F(V[i]) for all Ii.e., apply F to every element of Vwww.kellytechno.com

  • MAP EXAMPLES IN HASKELLmap (+1) [1,2,3,4,5] == [2, 3, 4, 5, 6]map (toLower) "abcDEFG12!@# == "abcdefg12!@#map (`mod` 3) [1..10] == [1, 2, 0, 1, 2, 0, 1, 2, 0, 1]www.kellytechno.com

  • REDUCE: A HIGHER ORDER FUNCTIONreduce also known as fold, accumulate, compress or injectReduce/fold takes in a function and folds it in between the elements of a list.www.kellytechno.com

  • FOLD-LEFT IN HASKELLDefinitionfoldl f z [] = zfoldl f z (x:xs) = foldl f (f z x) xsExamplesfoldl (+) 0 [1..5] ==15 foldl (+) 10 [1..5] == 25 foldl (div) 7 [34,56,12,4,23] == 0 www.kellytechno.com

  • FOLD-RIGHT IN HASKELLDefinitionfoldr f z [] = zfoldr f z (x:xs) = f x (foldr f z xs) Examplefoldr (div) 7 [34,56,12,4,23] == 8 www.kellytechno.com

  • EXAMPLES OF THEMAP REDUCE IDEAwww.kellytechno.com

  • WORD COUNT EXAMPLERead text files and count how often words occur. The input is text filesThe output is a text fileeach line: word, tab, countMap: Produce pairs of (word, count)Reduce: For each word, sum up the counts.www.kellytechno.com

  • GREP EXAMPLESearch input files for a given patternMap: emits a line if pattern is matchedReduce: Copies results to outputwww.kellytechno.com

  • INVERTED INDEX EXAMPLEGenerate an inverted index of words from a given set of filesMap: parses a document and emits pairsReduce: takes all pairs for a given word, sorts the docId values, and emits a pairwww.kellytechno.com

  • MAP/REDUCE IMPLEMENTATION IDEAwww.kellytechno.com

  • EXECUTION ON CLUSTERSInput files split (M splits)Assign Master & WorkersMap tasksWriting intermediate data to disk (R regions)Intermediate data read & sortReduce tasksReturnwww.kellytechno.com

  • MAP/REDUCE CLUSTER IMPLEMENTATIONsplit 0split 1split 2split 3split 4Output 0Output 1Input filesOutput filesM map tasksR reduce tasksIntermediate filesSeveral map or reduce tasks can run on a single computerEach intermediate file is divided into R partitions, by partitioning functionEach reduce task corresponds to one partitionwww.kellytechno.com

  • EXECUTIONwww.kellytechno.com

  • FAULT RECOVERYWorkers are pinged by master periodicallyNon-responsive workers are marked as failedAll tasks in-progress or completed by failed worker become eligible for reschedulingMaster could periodically checkpointCurrent implementations abort on master failurewww.kellytechno.com

  • www.kellytechno.com