27
Review of Calculation Paradigm and its Components Namuk Park Nov 18, 2014

Review of Calculation Paradigm and its Components

Embed Size (px)

Citation preview

Page 1: Review of Calculation Paradigm and its Components

Review of Calculation Paradigm and its

ComponentsNamuk Park

Nov 18, 2014

Page 2: Review of Calculation Paradigm and its Components

Hadoop File System

Page 3: Review of Calculation Paradigm and its Components

Hadoop: MapReduce

Page 4: Review of Calculation Paradigm and its Components

Hadoop 2.0

• improve scalability

• to support non-mapreduce job

• heterogeneous machine

• common scenarios for low cluster utilization: maps slots might be full while reduce slots are empty, and vice-versa

Page 5: Review of Calculation Paradigm and its Components

Hadoop 2.0

Page 6: Review of Calculation Paradigm and its Components

Hadoop 2.0: Service Layers

Page 7: Review of Calculation Paradigm and its Components

YARN

• split up the two functions of the JobTracker, resource management and job scheduling/monitoring

• to have a global Resource Manager (RM) and per-application ApplicationMaster (AM)

Page 8: Review of Calculation Paradigm and its Components

YARN: MapReduce

Page 9: Review of Calculation Paradigm and its Components

Storm

Page 10: Review of Calculation Paradigm and its Components

Storm

public class WordCountTopology { {……} public static void main(String[] args) throws Exception {

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("spout", new RandomSentenceSpout(), 5);

builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

Config conf = new Config(); conf.setDebug(true);

if (args != null && args.length > 0) { conf.setNumWorkers(3);

StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology()); }}

Page 11: Review of Calculation Paradigm and its Components

Storm Architecture

Page 12: Review of Calculation Paradigm and its Components

Storm Architecture

Page 13: Review of Calculation Paradigm and its Components

Lambda Architecture

query = function (all datum)

Page 14: Review of Calculation Paradigm and its Components

Lambda Architecture

Page 15: Review of Calculation Paradigm and its Components

Tez

Low Level DAG Framework

• to execute a complex DAG of tasks

• more general-purpose resource management framework

Page 16: Review of Calculation Paradigm and its Components

Tez: Runtime API

Page 17: Review of Calculation Paradigm and its Components

Pig: ConceptsNon-blocking operators

• LOAD / STORE

• FOREACH __ GENERATE __

• FILTER __ BY __

Blocking operators

• GROUP __ BY __

• ORDER __ BY __

• JOIN __ BY __

Translated to a MapReduce shuffle

Page 18: Review of Calculation Paradigm and its Components

Pig: Problems

Restrictions by MapReduce

• Extra intermediate output on HDFS

• Artificial synchronization barriers

• Inefficient use of resources

• Multi-query optimization

Page 19: Review of Calculation Paradigm and its Components

Pig: on Tez

Page 20: Review of Calculation Paradigm and its Components

Pig: Tez DAG

Page 21: Review of Calculation Paradigm and its Components

Pig: Strategies

• AM/Container Reuse

• Broadcast Edge, Object Cache

• Vertex Group

• Slow Start, Pre-launch

Page 22: Review of Calculation Paradigm and its Components

Pig: Performance

Page 23: Review of Calculation Paradigm and its Components

Pig: Performance

Page 24: Review of Calculation Paradigm and its Components

Pig: Performance

Page 25: Review of Calculation Paradigm and its Components

Complex Event Processing: Problems

• fungible data

• EDA: event-driven SOA

• EDA requires non-pipeline complex

Page 26: Review of Calculation Paradigm and its Components

Complex Event Processing: Paradigm

Task Tracker Task TrackerTask TrackerTask Tracker

Job Tracker

datadata data

pipeline

Task Tracker Task TrackerTask TrackerTask Tracker

Job Tracker

data data

data Message Coordinatordatadata

independent

Page 27: Review of Calculation Paradigm and its Components

References

• Hadoop YARN: The Architectural Center of Enterprise Hadoop

• Lambda Architecture

• Apache Pig를 위한 Tez 연산 엔진 개발하기