View
220
Download
4
Category
Tags:
Preview:
Citation preview
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 1
ITO483-PRINCIPLES OF CLOUD COMPUTING
Unit-5. CASE STUDY : Amazon Case Study. Introduction to MapReduce: Discussion of Google Paper,GFS, HDFS, Hadoop Framework.
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 2
AMAZON WEB SERVICEUsing Amazon Web Services, an e-commerce web site can weather unforeseen demand with ease; a pharmaceutical company can “rent” computing power to execute large-scale simulations; a media company can serve unlimited videos, music, and more; and an enterprise can deploy bandwidth-consuming services and training to its mobile workforce
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 3
No contracts or commitments
Pay as you go
Transparent pricing
Better economics
Better use of your time
Better environmental impact
BENEFITS
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 4
The idea of Map, and Reduce is 40+ year oldPresent in all Functional Programming Languages. See, e.g., APL, Lisp and ML
Alternate names for Map: Apply-AllHigher Order Functions
take function definitions as arguments, orreturn a function as output
Map and Reduce are higher-order functions
MAP REDUCE
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 5
MAP REDUCEF(x: int) returns r: intLet V be an array of integers.W = map(F, V)
W[i] = F(V[i]) for all Ii.e., apply F to every element of V
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 6
reduce also known as fold, accumulate, compress or inject
Reduce/fold takes in a function and folds it in between the elements of a list.
reduce: A Higher Order Function
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 7
Map/Reduce Implementation Idea
MapReduce and Distributed File System framework for large commodity clustersMaster/Slave relationship
JobTracker handles all scheduling & data flow between TaskTrackersTaskTracker handles all worker tasks on a nodeIndividual worker task runs map or reduce operation
Integrates with HDFS for data locality
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 8
HDFS: Hadoop's own file system. Amazon S3 file system.
Targeted at clusters hosted on the Amazon Elastic Compute Cloud server-on-demand infrastructureNot rack-aware
CloudStorepreviously Kosmos Distributed File Systemlike HDFS, this is rack-aware.
FTP Filesystemstored on remote FTP servers.
Read-only HTTP and HTTPS file systems.
Hadoop Supported File Systems
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 9
Designed to scale to petabytes of storage, and run on top of the file systems of the underlying OS.
Master (“NameNode”) handles replication, deletion, creation
Slave (“DataNode”) handles data retrievalFiles stored in many blocks
Each block has a block IdBlock Id associated with several nodes hostname:port
(depending on level of replication)
HDFS: Hadoop Distr File System
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 10
MapReduce is also the name of a framework developed by GoogleHadoop was initially developed by Yahoo and now part of the Apache group.Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.
Hadoop v. ‘MapReduce
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 11
MapReduce Hadoop
Org Google Yahoo/Apache
Impl C++ Java
Distributed File Sys
GFS HDFS
Data Base Bigtable HBase
Distributed lock mgr
Chubby ZooKeeper
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 12
wordCount
A Simple Hadoop Examplehttp://wiki.apache.org/hadoop/WordCount
Word Count Example• Read text files and count how often words
occur. – The input is text files– The output is a text file
• each line: word, tab, count
• Map: Produce pairs of (word, count)• Reduce: For each word, sum up the counts.
04/10/23 14IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN
WordCount Overview 3 import ... 12 public class WordCount { 13 14 public static class Map extends MapReduceBase implements Mapper ... { 17 18 public void map ... 26 } 27 28 public static class Reduce extends MapReduceBase implements Reducer ... { 29 30 public void reduce ... 37 } 38 39 public static void main(String[] args) throws Exception { 40 JobConf conf = new JobConf(WordCount.class); 41 ... 53 FileInputFormat.setInputPaths(conf, new Path(args[0])); 54 FileOutputFormat.setOutputPath(conf, new Path(args[1])); 55 56 JobClient.runJob(conf); 57 } 58 59 }
04/10/23 15IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN
wordCount Mapper 14 public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> { 15 private final static IntWritable one = new IntWritable(1); 16 private Text word = new Text(); 17 18 public void map(
LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException { 19 String line = value.toString(); 20 StringTokenizer tokenizer = new StringTokenizer(line); 21 while (tokenizer.hasMoreTokens()) { 22 word.set(tokenizer.nextToken()); 23 output.collect(word, one); 24 } 25 } 26 }
04/10/23 16IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN
wordCount Reducer 28 public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> { 29 30 public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,Reporter reporter)
throws IOException { 31 int sum = 0; 32 while (values.hasNext()) { 33 sum += values.next().get(); 34 } 35 output.collect(key, new IntWritable(sum)); 36 } 37 }
04/10/23 17IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN
wordCount JobConf 40 JobConf conf = new JobConf(WordCount.class); 41 conf.setJobName("wordcount"); 42 43 conf.setOutputKeyClass(Text.class); 44 conf.setOutputValueClass(IntWritable.class); 45 46 conf.setMapperClass(Map.class); 47 conf.setCombinerClass(Reduce.class); 48 conf.setReducerClass(Reduce.class); 49 50 conf.setInputFormat(TextInputFormat.class); 51 conf.setOutputFormat(TextOutputFormat.class);
04/10/23 18IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN
WordCount main 39 public static void main(String[] args) throws Exception { 40 JobConf conf = new JobConf(WordCount.class); 41 conf.setJobName("wordcount"); 42 43 conf.setOutputKeyClass(Text.class); 44 conf.setOutputValueClass(IntWritable.class); 45 46 conf.setMapperClass(Map.class); 47 conf.setCombinerClass(Reduce.class); 48 conf.setReducerClass(Reduce.class); 49 50 conf.setInputFormat(TextInputFormat.class); 51 conf.setOutputFormat(TextOutputFormat.class); 52 53 FileInputFormat.setInputPaths(conf, new Path(args[0])); 54 FileOutputFormat.setOutputPath(conf, new Path(args[1])); 55 56 JobClient.runJob(conf); 57 }
04/10/23 19IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN
Invocation of wordcount1. /usr/local/bin/hadoop dfs -mkdir <hdfs-dir>2. /usr/local/bin/hadoop dfs -copyFromLocal
<local-dir> <hdfs-dir> 3. /usr/local/bin/hadoop
jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>] <in-dir> <out-dir>
04/10/23 20IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN
GFS• Google File System (GFS or GoogleFS) is a
proprietary distributed file system developed by Google Inc. for its own use. It is designed to provide efficient, reliable access to data using large clusters of commodity hardware.
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 21
HDFS• Hadoop Distributed File System (HDFS™) is the
primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations.
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 22
04/10/23IT0483-PRINCIPLES OF CLOUD
COMPUTING,N.ARIVAZHAGAN 23
Review questions Part A
1) What is the Apache Hadoop?2) Mention the uses of Amazon EC2 Cloud Computing services3) What is meant by MapReduce?4) Mention the hot spots of MapReduce framework5) What are the different steps in MapReduce framework6) What is the use of Map Partition function7) Mention the uses of MapReduce function8) Differentiate job tracker and task tracker9) What is the algorithm used in scheduling of Hadoop?10) What is meant by fair scheduler. Mention its uses11) What is meant by capacity scheduler12) Mention any four applications of Hadoop13) Who are the all the main users of Hadoop14) Mention any four commercially supported Hadoop related
products
04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING
,N.ARIVAZHAGAN 24
Part B
1) Draw and explain about Hadoop architecture
2) Explain about Hadoop File System
3) Explain about Amazon EC2 Cloud Computing Case Study for financial organization
4) Explain the concept of MapReduce
5) Explain the concept of Google File System
04/10/23IT0483-PRINCIPLES OF CLOUD
COMPUTING,N.ARIVAZHAGAN 25
REFERENCES1.WWW.WIKIPEDIA.ORG
Recommended