Upload
deddy-setyadi
View
58
Download
0
Embed Size (px)
Citation preview
... - 2003
2 days in 2011
10 minutes in
2013
5 billion GB Live stats
2016
Where those data comes from?Activity Listening music, reading a book, searching, shopping, etc.
Our conversations in social media are now digitally recorded.Conversation
We upload and share 100s of thousands of them on social media sites every second.Photo and Video
We are increasingly surrounded by sensors that collect and share data. Sensor
We now have smart TVs that are able to collect and process data.
The Internet of Things
The basic idea behind the phrase 'Big Data' is that
everything we do is increasingly leaving a digital
trace (or data), which we (and others) can use and
analyse
Big data :means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques.
Big Data includes huge volume, high velocity, and extensible variety of data.
StructuredItem 2Semi Structured Unstructured
● Database● Census records● Economic data● Phone numbers
● JSON● XML
● Word● PDF● Text● Media Logs
Big Data TechnologiesOperational Big DataThis include systems like MongoDB that provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored.
NoSQL Big Data systems are designed to allow massive computations to be run inexpensively and efficiently. This makes operational big data workloads much easier to manage, cheaper, and faster to implement.
Analytical Big DataThis includes systems like Massively Parallel Processing (MPP) database systems and MapReduce that provide analytical capabilities for retrospective and complex analysis.
A system based on MapReduce can be scaled up from single servers to thousands of high and low end machines.
Traditional Approach
In this approach, an enterprise will have a computer to store and process big data. Here data will be stored in an RDBMS, process the required data and present it to the users for analysis purpose. tutorialspoint.com
Google’s SolutionGoogle solved this problem using an algorithm called MapReduce. This algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset.
tutorialspoint.com
Hadoop
Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel on different CPU nodes. In short, Hadoop framework is capable enough to develop applications, capable of running on clusters of computers and they could perform complete statistical analysis for a huge amounts of data. tutorialspoint.com
MapReduce
Data
MapConverts data into another set of data. Elements are broken down into tuples (key/value pairs).
ReduceShuffle stage and the Reduce stage that produces a new set of output, which will be stored in the HDFS.
1 2 3
HDFS● Fault detection and recovery :
HDFS should have mechanisms for quick and automatic fault detection and recovery.
● Huge datasets : HDFS should have hundreds of nodes per cluster to manage the applications having huge data sets.
● Hardware at data : A requested task can be done efficiently.
tutorialspoint.com
References & Sourcehttp://www.tutorialspoint.com/hadoop/
http://www.wired.com/2013/02/the-decades-that-invented-the-future-part-11-2001-2010/
http://www.slideshare.net/BernardMarr/140228-big-data-slide-share/3-The_basic_idea_behind_the
https://www.youtube.com/watch?v=HqsBensINkE
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
http://noviardisyamsuir.blogspot.co.id/2016/03/hadoop-mapreduce-adalah.html
http://www.slideshare.net/lynnlangit/hadoop-mapreduce-fundamentals-21427224/5-What_types_of_business_problems
https://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/