22
Handli ng Big Data Deddy Setyadi www.elakiri.com

Big data

Embed Size (px)

Citation preview

Handling

Big Data

Deddy Setyadi

www.elakiri.com

... - 2003

2 days in 2011

10 minutes in

2013

5 billion GB Live stats

2016

Where those data comes from?Activity Listening music, reading a book, searching, shopping, etc.

Our conversations in social media are now digitally recorded.Conversation

We upload and share 100s of thousands of them on social media sites every second.Photo and Video

We are increasingly surrounded by sensors that collect and share data. Sensor

We now have smart TVs that are able to collect and process data.

The Internet of Things

The basic idea behind the phrase 'Big Data' is that

everything we do is increasingly leaving a digital

trace (or data), which we (and others) can use and

analyse

Big data :means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques.

Big Data includes huge volume, high velocity, and extensible variety of data.

StructuredItem 2Semi Structured Unstructured

● Database● Census records● Economic data● Phone numbers

● JSON● XML

● Word● PDF● Text● Media Logs

Benefits of Big Data

https://www.youtube.com/watch?v=HqsBensINkE

Big Data TechnologiesOperational Big DataThis include systems like MongoDB that provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored.

NoSQL Big Data systems are designed to allow massive computations to be run inexpensively and efficiently. This makes operational big data workloads much easier to manage, cheaper, and faster to implement.

Analytical Big DataThis includes systems like Massively Parallel Processing (MPP) database systems and MapReduce that provide analytical capabilities for retrospective and complex analysis.

A system based on MapReduce can be scaled up from single servers to thousands of high and low end machines.

Big Data Solutions

Traditional Approach

In this approach, an enterprise will have a computer to store and process big data. Here data will be stored in an RDBMS, process the required data and present it to the users for analysis purpose. tutorialspoint.com

Google’s SolutionGoogle solved this problem using an algorithm called MapReduce. This algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset.

tutorialspoint.com

Hadoop

Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel on different CPU nodes. In short, Hadoop framework is capable enough to develop applications, capable of running on clusters of computers and they could perform complete statistical analysis for a huge amounts of data. tutorialspoint.com

Hadoop

Hadoop Architecture

tutorialspoint.com

MapReduce

Data

MapConverts data into another set of data. Elements are broken down into tuples (key/value pairs).

ReduceShuffle stage and the Reduce stage that produces a new set of output, which will be stored in the HDFS.

1 2 3

MapReduce

http://mm-tom.s3.amazonaws.com/blog/MapReduce.png

MapReduce

noviardisyamsuir.blogspot.com

HDFS● Fault detection and recovery :

HDFS should have mechanisms for quick and automatic fault detection and recovery.

● Huge datasets : HDFS should have hundreds of nodes per cluster to manage the applications having huge data sets.

● Hardware at data : A requested task can be done efficiently.

tutorialspoint.com

Demo

Closing ...

blog.cloudera.com

References & Sourcehttp://www.tutorialspoint.com/hadoop/

http://www.wired.com/2013/02/the-decades-that-invented-the-future-part-11-2001-2010/

http://www.slideshare.net/BernardMarr/140228-big-data-slide-share/3-The_basic_idea_behind_the

https://www.youtube.com/watch?v=HqsBensINkE

http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php

http://noviardisyamsuir.blogspot.co.id/2016/03/hadoop-mapreduce-adalah.html

http://www.slideshare.net/lynnlangit/hadoop-mapreduce-fundamentals-21427224/5-What_types_of_business_problems

https://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

Thank you!