27
PYOTR SMIRNOV, 1860

Big data

Embed Size (px)

DESCRIPTION

Learn about big data and basics of Hadoop

Citation preview

Page 1: Big data

PYOTR SMIRNOV, 1860

Page 2: Big data

©copyright Ankur Raina 2012

Industry in TransitionI T stands for

Page 3: Big data

©copyright Ankur Raina 2012

• 3 million lines of code are tracking your checked baggage.

• A billion lines of code are included in the working of the latest airbus plane.

• A billion transistors per person.• 4 billion mobile phone subscribers.• St. Anthony Falls Bridge ( Minneapolis) is fitted

with 200 embedded sensors.

• 0

Did you know ?

Page 4: Big data

©copyright Ankur Raina 2012

Oops!!!DATA EXHAUST

Page 5: Big data

©copyright Ankur Raina 2012

•20018 Lakh Petabytes of data

•202035 zettabytes of data

Page 6: Big data

©copyright Ankur Raina 2012

7 TB/day

10 TB/day

Exhaust

Data

Page 7: Big data

©copyright Ankur Raina 2012

The Trouble begins here…• 80% of the world’s information is

unstructured.

• Unstructured information is growing at 15 times the rate of structured information.

Are We Prepared ?

Page 8: Big data

©copyright Ankur Raina 2012

BIG DATA

Ankur Raina09-IT-4505

Page 9: Big data

©copyright Ankur Raina 2012

Contents• What is Big Data ?• The 3Vs.• What is a Big Data platform ?• Needle in a haystack problem.• Big Data & Social Media.• The Call Centre mantra.• ABCs of Hadoop.

Page 10: Big data

©copyright Ankur Raina 2012

Big Data The information which cannot be

processed/analyzed using the traditional processes or tools.

• Instrumentation • Interconnection

• M2M interconnectivity• Intelligent Machines

Page 11: Big data

©copyright Ankur Raina 2012

3Vs

Page 12: Big data

©copyright Ankur Raina 2012

Big Data Platform• Lets you store the data in its native business

object format & get value out of it through massive parallelism on readily available components.

• It’s not a replacement of Data Warehouse.

Page 13: Big data

©copyright Ankur Raina 2012

Is it worth it ?

This is what I need !!!

IT yearns for log longevity ?Service Oriented Architecture (SOA )

Page 14: Big data

©copyright Ankur Raina 2012

Social MediaWe know…• What are the people saying ?

But…• Why are people saying what they are saying &

behaving in the way they are behaving ?

From the Business Perspective

Page 15: Big data

©copyright Ankur Raina 2012

• Super Bowl 2011 (4064 Ttps ,Feb 2011)• Bin Laden’s death ( 5106 Ttps )• Japan Earthquake ( 6939 Ttps )• Paraghay’s football penalty shootout win over

Brazil in the Copa America quarter-final peaked at 7166 Ttps

• Same day U.S match win in the FIFA women’s world cup -> 7196 Ttps

• Singer Beyonce’s pregnancy announcement (8868 Ttps )

Twitter Tweets per Second

Page 16: Big data

©copyright Ankur Raina 2012

• In-Motion Analytics ( Streams Computing )• Using At Rest ( BigInsights)

Call Centre mantra:“This call may be recorded

For Quality Assurance Purposes”

Page 17: Big data

©copyright Ankur Raina 2012

HADOOP• Creator: Doug Cutting• Top-level Apache Project.• Inspired by Google’s work on it GFS ( Google

File System ).• Function-to-data model & not data-to-

function model.

Page 18: Big data

©copyright Ankur Raina 2012

Does the word “Hadoop”mean anything ?

Page 19: Big data

©copyright Ankur Raina 2012

Hadoop

HDFS Map Reduce Hadoop Common Components

Page 20: Big data

©copyright Ankur Raina 2012

Hadoop Distributed File System• Data broken into blocks & distributed throughout the

cluster.• Data locality.• Mean Time To Failure ( MTTF )• Block size ( 64MB default )• Higher block sizes available for longer files to reduce

the amount of metadata. ( BigInsights 128 MB )• Redundancy• Name Node server

Page 21: Big data

©copyright Ankur Raina 2012

Page 22: Big data

©copyright Ankur Raina 2012

Map Reduce• Map job which takes a set of data and

converts it into another set of data where individual elements are broken down into tuples.

• Reduce job takes the output from a map as input & combines those data tuples into smaller set of tuples.

Page 23: Big data

©copyright Ankur Raina 2012

Map Reduce• Job• Tasks• Job Tracker• Task Tracker Agents• Shuffle• Combiner

Page 24: Big data

©copyright Ankur Raina 2012

Page 25: Big data

©copyright Ankur Raina 2012

Hadoop Common Components• Set of libraries that support various Hadoop

subprojects.• /bin/hdfs dfs <args>Command Function

chmod Changes the permissions for reading & writing to a given file/set of files.

chown Changes the owner of a given file/set of files

copyFromLocal Copies a file from the local file system into HDFS

Page 26: Big data

©copyright Ankur Raina 2012

Command Function

copyToLocal Copies a file from HDFS to the local file system.

cp Copies HDFS files from one directory to another.

expunge Empties all files that are in the trash.

cat Copies the files to standard output.

ls Displays a listing of files in a given directory.

mkdir Creates a directory in HDFS.

mv Moves files from one directory to another.

rm Deletes a file 7 sends it to the trash. ( use –skiptrash option for deleting permanently).

Page 27: Big data

©copyright Ankur Raina 2012

References• www.ibm.com• www.hadoop.apache.org• Understanding Big Data by Chris, Dirk, Tom,

George & Paul ( McGraw Hill )• Oracle Magazine