Upload
gagan-agrawal
View
305
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Netherlands | USA | India | UK | France
SOFTWARE DEVELOPMENT DONE RIGHT
Generally refers to data that can not be processed by traditional systems efficiently mainly because of it's size.
Twitter/Facebook example Facebook – 500TB data daily Twitter – 250million tweets daily
90% of data has been generated in last 2-3 years.
What is Big Data?
Sources -• Social networking sites like twitter, facebook etc.• Smart phones • Trading platforms• Machines• Log Files
This data is used for different purposes like• Product Trends• Market Analysis
Big Data Sources
Apache Hadoop is a Framework for running applications on large cluster built of commodity hardware. Transparently provides applications both reliability and data motion. Implements a computational paradigm named Map/Reduce where application is divided in small fragments of work. Provides a distributed file system (HDFS) Transfers code near to data. Hadoop opened the gates for processing Big Data
What is Hadoop ?
Hadoop is based on work done by Google
GFS – HDFS
Google Map Reduce – Hadoop Map Reduce
BigTable – HBase
Hadoop's History
Partial Failure Support
Data Recoverability
Component Recovery
Consistency
Scalability
Hadoop Features
Core Components• HDFS – Hadoop Distributed File System• Map Reduce
Projects in Hadoop Ecosystem• Pig, Hive, HBase, Flume, Oozie, Sqoop etc.
Hadoop Components
HDFS
Map/Reduce
Product - Data Quality and cleansing product solutions.
Before Hadoop Two node DB cluster Multi-threaded java application for de-duplication 1 million records took 10 hrs. to process
After Hadoop 8 GB Ram, 4 cores, 4 machines in cluster. 1 million records took 30 min to process
Case Study
Any application which has > 10TB data Needs fast and cheap processing
Log Analysis Recommendation Engine Feed Analysis Data Mining Statistical Analysis ETL Processing Business Intelligence
Hadoop In Use
Cloudera is “The commercial Hadoop company”.
Founded by leading experts on Hadoop from Facebook, Google,Oracle and Yahoo.
Provides consulting and training services for Hadoop users.
Staff includes committers to virtually all Hadoop projects.
Cloudera
Books Hadoop : The Definitive Guide (by Tom White) Hbase : The Definitive Guide (by Lars George) MapReduce Design Patterns (by Donald Miner)
Web http://hadoop.apache.org/ http://hbase.apache.org/ http://research.google.com/archive/bigtable.html http://research.google.com/archive/mapreduce-osdi04.pdf
Resources
Contact us @
Xebia IndiaWebsitewww.xebia.comwww.xebia.inwww.xebia.fr
Thought Leadershiphttp://blog.xebia.comhttp://podcast.xebia.com