BIGdataecosystem
Mariusz Gil
/ ABOUT ME /
BIG DATAThis talk is about
BIG DATA?What is...
VOLUMElarge amounts of data
VELOCITYneeds to be analyzed quickly
VARIETYdifferent types of structured and unstructured data
Big Data is data that is too large, complex and dynamics for any conventional data tools to capture, store, manage and analyze.
30 billion pieces of content we added past month
more than 2 billion videos were watched yesterday
more than 58 millions messages were send yesterday
/ MAIN QUESTIONS /
WHY?
IMPROVED RISKMANAGEMENT
49% IMPROVEDMANAGEMENT
CONTROL
36%IT ANALYSIS40%
MARKET-ORIENTEDPRODUCT DEVELOPMENT
43%
INCREASEDSALES FIGURES
32%
FINANCES ANDECONOMICS
27%
690 nodes Hadoop cluster for predictions and analytics
HOW?
HDFSHADOOP DISTRIBUTED FILE SYSTEM
YARN / MapReduce v2DISTRIBUTED PROCESSING FRAMEWORK HB
ASE
COLU
MNAR
STOR
AGE
HIVE
SQL D
ATA W
AREH
OUSE
ENGIN
E
AVRO
DATA
SERIA
LIZAT
ION
MAHO
UTSC
ALAB
LE M
ACHI
NE LE
ARNI
NG
PIG SCRIP
TING F
OR LA
RGE D
ATA SE
TS
OOZIE
WORK
FLOWS
ORCH
ESTR
ATION
ZOOK
EEPE
RDIS
TRIBU
TED C
OORD
INAT
ION SE
RVICE
FLUME
LOG C
OLLE
CTOR
SQOO
PDA
TA EX
CHAN
GE
AMBARIPROVISIONING, MANAGING AND MONITORING CLUSTERS
WHIRRRUNNING CLOUD SERVICES
VENDORSWe can choose from multiple
like Cloudera, HortonWorks or Amazon
Even from...
FASTER?Can we get results
Apache DrillStorm
Cloudera Impala
thanks