14
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS SPEAKER: Vipul Sharma Director of Data Engineering Eventbrite Monday, April 1, 13

COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

  • Upload
    gigaom

  • View
    105

  • Download
    1

Embed Size (px)

DESCRIPTION

Presentation from Vipul Sharma, Eventbrite #dataconf More at http://event.gigaom.com/structuredata/

Citation preview

Page 1: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS

SPEAKER: Vipul SharmaDirector of Data EngineeringEventbrite

Monday, April 1, 13

Page 2: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Real Time Data Processing at ScaleVipul Sharma – Director of Data Engineering

Monday, April 1, 13

Page 3: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Eventbrite by the Numbers

Monday, April 1, 13

Page 4: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

1.5 million events80 million tickets sold

$1 billion in gross ticket salesEvents in 179 countries

Eventbrite by the Numbers

Monday, April 1, 13

Page 5: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Who am I?

Director of Data Engineering at EventbriteInfrastructure, Data Science, Analytics, Spam and Fraud

linkedin.com/in/vipulsharma3@[email protected]

Monday, April 1, 13

Page 6: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Real Time

• Definition of real time varies with use case• Real time at scale is a challenge• Active learning requires real time data processing• Spam/Fraud• Discovery • Search

• Analytics• Real time analytics

• Data Changes• Changes in inventory, user settings etc

Monday, April 1, 13

Page 7: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Scaling for Growth

• Decouple Services• Decouple services based on CAP, Size and Growth• NoSQL attractive for out of the box sharding, replication and multi data

center support along with high write speeds• Multiple data stores pose a challenges of data flow between services in real

time• Batch Processing• Batch processing for big data e.g. data science, analytics etc• MapReduce is not built for real time• Data locality requires data to be stored on HDFS• Data Sync to Hadoop in real time is a challenge

Monday, April 1, 13

Page 8: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Monday, April 1, 13

Page 9: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Challenges with Real Time• Data Flow• How to transfer data captured in logs to services in real

time• How to transfer data captured in database to services in

real time• Data Processing• How to process significant data in real time• Distributed data processing for real time

Monday, April 1, 13

Page 10: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Data Flow

• Database polling• Rather than each application polling build a single polling service• Downstream applications polls from this service• Built for consistency and read scalability• Example: Event Cache• Excited about Linkedin’s Databus - http://data.linkedin.com/projects/

databus• Persisted Queues• Transfer logs via a distributed persisted message queue• Downstream applications subscribe to these queues getting a stream of

data• Example: Firehose• Excited about Linkedin’s Kafka - http://kafka.apache.org/index.html

Monday, April 1, 13

Page 11: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Data Processing

• Denormalization• Write data ready to serve• NoSQL built for Denormalization• Example: See who’s visiting

• Distributed Data Processing• Complex business logic needs more than de-normalization• Example: API stats using Storm• http://storm-project.net/

Monday, April 1, 13

Page 12: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Questions?

See it in action. Download our app:

eventbrite.com/eventbriteapp

Monday, April 1, 13

Page 13: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Thank You!@vipulsharma/ [email protected]

Monday, April 1, 13

Page 14: COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Monday, April 1, 13