View
4.352
Download
0
Category
Preview:
DESCRIPTION
Short overview of data infrastructure at Bazaarvoice. We use a combination of many different data stores such as MySQL, SOLR, Infobright, MongoDB and Hadoop.
Citation preview
I CAN HAS BIG DATA?Small and Big Data at Bazaarvoice
Alex Pinkin@apinkin
whois apinkin
● Alex PinkinSoftware Engineering Lead, Data Infrastructure team,Bazaarvoice
● Loves both SQL and NoSQL. Can't commit to one! :-)
@apinkin
Big Data?
A few facts about Bazaarvoice
● Bazaarvoice is a SaaS companypowering user generated contentsuch as ratings and reviews on thousands of web sites
● Over 75 Million reviews
● 280 Billion impressions
● 5 Billion Page Views per month
How Do We Do It?
● Client-side integration
● Code and Servers :)
What Do We Run in Prod?
● SQL○ MySQL○ Infobright
● NoSQL○ SOLR○ ElasticSearch○ MongoDB○ CouchDB○ Hadoop
Four Pillars
MySQL and Big Data?!!
● Yes, MySQL is our Master. Mostly used as K/V store.
● Scaling Reads: Replication● Scaling Writes: Sharding● HA: Hot Back-up, Multiple DC
● Pros○ Rock solid○ SQL
● Cons○ Inflexible schema○ Replication lag○ Sharding not built-in○ HA
Search: SOLR/Lucene
● Document Store● Inverted Index
Term Document IDs
rating:5 1,2
rating:4 3
productId: 12345 1,2,3
Analytics
Analytics - Infobright
● Columnar storage○ Compression (10x+)○ Reduced disk I/O
● Partitioning○ Horizontal: Data Packs○ Vertical: Columns
● Knowledge grid ○ MIN(C), MAX(C),
SUM(C), AVG(C),COUNT(DISTINCT(C))
Infobright - Pros and Cons
● Pros○ 30x faster than MySQL on analytics queries○ Open Source
● Cons○ No DML in OSS version○ No MPP (good for up to 5 TB)
Hadoop Use Case
Bazaarvoice EMR - Phase 1
Bazaarvoice EMR - Phase 2
Summary
● We use the best tool for the job
● NoSQL is maturing quickly. Query languages are still in flux though.
● Hadoop is here to stay
● We are (slowly) moving away from MySQL
@apinkin
Recommended