17
Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo ! rohitc@yahoo- inc.com

Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! [email protected]

Embed Size (px)

Citation preview

Page 1: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Building BI App on Cloud

Rohit ChatterSr. Architect@[email protected]

Page 2: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

• Yahoo is the most Visited Site on the Internet– 600M+ Unique Visitors per Month– Billions of Page Views per Day– Billions of Searches per Month– Billions of Emails per Month– Terabytes of Data per Day!

• And we crawl the Web– 100+ Billion Pages– 5+ Trillion Links– Petabytes of data

• Reading 100 Terabytes could be overwhelming

Yahoo! BigData Scale

Page 3: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Types in a search query on Yahoo or affiliate site (aka the Publisher)

Passes search query to the ad platform for servable ad listings

Manages campaigns, creates ad listings, bids for keywords

Ad serving returns relevant & available ads matching the search query

Clicks on Ad

Shows ads returned by ad serving

Yahoo! Search Scale

Page 4: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Daily, Weekly, Monthly & Yearly

Daily, Weekly, Monthly & Yearly

Daily, Hourly, Weekly, Monthly & Yearly

Daily, Weekly, Monthly & Yearly

Daily, Hourly, Weekly, Monthly & Yearly

Performance, Credit Summary

Performance, Budget Headroom, AM performance, competitive analysis

Performance, Feature Adoption

Competitive analysis, cross sell, upsell, performance

Business Model

Page 5: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Business Perfomance monitoring

RDBMS Facts

Home Grown App Level 1 & 2 analysis

Granular aggregates

Home Grown AppWhat if analysis and deep

dive data analysis

Most granular data- event level model

Tactical & Operational reporting

Improvement & Alignment

Excellence & Strategic

Hour Glass Model – A Perspective

Page 6: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Functional ViewFunctional View

Data – 100+ Gigabytes/DayData – 100+ Gigabytes/Day

Hadoop Grid + PIGCloud

Hadoop Grid + PIGCloud

Aggregates & Metadata layerAggregates &

Metadata layer

App Server – BI layerApp Server – BI layer

Data SourceDimension & Fact

Utility ComputingBuild Aggregates

Oracle RDBMSBI Aggregates

(H,D,W,M)

BI Tool/Home Grown

What is computed where

What is computed where

MetricsImpressions, Revenue, Clicks,Conversions, Quality Score,

Top keywords

Rollups, Type 2 Dimension,

Alerts & Messaging

Load balanced webLoad balanced webApache Web Server

Derived Metrics – CTR, Depth, RPM, Coverage

BI on Cloud [1000ft view]

Page 7: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

BI on Cloud – Screen Shots

Page 8: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

CUBE on Hadoop?

Page 9: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Oracle

ETL/Aggregation

I-CUBE

HADOOP

MicroStrategy

Home Grown Tools

ARTART

Tradition

APOLLO FEEDS

Page 10: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

I-CUBE

HADOOP

BI Tool

Home Grown Tools

ARTART

HBASEHBASE

Aggregation in HIVE

Game Changer – Hbase & Schema

HiveserverHiveserverJDBC/ODBCJDBC/ODBC

Page 11: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

How we do?

RowKey Day Metrics Week Metrics MTD Metrics SCD Info Offer Stats

OrderId-MMYY D1 D2……..Dn Wx Wx+1 …… Wy Imp Clicks Name Email …

• Htable – Schema Less• Use Hbase Incrementor - incrementColumnValue for Weekly &

MTD• Hive Windowing UDF to generate flattened daily row• Carefully choose Rowkey• SCD – Comes free• Performance – Physical file Hfile by table & Column Family

Number GameSize – 360GB Format – RCFile Rows – 14.7 BilionMappers – 562 Reducers – 436Elapsed Time <= 30 mins

Page 12: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Hadoop/RDBMS

Hadoop/RDBMS

BIG DATA

SLA

Challenge@Hand

Page 13: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

What users love? – Excel & Pivot

Page 14: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

What if I n

eed to Pivot

Having few M

illion Record

Or maybe Billio

n records

But “Hang” on a minute? – BIG DATA?

Page 15: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Our Answer – Hadoop PivotNumber GameSize – 360GBFormat – RCFileRows – 14.7 BilionMappers – 670Reducers – 30Elapsed Time – 251 secs [< 5 mins]

Voila – Back to Excel

Page 16: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Questions?

Page 17: Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

Hadoop HDFS – Hourly FeedsHadoop HDFS – Hourly Feeds

Hadoop HDFS Grid – Daily Feeds & AggregatesHadoop HDFS Grid – Daily Feeds & Aggregates

Oracle RAC 8 Node60TB

Oracle RAC 8 Node60TB

Oracle ETL ServerOracle ETL Server

BI App ServerBI App Server

BI Web ServerBI Web Server

App Server ,Grid Launcher BoxApp Server ,Grid Launcher Box

GRID Based ReportWeb Server

GRID Based ReportWeb Server

MetadataMetadata

Unified Web BI PortalUnified Web BI Portal

Web Services Data Access Layer [ ODBC/PL/SQL API]Web Services Data Access Layer [ ODBC/PL/SQL API]

DimensionsHBase

DimensionsHBase

Facts on HDFS [Rcfile]

Facts on HDFS [Rcfile]

OtherToolsOtherTools

TRADITIONAL

GRID

Hive + PIG – Query EngineHive + PIG – Query EngineSchedulerScheduler