Upload
anil-madan
View
273
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3
Citation preview
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3
Anil MadanSr. Director Engineering, PayPal
$1 in every $6Spent on e-commerce is
spent through PayPal.*
*Source: Morgan Stanley, “eCommerce Disruption: A Global Theme,” January 6, 2013, p.21.
Creating Tomorrow’s
Mobile PaymentExperiences
25 countries with live PayPal fingerprint authenticationon Samsung devices.
Helping DevelopersInnovate & Monetize
New Mobile Apps
Braintree launches its new API, including Pay with PayPal.
PayPal Now Available in 203 Markets10 new markets added in the second quarter,
making PayPal available to 80 million new internet users.
Paraguay
Côte d’Ivoire
Nigeria
Monaco
Belarus
Moldova
Cameroon
Zimbabwe
Montenegro
Macedonia
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
We need to better understand our customers…
Where do prospects sign up for accounts?
How do prospective customers learn about
PayPal?
Acquisition Activation AdoptionAwareness
How can we help them
use PayPal even more?
How can we help them to
complete their 1st
payment?
Business Problem
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
How we solved it…
Direct/Home Page
ProductExperiences
Search EngineMarketing
TransactionEmails
Tracking MetadataTool
Taxonomy
Tracking Event Service
Tracking Servers
Tag Catalog
Tracking Validation Service
Marketing
Segmentation
Real Time Systems
Experimentation
Metadata
AttributionExploratory Analytics Predictive Analytics
Big Data
Mobile
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Reporting & Visualization
Pathing Store
Logical View
Client Side Events
Page Performance
Events
Server Side Events
Collection Service
Sessionization
Behavioral Metrics
Marketing Metrics
Metadata Instrumentation Collection Processing Analytics
Performance Metrics
Operational Metrics (OpenTSDB)
DRUIDMetrics Store
Real Time Event
Metrics
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Metadata –Logical Entity Model
COMPONENTS
PAGETEMPLATE
TAGS
LINK
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Metadata – Logical Event Model
ImpressionEvent
TrackingEvent
ReactionEvent
ComponentImpression
Event
AdImpression
Event
ClickEvent
Click-ThroughEvent
Mouse-overEvent
EntryEvent
ExitEvent
OutcomeEvent
PageImpression
Event
Client PageImpression
Event
Server PageImpression
Event
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
11
Metadata - Self-Service Management Workflow…
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
DATA PIPELINEProcessing Analysis &
VisualizationClientSide
Metadata
Performance
Collection
Metrics
Tools
RESTSpout
Bot flagging
Bolt
AggregationSessionization
RESTProxy
HTTP
ServerSide
Geo Enrichment
Bolt Reporting
Data Stores
Druid
Apache Titan
DevelopersProduct Owners
Customers
Meta data
Reporting Consumers
Metadata Service
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Druid Architecture
• Open-source• Distributed • Real-time • Highly-Available Data store• Column-oriented• Approximate or Exact
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
14
• Ingest data and buffer events in memory
• Incremental indexing• Query data as soon as it is
ingested• Periodically persist collected
events to disk • Combine multiple disk indexes
to create immutable ‘segments’• Log-structured merge-tree
Real Time Nodes
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Druid Architecture
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
16
Historical Nodes
• Load immutable read-optimized data from deep storage
• Memory mapped storage engine• Caches segments • Supports tiered storage
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Druid Architecture
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
18
Druid Systems Overview
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
19
"type": "doubleSum", "name": "pageviews", "fieldName": "PV" }, { "type": "doubleSum", "name": "bounces", "fieldName": "bnc" },.... { "type": "hyperUnique", "name": "unique_visits", "fieldName": "user_session_guid" }, { "type": "hyperUnique", "name": "unique_visitors", "fieldName": "user_guid" }
2014/06/11/10", "filter": "part-", "parser": { "type": "string", "timestampSpec": { "column": "timestamp", "format": "auto" }, "data": { "format": "json", "dimensions": [ "timestamp", "USER_GUID", "USER_SESSION_GUID", "PAGE_GROUP", "PAGE_NAME", "PAGEGROUP_LINK_NAME", "PAGE_LINK_NAME",
…
Metrics & Dimensions
Standard
Metrics
Estimated
Metrics
HyperLogLog
Dimensions
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
20
Sessionization
Visitor ID
SessionID
Timestamp EventPayload
V1 S1 2014-10-16 05:12
E1
V2 S2 2014-10-16 05:14
E2
V1 S1 2014-10-16 05:15
E3
V1 S1 2014-10-16 05:20
E4
V2 S2 2014-10-16 05:21
E5
V1 S3 2014-10-16 05:25
E6
… … … …
Visitor ID
SessionID
Payload
V1 S1 sf, mac, {flash, quicktime}, {ca, usa}, 480 secs,….
E1
E3
E4V2 S2 ff, win, {acrobat, mediaplayer}.
{wb, in}, 420 secs…..E2
E5
V1 S3 sf, mac, {quicktime, java}, {on, ca}, 60 secs
E6
Events VisitContainer
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
21
Druid Storage – Columns & Dictionaries Timestamp (Hr) Sessi
onID
Country OS UserAgent
Page Name
2014-10-16 05 S1 US MAC SF LoginAccountOverview
2014-10-16 05 S2 DE WIN IE LoginPaymentReviewAccountHistory
2014-10-16 05 S3 US LNX FF LoginPaymentReview
Checkout
2014-10-16 05 S4 UK LNX FF LoginProfile
Checkout
2014-10-16 05 S5 DE WIN CR LoginProfile
2014-10-16 05 S6 UK MAC SF LoginAccountOverview
Checkout
Page Name
01
023
024
054
05
014
Dictionary
Login 0
AccountOverview
1
PaymentReview 2
AccountHistory 3
Checkout 4
Profile 5LZF
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
22
Druid Data Structure - Bitmap Indices
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
23
Herald – Self Service Analytics
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
24
Herald – Self Service Analytics
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
25
Druid Metrics
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
26
Enter
Pathing
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
27
Fallout Reports
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
28
Visitor ID Current Page Next Page 1 Next Page 2 Prev Page 1 Prev Page 2
S1 A B C null null
S1 B C D A null
S1 C D X B A
S1 D X A C B
S1 X A M D C
S1 A M null X D
S1 M Null null A X
S2 A B C null Null
S2 B C D null A
S2 C D E B A
S2 D E Null C B
S2 E Null null D C
A->B->C->D->X->A->M and A->B->C->D->E Pathing
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
29
Next Page{ “queryType” : “groupBy” “dimensions” : (“current_page”, “dimensions like country, segmentation etc”} “aggregations” : [ { “type”: “count”, “name”: “next_page_count”, “fieldname” : “next_page, next_page2” }] “filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” }}
Previous Page{ “queryType” : “groupBy” “dimensions” : {“current_page”, “dimensions like country, segmentations etc”} “aggregations” : [ { “type”: “count”, “name”: “prev_page_count”, “fieldname” : “prev_page1, prev_page2” }] “filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” }}
Pathing
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
30
Fallout
• Apply them to the dictionary• Figure out the values that match• Take those bitmap indices• OR the bitmap indices together• Use the output bitmap as the filter
A->D-> X->M
“queryType” : “search” “dimensions” : { “current_page_path_count”, “dimensions like country, segmentation etc”} “filter”: { “type”: “regex”, “dimension”: “next_page_path”, “pattern”: “^A*D*X*M$” }}
A->B->C->D->X->A->M
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
31
Model View
Controller
NVD3Directives
CL
IEN
TS
ER
VE
RHerald Architecture
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
32
SSO
Druid
Herald Deployment
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
33
Name:
Login_2014101611
Country: USCount: 15
Name:
AccountOverview_2014101611
Name:
PaymentReview_
2014101611
Name:
Checkout_2014101611
Count: 8
Country: USCount: 5
Count: 7
Country: USCount: 5
Country: USCount: 10
Count: 5
Count: 5
5
8
7
6
Adhoc Graph Analytics
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
34
Name: Login_2014101611
Country: USCount: 15
Name: AccountOverview_2014101611
Name: PaymentReview_2014101611
Name: Checkout_2014101611
Count: 8
Country: USCount: 5
Count: 7
Country: USCount: 5
Country: USCount: 10
Count: 5
Count: 5
5
8
7
6
gremlin> g.v(‘Name’, ‘Login_2014101611').as('x’).
outE.inV.loop('x'){it.loops < 4}
{it.object.getProperty('name') ==
'Checkout_2014101611'}.path
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
35
Summary
• Problem• Understand our customer behavior• Across disparate channels & experiences
• Solution• Democratize data• Consistent standardized metadata• Disciplined instrumentation• Distributed scalable backend for adhoc & interactive analytics• Self-service BI through modern visualization tools
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Questions ?