Upload
dmitry-tolpeko
View
491
Download
0
Embed Size (px)
Citation preview
1
EPAM BI Competency Center
1Near Real-time Marketing
Support SystemImplementation Details
by <Kiryl Sultanau> & <Yauheni Yushyn> & <Dzmitry Maskayeu>
2
Preamble
3
Bidding Support
Improve Ad campaigns
NRT Data Visualization:● Near Real Time visualization of bids match with clicks, leads etc.● Detect best Ad type/place/size/position... for different users/devices/regions...● Quick reaction and better estimation for just started Ad Campaigns.
Improve Ad campaigns:● Improve keyword campaigns with more relevant keywords and better
specialized target group, region etc.● Create new short time campaigns for special events or occasions● Collect specific users and information about them
4
Prerequisites
External bidding info (impressions, clicks)
Ad Campaign infoPublisher log streams (impressions, click, search...
Other Dictionaries
5
Architectural Overview
6
PipelineSourcedata
ETLlayer
Presentationlayer
7
Stream Processing
server log [cookies, user_agent, city_id, log_type_id …]
Ad Exchange
- Google DoubleClick AdX
- TANX Alibaba- Baidu- Google Mobile...
JOIN
City
US city names
Log Type
- bid-impression- bid-click- site-open- site-search- site-impression- site-click
Site Pages
Owner URL & google tag
User Tag
External URL & user search keyword
State
US state names
KeywordsUser Keywords as union of google tags and user search keywords
Spark Cache
Kafka RDDDataFrame Apply schema
8
joined server log DataFrame
Parse User Agent String
Browser
OS
Group
Manufacturer
Rendering engine
Version: major, minor
Name
Name
Platform
Device
Manufacturer
DataFrame
Stream Processing
9
joined server log + user agentDataFrame
JOIN
Cassandra table UNPIVOTDataFrame
id bid_click_kw site_open_kw site_click_kw site_lead_kwsite_search_k
w
joined server log + user agent + previous user behaviorDataFrame
Stream Processing
10
joined server log + user agent + previous user behaviorDataFrame
joined server log + user agent + previous user behavior + target group marker
Stream Processing
11
Saving data
Users Dimension
Analytics
Service API
12
Saving data
13
Visualisation
Discover
VisualizeDashboard
14
Tags Analyser Tool● real time data● slices by any collected
metric (time, geo-location, action type, make, model, user behavior …)
● apply filters on the fly easy as a cake
● combine and manage filters
● share dashboards● add new visualisations on
the fly● serve all this staff from UI
15
NRT Data Visualization
16
Question: How to recognize users that will potentially bring profit to provider?
Input data: Logs of searches and clicks on site, logs from partner sites. The data will be merged and split on parts: 60% training, 20% test, 20% validation.
Features: The variables for model training that we’ll use as defining the output are: region, city, user actions and searches on site.
Algorithms: Deep Learning algorithm from H2O package.
Evaluation: The model will be evaluated based on number of predicted clicks + N * number of predicted conversions.
Model Usage: After being trained the Model will receive data on user and his actions on site and will provide probability that this user will click on ad.
Lead Prediction Using Machine Learning
17
NRT Bidding
region: LA, CAsex: maleage: 31stream: google.com > Edmunds.com > search: SUV
region: CAtags: top, SUV, 2015price: 90$ CPMlimit: 200$ day
region: CA, NYtags: SUV, crossoverprice: 70$ CPMlimit: 300$ day
18
Crawl Social Networks (event, places, post, feeds...)
19
Crawl Social Networks (attenders, followers, likers...)
20Confidential 20