Upload
conor-duke
View
1.060
Download
1
Embed Size (px)
Citation preview
Fabrikatyr AnalyticsUncover tangible truths amidst the noise of modern media
Using predictive modelling to increase campaign response ratePyCon Dublin - 2015U@Conr @fabrikatyr
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Agenda➔ Business problem
➔ Data and quality checks
➔ Tools used
➔ Methods
➔ Outcomes
➔ Next steps
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Business Problem
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate 4
The client currently executes successful SMS campaigns across the globe, however it wants to leverage its existing data to increase response rate
This presents the following operational challenges
What consumer behaviours can be modelled d to predict engagement?
Which behaviours occur across fd campaigns, regions & demographics?
How can these behaviours be leveraged to drive value?
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Data and quality checks
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
We took 1.5 mil user profiles across 5 brands and 8 campaigns across the globe and analysed over 6.2 mill SMS transactions to understand what drives response
Regions Brands
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Database 1
Database 2
Database 3
Database 4
List
User Campaign
Outbound Messages
Campaign Details
Inbound Messages
Features
List of all people surveyed + how they respond
Every SMS sent during the campaign
Details of the Campaign
All the people who respond
Features we think are important
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
We faced a number of challenges analysing large volumes of data which reduced the visibility of the predictive patterns
Duplicate responses
No unique consumer ID
Incomplete profiles
Non-uniform dates
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Tools Used
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
https://gist.github.com/iamatypeofwalrus/5183133
Deploy free scaleable data science toolkit ● Python 2.7
● Anaconda for Python
● Jupyter Notebook
● mySQL
● Ubuntu AMI
● Amazon RDS
● AWS EC2 Instance
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Packages
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Useful packages for visualising
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Methods
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Feature Preparation
User ClusteringPredictive modelling
Feature selection
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Feature preparation
15
Naming convention
Kanban for processing
Synthetic features
Mise en placeFrench culinary phrase which meaning "putting in place" or the arrangement of a chef’s workspace before the beginning of dinner service
Meaningful and standardised naming conventions are critical
Kanbans are perfect for loading, naming and mapping data
Synthetic features are key to uncover meaningful results
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Comparison between
frequency of inbound and
outbound messaging
Alway check for distributions that are in-line with expectations
35
30
25
20
15
10
5
0
800
700
600
500
400
300
200
100
0
SMS Sent
SMS Received
0
500
1,000
1,500
2,000
2,500 2014-12-22 - for 193 days till - 2015-07-03User
July 2015 - PyCon Dublin 2015 Fabrikatyr - Increasing DM response rate
Count of Valid Users
Count of Valid
inbound responses
The frequencies of consumer responses indicates a non-normal* response rate and outliers need to be removed
July 2015 - PyCon Dublin 2015 Fabrikatyr – Increase DM response rate
Preliminary analysis
Filtering for outliers 6 of the 8 campaigns have relatively normal distributions
Campaign 3
Campaign 2
Consumer response rate distributer by campaign
Campaign 1
Campaign 4
Campaign 5
Campaign 6
1 2 3 4 5 10 Response per Consumer
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Dealing with descriptive variable Hot-encoding!
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Starting with 45 pieces of information per consumer we added 30 pieces of campaign information and 20 items of behavioural information
Synthetic features
Campaign features
Time based feature
User characteristics
Key Features
User
Outbound
Campaign
Inbound Key Features
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
User ClusteringPredictive modelling
Iterative loop
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Histograms versus K-means clustering for user groupings
Response count
K-means of continuous variablesHistogram of response rateResponse count
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
How to make useable clusters?Continuous and discrete variables don’t make clustering easy
Two solutions
Histograms (easy to generate)or
Spectral clustering methods(robust to new data)
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Predicting on 100 features can be expensive! Tree’s help
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Using random forest models and decision trees can speed things up
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
We can then use Confusion matrix to test ‘predictability’ of the behaviour we are interested in - Responding!
0: Frequency=01: Frequency=12: Frequency>1
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
If a model is to good to be true, it usually is, so be wary of adding synthetic feature which strongly infer a behaviour
0: Frequency=01: Frequency=12: Frequency>1
Including a vector‘invalid response’ clearly indicates they will respond
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Cross-validation is a good idea● Randomly partitioned into k equal
sized subsamples. ● Asingle subsample is retained as
the validation data for testing and the remaining k − 1 subsamples are training data.
● The k results from the folds can then be averaged to produce a single estimation
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Features which influence
consumer response
● We identified 7 features which strongly influence a user's likelihood of response.
● User behaviour & language dominate, campaign features are important and time features influence*
Campaigns
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
18% users are responsible for 78% of responses , these users have distinctly different influencing characteristics compared to the general population
Campaigns
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
14,944Valid 569
669,589Valid
348,917
4,968Valid399
Data collection can be a quick winSome feature variables were unuseable
3.8% 51% 8%
Valid date of birth collection
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Starting with 45 pieces of information per consumer we added 30 pieces of campaign information and 20 items of behavioural information
Synthetic features
Campaign features
Time based feature
User characteristics
Key Features
User
Outbound
Campaign
Inbound Key Features
Oct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Increasing profile collection data quality , targeting hi-value users by behaviour and encouraging people to respond more than once will increase campaign revenue by T%
Find the whales
Apply automate data quality checks on user profile gathering
Generate and target ‘hi-value’ cohorts based on behaviour
Identify offer to encourage low value cohorts to respond >1
User profile revenue
Applying predictive to 100 campaigns with an average of €XK earning will yield an extra €YK
The following activities will deliver percent benefits in inbound messages
R%
P%
Q%