Upload
veronica-ferguson
View
223
Download
3
Tags:
Embed Size (px)
Citation preview
Database Marketing
Dr. Ron Rymon
Marketing Communications Program
IDC, Herzliya
Overview
Goal: describe the framework, and touch on the current trends and buzzwords
Outline: Uses of the marketing database The Data Implementation technologies Analysis techniques Modeling techniques
Uses of the Marketing Database
The Marketing Database
• Comprehensive collection of interrelated data ...
• Arranged around each customer ...
• Allow timely and accurate retrieval ...
• Support analytical, predictive, operational needs ...
• Serving multiple applications …
Active Database:An Integrated Business Resource
Marketing
Distribution
CustomerService
SalesResearch
Finance
Database
Information is Power:Active databases drive the business…
• Identify your best customers– profitability analysis, clustering
• Develop new customers and cross-sell– similar to current, identify competitors’ customers
• Improve delivery of sales promotion– response modeling / targeting
• Personalize message– based on purchase patterns, volume
• Use as a research tool across the organization– customer, product, market research
Key Building Blocks
• Data– a database is only as powerful as its data
• Implementation technologies– hardware, networking, warehouses
• Analysis techniques– RFM, LTV, OLAP, Segmentation, Visualization
• Modeling– Regressions, Artificial Intelligence, Data Mining
The Data
A database is only as powerful as the data it houses.
• Every database is a collection of records• Each record is a collection of fields• Here, one record per customer and/or prospect
– unique identifier– general customer/account information
• including demographic, psychographic, socio-economic
– our offers+communications to the customer– customer’s actions: response, purchase, payment
Customer-centric Database
What Data To Hold
• Too often, data is collected based on availability, and not based on projected need
• Should accumulate internally– data that can be used to support current and future
strategies (mktg and otherwise, e.g., operations)
– …. data that may be valuable to other organizations
• Should source external data– unavailable internally
– too expensive to maintain/update
What’s in YOUR database?
Data Requirements
Basic Demo Usage Payment
Targeting
Retention
Mar
keti
ng +
oth
erA
ppli
cati
ons
Usageupgrading
Data Sources : Internal
• Operations / Sales– past usage/purchase, e.g., amount, variability
• Finance– payments, e.g., timeliness, amounts
• Customer service– e.g., inquiries, complaints
• Other data collection methods:– sales/orders, promotions/drawings, inquiries, surveys,
warranty cards, research panels...
Data Sources: Distribution Channels
• Many companies use distributors, retailers
• Problem: lack of direct communications with end-customer, no “relationship”
• Part-solution 1: keep tabs on channel + aggregate statistics on customers
• More aggressive solution: special marketing programs to reach customers
Data Sources : External Lists
• 50% of U.S. DMers sell their lists
• Use to enlarge universe : new names
– can buy segments by specific features (model)
• Enhance data : cross information
– U.S. census data
– Credit bureau
– Various marketers of related products
– List compilers / maintainers / sellers
John Smith 60 10.2
Eric Cohen 35 1.3
Jack Marshal 20 0
enhance data
enlargeuniverse
Other Data Sources
• Mass-advertised offers– TV call-ins, direct response
• Joint offers with other merchants– take-one brochures in banks, restaurants– drawings
• Trade shows, happenings, community activity
• Referrals!
Data Management
• Many sources:– conversions, transformations, cleaning, merge-purge
• Many “clients”– marketing, sales, product managers, operations
• Temporal issues– updates, audits, archives/deltas
• Quantities: huge databases– Storage, access, processing, communications
• Resolution, Enhancement
Merge-Purge : Example
• Palmer, Robert and Mary, 123 Sun Avenue, Apt 7, Key West, FL 31250
• Dr. Robert C. Palmer, Custom Engineers LLC, 123 Sun Avenue, 7th Floor, Key West, FL 31250
• Rob Charles Palmer, CE Inc., 123 Sun Ave #7, Key West FL 31250
• Bob Palmer Jr., 123 Sun #7, Key West, FL 31252• Maria Palmer, 123 Sun Avenue, Suite 7, Key
West, FL 31250
Other Issues
• Legal– Privacy Act– Anti-discrimination– Advertising Code– Telephone Consumer Protection Act
• Consumer groups– Right to be omitted (just write to DMA)– Environmental issues
• DMA invests in education:– Dmers: best practice– Customers: better image
Implementation Technologies
Computing Platforms
• Issues / Needs:– Information sources + integration
– Storage/access, maintenance, completeness, update
– Computation: process queries, algorithms
– Analyses and reports, feed to operations, customer/user interaction
• Trends:– Traditionally, all DMers used mainframes
– Today, some migration to mid-range (UNIX)
– PC-based computers gaining power (NT)
– Client/Server architectures
– Everything networked
Applications
• Database is a foundational software
• Must support variety of applications:– transaction processing
– analyses
– on-line interaction
• Trends:– Relational databases
– Data warehouses
– Data redundancy/multiplicity
Database
Database Management
O.S.
Relational Database
ID Cust Name Address …1234 John Brown 123 Main St ….
ID Date Cust Product Quant. Price98765 3.5.98 1234 A703 5 150.0098766 4.5.98 1234 A707 2 240.0098767 4.5.98 1235 A703 1 30.00
Transactions for John Brown
3.5.98 5 Levis Jeans $1504.5.98 2 CK Jacket $120
Purchases of Levis Jeans
3.5.98 John Brown 5 $1504.5.98 Jane Doe 1 $30
ID Product Supplier …..A703 Levis Jeans S7003 ….
Tables
Reports (SQL Queries)
Data Warehouse
• Stores data for informational and analytical processing
– Separate from operations
– Subject-oriented
– Integrated
– Historical
Operational Data Warehouse
loans
credit card
savings customer
product
investments
Example: Computer-by-Mail Inc.
House Files
TelemarketersClient --- Server
operations
Analyst
MktgExecutive
DataWarehouse
Analysis Techniques
Data Limitations
• Important: The data is a limited encoding of reality
• Many potholes:– Omission
– Errors, noise
– Representation
– Sampling bias
• Cannot be too careful !
Exploratory data analysis :Single-variable
• Descriptive statistics– Mean, Median– Variance
• Histograms– Shows distribution
0%
5%
10%
15%
20%
25%
30%
35%
40%
0-20 20-30 30-40 40+
Exploratory data analysis:Multi-variable
• Examine relationship between two or more variables– Cross tabs
– Correlation
– Scatter plots
– Clustered histograms
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
0-20 20-30 30-40 40+
BuyerNoBuy
RFM Analysis
• RFM score– Recency: how close is the last purchase– Frequency: number of recent purchases– Monetary: dollars spent recently
• Example:– Recency: 10 pts if within 3 mos, 1 pt lower per additional
month, to 1– Frequency: 1 pt for each purchase within 12 months– Monetary: 1pt for each $100 in past year, to 10– Score=R*F*M, the higher the better
Life-Time Value of Customers
• LTV Goal:– recognize each customer’s contribution
• Method:– calculate the “expected” net revenue
– discounted:• risk of attrition
• probability of sales
• rate of money
• Typically computed per 1000, if possible by segment
OLAP Tools
• OLAP : On-Line Analytical Processing
• Goal: A database-driven system that provides– Fast
– Analysis• common business reports, statistics
– of Shared• same information available to many users
– Multi-dimensional• every piece of information is multiply categorized
– Information“The OLAP Report”
OLAP Tools
• Data represented internally as multi-dimensional cube
– e.g., customer’s attributes, purchases, payments, etc.
• User chooses presenting two dimensions at a time
– e.g., show $-sales, by geographic region and income
• Heavy use of hierarchical variables, with drilling capabilities:
– time: year, quarter, month, week, day, hour
– product: hardware, printers, small printers, PX-1000
– dollars: by ranges 0-1000, 1000-5000, 5000-25000, etc.
• Analyses, highlights of interesting cases, etc.
OLAP Tools: Example Screen
Data Visualization Tools
• Many relationships are best communicated visually:– histograms, pie-charts, scatter plots, graphs
– use color/texture, shapes
– temporal animations
• Visualization software allows– single-variable over time
– one variable as a function of another
– interaction detection
– segmentation
Modeling Techniques
Modeling Behavior
• Target variable– a.k.a. dependent/modeled/explained variable
– typically, whether bought/responded or not
• Goal:– Use other variables in a model to classify/predict
– other variables: a.k.a. independent, observable, explaining
– model: formula, algorithm
• Success criterion: future performance
Modeling and Validation Framework
• Data flow:– Historical data
– Modeling software
– Constructs model
– Tested on more historical data
– Repeat until satisfied
– Use model to predict
Training SetModel
Productionor Test Set
Output
Critical Success Factors
• Choice of data– scope: same/similar period, audience, offer,
communication
– explaining variables: available, useful, well-represented
• Choice of modeling technique– appropriate for the goal
– powerful: good fitting power
• Careful and “pessimistic” testing and validation
Validation
• NEVER test on same data set– avoid “memorizing” the data, overfitting
• Out-of-sample methods– separate training set and test set– cross-validation, a.k.a. jack-knifing– remember temporal aspect
• Evaluate the model’s robustness– estimate chance probability, bootstrapping
Classification v. Prediction Systems
• Classification systems:– distinguish few types of customers, e.g., responded or not– technically, target variable is discrete/categorical– validation through “hit rate”
• Prediction systems– predict probability of purchase, or purchase dollars– technically, target variable is continuous– validation through “closeness” measures
Linear Scoring Systems
• Use linear regression
• Coefficients evaluated using historical data
• Higher score interpreted as greater likelihood of responding
• Every coefficient measures “independent” contribution
• Classification variant: discriminant analysis– e.g., predict response if score is above 0.3
dcba AgesePastPurchaIncomeScore
Logistic Regressions
• Logistic regression (logit)
• Target variable– historical data: 0 or 1
– future application: used as probability
• Independent variables: continuous or categorical• Probit: variation that relies on normal distribution
dcba
dcba
1
AgesePastPurchaIncome
AgesePastPurchaIncome
e
eyProbabilit
Presenting and Evaluating Results
• Lift table
Top Scoring % Respond %Non-Respond5% 26.8% 4.9%
10% 41.2% 9.8%15% 52.4% 14.8%20% 62.6% 19.8%
50% 87.9% 49.7%75% 96.5% 74.8%
Presenting and Evaluating Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
% Non-Responders
%R
es
po
nd
ers
• Lift curve (a.k.a. Receiver Operating Characteristic curve)
Presenting and Evaluating Results
• Confusion Matrix (given a specific threshold)
• Accuracy=(Pr,r+Pn,n)/Total• Detection=Pr,r/(Pr,r+Pr,n)• Two error types: Pr,n and Pn,r
PredictedRespond
PredictedNon-Respond
Actual Respond Pr,r Pn,r
ActualNon-Respond
Pr,n Pn,n
Break-even Analysis
• Issue: how much to mail?• Solution: find break-even point• e.g.:
• Caution: use held-out data!
%mail %respond Income Cost ofproduct
Cost ofmail
MarginalProfit
5% 25% $1,250 $750 $50 $45010% 21% $1,050 $630 $50 $37015% 16% $800 $480 $50 $27020% 11% $550 $330 $50 $17025% 6% $300 $180 $50 $7030% 3% $150 $90 $50 $1035% 2% $100 $60 $50 -$10
Non-Linear Systems
• In regressions, a change in one independent variables always affects in same direction– e.g., if age affects positively, then the older the better, always
• One solution: transformations– e.g., if U-shaped relation, use quadratic form
• Or, use non-linear techniques:– Neural networks
– Decision trees
– Other: rule-based systems, genetic algorithms, Bayesian nets
Neural Networks
• Motivated by biological nervous system
• Perceptron = a model of a neuron
WiXi
WiXi
e
eactivation
1
1x 2x 3x 4x 5x
1w5w
4w2w 3w
Classical Neural Net
• Multi-layer network of perceptrons
• Proper weights are “discovered” from random– forward propagation of training set
– compare output to actual target variable
– back propagation of error to adapt weights
50
Decision Trees
• Partition the data based on one attribute...
A B C D Resp.0 0 1 0 Buy0 1 1 1 Buy1 0 1 0 No1 1 0 0 Buy1 1 1 1 No
51
A=0
A=1
Induction of Decision TreesInduction of Decision Trees
Recursively, partition each of the nodes
A B C D Resp.0 0 1 0 Buy0 1 1 1 Buy1 0 1 0 No1 1 0 0 Buy1 1 1 1 No0 0 1 0 Buy
0 1 1 1 Buy1 0 1 0 No1 1 0 0 Buy1 1 1 1 No
52
0 0 1 0 Buy0 1 1 1 Buy
1 0 1 0 No1 1 1 1 No
A B C D Resp.0 0 1 0 Buy0 1 1 1 Buy1 0 1 0 No1 1 0 0 Buy1 1 1 1 No
Induction of Decision TreesInduction of Decision Trees
…until the node is homogeneous
1 1 0 0 Buy
1 0 1 0 No1 1 0 0 Buy1 1 1 1 No
A=0
A=1
C=0 C=1
A B C D Resp.0 0 1 0 Buy0 1 1 1 Buy1 0 1 0 No1 1 0 0 Buy1 1 1 1 No
53
Classification
• Go down a matching path...
A=0
A=1
C=0 C=1
Buy No
(A=1,B=0,C=0,D=1)
Buy
54
ClassificationClassification
Continue...
A=0
A=1
C=0 C=1
(A=1,B=0,C=0,D=1)
Buy
Buy No
55
Classification
A=0
A=1
C=0 C=1
No
(A=1,B=0,C=0,D=1)
NoBuy
Buy
…until reaching a leaf Use the leaf’s probability
56
Set of Rules, or Market Segments
A=0
A=1
C=0 C=1
A=0 => BuyA=1 and C=0 => BuyA=1 and C=1 => No
Buy
Buy
NoNo
• Each rule represents a market segment
Which Modeling Technique?
• Decision trees (ChAID, CART, C4.5)– symbolic: model is interpretable as set of rules
– essentially is a segmentation
– useful when few “classes”, e.g., based on action (send/not, or few offer types)
• Regressions, neural nets– numeric: allows fine-tuning, e.g., for prediction or ranking
– model is hard to interpret and used as “black-box”
– useful when target is continuous
Data Mining
• Knowledge Discovery in Databases (KDD)
• KDD is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data
• Includes all steps of data preparation + management• Data mining step uses statistical techniques, decision
trees, neural nets, etc.
Summary
• Customer data can be leveraged to better understand and manage current customers, and target new ones
• Data analysis and visualization– insights about our customers
– business economics
• Modeling– “mined” insights
– classify/predict behavior