Upload
minha-hwang
View
239
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Promotion Analytics in Consumer Electronics and High-tech Industries: Module 1 - Data
Citation preview
Workshop Overview
• Module 1: Data
• Module 2: Model and Estimation
• Module 3: Sample Output and Empirical Generalization
Outline
• Ideal Data for Promotion/Pricing Analytics: Scanner Data (in CPG)
• Data and Inference: What Can Go Wrong?
• Challenges and Common Mistakes in Consumer Electronics
• Data Requirement and Potential Data Source
Scanner Data (Store Audit Data)How is Data Collected?
• Syndicated data providers: IRI and A.C. Nielsen
• Sample of stores (Grocery, Drug, Convenience, Mass Merchandiser, Warehouse stores)
• Scanner data
– UPC info (product features), (Retail) price, Quantity (Volume) all recorded
• Features
– Centrally collected and coded (daily)
• Displays
– Collected by store auditors (1X/week)
4 Data Dimensions
• The Data Cube– Geography (Market) x Product x Time x Variable (Measure)– G x P x T x V > 1,000,000 even for one category
• Aggregation (chain/regions, SKU groups, temporal)
Scanner Data: Advantages
• Completeness– Linking aggregate sales movements to marketing instruments
(price, feature, display, etc)– Obtaining a richer set of performance measures beyond market
share and factory shipments
• Timeliness– Getting the data within a window that allows for meaningful
managerial action (i.e. less than old lag time of 8 weeks or more)
• Accuracy
Scanner Data: Limitations
• Not a complete sampling frame: excluded stores– Small shops, Walmart!
• Hard to make causal statements without careful modeling: non-random assignment
• No information on consumer behaviors before purchases (e.g. search, consideration) and consumption after purchases
• No information on psychographics
5 %
Week
Mar
ket S
hare
8 %
4.5 %
Promotion Week
1 2 3 4 5 6 7
4.8 %
8
Purchase Deceleration
Purchase Acceleration
Net Effect = (8 - 5) - .2 - .5 = 2.3%
Promotion Analytics from Scanner Data
• A simplistic picture
5 10 15 20 25 30Week
0.2
0.4
0.6
0.8
1
te
kra
Me
ra
hS
0.25
0.5
0.75
1
eci
rP
FDC
FDC
FD
FDC
FDC
FD
FD
F = Feature, D = Display, C = Store Coupon
Promotions: Actual data
Promotion Types
(End of Aisle) Display
FeaturePrice-cut(BOGO)
Coupon
1. Size of Data Information in Data
• Consider the following two options:
(1) Wal-Mart with 4,000 stores, 52 weeks of data, 500 SKUs (104 million observations!)
(2) Best-buy with 1,500 stores, 52 weeks of data, 500 SKUs (39 million observations)
• Which dataset would be more useful to measure price responses?
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
5
10
15
20
25Wal-Mart (EDLP)
P1 P2 P31 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52
0
5
10
15
20
25
Best-Buy (Hi-Lo)
P1 P2 P3
2. Pay Attention to Signal-to-Noise Ratio• Consider the following measurement. Is there significant impact from marketing event?
• Well, it depends on signal-to-noise ratio!
Revenue Before Event After Event % Change
Average 10 13 30
0 20 40 60 80 100 1200
2
4
6
8
10
12
14
16
Revenue before/after Event
0 20 40 60 80 100 1200
20
40
60
80
100
120
Revenue before/after event
3. Careful about Reverse Causality• Imagine the following data generating process.
• If you do analysis ignoring the reverse causality, you may conclude the following.
0 1 2 3 4 5 60
10
20
30
40
50
60
Sales (m,t) vs. Adv (m,t)
Adv (m,t)
Sale
s (m
,t)
Coefficients Standard Error P-value
Intercept -3.357941562 1.121241507 0.004875
Advertising (t) 9.716546286 0.354635984 3.58E-26
R square: 0.95 -> Good fit!
Significant impact of advertising?
4. Omitted Variables Can Be Dangerous• Oftentimes, we don’t have data on some important variables, which can impact sales,
revenue, or profits. – Doing analytics ignoring these “omitted variables” can lead to “biased” estimates of
marketing mix effects.
• Think about the graph below (from NYT). Is the family income really responsible for better academic achievement? What would be potential omitted variable bias here?
5. Selection by Outcome: Bad Idea!• Problem: Often times, two groups, which are conditioned by outcome variables, are
compared to infer the causal impact of marketing mix
• Example– To calculate the ROI of paid search campaign, advertisers compare the “conversion
rates” of each “search” keyword. Usually, branded keywords are shown to have high conversion rates (> 6%) compared to generic keywords (~ 1%).
• How to fix the problems? – Use proper “control” condition!– In the paid search example, all the traffics/conversions from consumers who click on
branded keywords are attributed to the resulting sales and profit. An implicit assumption here is that all the sales/profits are lost without paid search. Really?
– It’s possible that consumers who use branded keywords are already quite committed to purchase, and they may simply substitute to unpaid (organic) search links if paid searches are turned off.
– A proper control in this case is “halting selected search engine marketing keywords”
Key Challenges
• There are no syndicated data providers such as IRI and Nielsen in Consumer Electronics
• Slightly better situation in North America or Europe– NPD (U.S.), GfK (Europe) provides market (or retail channel) level unit sales
and price data by SKUs– However, they do not provide promotion details– Even with promotion data, the use of market (or channel) level data can cause
aggregation bias (i.e. overestimation of promotion effects)
• You have to assemble multiple datasets on your own– At least 2 ~ 3 datasets need to be merged– SKU-level unit sales data from ERP + External tracking service data (on price
and promotion): half-blind (no sales info for competitors)– Better data access if you are a category captain – Most painful and time-consuming step: organizational silo
Common Mistakes: For Discussion• Use factory shipment data instead of retail sales data
– Biased promotion effect estimates due to forward buying from retailers
• Use cross-sectional data to measure price/promotion effects– Biased price or promotion effect estimates due to omitted variable bias– Better to use panel data and identify effects from within-store (or within chain) variation
• Use market (or channel) level data– Promotion effects are not homogeneous within a market (or channel)– Due to aggregation bias, promotion effects will be overstated – Better to use store, account, or chain-level data where promotion activities do not vary
across units
• Use data with short history (1 year or less)– At least, 2 ~ 3 years of data are required to properly measure seasonality
• Ignore price changes and promotion from competitors– Biased estimates of baseline sales and price/promotion effects
Consumer Sales vs. Factory Shipments
20,000
40,000
60,000
80,000
100,000
1978 1979 1980 1981 1982
Promotion
Shipment
Retail sales
Data Requirement
• Key elements of data– Unit sales by SKUs (outcome): ideally for the entire category (including competitors), but
feasible only with data for focal company‘s own SKUs– Price measures by SKU(causal): focal company + competitors – Promotion measures by SKU/product line/brand (causal): focal company + competitors
• Duration– Ideally 3 years (of weekly data); At least 2 years of data– To properly control seasonality
• Level of aggregation– Ideally store-level data; chain or account (chain-market combination) data can be used
as long as promotion/price policies are uniform (within chain or account)– Using market or channel-level data can cause overstating of promotion effects due to
aggregation bias
• Type of response data: Retail sales data (Do not use factory shipment data)– Due to forward buying from retailers
• Level of aggregation
Potential Data Source: For Discussion
• Key elements of data
– Unit sales by SKUs (outcome)
– Price measures by SKU(causal)
– Promotion measures by SKU/product line/brand (causal)