Promotion Analytics in Consumer Electronics - Module 1: Data

Workshop Overview

• Module 1: Data

• Module 2: Model and Estimation

• Module 3: Sample Output and Empirical Generalization

Outline

• Ideal Data for Promotion/Pricing Analytics: Scanner Data (in CPG)

• Data and Inference: What Can Go Wrong?

• Challenges and Common Mistakes in Consumer Electronics

• Data Requirement and Potential Data Source

Scanner Data (Store Audit Data)How is Data Collected?

• Syndicated data providers: IRI and A.C. Nielsen

• Sample of stores (Grocery, Drug, Convenience, Mass Merchandiser, Warehouse stores)

• Scanner data

– UPC info (product features), (Retail) price, Quantity (Volume) all recorded

• Features

– Centrally collected and coded (daily)

• Displays

– Collected by store auditors (1X/week)

4 Data Dimensions

• The Data Cube– Geography (Market) x Product x Time x Variable (Measure)– G x P x T x V > 1,000,000 even for one category

• Aggregation (chain/regions, SKU groups, temporal)

Scanner Data: Advantages

• Completeness– Linking aggregate sales movements to marketing instruments

(price, feature, display, etc)– Obtaining a richer set of performance measures beyond market

share and factory shipments

• Timeliness– Getting the data within a window that allows for meaningful

managerial action (i.e. less than old lag time of 8 weeks or more)

• Accuracy

Scanner Data: Limitations

• Not a complete sampling frame: excluded stores– Small shops, Walmart!

• Hard to make causal statements without careful modeling: non-random assignment

• No information on consumer behaviors before purchases (e.g. search, consideration) and consumption after purchases

• No information on psychographics

5 %

Week

Mar

ket S

hare

8 %

4.5 %

Promotion Week

1 2 3 4 5 6 7

4.8 %

8

Purchase Deceleration

Purchase Acceleration

Net Effect = (8 - 5) - .2 - .5 = 2.3%

Promotion Analytics from Scanner Data

• A simplistic picture

5 10 15 20 25 30Week

0.2

0.4

0.6

0.8

1

te

kra

Me

ra

hS

0.25

0.5

0.75

1

eci

rP

FDC

FDC

FD

FDC

FDC

FD

FD

F = Feature, D = Display, C = Store Coupon

Promotions: Actual data

Promotion Types

(End of Aisle) Display

FeaturePrice-cut(BOGO)

Coupon

1. Size of Data Information in Data

• Consider the following two options:

(1) Wal-Mart with 4,000 stores, 52 weeks of data, 500 SKUs (104 million observations!)

(2) Best-buy with 1,500 stores, 52 weeks of data, 500 SKUs (39 million observations)

• Which dataset would be more useful to measure price responses?

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

5

10

15

20

25Wal-Mart (EDLP)

P1 P2 P31 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52

0

5

10

15

20

25

Best-Buy (Hi-Lo)

P1 P2 P3

2. Pay Attention to Signal-to-Noise Ratio• Consider the following measurement. Is there significant impact from marketing event?

• Well, it depends on signal-to-noise ratio!

Revenue Before Event After Event % Change

Average 10 13 30

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16

Revenue before/after Event

0 20 40 60 80 100 1200

20

40

60

80

100

120

Revenue before/after event

3. Careful about Reverse Causality• Imagine the following data generating process.

• If you do analysis ignoring the reverse causality, you may conclude the following.

0 1 2 3 4 5 60

10

20

30

40

50

60

Sales (m,t) vs. Adv (m,t)

Adv (m,t)

Sale

s (m

,t)

Coefficients Standard Error P-value

Intercept -3.357941562 1.121241507 0.004875

Advertising (t) 9.716546286 0.354635984 3.58E-26

R square: 0.95 -> Good fit!

Significant impact of advertising?

4. Omitted Variables Can Be Dangerous• Oftentimes, we don’t have data on some important variables, which can impact sales,

revenue, or profits. – Doing analytics ignoring these “omitted variables” can lead to “biased” estimates of

marketing mix effects.

• Think about the graph below (from NYT). Is the family income really responsible for better academic achievement? What would be potential omitted variable bias here?

5. Selection by Outcome: Bad Idea!• Problem: Often times, two groups, which are conditioned by outcome variables, are

compared to infer the causal impact of marketing mix

• Example– To calculate the ROI of paid search campaign, advertisers compare the “conversion

rates” of each “search” keyword. Usually, branded keywords are shown to have high conversion rates (> 6%) compared to generic keywords (~ 1%).

• How to fix the problems? – Use proper “control” condition!– In the paid search example, all the traffics/conversions from consumers who click on

branded keywords are attributed to the resulting sales and profit. An implicit assumption here is that all the sales/profits are lost without paid search. Really?

– It’s possible that consumers who use branded keywords are already quite committed to purchase, and they may simply substitute to unpaid (organic) search links if paid searches are turned off.

– A proper control in this case is “halting selected search engine marketing keywords”

Key Challenges

• There are no syndicated data providers such as IRI and Nielsen in Consumer Electronics

• Slightly better situation in North America or Europe– NPD (U.S.), GfK (Europe) provides market (or retail channel) level unit sales

and price data by SKUs– However, they do not provide promotion details– Even with promotion data, the use of market (or channel) level data can cause

aggregation bias (i.e. overestimation of promotion effects)

• You have to assemble multiple datasets on your own– At least 2 ~ 3 datasets need to be merged– SKU-level unit sales data from ERP + External tracking service data (on price

and promotion): half-blind (no sales info for competitors)– Better data access if you are a category captain – Most painful and time-consuming step: organizational silo

Common Mistakes: For Discussion• Use factory shipment data instead of retail sales data

– Biased promotion effect estimates due to forward buying from retailers

• Use cross-sectional data to measure price/promotion effects– Biased price or promotion effect estimates due to omitted variable bias– Better to use panel data and identify effects from within-store (or within chain) variation

• Use market (or channel) level data– Promotion effects are not homogeneous within a market (or channel)– Due to aggregation bias, promotion effects will be overstated – Better to use store, account, or chain-level data where promotion activities do not vary

across units

• Use data with short history (1 year or less)– At least, 2 ~ 3 years of data are required to properly measure seasonality

• Ignore price changes and promotion from competitors– Biased estimates of baseline sales and price/promotion effects

Consumer Sales vs. Factory Shipments

20,000

40,000

60,000

80,000

100,000

1978 1979 1980 1981 1982

Promotion

Shipment

Retail sales

Data Requirement

• Key elements of data– Unit sales by SKUs (outcome): ideally for the entire category (including competitors), but

feasible only with data for focal company‘s own SKUs– Price measures by SKU(causal): focal company + competitors – Promotion measures by SKU/product line/brand (causal): focal company + competitors

• Duration– Ideally 3 years (of weekly data); At least 2 years of data– To properly control seasonality

• Level of aggregation– Ideally store-level data; chain or account (chain-market combination) data can be used

as long as promotion/price policies are uniform (within chain or account)– Using market or channel-level data can cause overstating of promotion effects due to

aggregation bias

• Type of response data: Retail sales data (Do not use factory shipment data)– Due to forward buying from retailers

• Level of aggregation

Potential Data Source: For Discussion

• Key elements of data

– Unit sales by SKUs (outcome)

– Price measures by SKU(causal)

– Promotion measures by SKU/product line/brand (causal)

Marketing

Promotion Analytics in Consumer Electronics - Module 1: Data