Consumer Behavior Prediction using Parametric and Nonparametric Methods

Preview:

DESCRIPTION

Consumer Behavior Prediction using Parametric and Nonparametric Methods. Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery, Rich Caruana, Christos Faloutsos. Outline. Introduction Data Economics Overview Baseline Models New Hybrid Models Results - PowerPoint PPT Presentation

Citation preview

Consumer Behavior Prediction using Parametric and

Nonparametric Methods

Elena EnevaCALD Masters Presentation

19 August 2002

Advisors: Alan Montgomery, Rich Caruana,

Christos Faloutsos

Outline

Introduction Data Economics Overview Baseline Models New Hybrid Models Results Conclusions and Future Work

Background

Retail chains are aiming to customize prices in individual stores

Pricing strategies should adapt to the neighborhood demand

Stores can increase operating profit margins by 33% to 83%

Price Elasticity

consumer’s response to price change

Ppercent

Qpercent E

inelastic elastic

Q is quantity purchased

P is price of product

Data Example

0

20000

40000

60000

80000

100000

0.02 0.03 0.04 0.05 0.06price

quantity

2.75

3.25

3.75

4.25

4.75

5.25

-1.58 -1.53 -1.48 -1.43 -1.38 -1.33 -1.28ln(price)

ln(quant)

Data Example – Log Space

Assumptions

Independence– Substitutes: fresh fruit, other juices– Other Stores

Stationarity– Change over time– Holidays

“The” ModelCategory

Price of Product 1

Price of Product 2

Price of Product 3

Price of Product N

. . .

“I know your

customers”

PredictorPredictor

Quantity bought of Product 1

. . .

Quantity bought of Product 2

Quantity bought of Product 3

Quantity bought of Product N

Need to multiply this across many stores, many categories.

),0(~

))(ln(~)ln(2

N

pfq

conv

ert t

o ln

spa

ce

conv

ert t

o or

igin

al s

pace

Converting to Original Space

),0(~

))(ln(~)ln(2

N

pfq

))(ln(ˆ)ln( pfq

),(~))(ln(|)ln( 2Npfq

2^

2

1)ln(

ˆq

eq

eqE2

2

1

][

Existing Methods

Traditionally – using parametric models (linear regression)

Recently – using non-parametric models (neural networks)

Our Goal

Advantage of LR: known functional form (linear in log space), extrapolation ability

Advantage of NN: flexibility, accuracy

robustness

acc

ura

cy

NNnew

LR

Take Advantage: use the known functional form to bias the NN

Build hybrid models from the baseline models

Datasets

weekly store-level cash register data at the product level

Chilled Orange Juice category

2 years 12 products 10 random stores selected

Evaluation Measure

Root Mean Squared Error (RMS) the average deviation between the

predicted quantity and the true quantity

N

iii qq

NRMSerror

1

1

Models

Hybrids– Smart Prior– MultiTask Learning– Jumping Connections– Frozen Jumping

Connections

Baselines–Linear Regression–Neural Networks

Baselines

Linear Regression

Neural Networks

q is the quantity demanded pi is the price for the ith product K products overall The coefficients a and bi are determined by

the condition that the sum of the square residuals is as small as possible.

Linear Regression

),0(~

)ln()ln(

2

1

N

pbaq i

K

ii

Linear Regression

Results RMS

0

2000

4000

6000

8000

10000

12000

LR NN SmPr MTL JC FJC Vote WAV

Neural Networks

generic nonlinear function approximators

a collection of basic units (neurons), computing a (non)linear function of their input

backpropagation

Neural Networks

1 hidden layer, 100 units, sigmoid activation function

Results RMS

0

2000

4000

6000

8000

10000

12000

LR NN SmPr MTL JC FJC Vote WAV

Hybrids

Smart Prior MultiTask Learning Jumping Connections Frozen Jumping Connections

Smart Prior

Idea: start the NN at a “good” set of weights, help it start from a “smart” prior.

Take this prior from the known “linearity” NN first trained on synthetic data generated

by the LR model NN then trained on the real data

Smart Prior

Results RMS

0

2000

4000

6000

8000

10000

12000

LR NN SmPr MTL JC FJC Vote WAV

Multitask Learning

Idea: learning an additional related task in parallel, using a shared representation

Adding the output of the LR model (built over the same inputs) as an extra output to the NN

Make the net share its hidden nodes between both tasks

Custom halting function Custom RMS function

MultiTask Learning

0

2000

4000

6000

8000

10000

12000

LR NN SmPr MTL JC FJC Vote WAV

Results RMS

Jumping Connections

Idea: fusing LR and NN

change architecture add connections which “jump” over the

hidden layer Gives the effect of simulating a LR and

NN all together

Jumping Connections

Results RMS

0

2000

4000

6000

8000

10000

12000

LR NN SmPr MTL JC FJC Vote WAV

Frozen Jumping Connections

Idea: you have the linearity, now use it!

same architecture as Jumping Connections, plus really emphasizing the linearity

freeze the weights of the jumping layer, so the network can’t “forget” about the linearity

Frozen Jumping Connections

Frozen Jumping Connections

Frozen Jumping Connections

Results RMS

0

2000

4000

6000

8000

10000

12000

LR NN SmPr MTL JC FJC Vote WAV

Models

Hybrids– Smart Prior– MultiTask Learning– Jumping Connections– Frozen Jumping

Connections

Baselines:–Linear Regression–Neural Networks

Combinations–Voting–Weighted Average

Combining Models

Idea: Ensemble Learning

Committee Voting – equal weights for each model’s prediction

Weighted Average – optimal weights determined by a linear regression model

2 baseline and 3 hybrid models (Smart Prior, MultiTask Learning, Frozen Jumping Conections)

Committee Voting

Average the predictions of the models

0

2000

4000

6000

8000

10000

12000

LR NN SmPr MTL JC FJC Vote WAV

Results RMS

Weighted Average – Model Regression

Linear regression on baselines and hybrid models to determine vote weights

Results RMS

0

2000

4000

6000

8000

10000

12000

LR NN SmPr MTL JC FJC Vote WAV

Normalized RMS Error

Compare model performance across stores Stores of different sizes, ages, locations, etc Need to normalize Compare to baselines

Take the error of the LR benchmark as unit error

Normalized RMS Error

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

LR NN SmPr MTL JC FJC Vote WAV

Conclusions

Clearly improved models for customer choice prediction

Will allow stores to price the products more strategically and optimize profits

Maintain better inventories Understand product interaction

Future Work Ideas

analyze Weighted Average model compare extrapolation ability of new

models use other domain knowledge

– shrinkage model – a “super” store model with data pooled across all stores

Acknowledgements

I would like to thank my advisors

and

my CALDling friends and colleagues

The Most Important Slide

for this presentation and the paper:

www.cs.cmu.edu/~eneva/research.htm

eneva@cs.cmu.edu

References

Montgomery, A. (1997). Creating Micro-Marketing Pricing Strategies Using Supermarket Scanner Data

West, P., Brockett, P. and Golden, L (1997) A Comparative Analysis of Neural Networks and Statistical Methods for Predicting Consumer Choice

Guadagni, P. and Little, J. (1983) A Logit Model of Brand Choice Calibrated on Scanner data

Rossi, P. and Allenby, G. (1993) A Bayesian Approach to Estimating Household Parameters

Recommended