Boosted Tree-based Multinomial Logit Model for Aggregated Market Data

Motivation Aggregated Market Multinomial Logit Model Application to Australian Data

Boosted Tree-based Multinomial Logit Model forAggregated Market Data

Jianqiang (Jay) Wang & Trevor Hastie

Hewlett-Packard Labs & Stanford University

Dec 2, 2012

Disclaimer: I, myself, take sole responsibility for any errors and omissions in this presentation.

1 / 16

Hewlett-Packard Labs

HPL Charter:

DELIVER; CREATE; ADVANCE; ENGAGE

Information Analytics Lab:

2 / 16

Statistical Demand Modeling

3 / 16

Pricing and Portfolio Management

Predictive analytics-based PPM decision support system.

2012 INFORMS Revenue Management & Pricing Practice Award.

DemandHow do consumers value products?

Product Selection and PricingWhat products should we offer? What is the right pricing?

Competitive Product SimilarityWhat products are we competing with on the market?

Leveraging IntelligenceCan we infer market intelligence from current prices, andlearn?

4 / 16

Estimating Aggregated Market Demand

Aggregated mobile computer sales data on all brands.

Market sales data reveals customer selection.

Aggregated mobile PC sales.

Brands, country, region, attributes, period, channel, price, volume.

Complexity of model estimation:

40+ different key features (memory, CPU, display, storage, OS, ...).

Price sensitivity varies with attributes, time, and region.

High-dimensional prediction problem.

5 / 16

Discrete Choice Model

Modeling Sales Volume vs Consumer choice (McFadden 1974):

Choice set: products to choose from.

Utility : overall attractiveness given attributes, brand and price.

Better attributes, higher utility; higher price, lower utility.

Challenges:

Sparse selection.

Nonlinearity.

Interactions among (attributes, price).

Semiparametric Multinomial Logit Model (MNL):

Linear MNLs: Train (2003); Semiparametric MNLs: p-splines (Tutz & Scholz 2004).

Flexibly model customers’ valuation without specifying a functional form.

Estimation: Functional gradient boosting with partitioned regression trees as base learners.

6 / 16

Aggregated Market Multinomial Logit Model

Single market with K products; products i = 1, · · · ,K with sales volumn(n1, · · · , nM); latent utilities

ui = fi + εi .

Assuming εiiid∼ standard Gumbel distn, utility maximization leads to

pi =exp(fi )∑Ki=1 exp(fi )

Minimize −2 log (multinomial likelihood):

φ(f) = −2K∑i=1

ni log(g(fi )) + 2N log

K∑i=1

g(fi )

+ const.

g(·) link function, e.g., g(u) = exp(u).

7 / 16

Model Variations

Notation: si – attributes, brand and channel; xi = (1, xi )′, xi – price.

Utility Specifications:

Varying coefficient-MNL (price*attribute interaction):

fi = x′i β(si ).

Partially linear-MNL (price & attribute additive):

fi = β0(si ) + xiβ1.

Nonparametric-MNL:fi = β(si , xi ).

Boosted trees:

Partition the products into homogeneous groups in a way that respects the mean utility function..

Iteratively fits simple trees to explain errors not captured in the previous iteration.

8 / 16

Building Block: VC Trees

Underlying VCM model:

ξi = x′iβ(si ) + εi ,

Piecewise constant approximation:

ξi =M∑

x′iβmI(si∈Cm) + εi ,

M: number of partitions.

{Cm}Mm=1: a partition of the space of si .

Piecewise constant approximation to the unknown high-dimensional function &data-driven partitioning method to obtain homogeneous regression relationships.Algorithm:

Heuristics: greedy algorithm based on binary splits of the space of si (similar to CART).

Splitting criterion: reduction in SSE.

9 / 16

Boosted VC-MNL

Boosted VC-MNL: φ(f) = −2∑K

i=1 ni log(g(x′i β(si ))) + 2N log{∑K

i=1 g(x′i β(si ))}

+ const.

1 Start with naive fit f(0)

= (x′1β(0), · · · , x′K β

(0))′.

2 For b = 1, · · · ,B, repeat:

Compute the “pseudo observations”: ξi = − ∂φ∂fi

∣∣∣f =f (b−1)

Fit ξi on si and xi using the “PartReg” algorithm to obtain partitions (C(b)1 , · · · , C (b)

Let zi = (I(si∈C

), · · · , I

(si∈C(b)M

), xi I

(si∈C(b)1

), · · · , xi I

(si∈C(b)M

))′, and use IRLS to

estimate β(b)

by minimizing

J(β(b)) = −2K∑i=1

{log(g(f

(b−1)i + z′i β

(b)))}

+ 2N log

K∑i=1

g(f(b−1)i + z′i β

.Update the fitted model by f (b) = f (b−1) + ν

∑Mm=1

(b)0m + β

(b)1mxi

(si∈C(b)m )

3 Output the fitted model f = f (B).

10 / 16

Boosted VC-MNL

Start with naive fit: e.g., simple linear MNL.

Begin the iteration process:

Compute pseudo observations/residuals.

Fit an appropriate tree to predict pseudo residualts.

Generate design matrix based on tree partitions, and fit linear MNL model.

Addtive model of trees, not of predictors.

Iteratively fit linear MNL models based on data-driven piecewise constant“bases”.

11 / 16

Mobile Computer Sales in Australia

6 months, 5 states; 30 choice sets (25 training, 5 test); use price residualsinstead of price.

Varying coefficient-MNL:fi = x′i β(si ).

Partially linear-MNL:fi = β0(si ) + xiβ1.

Nonparametric-MNL:fi = β(si , xi ).

0 200 400 600 800 1000

Varying coefficient−MNL, Boosted

Iterations

TrainingTest

0 200 400 600 800 1000

Partially linear, Boosted

Iterations

TrainingTest

0 200 400 600 800 1000

Nonparametric, Boosted

Iterations

TrainingTest

12 / 16

Competitor Method – Elastic Net MNL

Models: fi = x′iβ(si ).

Linear-MNL: linear β(si ).

Quadratic-MNL (first-order interaction).

Quadratic-MNL: Initial features si .

⇒ Quadratic & first-order interaction among si , obtain design matrix zi .

⇒ Linear specification: β0(si ) = ziγ0 and β1(si ) = ziγ1.

Elastic net (Zou & Hastie 2005) MNL:

arg minγ0,γ1

−2K∑i=1

ni log(g(z′i γ0 + (z′i xi )γ1)) + 2N log

K∑i=1

g(z′i γ0 + (z′i xi )γ1)

α∑i,j

|γij | +(1− α)

∑i,j

α = 0: Ridge regression; α = 1: LASSO.

g(·) : link function.

Sparse and stable coefficient estimates, penalized IRLS.

13 / 16

Summary of Results

Utility Optimal R2 Interactions

SpecificationEstimation

Training TestTime (min)

Among attributes

(α = 1) 399 .357 .17 X

Linear(α = 1

2) .419 .379 .48 X

(α = 1)penalized IRLS

.582 .499 76.91 1st -order

Quadratic(α = 1

2) .554 .53 52.78 1st -order

Varying-coef. .734 .697 186.47 (B=1000)

Partially linear boosted trees .493 .455 24.63 (B=1000) 2nd -order (M=4)

Nonparametric .52 .502 23.43 (B=1000)

M – size of each base tree; B– the number of boosting iterations

Nonparametric MNL specifies a larger model space than VC-MNL, but piecewise constant trees fails to find the

particular interactions.

14 / 16

Discussion

Semiparametric MNL models, estimated by boosted tree methods.

Learning from large-scale market data to a) make predictions and b) gaininsights: econometrics & statistical learning.

Statistical questions:

Assessing errors in R2 and coefficient surface.

Split selection in tree partitioning (variable importance).

Model validation & diagnostics (standardized pseudo residuals).

Choice of link functions.

15 / 16

Jianqiang (Jay) Wang

Information Analytics Lab

Hewlett-Packard Labs

jianqiang.jay.wang@hp.com

Thank you very much!

16 / 16

Boosted Tree-based Multinomial Logit Model for Aggregated Market Data

Data & Analytics

Assortment Optimization under the Multinomial Logit · PDF fileAssortment Optimization under the Multinomial Logit Model with Nested Consideration Sets Jacob Feldman School of Operations

Multinomial Logit & Ordered Probit. Multinomial Logit Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain,

A multinomial logit approach to exchange rate policy classification

OPTIMISASI HARGA DENGAN MODEL MULTINOMIAL LOGIT … · perpustakaan.uns.ac.id digilib.uns.ac.id commit to user i OPTIMISASI HARGA DENGAN MODEL MULTINOMIAL LOGIT

Boosted multinomial logit model (working manuscript)

Handleiding Spss Multinomial Logit Regression

Dynamic Assortment Optimization with a Multinomial Logit Choice

A Bayesian Mixed Logit-Probit Model for Multinomial Choice

Multinomial Logit Model - KORA · A Dynamic Random Effects Multinomial Logit Model of Household Car Ownership ... However, they applied an ordered probit model, which is restrictive

Fast Estimation of Multinomial Logit Models: R Package mnlogit

The Generalized Multinomial Logit Model - princeton.eduerp/erp seminar pdfs/papersfall08/keane.pdf · The Generalized Multinomial Logit Model By Denzil G. Fiebig University of New

A Multinomial Logit Model of Mode and Arrival Time - EASTS

A Multinomial Logit Model of College Stopout and Dropout ...ftp.iza.org/dp1634.pdf · A Multinomial Logit Model of College Stopout and Dropout Behavior Leslie S. Stratton Virginia

K7 k8 mpl logit multinomial

The Generalized Multinomial Logit Model

Eﬀect Displays for Multinomial and Proportional-Odds Logit ... · Eﬀect Displays for Multinomial and Proportional-Odds Logit Models John Fox and Robert Andersen1 McMaster University

Markov switching multinomial logit model: an application ... · Markov switching multinomial logit model: an application to accident injury severities Nataliya V. Malyshkina∗, Fred

Model Multinomial Logit Untuk Menentukan Harga Optimal .../Model... · secara individual), penentuan validasi model multinomial logit, penentuan harga optimal masing-masing provider,

S005 - Multinomial Logit Analysis Tutorial

Multinomial Logit Models with Continuous and Discrete

￼Boosted Tree-based Multinomial Logit Model for Aggregated Market Data

Boosted Tree-based Multinomial Logit Model for Aggregated Market Data