Endogeneity - University of California, Berkeley

Chapter 13

Endogeneity

13.1 Overview

So far we have assumed that the explanatory variables that enter adiscrete choice model are independent of the unobserved factors. Inmany situations, however, the explanatory variables are endogeneous,that is, are correlated or otherwise not independent of the unobservedfactors. Examples include:

1. Unobserved attributes of a product can affect its price.

In modeling consumers’ choices among products, it might be impossi-ble to measure all of the relevant attributes of the various products.In the case of cars, for example, the researcher can obtain informationabout the fuel efficiency, length, width, horsepower, weight and manyother attributes of each car that is offered by manufacturers, but at-tributes such as comfort, beauty of the design, smoothness of the ride,handling in curves, expected resale value, prestige, etc., cannot be mea-sured directly. Yet the price of the product can be expected to reflectthese unobserved (i.e., unmeasured) attributes as well as the observedones. There are two reasons that price is affected. First, insofar as theunobserved attributes are costly for the manufacturer, the price of theproduct can be expected to reflect these costs. Second, insofar as theunobserved attributes affect demand for the product, a price that isdetermined by the interaction of demand and supply can be expectedto reflect these differences in demand. The end result is that price iscorrelated with unobserved attributes, rather than independent as we

1

2 CHAPTER 13. ENDOGENEITY

have assumed so far in this book.

2. Marketing efforts can be related to prices.

Advertising and sales promotions, such as coupons and discounts, areubiquitous. Often the marketing practices of firms create a correla-tion between the price of products, which the researcher observes, andnon-price promotional activities, which the researcher generally can-not measure directly. The correlation can go in either direction. Aproduct might be promoted through an advertising blitz coupled withdiscounts. The advertising is then negatively correlated with price:greater advertising occurs simultaneously with lower prices. Alterna-tively, firms can raise the price of their products to pay for the advert-ing, which creates a positive correlation. In either case, the price of theproduct is no longer independent of the unobserved factors affectingconsumers’ choices.

3. Inter-related choices of decision-makers.

In many situations, the observed factors that affect one choice that aperson makes are determined by another choice by that person. Travelmode and housing location are an important example. In a standardmode choice model for the commute to work, the observed explanatoryvariables are usually the cost and time of travel from home to work oneach mode (car, bus, train). However, people who tend to like travelingon public transit (or dislike it less than the average person) might alsotend to buy or rent homes that are near public transit. Travel time bytransit is therefore lower for these people than for people who locatefurther from transit. Stated in terms of the observed and unobservedfactors in the mode choice model, unobserved attitudes towards publictransit, which affect mode choice but cannot be measured completelyby the researcher, are (negatively) correlated with the observed timeof travel by public transit.

In situations such as these, estimation without regard to the correlationbetween observed and unobserved factors is inconsistent. The direc-tion of bias can often be determined logically. For example, if desirable

13.1. OVERVIEW 3

unobserved attributes are positively correlated with price, then esti-mation without regard to this correlation will result in an estimatedprice coefficient that is biased downward in magnitude. The reasonis clear: since higher prices are associated with desirable attributes,consumers avoid the higher-priced products less than they would ifthe higher prices occurred without any compensating change in unob-served attributes. Essentially, the estimated price coefficient picks upboth the price effect (which is negative) and the effect of the desir-able unobserved attributes (which is positive), with the latter mutingthe former. A similar bias, though in the opposite direction, occursif marketing consists of advertising coupled with price discounts. Theincreased demand for marketed products comes from both the lowerprice and the nonprice advertising. The estimated price coefficientpicks up both effects, which is greater than the impact of lower pricesby themselves.

Several methods have been developed to estimate choice modelsin the presence of endogenous explanatory variables. In this chapter,we describe these methods, delineating the advantages and limitationsof each approach. We first describe the BLP approach, developed byBerry, Levinsohn and Pakes (hence the initials) through a series ofpublications. Berry (1994) pointed out that constants can be includedin the choice model to capture the average effect of product attributes(both observed and unobserved). The estimated constants can then beregressed against the observed attributes in a linear regression, whereendogeneity is handled in the usual way by instrumental variables es-timation of the linear regression. Essentially, he showed that the en-dogeneity could be taken out of the choice model, which is inherentlynon-linear, and put into a linear regression model, where endogeneitycan be handled through standard instrumental variables estimation.To apply this method, it is often necessary to estimate a very largenumber of constants in the choice model, which can be difficult us-ing standard gradient-based methods for maximization. To addressthis issue, Berry, Levinsohn, and Pakes (1995) provided a procedure,called “the contraction,” that facilitates estimation of these constants.These two early papers were based on aggregate models, i.e., modelsestimated on aggregate share data. However, the concepts are appli-cable to individual-level choice data or a combination of aggregate andindividual-level data, as utilized in a later paper by Berry, Levinsohnand Pakes (2004). Applications of the BLP approach include Nevo


(2001), Petrin (2002), Goolsbee and Petrin (2004), Chintagunta, Dubeand Goh (2005), and Train and Wilson (2007), to name only a few.A Bayesian form of the procedure has been developed by Yang, Chen,and Allenby (2003) and Jiang, Manchanda, and Rossi (2007).

The second procedure that we describe is the control function ap-proach. The concepts motivating this approach date back to Heckman(1978) and Hausman (1978), though the first use of the term “controlfunction” seems to have been by Heckman and Robb (1985). Endo-geneity arises when observed variables are correlated with unobservedfactors. This correlation implies that the unobserved factors condi-tional on the observed variables do not have a zero mean, as is usuallyrequired for standard estimation. A control function is a variable thatcaptures this conditional mean, essentially “controlling” for the corre-lation. Rivers and Voung (1988) adapted these ideas for handling en-dogeneity in a binary probit model with fixed coefficients, and Petrinand Train (2009) generalized the approach to multinomial choice mod-els with random coefficients. The procedure is implemented in twosteps. First, the endogenous explanatory variable (such as price) is re-gressed against exogenous variables. The estimated regression is usedto create a new variable (the control function) that is entered into thechoice model. The choice model is then estimated with the originalvariables plus the new one, accounting appropriately for the distri-bution of unobserved factors conditional on both this new and theoriginal variables. Applications include Ferreira (2004) and Guervaraand Ben-Akiva (2006).

The third procedure is a full maximum likelihood approach, asapplied by Villas-Boas and Winer (1999) to a multinomial logit withfixed coefficients and generalized by Park and Gupta (2008) to randomcoefficient choice models. The procedure is closely related to the con-trol function approach, in that it accounts for the non-zero conditionalmean of the unobserved factors. However, instead of implementing thetwo steps sequentially (i.e., estimate the regression model to create thecontrol function and then estimate the choice model with this controlfunction), the two steps are combined into a joint estimation criterion.Additional assumptions are required to allow the estimation to be per-formed simultaneously; however, the procedure is more efficient whenthose assumptions are met.

In the sections below, we discuss each of these procedures, followedby a case study using the BLP approach.

13.2. THE BLP APPROACH 5

13.2 The BLP approach

This procedure is most easily described for choice among productswhere the price of the product is endogenous. A set of products aresold in each of several markets, and each market contains numerousconsumers. The attributes of the products vary over markets but notover consumers within each market (that is, all consumers within agiven market face the same products with the same attributes.) Thedefinition of a market depends on the application. The market mightbe a geographical area, as in Goolsbee and Petrin’s (2004) analysis ofhouseholds’ choice among TV options. In this application, the priceof cable TV and the features offered by cable TV (such as number ofchannels) varies over cities, since cable franchises are granted by localgovernments. Alternatively, the market might be defined temporally,as in BLP’s (1995 and 2004) analysis of new vehicle demand, whereeach model-year consists of a set of makes and models of new vehicleswith prices and other attributes. Each year constitutes a market inthis application. If the analysis is over products whose attributes arethe same for all the consumers, then there is only one market, anddifferentiating by market is unnecessary.

13.2.1 Specification

Let M be the number of markets and Jm be the number of optionsavailable to each consumer in market m. Jm is the number of productsavailable in market m plus perhaps, depending on the analysis, theoption of not buying any of the products, which is sometimes called“the outside good”.1 The price of product j in market m is denotedpjm. Some of the nonprice attributes of the products are observed bythe researcher and some are not. The observed nonprice attributes ofproduct j in market m are denoted by vector xjm. The unobservedattributes are denoted collectively as ξjm, whose precise meaning isdiscussed in greater detail below.

1If the outside good is included, then the model can be used to predict the totaldemand for products under changed conditions. If the outside good is not included,then the analysis examines the choice of consumers among products conditional ontheir buying one of the products. The model can be used to predict changes inshares among those consumers who originally purchased the products, but cannotbe used to predict changes in total demand, since it does not include changes in thenumber of consumers who decided not to buy any of the products. If the outsidegood is included, its price is usually considered to be zero.


The utility that consumer n in market m obtains from product jdepends on observed and unobserved attributes of the product. As-sume that utility takes the form:

Unjm = V (pjm, xjm, sn, βn) + ξjm + εnjm

where sn is a vector of demographic characteristics of the consumer,V (·) is a function of the observed variables and the tastes of the con-sumer as represented by βn, and εnjm is iid extreme value. Note thatξjm enters the same for all consumers; in this set-up, therefore, ξjm

represents the average, or common, utility that consumers obtain fromthe unobserved attributes of product j in market m.

The basic issue that motivates the estimation approach is the en-dogeneity of price. In particular, the price of each product dependsin general on all of its attributes, both those that are observed by theresearcher and those that are not measured by the researcher but nev-ertheless affect the demand and/or costs for the product. As a result,price pjm depends on ξjm.

Suppose now that we were to estimate this model without regardfor this endogeneity. The choice model would include price pjm andthe observed attributes xjm as explanatory variables. The unobservedportion of utility, conditional on βn, would be ε∗njm = ξjm+εnjm whichincludes the average utility from unobserved attributes. However, sincepjm depends on ξjm, this unobserved component, ε∗njm, is not indepen-dent of pjm. To the contrary, one would expect a positive correlation,with more desirable unobserved attributes being associated with higherprices.

The BLP approach to this problem is to move ξjm into the observedportion of utility. This is accomplished by introducing a constant foreach product in each market. Let V be the portion of V (·) that variesover products and markets but is the same for all consumers. Let Vbe the portion that varies over consumers as well as markets and prod-ucts. Then V (·) = V (pjm, xjm, β) + V (pjm, xjm, sn, βn), where β areparameters that are the same for all consumers and βn are parametersthat vary over consumers. Note that V does not depend on sn sinceit is constant over consumers.2 It is most natural to think of V asrepresenting the average V in the population; however, it need not.

2It is possible for V to include aggregate demographics, such as average incomefor the market, which varies over markets but not over consumers in each market.However, we abstract from this possibility in our notiation.


All that is required is that V is constant over consumers. Variation inutility from observed attributes around this constant is captured by V ,which can depend on observed demographics and on coefficients thatvary randomly. Utility is then:

Unjm = V (pjm, xjm, β) + V (pjm, xjm, sn, βn) + ξjm + εnjm.

Rearranging the terms, we have:

Unjm = [V (pjm, xjm, β) + ξjm] + V (pjm, xjm, sn, βn) + εnjm.

Note that the term in brackets does not vary over consumers. It isconstant for each product in each market. Denote this constant as:

δjm = V (pjm, xjm, β) + ξjm (13.1)

and substitute it into utility:

Unjm = δjm + V (pjm, xjm, sn, βn) + εnjm. (13.2)

A choice model based on this utility specification does not entail anyendogeneity. A constant is included for each product in each mar-ket, which absorbs ξjm. The remaining unobserved portion of utility,εnjm, is independent of the explanatory variables. The constants areestimated along with the other parameters of the model. Essentially,the term that caused the endogeneity, namely ξjm, has been subsumedinto the product-market constant such that it is no longer part of theunobserved component of utility.

The choice model is completed by specifying how βn varies overconsumers. Denote the density of βn as f(βn | θ), where θ are param-eters of this distribution representing, e.g., the variance of coefficientsaround the common values. Given that εnjm is iid extreme value, thechoice probability is a mixed logit:

Pnim =∫ ⎡

⎣ eδim+V (pim,xim,sn,βn)

∑j eδjm+V (pjm,xjm,sn,βn)

⎤⎦ f(βn | θ) dβn. (13.3)

Usually, V is linear with coefficients βn and explanatory variables thatare the observed attributes, pjm and xjm, interacted perhaps withdemograhics, sn. Other distributions can be specified for εnjm; forexample, Goolsbee and Petrin (2004) assume εnjm is jointly normal


over products, such that the choice probability is a probit. Also, if in-formation is available on the consumers’ ranking or partial ranking ofthe alternatives, then the probability of the ranking is specified analo-gously; for example, Berry, Levinsohn, and Pakes (2004) and Train andWinston (2007) had data on the vehicle that each consumer bought aswell as their second-choice vehicle and represented this partial rank-ing by inserting the exploded logit formula from section 7.3 inside thebrackets of equation (13.3) instead of the standard logit formula.

Estimation of the choice model in (13.3) provides estimates of theconstants and the distribution of tastes. However, it does not provideestimates of the parameters that enter the part of utility that is con-stant over consumers; that is, it does not provide estimates of β inV . These parameters enter the definition of the constants in equation(13.1), which constitutes a regression model that can be used to es-timate the average tastes. It is customary to express V as linear inparameters, such that (13.1) becomes

δjm = β′v(pjm, xjm) + ξjm (13.4)

where v(·) is a vector-valued function of the observed attributes. Aregression can be estimated where the dependent variable is the con-stant for each product in each market and the explanatory variablesare the price and other observed attributes of the product. The errorterm for this regression is ξjm, which is correlated with price. How-ever, procedures for handling endogeneity in linear regression modelsare well developed and are described in any standard econometricstextbook. In particular, regression (13.4) is estimated by instrumentalvariables rather than ordinary least squares. All that is required forthis estimation is that the researcher has, or can calculate, some addi-tional exogenous variables that are used as instruments in lieu of theendogenous price. The selection of instruments is discussed as part ofthe estimation procedure below; however, first we need to address animportant issue that has been implicit in our discussion so far, namely,how to handle the fact that there might be (and usually are) a verylarge number of constants to be estimated, one for each product ineach market.

13.2.2 The contraction

As described above, the constants δjm ∀j, m are estimated along withthe other parameters of the choice model. When there are numerous


products and/or markets, estimation of this large number of constantscan be difficult or infeasible numerically if one tries to estimate themin the standard way. For example, for vehicle choice, there are over 200makes and models of new vehicles each year, requiring the estimationof over 200 constants for each year of data. With 5 years of data, overa thousand constants would need to be estimated. If the proceduresin Chapter 8 were used for such a model, each iteration would entailcalculating the gradient with respect to, say, 1000+ parameters andinverting a 1000+ by 1000+ Hessian; also, numerous iterations wouldbe required since the search is over a 1000+ dimensional parameterspace.

Luckily, we do not need to estimate the constants in the standardway. BLP provided an algorithm for estimating them quickly, withinthe iterative process for the other parameters. This procedure rests onthe realization that constants determine predicted market shares foreach product and therefore can be set such that the predicted sharesequal actual shares. To be precise, let Sjm be the share of consumersin market m who choose product j. For a correctly specified model,the predicted shares in each market should equal these actual shares(at least asymptotically). We can find the constants that enforce thisequality, that is, that cause the model to predict shares that matchthe actual shares. Let the constants be collected into a vector δ =〈δjm ∀j, m〉. The predicted shares are Sjm(δ) =

∑n Pnjm/Nm where

the summation is over the Nm sampled consumers in market m. Thesepredicted shares are expressed as a function of the constants δ becausethe constants affect the choice probabilities which in turn affect thepredicted shares.

Recall that in section 2.8 an iterative procedure was described forrecalibrating constants in a model so that the predicted shares equalthe actual shares. Starting with any given values of the constants,labeled δt

jm ∀j, m, the constants are adjusted iteratively by the formula:

δt+1jm = δt

jm + ln

(Sjm

Sjm(δt)

).

This adjustment process moves each constant in the “right” direction,in the following sense. If, at the current value of the constant, theactual share for a product exceeds the predicted share, then the ra-tio of actual to predicted (i.e., Sjm/Sjm(δt)) is greater than one andln(·) is positive. In this case, the constant is adjusted upward, to raise


the predicted share. When the actual share is below the predictedshare, the ratio is below one and ln(·) < 0, such that the constant isadjusted downward. The adjustment is repeated iteratively until pre-dicted shares equal actual shares (within a tolerance) for all productsin all markets.

This alogorithm can be used to estimate the constants instead ofestimating them by the usual gradient-based methods. The other pa-rameters of the model are estimated by gradient-based methods, andat each trial value of these other parameters (i.e., each iteration in thesearch for the optimizing values of the other parameters), the constantsare adjusted such that the predicted shares equal actual shares at thistrial value. Essentially, the procedure that had been used for manyyears for post-estimation re-calibration of constants is used during es-timation, at each iteration for the other parameters.

Berry (1994) showed that, for any values of the other parametersin the choice model (i.e., of θ), there exists a unique set of constants atwhich predicted shares equal actual shares. Then BLP (1995) showedthat the iterative adjustment process is a contraction, such that it isguaranteed to converge to that unique set of constants. When usedin the context of estimation instead of post-estimation calibration, thealgorithm has come to be known as “the contraction.”

A few additional notes are useful regarding the contraction. First,we defined the shares Sjm above as the “actual” shares. In practice,either aggregate market shares or sample shares can be used. In somesituations, data on aggregate shares are not available or not reliable.Sample shares are consistent for market shares provided that the sam-pling is exogenous. Second, the procedure imposes a constraint orcondition on estimation, namely, that predicted shares equal actualshares. As discussed in section 3.7.1, maximum likelihood estima-tion of a standard logit model with alternative specific constants foreach product in each market necessarily gives predicted shares thatequal sample shares. The condition that predicted shares equal sam-ple shares is therefore consistent with (or more precisely, is a feature of)maximum likelihood on a standard logit. However, for other models,including probit and mixed logit, the maximum likelihood estimatordoes not equate predicted shares with sample shares even when a fullset of constants is included. The estimated constants that are obtainedthrough the contraction are therefore not the maximum likelihood es-timates. Nevertheless, since the condition holds asymptotically for a


correctly specified model, imposing it seems reasonable.

13.2.3 Estimation by MSL and IV

There are several ways that the other parameters of the model (i.e, θand β) can be estimated. The easiest procedure to conceptualize is thatused by Goolsbee and Petrin (2004) and Train and Winston (2007). Inthese studies, the choice model in equation (13.3) is estimated first, us-ing maximum simulated likelihood (MSL) with the contraction. Thisstep provides estimates of the parameters that enter equation (13.2),namely the constants δjm ∀j, m and the parameters θ of the distribu-tion of tastes around these constants. The contraction is used for theconstants, such that the maximization of the log-likelihood function isonly over θ.

To be more precise, since the choice probabilities depend on bothδ and θ, this dependence can be denoted functionally as Pnjm(δ, θ).However, for any given value of θ, the constants δ are completely de-termined: they are the values that equate predicted and actual shareswhen this value of θ is used in the model. The calibrated constantscan therefore be considered a function of θ, denoted δ(θ). Substitut-ing into the choice probability, the probability becomes a function ofθ alone: Pnjm(θ) = Pnjm(δ(θ), θ). The log-likehood function is alsodefined as a function of θ: with in denoting n’s chosen alternative, thelog-likelihood function is LL(θ) =

∑n lnPninm(θ), where δ is recalcu-

lated appropriately for any θ. As such, the estimator is MSL subject tothe constraint that predicted shares equal actual shares (either marketor sample shares, whichever are used.)3

Once the choice model is estimated, the estimated constants areused in the linear regression (13.4), which we repeat here for conve-nience:

δjm = β′v(pjm, xjm) + ξjm

The estimated constants from the choice model are the dependent vari-able in this regression, and the price and other observed attributes ofthe products are the explanatory variables. Since price is endogenous

3From a programming perspective, the maximization entails iterations withiniterations. The optimization procedure iterates over values of θ in searching for themaximum of the log-likelihood function. At each trial value of θ, the contractioniterates over values of the constants, adjusting them until predicted shares equalactual shares at that trial value of θ.


in this regression, it is estimated by instrumental variables instead ofordinary least squares. The instruments include the observed nonpriceattributes of the products, xjm, plus at least one additional instrumentin lieu of price. Denoting all the instruments as the vector zjm, theinstrumental variables estimator is the value of β that satisfies:

∑j

∑m

[δjm − β′v(pjm, xjm)]zjm = 0

where δjm is the estimated constant from the choice model. We canrearrange this expression to give the estimator in closed form, like it isusually shown in regression textbooks and as given in section 10.2.2:

ˆβ = (∑j

∑m

zjmv(pjm, xjm)′)−1(∑j

∑m

zjmδjm)

If the researcher chooses, efficiency can be enhanced by taking the co-variance among the estimated constants into account, through gener-alized least squares (GLS); see e.g., Greene (2000), on GLS estimationof linear regression models.

The issue necessarily arises of what variables to use for instruments.It is customary, as already stated, to use the observed non-price at-tributes as instruments under the assumption that they are exoge-nous.4 BLP (1994) suggested instruments that are based on pricingconcepts. In particular, each manufacturer will price each of its prod-ucts in a way that takes consideration of substitution with its otherproducts as well as substitution with other firms’ products. For exam-ple, when a firm is considering a price increase for one of its products,consumers who will switch away from this product to another of thesame firm’s products do not represent as much of a loss (and might,depending on profit margins, represent a gain) as consumers who willswitch to other firms’ products. Based on these ideas, BLP proposedtwo instruments: the average nonprice attributes of other productsby the same manufacturer, and the average nonprice attributes ofother firms’ products. For example, in the context of vehicle choicewhere, say, vehicle weight is an observed attribute, the two instru-ments for the Toyota Camry in a given year are (1) the average weight

4This assumption is largely an expediant, since in general one would expectthe unobserved attributes of a product to be related not only to price but to theobserved non-price attributes. However, a model in which all observed attibutesare treated as endogenous leaves little to use for instruments.


of all other makes and models of Toyota vehicles in that year, and(2) the average weight of all non-Toyota vehicles in that year.5 Trainand Winston (2007) used an extension of these instruments that re-flects the extent to which each product differs from other productsby the same and different firms. In particular, they took the sum ofsquared differences between the product and each other product bythe same and other firms. These two instruments for product j inmarket m are z1

jm =∑

k∈Kjm(xjm − xkm)2 where Kjm is the set of

products offered in market m by the firm that produced product j,and z2

jm =∑

k∈Sjm(xjm − xkm)2, where Sjm is the set of products of-

fered in market m by all firms except the firm that produced productj.

In other contexts, other instruments are appropriate. Goolsebeeand Petrin (2004) examined households’ choices among TV options,with each city constituting a market with different prices and nonpriceattributes of cable and over-the-air TV. Following the practice sug-gested by Hausman (1997), they used the prices in other cities by thesame company as instuments for each city. This instrument for citym is zjm =

∑m′∈Km

pjm′ , where Km is the set of other cities that areserved by the franchise operator in city m. The motivating concept isthat the unobserved attributes of cable TV in a given city (such as thequality of programming for that city’s franchise) are correlated withthe price of cable TV in that city but are not correlated with the priceof cable TV in other cities. They also included the city-imposed cablefranchise fee (i.e., tax) and the population density of the city.

13.2.4 Estimation by GMM

As stated above, several methods can be used for estimation, not justmaximum likelihood. BLP (1995, 2004), Nevo (2001), and Petrin(2002) utilized a generalized method of moments (GMM) estimator.This procedure is a generalized version of the method of simulated mo-ments (MSM) described in Chapter 10, augmented by moments for theregression equation. Moment conditions are created from the choice

5Instead of averages, sums can be used, with the number of products by thesame and by other firms (i.e., the denominators in the averages) also entering asinstruments.


probabilities as:∑n

∑j

(dnjm − Pnjm)znjm = 0 (13.5)

where dnjm is the dependent variable, which is 1 if consumer n inmarket m chose alternative j and 0 otherwise, and znjm is a vector ofinstuments that vary over products and markets (such as the observednonprice attributes and functions of them) as well as over consumersin each market (such as demograhic variables interacted with non-price attributes.) In estimation, the exact probabilities are replacedwith their simulated version. Note that this is the same formula as insection 10.2.2, on MSM estimation of choice models. These momentconditions, when satisfied, imply that the observed mean of the in-struments for the chosen alternatives,

∑n

∑j dnjmznjm/Nm, is equal

to the mean predicted by the model,∑

n

∑j Pnjmznjm/Nm. Note that,

as described in section 3.7, for a standard logit model, these momentconditions are the first order condition for maxmimum likelihood es-timation, where the instruments are the explanatory variables in themodel. In other models, this moment condition is not the same as thefirst-order condition for maximum likelihood, such that there is a lossof efficiency. However, as described in Chapter 10 in comparing MSLwith MSM, simulation of these moments is unbiased given an unbiasedsimulator of the choice probability, such that MSM is consistent for afixed number of draws in simulation. In contrast, MSL is consistentonly when the number of draws is considered to rise with sample size.

Moment conditions for the regression equation are created as:∑j

∑m

ξjmzjm = 0

where zjm are instruments that vary over products and markets but notover people in each market (such as the observed nonprice attributesand functions of them.) These moments can be rewritten to includethe parameters of the model explicitly, assuming a linear specificationfor V : ∑

j

∑m

[δjm − β′v(pjm, xjm)]zjm = 0 (13.6)

As discussed in the previous section, these moments define the stan-dard instrumental variables estimator of the regression coefficients.


The parameters of the system are β, capturing elements of pref-erences that are the same for all consumers, and θ, representing vari-ation in preferences over consumers. The constants δjm can also beconsidered parameters, since they are estimated along with the otherparameters. Alternatively, they can be considered functions of theother parameters, as discussed in the previous subsection, calculatedto equate predicted and actual shares at any given values of the otherparameters. For the distinctions in the next paragraph, we considerthe parameters to be β and θ, without the constants. Under this ter-minology, estimation proceeds as follows.

If the number of moment conditions in (13.5) and (13.6) combinedis equal to the number of parameters (and have a solution), then theestimator is defined as the value of the parameters that satisfy all themoment conditions. The estimator is the MSM described in Chapter 10augmented with additional moments for the regression. If the numberof moment conditions exceeds the number of parameters, then no setof parameter values can satisfy all of them. In this case, the numberof independent conditions is reduced through the use of a generalizedmethod of moments (GMM).6 The GMM estimator is described mosteasily by defining observation-specific moments:

g1njm = (dnjm − Pnjm)znjm

which are the terms in (13.5) and

g2jm = [δjm − β′v(pjm, xjm)]zjm

which are the terms in (13.6). Stack these two vectors into one, gnjm =〈g1

njm, g2jm〉, noting that the second set of moments is repeated for

each consumer in market m. The moment conditions can then bewritten succinctly as g =

∑n

∑j gnjm = 0. The GMM estimator is the

parameter value that minimizes the quadratic form g′Θ−1g where Θ isa positive definite weighting matrix. The asymptotic covariance of gis the optimal weighting matrix, calculated as

∑n

∑j gnjmg′njm. Ruud

(2000, Ch. 21) provides a useful discussion of GMM estimators.

6When simulated probabilities are used in the conditions, the procedure canperhaps more accurately be called a generalized method of simulated moments(GMSM) estimator. However, I have never seen this term used; instead, GMM isused to denote both simulated and unsimulated moments.


13.3 Supply side

The supply side of a market can be important for several reasons. First,insofar as market prices are determined by the interaction of demandand supply, forecasting of market outcomes under changed conditionsrequires an understanding of supply as well as demand. An importantexample arises in the context of antitrust analysis of mergers. Priorto a merger, each firm sets the prices of its own product so as tomaximize its own profits. When two firms merge, they set the pricesof their products to maximize their joint profit, which can, and usuallydoes, entail different prices than when the firms competed with eachother. A central task of the Department of Justice and the FederalTrade Commission when deciding whether to approve or disallow amerger is to forecast the impact of the merger on prices. This taskgenerally entails modeling the supply side, namely the marginal costsand the pricing behavior of the firms, as well as the demand for eachproduct as a function of price.

Second, firms in many contexts can be expected to set their pricesnot just at marginal cost or some fixed markup over marginal cost, butalso on the basis of the demand for their product and the impact ofprice changes on their demand. In these contexts, the observed pricescontain information about the demand for products and price elas-ticities. This information, if correctly extracted, can be used in theestimation of demand parameters. The researcher might, therefore,choose to examine the supply side as an aspect of demand estima-tion, even when the researcher’s objectives do not entail forecastingthe supply side per se.

In the paragraphs below, we describe several types of pricing be-havior that can arise in markets and show how the specification ofthis behavior can be combined with the demand estimation discussedabove. In each case, some assumptions about the behavior of firmsare required. The researcher must decide, therefore, whether to incor-porate the supply side into the analysis under these assumptions orestimate demand without the supply side, using the methods in theprevious section. The researcher faces an inherent tradeoff. Incorpo-ration of the supply side has the potential to improve the estimates ofdemand and expand the use of the model. However, it entails assump-tions about pricing behavior that might not hold, such that estimatingdemand without the supply side might be safer.

13.3. SUPPLY SIDE 17

13.3.1 Marginal cost

A basic tenet of economic theory is that prices depend on marginal cost.The marginal cost of a product depends on its attributes, including theattributes that are observed by the researcher, xjm, as well as thosethat are not, ξjm. Marginal cost also depends on input prices, suchas rents and wages, and other “cost shifters.” The researcher observessome of the cost shifters, labeled cjm, and does not observe others. Itis often assumed that marginal cost is separable in unobserved terms,such that it takes the form a regression:

MCjm = W (xjm, cjm, γ) + μjm (13.7)

where W (·) is a function with parameters γ. The error term in thisequation, μjm, depends in general on unobserved attributes ξjm andother unobserved cost shifters.

Marginal costs relate to prices differently, depending on the pricingmechanism that is operative in the market. We examine the prominentpossibilities below.

13.3.2 Marginal cost pricing

In perfectly competitive markets, the price of each product is drivendown, in equilibrium, to its marginal cost. Pricing in a competitivemarket under separable marginal costs is therefore:

pjm = W (xjm, cjm, γ) + μjm (13.8)

This pricing equation can be used in conjunction with either of thetwo ways described above for estimating the demand system, each ofwhich we discuss below.

MSL/IV with MC pricing

Suppose that the method in section (13.2.3) is used, i.e., that thechoice model is estimated by maximum likelihood and then the esti-mated constants are regressed against product attributes using instru-mental variables to account for the endogeneity of price. With thisestimation, the variables that enter the MC pricing equation are theappropriate instruments to use in the instrumental variables. Any ob-served cost-shifters cjm serve as instruments along with the observed


non-price attributes xjm. In this set-up, the pricing equation simplyprovides information about the appropriate instruments. Since thepricing equation does not depend on demand parameters, it providesno information, other than the instruments, that can improve the es-timation of demand parameters.

The researcher might want to estimate the pricing equation eventhough it does not include demand parameters. For example, the re-searcher might want to be able to forecast prices and demand underchanges in cost shifters. In this case, equation (13.8) is estimated byordinary least squares. The estimation of demand and supply thenconstitutes three steps: (1) estimate the choice model by MSL, (2)estimate the regression of constants on observed product attributes byinstrumental variables, and (3) estimate the pricing equation by ordi-nary least squares. In fact, if the variables that enter the pricing equa-tion are the only instruments, then the pricing equation is estimatedimplicitly as part of step (2) even if the researcher does not estimatethe pricing equation explicitly in stage (3). Recall from standard re-gression textbooks that instrumental variables estimation is equivalentto two-stage least squares. In our context, the instrumental variablesin step (2) can be accomplished in two stages: (i) estimate the priceequation by ordinary least squares and use the estimated coefficients topredict price, and then (ii) regress the constants on product attributesby ordinary least squares using the predicted price as an explanatoryvariable instead of the observed price. The overall estimation processis then: (1) estimate the choice model, (2) estimate the pricing equa-tion by ordinary least squares, and (3) estimate the equation for theconstants by ordinary least squares using predicted prices instead ofactual prices.

GMM with MC pricing

If GMM estimation is used, then the explanantory variables in thepricing equation become instruments zjm = 〈xjm, cjm〉, which enter themoments given in equations (13.5) and (13.6). The pricing equationprovides additional moment conditions based on these instruments:∑

j

∑m

(pjm − W (xjm, cjm, γ))zjm = 0

The demand parameters can be estimated without these additionalmoments. Or these additional moments can be included in the GMM


estimation, along with those defined in (13.5) and (13.6). The estima-tor is the value of the parameters of the demand model and pricingequation that minimizes g′Θ−1g where now g includes the momentsfrom the pricing equation and Θ includes their covariance.

13.3.3 Fixed markup over marginal cost

Firms might price at some fixed markup over marginal cost, with thismarkup not depending on demand. This form of pricing is consideredby business people to be ubiquitous. For example, Shim and Sudit(1995) report that this form of pricing is used by over 80 percent ofmanagers at manufacturing firms. Of course, it is difficult to knowhow to interpret managers’ statements about their pricing, since eachmanager can think they set their prices at a fixed markup over marginalcost and yet the size of the “fixed” markup can be found to varyover managers in relation to demand for the different manufacturers’products.

This form of pricing has the same implications for demand estima-tion as marginal cost pricing. Price is pjm = kMCjm for some constantk. The pricing equation is the same as above, equation (13.8), but withthe coefficients now incorporating k. All other aspects of estimation,using either MSL/IV or GMM, are the same as with marginal costpricing.

13.3.4 Monopoly pricing and Nash equilibrium for single-product firms

Consider a situation in which each product is offered by a separatefirm. If there is only one product in the market, then the firm is amonopolist. The monopolist is assumed to set its price to maximizeprofits, holding all other prices in the economy fixed. If there are mul-tiple products, then the firms are oligopolists. We assume that eacholigopolist maximizes its profits given the prices of the other firms.This assumption implies that each oligopolist utilizes the same profit-maximization condition as a monopolist, holding all other prices (in-cluding its rivals’ prices) fixed. Nash equilibrium occurs when no firmis induced to change its price given the prices of other firms.

Let pj and qj be the price and quantity of product j, where thesubscript for markets is omitted for convenience since we are examiningthe behavior of firms in a given market. The firm’s profits are πj =


pjqj − TC(qj) where TC is total cost. Profits are maximized when:

dπj

dpj= 0

d(pjqj)dpj

− MCjdqj

dpj= 0

qj + pjdqj

dpj= MCj

dqj

dpj

pj + qj(dqj

dpj)−1 = MCj (13.9)

pj + pjqj

pj(dqj

dpj)−1 = MCj

pj + (pj/ej) = MCj

where MCj is the marginal cost of product j and ej is the elasticityof demand for product j with respect to its price. This elasticitydepends on all prices, because the elasticity for a product is differentat different prices (and hence different quantities) for that productas well as other products. Note that the elasticity is negative, whichimplies that pj +pj/ej is less than pj such that price is above marginalcost, as expected.7 Substituting the specification in (13.7) for MC, andadding subscripts for markets, we have:

pjm + (pjm/ejm) = W (xjm, cjm, γ) + μjm (13.10)

which is the same as the pricing equation under marginal cost pricing,(13.8), except that the left-hand-side has (pjm/ejm) added to price.

The marginal cost parameters can be estimated after the demandparameters. In the context of MSL and IV, the process consists ofthe same three steps as under marginal cost pricing: (1) estimate thechoice model by MSL, (2) estimate the regression of constants on priceand other attributes by instrumental variables, and (3) estimate thepricing equation by ordinary least squares. The only change is thatnow the dependent variable in the pricing equation is not price itself,but pjm + (pjm/ejm). This term includes the elastiticy of demand,which is not observed directly. However, given estimated parameters

7The condition can be rearranged to take the form that is often used for monop-olists and one-product oligopolists: (pj − MCj)/pj = −1/ej , where the markup asa percent of price is inversely related to the magnitude of the elasticity.


of the demand model from steps (1) and (2), the elasticity of demandcan be calculated and used for step (3).

The fact that the pricing equation includes the elasticity of de-mand implies that observed prices contain information about demandparameters. The sequential estimation procedure just described doesnot utilize this information. In the context the of GMM, utilizing thisextra information is straightforward, at least conceptually. Additionalmoment conditions are defined from the pricing equation as:

∑j

∑m

(pjm + (pjm/ejm) − W (xjm, cjm, γ))zjm = 0

The only difference from the procedure used under marginal cost pric-ing is that now the moment conditions include the additional term,(pjm/ejm). At each trial value of the parameters in the GMM estima-tion, the elasticity is calculated and inserted into this moment.

13.3.5 Monopoly pricing and Nash equilibrium for multi-product firms

We now generalize the analysis of the previous section to allow eachfirm to sell more than one product in a market. If there is only one firmthat offers all the products in the market, that firm is a multi-productmonopolist. Otherwise, the market is an oligopoly with multi-productfirms. The market for new vehicles is a prominent example, whereeach manufacturer, such as Toyota, offers numerous makes and modelsof vehicles. The pricing rule differs from the situation with only oneproduct per firm because now each firm must consider the impact ofits price for one product on the demand for its other products. Wemake the standard assumption, as before, that each firm prices so asto maximize profits given the prices of other firms’ products.

Consider a firm that offers a set K of products. The firm’s profitsare π =

∑j∈K pjqj − TC(qj ∀j ∈ K). The firm chooses the price of

product j ∈ K that maximizes its profits:

dπ/dpj = 0

qj +∑k∈K

(pk − MCk)(dqk/dpj) = 0

Analogous conditions apply simultaneously for all the firm’s products.The conditions for all the firm’s products can be expressed in a simple


form by use of matrix notation. Let the prices of the K products bestacked in vector p, quantities in vector q, and marginal costs in vectorMC. Define a KxK matrix of derivatives of demand with respectto price: D where the i, j-th element is (dqj/dpi). Then the profitmaximizing prices for the firm’s products satisfy

q + D(p − MC) = 0D−1q + p − MC = 0

p + D−1q = MC

Note that this last equation is just a generalized version of the one-product monopolist’s rule given in (13.9). It is handled the same wayin estimation. With MSL/IV estimation, it is estimated after the de-mand model, using the demand parameters to calculate D−1. WithGMM, the pricing equation is included as an extra moment condition,calculating D−1 at each trial value of the parameters.

13.4 Control Functions

The BLP approach is not always applicable. If observed shares forsome products in some markets are zero, then the BLP approach can-not be implemented, since the constants for these product-markets arenot identified. (Any finite constant gives a strictly positive predictedshare, which exceeds the actual share of zero.) An example is Martin’s(2008) study of consumers’ choice between incandescent and compactfluorescent light bulbs (CFLs), where advertising and promotions oc-cured on a weekly basis and varied over stores, and yet it was commonfor a store not to sell any CFLs in a given week. Endogeneity can alsoarise over the decision-makers themselves rather than over markets (i.e.groups of decision-makers), such that the endogeneity is not absorbedinto product-market constants. For example, suppose people who likepublic transit choose to live near transit such that transit time in theirmode choice is endogenous. Constants cannot be estimated for eachdecision-maker, since the constants would be infinity (for the chosenalternative) and negative infinity (for the nonchosen alternatives), per-fectly predicting the choices and leaving no information for estimationof parameters. Even when the BLP approach can be implemented,the researcher might want to avoid the complication of applying thecontraction that is usually needed in the BLP aproach.

13.4. CONTROL FUNCTIONS 23

Alternatives that have been implemented include the control func-tion approach, which we discuss in this section, and a more highlyspecified version of the control function approach that utilizes full-information maximum likelihood, which we discuss in the followingsection.

The set-up for the control function approach follows quite closelythe specification of simultaneous equation regression models. However,since choice models are nonlinear, an additional layer of complicationmust be addressed, and this complication can restrict the applicablityof the method more than would be immediately expected given theanalogy to linear models.

Let the endogenous explanatory variable for decision-maker n andalternative j be denoted as ynj . We do not differentiate markets, asin the BLP approach, because we allow the the possibility that theendogenous variables vary over decision-makers rather than just overmarkets. The endogenous variable might be price, transit time, orwhatever is relevant in a given application. In the sections below, wediscuss issues that can arise when the endogenous variable is price.

The utility that consumner n obtains from product j is expressedas

Unj = V (ynj , xnj , βn) + εnj (13.11)

where xnj are observed exogenous variables relating to person n andproduct j (including observed demograhics). The unobserved term εjm

is not independent of ynj as required for standard estimation. Let theendogenous explanatory variable be expressed as a function of observedinstruments and unobserved factors:

ynj = W (znj , γ) + μnj (13.12)

where εnj and μnj are independent of znj but μnj and εnj are corre-lated. The correlation between μnj and εnj implies that ynj and εnj

are correlated, which is the motivating concern. We assume for ourinitial discussion that μnj and εnj are independent for all k �= j.

Consider now the distribution of εnj conditional on μnj . If thisconditional distribution takes a convenient form, then a control func-tion approach can be used to estimate the model. Decompose εnj

into its mean conditional on μnj and deviations around this mean:εnj = E(εnj | μnj) + εnj . By construction, the deviations are notcorrelated with μnj and therefore not correlated with ynj . The con-


ditional expectation is a function of μnj (and perhaps other vari-ables); it is called the control function and denoted CF (μnj , λ), whereλ are the parameters of this function. The simplest case is whenE(εnj | μnj) = λμnj , such that the control function is simply μnj

times a coefficient to be estimated. We discuss motivations for var-ious control functions below. Substituting the conditional mean anddeviations into the utility equation, we have:

Unj = V (ynj , xnj , βn) + CF (μnj , λ) + εnj . (13.13)

The choice probabilities are derived from the conditional distributionof the deviations εnj . Let εn = 〈εnj ∀j〉 and μn = 〈μnj ∀j〉. The condi-tional distribution of εn is denoted g(εn | μn), and the distribution ofβn is f(βn | θ). The choice probability is then

Pnj = Prob(Unj > Unk ∀k �= j)

=∫ ∫

I(Vnj + CFnj + εnj > Vnk + CFnk + εnk ∀k �= j)

g(εn | μn)f(βn | θ)dεdβn (13.14)

where these abbreviations are used:

Vnj = V (pnj , xnj , βn)CFnj = CF (μnj , λ)

This is a choice model just like any other, with the control functionentering as an extra explanatory variable. Note that the inside integralis over the conditional distribution of ε rather than the original ε. Byconstruction, ε is not correlated with the endogenous variable, whilethe original ε was correlated. Essentially, the part of ε that is correlatedwith ynj is entered explicitly as an extra explanatory variable, namelythe control function, such that the remaining part is not correlated.

The model is estimated in two steps. First, equation (13.12) is esti-mated. This is a regression with the endogenous variable as the depen-dent variable and with exogenous instruments serving as explanatoryvariables. The residuals for this regression provide estimates of theμnj ’s. These residuals are calculated as: μnj = ynj − W (znj , γ) usingthe estimated parameters γ . Second, the choice model is estimatedwith μnj entering the control function. That is, the choice probabili-ties in (13.14) are estimated by maximum likelihood, with μnj and/ora parametric function of it entering as extra explanatory variables.


The central issue with the control function approach is the specifi-cation of the control function and the conditional distribution of εn. Insome situations, there are natural ways to specify these elements of themodel. In other situations, it is difficult, or even impossible, to specifythem in a way that meaningfully represents reality. The applicabilityof the approach depends on the researcher being able to meaningfullyspecify these terms.

Some examples of control function specifications are as follow:1. Let

Unj = V (ynj , xnj , βn) + εnj

ynj = W (znj , γ) + μnj

and assume that εnj and μnj are jointly normal with zero mean andconstant covariance matrix for all j. By the properties of normals,the expectation of εnj conditional on μnj is λμnj , where λ reflectsthe covariance, and deviations around the mean, εnj , are normal withconstant variance. In this case, the control function is CF (μnj , λ) =λμnj . Utility is:

Unj = V (ynj , xnj , βn) + λμnj + εnj .

The choice model is probit with the residual from the price equationas an extra variable. Note however that the variance of εnj differsfrom the variance of εnj , such that the scale in the estimated probit isdifferent from the original scale. If βn is random, then the model is amixed probit.

2. Suppose εnj consists of a normally distributed part that is cor-related with ynj and a part that is iid extreme value. In particular,let:

Unj = V (ynj , xnj , βn) + ε1nj + ε2

nj

ynj = W (znj , γ) + μnj

where ε1nj and μnj are jointly normal and ε2

nj is iid extreme value. Theconditional distribution of ε1

nj is, as in the previous example, normalwith mean λμnj and constant variance. However, the conditional dis-tribution of ε2

nj is the same as its unconditional distribution since ε2nj

and μnj are independent. Utility becomes

Unj = V (ynj , xnj , βn) + λμnj + ε1nj + ε2

nj .


where ε1nj is normal with zero mean and constant variance. This error

component can be expressed as ε1nj = σηnj where ηnj is standard

normal. Utility becomes:

Unj = V (ynj , xnj , βn) + λμnj + σηnj + ε2nj .

The choice probability is a mixed logit, mixed over the normal errorcomponents σηnj ∀j as well as the random elements of βn. The stan-dard deviation, σ, of the conditional errors is estimated, unlike in theprevious example.

3. The notation can be generalized to allow for correlation betweenεnj and μnk for k �= j. Let εnj = ε1

nj + ε2nj as in the previous example,

except that now we assume that the stacked vectors ε1n and μn are

jointly normal. Then ε1n conditional on μn is normal with mean Mμn

and variance Ω where M and Ω are matrices of parameters. Stackingutilities and the explanatory functions, we have

Un = V (yn, xn, βn) + Mμn + Lηn + ε2n

where L is the lower-triangular Choleski factor of Ω and ηn is a vector ofiid standard normal deviates. Since the elements of ε2

n are iid extremevalue, the model is mixed logit, mixed over error components ηn andthe random elements of βn. The residuals for all products enter theutility for each product.

13.4.1 Relation to pricing behavior

As stated above, the primary limitation of the control function ap-proach is the need to specify the control function and the conditionaldistribution of the new unobserved term ε. In some situations, the trueconditional distribution is so complex that it cannot be derived andany specification using standard distributions, like normal, is necessar-ily incorrect. These issues are explained most readily in relation to thepricing behavior of firms, where price is the endogenous variable, butthey can arise under any type of endogenous variable, depending onthe way that the endogenous variable is determined.

Let the choice be among products and the endogenous variable ynj

be price pnj , so that we can discuss alternative pricing behaviors andwhether they can be accommodated in the control function approach.Consider first a situation where the control function can be readily


applied: marginal cost pricing. The utility that consumer n obtainsfrom product j is:

Unj = V (pnj , xnj , βn) + ε1nj + ε2

nj

where ε1nj is correlated with price and ε2

nj is iid extreme value. Forexample, ε1

nj might represent unobserved attributes of the product.Prices vary over people because different people are in different mar-kets or prices are set separately for people or groups of people (e.g., toaccount for transportation costs to each customer.) Suppose furtherthat firms set prices at marginal cost, which is specified as MCnj =W (znj , γ)+μnj where znj are exogenous variables that affect marginalcost (including the observed attributes of the product) and μnj cap-tures the effect of unobserved cost shifters (including the unobservedattributes of the product). The price equation is then:

pnj = W (znj , γ) + μnj

Assume that ε1nj and μnj are jointly normal with the same covariance

matrix for all j. Correlation might arise, e.g., because unobservedattributes affect utility as well as costs, thereby entering both ε1

nj andμnj . As in the second example above, utility becomes:

Unj = V (pnj , xnj , βn) + λμnj + σηnj + ε2nj

where ηnj is iid standard normal. The model is estimated in two steps.First, the pricing equation is estimated and its residuals, μnj are re-tained. Second, the choice model is estimated with these residualsentering as explanatory variables. The model is a mixed logit, mixedover the new error components ηnj . The same specification is appli-cable if firms price at a constant markup over marginal cost. Withthis pricing behavior, the pricing equation is sufficiently simple thatreasonable assumptions on the observed terms in the model give a con-ditional distribution for unobserved terms in utility that can be derivedand is convenient.

Consider, now, monopoply pricing or Nash equilibrium where pricedepends on the elasticity of demand as well as marginal cost. Asshown above in relation to the BLP approach, the pricing equation fora single-product monopolist or Nash oligopolists is:

pnj + (pnj/enj) = MC

pnj = −(pnj/enj) + W (znj , γ) + μnj


Suppose now that ε1nj and μnj are jointly normal. The distribution of

ε1nj conditional on μnj is still normal. However, μnj is no longer the

only error in the pricing equation, and so we cannot obtain an estimateof μnj to condition on. Unlike marginal cost pricing, the unobservedcomponent of demand, ε1

nj , enters the pricing equation through theelasticity. The pricing equation includes two unobserved terms: μnj

and a highly nonlinear transformation of ε1nj entering through enj .

If we rewrite the pricing equation in a form that can estimated,with an observed part and a separable error, we would have:

pnj = Zj(znj , γ) + u∗nj

where zn is a vector of all of the observed exogenous variables thataffect marginal cost and the elasticity, Zj(·) is a parametric functionof these variables, and u∗

nj is the unobserved deviations of price aroundthis function. We can estimate this equation and retain its residual,which is an estimate of u∗

nj . However, u∗nj is not μnj ; rather u∗

nj in-corporates both μnj and the unobserved components of the elasticity-based markup pnj/enj . The distribution of ε1

nj conditional on this u∗nj

has not been derived, and may not be derivable. Given the way thatε1nj enters the pricing equation through the elasticity, its conditional

distribution is certainly not normal if its unconditional distribution isnormal. In fact, its conditional distribution is not even independent ofthe exogenous variables.

Villas-Boas (2008) has proposed an alternative direction of deriva-tion in this situation. He points out that the difficulty that we en-countered above in specifying an appropriate control function arisesfrom the assumption that marginal costs are separable in unobservedterms, rather than from the assumption that prices are related to elas-ticities. Let us assume instead that marginal cost is a general functionof observed and unobserved terms: MC = W ∗(znj , γ, μnj). In manyways, the assumption of non-separable unobserved terms is more re-alistic, since unobserved cost shifters can be expected in general tointeract with observed cost shifters. Under this general cost function,Villas-Boas shows that for any specification of the control functionand distribution of ε1

nj conditional on this control function, there ex-ists a marginal cost function W ∗(·) and distribution of unobservedterms μnj and ε1

nj that are consistent with them. This result impliesthat the researcher can apply the control function approach even whenprices depend on elasticities, knowing that there is some marginal cost

13.5. MAXIMUM LIKELIHOOD APPROACH 29

function and distribution of unobserved terms that make the approachconsistent. Of course, this existence proof does not provide guidanceon what control function and conditional distribution are most reason-able, which must still remain an important issue for the researcher.

13.5 Maximum likelihood approach

The maximum likelihood approach is similar to the control functionapproach except that the parameters of the model are estimated si-multaneously rather than sequentially. As with the control functionapproach, utility is given in equation (13.13) and the endogenous ex-planatory variable is specified by (13.12). To allow more compactnotation, stack each of the terms over alternatives, such that the twoequations are:

Un = V (yn, xn, βn) + εn (13.15)

yn = W (zn, γ) + μn. (13.16)

Rather than specifying the conditional distribution of εn given μn, theresearcher specifies their joint distribution, denoted g(εn, μn). Fromequation (13.16), μn can be expressed as a function of the data andparameters: μn = yn − W (zn, γ). So the joint distribution of εn andyn is g(εn, yn − W (zn, γ)). Denote the chosen alternative as i. Theprobability of the observed data for person n is the probability thatthe endogenous explanatory variable takes the value yn and that alter-native i is chosen. Conditonal on βn, this probability is:

Pn(βn) =∫

I(Uni > Unj ∀j �= i)g(εn, yn − W (zn, γ))dεn

If βn is random, then Pn(βn) is mixed over its distribution. Theresulting probability Pn is inserted into the log-likelihood function:LL =

∑n ln(Pn). This LL is maximized over the parameters of the

model. Instead of estimating (13.16) first and using the residuals inthe choice probability, the parameters of (13.16) and the choice modelare estimated simultaneously.

The third example of the control function approach (which is themost general of the examples) can be adapted to this maximum likeli-hood procedure. Stacked utility is

Un = V (yn, xn, βn) + ε1n + ε2

n


with the stacked endogenous variable:

yn = W (zn, γ) + μn.

where W (·) is now vector-valued. Each element of ε2n is assumed to be

iid extreme value. Assume that ε1n and μn are jointly normal with zero

mean and covariance Ω. Their density is denoted φ(ε1n, μn | Ω). The

probability that enters the log-likelihood function for person n whochose alternative i is:

Pn =∫ ∫

eV (yni,xni,βn)+ε1ni∑

eV (ynj ,xnj ,βn)+ε1nj

φ(εn, (yn−W (zn, γ)) | Ω)f(βn | θ)dεndβn

This probability is inserted into the log-likelihood function, which ismaximized with respect to γ (the parameters that relate the endoge-nous explanatory variable to instruments), θ (which describes the dis-tribution of preferences affecting utility), and Ω (the covariance of thecorrelated unobserved terms ε1

n and μn.) Of course, in any particularapplication, restrictions can be placed on Ω to reduce the number ofparameters. For example, Park and Gupta (2008) assume that ε1

nj andμnk are not correlated for k �= j.

The control function and maximum likelihood approaches providea tradeoff that is common in econometrics: generality versus efficiency.The maximum likelihood approach requires a specification of the jointdistribution of εn and μn, while the control function requires a speci-fication of the conditional distribution of εn given μn. Any joint dis-tribution implies a particular conditional distribution, but any givenconditional distribution does not necessarily imply a particular jointdistribution. There may be numerous joint distributions that have thespecified conditional distribution. The control function approach istherefore more general than the maximum likelihood approach: it isapplicable under any joint distribution that is consistent with the spec-ified conditional distribution. However, if the joint distribution can becorrectly specified, the maximum likelihood approach is more efficientthan the control function, simply by the fact that it is the maximumlikelihood for all the parameters.

13.6. CASE STUDY: VEHICLE CHOICE 31

13.6 Case Study: Consumers’ choice amongnew vehicles

A useful illustration of the BLP apporach is given by Train and Win-ston (2007). Their study examined the question of why the Big Threeautomakers have been losing market share. As part of the study, theyestimated a model of buyers’ choices among new vehicles. Since manyattributes of new vehicles are unobserved by the researcher and yetaffect price, it is expected that price is endogenous.

Estimation was performed on a sample of consumers who boughtor leased new vehicles in the year 2000, along with data on the aggre-gate market shares for each make and model in that year. The analysisdid not include an outside good, and, as such, gives the demand con-ditional on new vehicle purchase. For each sampled buyer, the surveyinformation included the make and model of the vehicle that the per-son bought plus a list of the vehicles that the person said that theyconsidered. The chosen and considered set were treated as a ranking,with the chosen vehicle first and the considered vehicles ranked in theorder that the person listed them. The choice model was specified as an“exploded logit” for the probability of the person’s ranking, mixed overa distribution of random coefficients. See section 7.3 for an extendeddiscussion of this specification for rankings data. The choice modelincluded constants for each make and model of vehicle, as well as ex-planatory variables and random coefficients that capture variations inpreferences over consumers. The analysis distinguished 200 makes andmodels and used the contraction to calculate the 199 constants (withone normalized to zero) within the maximum likelihood estimation ofthe other parameters. The estimated constants were then regressedon observed attributes of the vehicles, including price. Since price wasconsidered endogenous, this regression was estimated by instrumentalvariables rather than ordinary least squares.

Table 13.1 gives the estimates of the parameters that relate tovariation in preferences over consumers. These are the parametersthat were estimated by maximum likelihood on the probability of eachbuyers’ ranking of makes and models, using the contraction of theconstants. The estimated coefficients have the following implications:

• Price divided by income enters as an explanatory variable, tocapture the concept that higher income households place less im-portance on price than households with less income. The variable


is given a normally distributed random coefficient, whose meanand standard deviations are estimated. The estimated mean isnegative, as expected, and the estimated standard deviation isfairly large and significant (with a t-statistic over two), indicatingconsiderable variation in response to price.

• The Consumner Report’s repair index enters for women who areat least 30 years old, and not for men and younger women. Thisdistinction was discovered through testing of alternative demo-graphic variables interacted with the repair index.

• People who lease their vehicle are found to have a stronger pref-erence for luxury and sports vehicles than people who buy.

• Households with adolescents are found to have a stronger prefer-ence for vans, SUVs, and station wagons than other households.

• Dealership locations are found to affect households’ choices. Oneof the purposes of Train and Winston’s analysis was to investi-gate the impact of dealership locations on vehicle choice, to seewhether changes in dealership locations can explain part of theBig Three’s loss in share. The demand model indicates that, asexpected, the probability of buying a given make and model riseswhen there are more dealerships within a 50 mile radius of thehousehold that sell that make and model.

• Horsepower, fuel consumption, and a truck dummy enter withrandom coefficients. These variables do not vary over consumers,only over makes and models. As a result, the mean coefficienttimes the variable is absorbed into the make/model constants,such that only the standard deviation is estimated in the choicemodel. To be precise: each of these variables enters utility asβnxj , with random βn. Decompose βn into its mean β and devi-ations βn. The mean impact βxj does not vary over consumersand becomes part of the constant for vehicle j. The deviationsβnxj enter the choice model separately from the constants, andthe standard deviation of βn is estimated. The estimates implyconsiderable variation in preferences for power, fuel efficiency,and trucks.

• The last set of variables captures the loyalty of consumers to-wards manufacturers. Each variable is specified as the number of


Table 13.1: Mixed Exploded Logit Model of New Vehicle ChoiceParameter Std. error

Price (MRSP) divided by respondents incomeMean coefficient -1.6025 0.4260Standard deviation of coefficient 0.8602 0.4143

Consumer Reports repair index, for women aged 30+ 0.3949 0.0588Luxury or sports car, for lessors 0.6778 0.4803Van, for households with an adolescent 3.2337 0.5018SUV or station wagon, for households with an adolescent 2.0420 0.4765ln(1+Number of dealers within 50 miles of person’s home) 1.4307 0.2714Horsepower: std dev of coefficient 0.0045 0.0072Fuel consumption (1/mpg): std dev of coefficient -102.15 20.181Light truck, van, or pickup, std dev of coefficient 6.8505 2.5572Number of previous consecutive GM purchases 0.3724 0.1471Number of previous consecutive GM purchases, if rural 0.3304 0.2221Number of previous consecutive Ford purchases 1.1822 0.1498Number of previous consecutive Chrysler purchases 0.9652 0.2010Number of previous consecutive Japanese purchases 0.7560 0.2255Number of previous consecutive European purchases 1.7252 0.4657Manufacturer loyalty, error component: std dev 0.3453 0.1712

SLL at convergence -1994.93

past consecutive purchases from a given manufacturer (or groupof manufacturers). The estimated coefficients imply that loy-alty to Eurpoean manufacturers is largest, followed by loyalty toFord. Interestingly, rural households are found to be more loyalto GM than urban households. These loyalty variables are atype of lagged dependent variable, which introduce econometricissues due to the possibility of serially correlated unobserved fac-tors. Winston and Train discuss these issues and account for theserial correlation, at least partly, in their estimation procedure.

The choice model in Table 13.1 included constants for each makeand model of vehicle. The estimated constants were regressed on ve-hicle attributes to estimate the aspects of demand that are commonto all consumers. Table 13.2 gives the estimated coefficients of thisregression, using instrumental variables to account for the endogeneityof price.


As expected, price enters with a negative coefficient, indicating thatconsumers dislike higher price, all else held equal (i.e., assuming thereare no changes in a vehicle’s attributes to compensate for the higherprice). Interestingly, when the regression was estimated by ordinaryleast squares, ignoring endogeneity, the estimated price coefficient wasconsiderably smaller: −0.0434 compared to the estimate of −0.0733using instrumental variables. This direction of difference is expected,since a positive correlation of price with unobserved attributes createsa downward bias in the magnitude of the price coefficient. The size ofthe difference indicates the importance of accounting for endogeneity.

The other estimated coefficients have expected signs. Consumersare estimated to value additional power, as evidenced by the positivecoefficient for the horsepower to weight ratio. Consumers also preferto have an automatic transmission as standard equipment (holding theprice of the vehicle constant). The size of the vehicle is measured bothby its wheelbase and its length beyond the wheelbase. Both measuresenter positively, and the wheelbase obtains a larger coefficient than thelength beyond wheelbase. This relation is expected, since the wheel-base is generally a better indicator of the interior space of the vehiclethan the length beyond the wheelbase. The negative coefficient offuel consumption implies that consumers prefer greater fuel efficiency,which reduces fuel consumption per mile.

Notice that price enters both parts of the model: the regressionin Table 13.2, which captures the impacts that are constant over con-sumers, and the choice model in Table 13.1, which capture impactsthat differ over consumers. Taking both parts together, price entersutility with a coefficient: −0.0773−1.602/Income+0.860∗η/Income,where η is a standard normal random term. The price coefficient has aconstant component, a part that varies with the income of the house-hold, and a part that varies randomly over households with the sameincome.

The endogeneity of price has been appropriately handled in thiscase study by including constants in the choice model to absorb theunobserved attributes and then using instrumental variables when re-gressing the constants against price and other observed attributes –i.e., by the BLP approach. A case study of the control function ap-proach is provided by Petrin and Train (2009) and of the maximumlikelihood approach by Park and Gupta (2009).


Table 13.2: Regression of constants on vehicle attributesParameter Std. error

Price (MSRP, in thousands of dollars) -0.0733 0.0192Horsepower divided by weight (in tons) 0.0328 0.0117Automatic transmission standard 0.6523 0.2807Wheelbase (inches) 0.0516 0.0127Length minus wheelbase (inches) 0.0278 0.0069Fuel consumption (in gallons per mile) -31.641 23.288Luxury or sports car -0.0686 0.2711SUV or station wagon 0.7535 0.4253Van -1.1230 0.3748Pickup truck 0.0747 0.4745Chrysler 0.0228 0.2794Ford 0.1941 0.2808General Motors 0.3169 0.2292European 2.4643 0.3424Korean 0.7340 0.3910Constant -7.0318 1.4884

R-squared 0.394


Bibliography

Berry, S. (1994), ‘Estimating discrete choice models of product differ-entiation’, RAND Journal of Economics 25, 242–262.

Berry, S., J. Levinsohn and A. Pakes (1995), ‘Automobile prices inmarket equilibrium’, Econometrica 63, 841–889.

Berry, S., J. Levinsohn and A. Pakes (2004), ‘Differentiated productsdemand system from a combination of micro and macro data: Thenew car market’, Journal of Political Economy 112, 68–105.

Chintagunta, P., J. Dube and K. Goh (2005), ‘Beyond the endo-geneity bias: The effect of unmeasaured brand characteristicson household-level brand choice models’, Management Science52, 832–849.

Ferreira, F. (2004), ‘You can take it with you: Transferability of propo-sition 13 tax benefits, residential mobility, and willingness to payfor housing amenities’, Center for Labor Economics Working Pa-per No. 72, http://ssrn.com/abstract=661421.

Goolsbee, A. and A. Petrin (2004), ‘The consumer gains from directbroadcast satellites and the competition with cable tv’, Econo-metrica 72, 351–382.

Greene, W. (2000), Econometric Analysis, 4th edn, Prentice Hall, Up-per Saddle River, New Jersey.

Guevara, C. and M. Ben-Akiva (2006), ‘Endogeneity in residential lo-cation choice models’, Transportation Research Record 77, 60–66.

Hausman, J. (1978), ‘Specification tests in econometrics’, Economet-rica 46, 1251–1272.

37

38 BIBLIOGRAPHY

Hausman, J. (1997), Valuation of new goods under perfect and im-perfect competition, in R.Gordon and T.Bresnahan, eds, ‘TheEconomics of New Goods’, University of Chicago Press, Chicago.

Heckman, J. (1978), ‘Dummy endogenous variables in a simultaneousequation system’, Econometrica 46, 931–959.

Heckman, J. and R. Robb (1985), ‘Alternative methods for evaluatingthe impacts of interventions: An overview’, Journal of Economet-rics 30, 239–267.

Jiang, R., P. Manchanda and P. Rossi (2007), ‘Bayesian analysis ofrandom coefficient logit models using aggregate data’, WorkingPaper, Graduate School of Business, University of Chicago.

Martin, L. (2008), ‘Consumer demand for compact flourescent lightbulbs’, working paper, Department of Agricultural and ResourceEconomics, University of California, Berkeley.

Nevo, A. (2001), ‘Measuring market power in the ready-to-eat cerealindustry’, Econometrica 69, 307–342.

Park, S. and S. Gupta (2009), ‘A simulated maximum likelihood es-timator for the random coefficient logit model using aggregatedata’, Journal of Marketing Research forthcoming.

Petrin, A. (2002), ‘Quantifying the benefits of new products; the caseof the minivan’, Journal of Political Economy 110, 705–729.

Petrin, A. and K. Train (2009), ‘A control function approach to en-dogeneity in consumer choice models’, Journal of Marketing Re-search forthcoming.

Rivers, D. and Q. Vuong (1988), ‘Limited information estimatorsand exogeneity tests for simultaneous probit models’, Journal ofEconometrics 39, 347–366.

Ruud, P. (2000), An Introduction to Classical Econometric Theory,Oxford University Press, New York.

Shim, E. and E. Sudit (1995), ‘How manufacturers price products’,Management Accounting 76, 37–39.

BIBLIOGRAPHY 39

Train, K. and C. Winston (2007), ‘Vehicle choice behavior and the de-clining market share of u.s. automakers’, International EconomicReview 48, 1469–1496.

Villas-Boas, M. (2007), ‘A note on limited versus full information es-timation in non-linear models’, working paper, Haas School ofBusiness, University of California, Berkeley.

Villas-Boas, M. and R. Winer (1999), ‘Endogeneity in brand choicemodels’, Management Science 45, 1324–1338.

Yang, S., Y. Chen and G. Allenby (2003), ‘Bayesian analysis of simul-taneous demand and supply’, Quantitative Marketing and Eco-nomics 1, 251–275.

Documents

Endogeneity - University of California, Berkeley