Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Moral Hazard, Adverse Selection and HealthExpenditures: A Semiparametric Analysis∗
Patrick BajariDepartment of Economics
University of Minnesota and NBER
Han HongDepartment of Economics
Stanford University
Ahmed KhwajaSchool of Management
Yale University
Christina MarshDepartment of Economics
University of Georgia
Abstract
Theoretical models predict asymmetric information in health insurance marketsmay generate inefficient outcomes due to adverse selection and moral hazard. How-ever, previous empirical research has found it difficult to disentangle adverse selectionfrom moral hazard in health care. We empirically study this question using a uniqueclaims-level dataset with confidential information from a large self-insured employerto estimate a structural model of the demand for health care. We propose a two-step semiparametric estimation strategy to identify and estimate a canonical model ofasymmetric information in health care markets. We find significant evidence of moralhazard and adverse selection.
JEL CLASSIFICATION: C14, D82, I11KEYWORDS: Adverse Selection, moral hazard, health insurance, semiparametric es-timation
∗We have benefited from the comments of participants of the Conference on Structural Models in Labor,Aging and Health, and the Seventeenth Annual Health Economics Conference. We acknowledge excellentresearch assistance from Ivan Shaliastovich, and especially Alvin Murphy. Bajari and Hong would like tothank the National Science Foundation for generous research support. The authors may be contacted at thefollowing email addresses respectively: [email protected], [email protected], [email protected],[email protected].
1
1 Introduction
A large theoretical literature predicts that adverse selection and moral hazard may gener-
ate inefficient outcomes in insurance markets due to asymmetric information (Arrow 1963,
Akerlof 1970, Spence and Zeckhauser 1971). The predictions of insurance theory and the
efficient regulation of insurance markets can depend on whether adverse selection or moral
hazard is more important. However, it is well recognized that it is empirically difficult to
distinguish between moral hazard and adverse selection and empirical research that disen-
tangles moral hazard from adverse selection is almost non-existent.1As a consequence there
is little consensus on which of these two sources of inefficiency is more important to address
for public policy. In this paper, we introduce an analytical approach that can separately
identify the two effects.
Our research builds on a canonical model of demand for health care. We begin by formu-
lating a model of consumers’ health care choices given insurance plan offerings. The model is
motivated by theoretical models proposed by Spence and Zeckhauser (1971) and Blomqvist
(1997). A consumer has preferences over both health care expenditure and aggregate con-
sumption, which are influenced by the consumer’s latent health status. A consumer’s optimal
choice of health care expenditure depends on his latent health status and his budget con-
straint. The budget constraint accounts for the characteristics of the chosen health insurance
plan and a plan-specific probability distribution of reimbursement for health expenditure.
Consumers have private information about their latent health status which is unobserved by
the insurer, leading to adverse selection. The consumers do not pay the full costs of their
health care coverage, which induces a moral hazard problem. We estimate risk parameters
semiparametrically and use these parameters to recover the latent health status distribution.
Given the estimated health status distribution, utility parameters, and observed consump-
tion choices, we are able to infer the level of moral hazard with a counterfactual replicating
1A notable exception is the work of Einav, Jenkins and Levin (2010), who estimate a model that incor-porates both these phenomena in the context of automobile loan markets.
2
a social planner’s optimal consumption allocation. Lastly, we use the estimated plan specific
latent health distributions to test for adverse selection across plans.
Our empirical research is made possible by access to a unique confidential claims-level
dataset from a large self-insured employer. The data contains a high level of detail and
includes all information that was necessary to process a claim. Since this data was used
directly for insurance processing by the employer, the data also has a high level of com-
pleteness and accuracy. Each consumer is matched to corresponding health expenditures,
reimbursements, salary, age, and health status indicators.
Our work contributes to the growing body of empirical work on the structural estimation
of models of medical utilization and health insurance choice (Cameron et al. 1988, Cardon
and Hendel 2001, Vera-Hernandez 2003, Blau and Gilleskie 2008, Khwaja 2001, 2010). We
differ from the existing earlier literature in three ways. First, we allow for both adverse selec-
tion and moral hazard due to asymmetric information in estimating the model. Second, our
estimation strategy is semiparametric, in contrast to earlier work relying on parsimoniously
specified parametric models. This is important because theory provides little guidance about
which parametric distributions for latent health shocks are a priori most plausible. Semi-
parametric estimation also is a major contribution because it allows us to specify plans’
reimbursement schedules flexibly, allowing for non-linearities in common characteristics such
as deductibles and copays, which could not be captured by more restrictive specifications in
previous literature. Third and most notably, our method only necessitates a single, relatively
weak identification assumption. Previous structural approaches rely on strong identifying
assumptions. In this paper, the sole identifying assumption is that the distribution of health
shocks is invariant over a short time span, i.e., the quantiles of the underlying health status
distribution are invariant across the three years of our data. This assumption is reasonable
given a large enough population of study. In particular, our data is from the largest em-
ployer in the state of Minnesota, which has retained the top position for many years with
a stable population of employees over our period of study. Additionally, we use consecutive
3
years in our estimation, which further supports the validity of this identifying assumption by
limiting the possibility of long-term changes in the local population or restructuring of the
employee base. Hence, our research extends the nascent empirical literature on examining
the economic effects of adverse selection in health insurance (see e.g., Carlin and Town 2009,
Bundorf, Levin and Mahoney 2010, Einav, Finkelstein and Cullen 2010, Handel 2010).
More generally, our paper contributes to the literature on analyzing distortions due to
asymmetric information in insurance markets. A common method in previous research to
detect asymmetric information is to examine the correlation between risk outcomes and a
measure of the generosity of a contract. In the context of health insurance, Cutler and Zeck-
hauser (2000) review an extensive literature that finds evidence of adverse selection based
on the positive correlation between generosity of the insurance contract and adverse out-
comes, and moral hazard based on the coinsurance elasticity of the demand for medical care.
However, Chiappori and Salanie (2003) show that, under moral hazard, the generosity of
the contract will lead to adverse risk outcomes, while under adverse selection the causality
is reversed. This leads to observational equivalence between the two hypotheses. Einav,
Finkelstein, and Levin (2010) and Abbring et al(2003) show that disentangling the two ef-
fects requires a modeling framework in the absence of experimental data.2 Moreover, the
nature of health insurance markets necessitates a more structured approach than reduced
form methods. This is because the price for health care services, in the form of deductibles
or copays, is often based on the amount of health care consumed. Thus, any conventional
regression analysis between health care quantities and incentives will be susceptible to endo-
geneity bias. Our framework, in addition to documenting the presence of moral hazard and
adverse selection, permits policy analysis using counterfactual simulations. Furthermore, it
allows estimation of the level of distortions of insurance markets and the welfare impact of
market interventions.
2Abbring et al. (2003) suggest exploiting the dynamic consequences of experience rating to distinguishbetween adverse selection and moral hazard. U.S. health insurance market regulation precludes this empiricalstrategy.
4
Our results indicate both significant moral hazard and adverse selection. Our estimates
of risk aversion on health expenditures are slightly higher than for composite good consump-
tion, which is consistent with the irrecoverable nature of health. Second, we find substantial
evidence of moral hazard or “overconsumption” of medical care. In our plans, overconsump-
tion due to insurance may be as high as 40 percent of health expenditure within our plans.
Third, we devise a nonparametric test for adverse selection, and find that individuals are
indeed sorting across different insurance categories based on their latent health status. We
find that plans with low out-of-pocket costs but restrictive provider choice attract a larger
proportion of healthy consumers vis a vis more expensive but flexible plans.
Our research is novel in that it develops a tractable estimation procedure under minimal
parametric assumptions to simultaneously examine adverse selection and moral hazard in
health insurance contracts. It also provides an important framework for similar analysis in
other contexts, especially with cross-section data, where distortions exist due to asymmetric
information. The rest of the paper is organized as follows. The model is discussed in Section 2
and the data in Section 3. Details of the estimation procedure are provided in Section 4.
Estimation results including analysis of moral hazard and adverse selection are discussed in
Section 5. Section 6 concludes.
2 Model
Our focus is on disentangling adverse selection and moral hazard in health insurance mar-
kets. This is difficult because the market for health insurance and health care is rich with
institutional detail and complex. Hence, in order to encapsulate the essential features of
asymmetric information in health insurance markets , we develop a canonical model of de-
mand for health care that builds on the work of Spence and Zeckhauser (1971), Bloomqvist
(1997) and Cardon and Hendel (2001). The objective of consumers acting directly, or in
concert with physicians who are their agents, is to maximize consumer utility. This is done
by choosing the optimal amount of health services m and consumption of a composite com-
5
modity c subject to a budget constraint. This budget constraint requires that out-of-pocket
expenses for m plus the value of c must be less than or equal to income y minus the insurance
premium p.
Our model differs from the standard neoclassical model of consumer choice in two impor-
tant respects. First, the optimal choice of m and c depends on a consumer’s latent health
status, which we denote as the scalar θ. We interpret θ as a preference shock for health
services, that is, the higher the value of θ, the higher the utility from health services. While
θ is observed to the consumer, it is not observed by the insurer. As a result, our model allows
for adverse selection, which plays a central role in the economic analysis of health insurance
and insurance markets. Second, the budget constraint in our model is more complicated
than the standard neoclassical model. Health insurance plans introduce nonlinearities in the
budget constraint through features such as deductibles and coverage gaps (Keeler, Newhouse
and Phelps 1977). Also, the consumer’s out-of-pocket expenses are uncertain at the time
that the health care choice m is made. We allow this out-of-pocket expense to be stochastic,
generalizing previous research which typically assumes that the consumer’s costs of health
care are known at the time m is chosen. Given this budget constraint, moral hazard arises
in the model when the consumers do not face the full cost of health services.
An important reason for adopting this framework to disentangle moral hazard and ad-
verse selection in health insurance is that there is strong evidence that consumers’ health
care choices do respond to health insurance incentive structures (see Manning et al. 1987,
Kowalski 2010, Marsh 2010). Conversely, insurance plans are designed with enrollee char-
acteristics, utilization and expenditures in mind. Hence, in the absence of the framework
described above, it would be extremely difficult to disentangle adverse selection from moral
hazard because consumer choices and plan characteristics are determined jointly in response
to each other. We describe our model in more detail below.
6
2.1 Consumer Preferences
The consumer’s utility function is specified as:
U(c,m; θ, γ) = (1− θ) c1−γ1
1− γ1
+ θm1−γ2
1− γ2
The consumer’s utility depends on the consumption of m and c. We assume consumption
to be additively separable in these two terms. The latent health status parameter θ lies
between [0,1] and indexes the weight that the consumer places on consumption of health
services m and non-health aggregate commodity c. If θ is close to one (zero), the consumer
has a greater valuation for m (c).
Consumers in our environment are heterogenous because θ varies across individuals. Let
g(θ) and G(θ) denote the density and cdf of health status across individuals in our data
set. In principle, we could let g depend on observed covariates considered to be associated
with health status such as age, education level or income. However, in the absence of a
theoretical foundation for the choice of such covariates, any ad hoc specification involves the
risk of misspecification. As we shall discuss in the estimation section, we do not need to
specify this part of the model. In our results section, we examine the relationship between θ
and such covariates, which has the additional advantage that it helps evaluate the plausibility
of the estimated distribution of θ.
Another feature of our model is that we allow for multidimensional risk aversion. We
assume constant relative risk aversion in c and m. The parameters γ1 and γ2 describe the
consumer’s attitude towards risk with respect to aggregate consumption and health respec-
tively. This is important in our framework because the out-of-pocket expenses associated
with m are stochastic.
2.2 Budget Constraint
The budget constraint specifies that the consumer’s total expenditure on the composite
commodity plus health must be less than her income after deducting medical premiums.
7
The price to the consumer of m depends on two things. The first is pj, the fixed premium of
the insurance plan j. There are three plans in our data, a Health Maintenance Organization
(HMO) and two Preferred Provider Organizations (PPO).
We do not model the choice of j. We could imbed our framework in a richer model that
includes the choice of a health plan in an earlier stage.3 However, the optimal choice of m
will be sufficient to identify our model parameters so we abstract from this complication.
The choice of j does not influence the consistency of our estimates. In our results section,
we will use our estimates of the model to ask how consumers with different values of θ sort
across plans and if their observed sorting is consistent with adverse selection. The second
part of our model that determines the consumer’s cost is the reimbursement rate of insurance
plan j, which we label as the scalar aj. If a consumer chooses health services that lead to
medical expenditure m then the insurer will cover ajm of these expenses. As a result, the
consumer must pay for m(1− aj) out-of-pocket.
The consumer’s budget constraint is:
c+m(1− aj) ≤ y − pj
A difficulty the consumer faces is that aj will be uncertain at the time that m is cho-
sen. Medical plans are long and typically quite complicated documents that are frequently
written by insurers, their executives, and attorneys. It is unlikely that a typical consumer
invests the resources to understand what these medical plans cover in all states of the world.
Furthermore, the final determination of reimbursement is done by administrators and is of-
ten the outcome of a negotiation between the health provider and the insurer, which is a
complicated process.
What is important to model is the determination of aj from the perspective of the con-
sumer since we are ultimately interested in consumer demand. The other issues raised above,
while interesting, are secondary to our research question. Therefore, we shall model aj as a
3See e.g., Khwaja (2001, 2010) for a dynamic model of health insurance and health care decisions.
8
random variable with a distribution fj(aj|m). This allows the generosity of the benefits to
depend on the plan j chosen by the consumer. Furthermore, we allow aj to depend on m.
This is natural in the context of health care. For example, many plans require a consumer to
pay a fixed fee for a doctor’s visit. Plans may display features such as deductibles or coverage
gaps which do not reimburse a certain range of expenditures. Our framework accommodates
these complications by allowing the reimbursement rate to depend on m.
We shall use the observed distribution of reimbursements and flexible, nonparametric
methods to identify fj(aj|m). We view this is as preferable to a strict, ex ante specification
of consumers’ expectations about aj. We allow these expectations to be data driven and con-
sistent with the standard economic assumption that consumers have rational expectations,
in the sense that their beliefs about aj must be consistent with the observed outcomes.4
2.3 Expected Utility and First Order Conditions
The consumer makes her choice under uncertainty. A time line for the consumer’s choice is:
1. A value of θ is drawn from g.
2. After the consumer observes θ, the consumer makes a choice of m.
3. The reimbursement rate aj is realized from the distribution fj(aj|m).
4. Since preferences are strictly increasing, the budget constraint binds and c is deter-
mined by the equation c = y − pj −m(1− aj).
Let EU(m; pj, y, θ, γ) denote the consumer’s expected utility at step 2 in the time line
above, given pj, y, θ, and γ.
4As an alternative it might be interesting to allow consumers to have biased or irrational beliefs aboutthe determination of aj . However, our model is very flexible and comes close to exhausting the degrees offreedom in the data. The identification of such irrational beliefs would therefore be tenuous. There is noconsensus about a theoretical framework which would provide a plausible basis for the a priori specificationof how consumers bias their beliefs in the context of health care. As a consequence, we use the more commonassumption that consumers have beliefs that are consistent with ex post outcomes.
9
EU(m; pj, y, θ, γ) =
∫(1− θ)(y − pj −m(1− aj))1−γ1
1− γ1
fj(aj|m)daj + θm1−γ2
1− γ2
(2.1)
The above expression substitutes the choice of m for c using the consumer’s budget con-
straint. In computing expected utility, the consumer integrates over aj using the distribu-
tion fj(aj|m). If the realization of aj is close to one (zero), the value of c will be larger
(smaller) all else held equal. The value of EU(m) depends crucially on γ, the consumer’s
attitude towards risk. For example, the more risk averse the consumer is toward uncertainty
in consumption of c, we would expect her to consume less m. The utility also depends on
income and premiums through y− pj. For example, households with lower income are more
adversely impacted by a low realization of aj.
The optimal choice of m is determined by a first order condition that sets the derivative
of EU(m) in m equal to zero. Formally, we can write this condition as:
∂EU(.)
∂m=
∂
∂m
[∫(1− θ)(y − pj −m(1− aj))1−γ1
1− γ1
fj(aj|m)daj
]+
∂
∂m
[θm1−γ2
1− γ2
]
=1− θ1− γ1
∫[−(1− γ1)(1− aj)(y − pj −m(1− aj))−γ1fj(aj|m)
+ (y − pj −m(1− aj))1−γ1 ∂fj(aj|m)
∂m]da+
[θ ·m−γ2
]= 0 (2.2)
The first line of the equation 2.2 describes the marginal effect of increase in health care
expenditure on expected utility. The left bracketed term shows the marginal change in a con-
sumer’s utility due to a marginal change in consumption of the aggregate commodity c. The
presence of the conditional probability of reimbursement fj(aj|m) reflects the uncertainty
in reimbursement and the effect this has on the income that is available for consumption of
the aggregate commodity. The right bracketed term shows the marginal increase in utility
10
due to the consumption of health care services. Each bracketed term is weighted by θ, so
that the weight on a marginal increase in health care services reflects the private value the
consumer places on health care. The total effect on expected utility is the combined terms,
which are expanded in the second line of equation 2.2.
2.4 Inferring θ from observed choices
In this section we provide intuition about how our estimation procedure allows us to estimate
the density of health shocks g and the preference parameters γ from the observed choices.
Let us rewrite 2.2 in the following way:
θ =I
I − (1−γ2)mγ2
(2.3)
where
I =
∫[−(1− γ1)(1− aj)(y − pj −m(1− aj))−γ1fj(aj|m)
+(y − pj −m(1− aj))1−γ1 ∂fj(aj|m)
∂m]daj (2.4)
The rationale underlying our estimator is similar to that used in auction models, espe-
cially Guerre, Perrigne and Vuong (2000) and Campo, Guerre, Perrigne and Vuong (2011).
The left hand side of the above equation is the consumer’s private information, θ. For a
fixed γ, the remaining terms on the right hand side of this equation are potentially observ-
able. For example, from the data, we will be able to construct an estimate of fj(aj|m) by
observing the distribution of reimbursements conditional on medical expenditures m. Also,
our data set contains information about y, a particular consumer’s income level, pj, medical
premiums for plan j, and m the total value of consumption of medical services.
Under suitable regularity conditions, for a fixed value of γ, we shall always be able
to find a value of θ that rationalizes the consumer’s choice. As a result, it follows that
the assumption of utility maximization alone will not be adequate to identify both g and
11
γ. Our approach to identification will be to impose additional moment restrictions. This
identification strategy is similar to the analysis of the risk averse auction model in Campo,
Guerre, Perrigne and Vuong (2011). We note that in nonlinear parametric models, global
identification is generally difficult to verify. Campo et al. (2011) also make use of a set
of nonlinear moment conditions to estimate the risk aversion parameters. Their “high-
level” parametric identifying condition A1 (iv) implies the order condition for identification,
but does not explicitly provide a rank condition. For a special case of the CRRA utility
function, their moment conditions become linear so that the rank condition can be directly
tested using the observed data.5 Our approach to identification will be to impose additional
moment restrictions. In our application, we will use the moment restriction that g does not
depend on time. Our data is a three year panel on incomes and health care choices from
a large Minnesota-based employer. Intuitively, this restriction means that the severity of
illnesses for this large population of employees does not change over an interval of three
years. Below, we shall argue that this is reasonable since the population of employees is very
large and there is no reason to expect large fluctuations in the severity of illness as reflected
in g among this group within a reasonably short three year time period.6 However, there
will be considerable variation in individual incomes as individuals experience promotions or
change jobs within the organization. Also, premiums and reimbursement rates fj(aj|m) will
vary from year to year. In theory, another source of variation in the latent health distribution
could be the option of not insuring at all. However, given that our data set comes from a
large self-insured employer, selection effects are limited because most employees are covered
by the plans offered by the employer. Finally, the plans used in estimation capture most of
the employee base of the large employer, insuring over 80 percent of employees.
5However, Campo et al. (2011) candidly acknowledged that “considering another parametric specification... would lead to a nonlinear system of equations in (the risk aversion parameters) for which local identificationconditions can be obtained through the usual ‘rank’ conditions,” without providing explicit details for theverification of the rank conditions.
6To our knowledge, there was no epidemic nor any significant changes such as innovations in medicaltechnology that might have led to a large shift in the health distribution in this population during this timeinterval.
12
2.5 Discussion
The two primary issues in health insurance that are addressed in this paper, moral hazard
and adverse selection, result from the fact that consumer’s health, θ, is unobserved to the
insurer, yet the insurer offers a menu of different plans from which a consumer may choose.
Adverse selection occurs when plans are not able to distinguish between consumers with
different values of θ. In such a situation, a consumer with a high value of θ can choose a
plan j with generous features such as high reimbursement or better access to care that was
designed for consumers with low values of θ. Since θ is private information to the consumer
and the insurance plan cannot correct for this selection. Equation 2.3 will be used to back
out the distribution of θ for each plan and test how these distributions differ across plans,
i.e., whether there is evidence of sorting in the data.
Moral hazard occurs in the model because consumers pay some proportion aj for each
unit of health care m consumed, but do not pay the full cost incurred by the insurer. This
definition of moral hazard is commonly used in the health economics literature, although
other fields, such as contract theory, may use instead a definition that focuses on “hidden
actions.” The approach we use for moral hazard in this paper follows a longstanding def-
inition, as established by Pauly (1968).7 In our model, insurers establish pj and fj(aj|m),
but once consumers have chosen plan j, the insurer cannot contract the amount of care m a
consumer chooses within plan j. In equation 2.2, a consumer’s marginal increase in expected
utility with respect to health care must be equal to the price ratio between health care and
composite good consumption. The reimbursement aj lowers the relative price of health care
relative to composite good consumption, leading to moral hazard or “overconsumption” of
medical care compared to a situation in which the price of medical care is not subsidized by
insurance.
7Pauly discusses moral hazard stating: “Medicare insurance, by lowering the marginal cost of care to theindividual, may increase usage; this characteristic has been termed “moral hazard”.”
13
3 Data
To estimate the model we use a detailed confidential claims level dataset from a large self-
insured employer.8 The claims level data is linked to enrollment and demographic history
from the employer. Since the employer is self-insured, any medical care with an associated
claim is included in the dataset for every employee in the firm. Full claims data is available
for the years 2002-2004. The employer has offices in several locations and the complete
claims data covers over 19,000 employees and over 39,000 total beneficiaries.
The employer offers health insurance coverage for individuals, spouse/domestic partners
and families. Employees had a choice of four types of plan during 2002-2004, a traditional
health maintenance organization (HMO), a preferred provider organization (PPO), a tiered
network product based on care systems, and a consumer driven health plan (CDHP). The
HMO featured generous coverage for network physicians and hospitals but no coverage of
out-of-network care. The PPO had nominal copayments for services in-network, with lower
coverage out-of-network after meeting a deductible. In the tiered network product, employees
chose from three cost tiers with standardized benefits but varying premiums which changed
depending on the bids submitted by each of the facilities in the tiers. The CDHP option
featured an employer-funded health savings account and a high deductible.
Each claim includes total approved health care spending, total reimbursement, and cost
variables such as the coinsurance amount, copayment, and amount of deductible used. The
claim also includes information such as primary and secondary diagnosis codes, procedure
codes, type of treatment facility, and type of provider. Cost and spending amounts are ag-
gregated for each beneficiary over the year for a precise measure of total spending by the
employer and total out-of-pocket cost to the beneficiary. Claims data is linked to employee
information to include demographic information. Age and gender are taken directly from the
claims files. Salary was imputed based on birthdate, gender, home zipcode, work location,
and job classification using a separate file provided directly by the employer. Plan charac-
8We thank Robert Town and Caroline Carlin for their assistance in accessing this data.
14
teristics such as premium, employee contribution, and cost structure were obtained directly
from the employer’s enrollment materials.
For the empirical analysis, we focus on the three most populous plans which comprise
over 80 percent of enrollment. These plans are the HMO plan, hereafter referred to as “HP,”
and two of the plans in the tiered network product, referred to as “PC2” and “PC3.” PC2
and PC3 have similar cost sharing parameters, however PC2 has a network of providers of
lower cost than PC3. We use only employees enrolled under single coverage, and only those
employees with continuous enrollment over the entire year.9 Table 1 displays enrollment
statistics for each year and plan for single coverage full-year enrollment. Over the three-year
period, the dataset has over 14,000 single-coverage enrollees. The HP plan has the largest
enrollment with over 80 percent of enrollees each year. Enrollments in the PC2 and PC3
plans range between 230 and 539.
Table 1: Plan Total Enrollees
Plan 2002 2003 2004 TotalHP 4,032 4,126 3,845 12,003PC2 483 469 388 1,340PC3 523 539 230 1,292Total 5,038 5,134 4,463 14,635
The remaining variables included in the dataset are a current health status proxy and a
measure for the probability of high future health costs. These two variables are created for
each individual enrollee using the Johns Hopkins University ACG Case-Mix System (v6),
which was developed by the Health Services Research and Development Center. This is a
commercial algorithm used to predict future illness severity and health care spending. The
health proxy variable is constructed around a national average of 1.0 with higher values
indicating greater illness severity. The diagnosis codes and health status proxy are then
used to create a measure of the probability of high costs next year.
9Continuous single coverage replicates our specified model, but more categories of consumer could beanalyzed with simple modifications to our framework.
15
Table 2 displays summary statistics for each of the three plans for total health spending,
total reimbursement, salary, age, the health status proxy, next year’s high cost probability,
and percentage female. Statistics are displayed for the over 14,000 individual observations
in the dataset. Average health care spending is less than $5,000 in all three of the plans,
but the tiered PPO plans have higher average spending than the HMO plan. The highest
average spending is in the PC3 plan at $4,700 per consumer across all years. PC2’s average
spending is approximately $700 lower than PC3. The HMO’s average spending is significantly
lower than the PPO’s spending, at $2,845 per consumer. For a better understanding of
the spending distribution in each plan, Table 2 also lists the 25th and 75th percentiles of
spending. In the 25th percentile, the HMO plan has again significantly less expenditure
than the two PPO plans. The HMO’s 25th percentile of spending is $306 as compared to
$774 and $864 of PC2 and PC3, respectively. This pattern continues in the 75th percentile
where HMO expenditure is $2,710, compared to PC2 at $4,402 and PC3 at $5,023. For each
plan, total reimbursement is slightly less than total health spending, which reflects enrollees’
out-of-pocket costs.
Comparing demographic variables across the plans, there is preliminary evidence of the
existence of adverse selection. The average salary in the dataset is $43,896. The PC3 plan
has the highest average salary of $50,357, compared with the PC2’s $45,946 and HP’s $42,972
average salaries. The PC3 plan retains the highest distribution of salaries when looking at
both the 25th and 75th percentiles. The HP plan’s enrollees are younger, with an average
age of 41.7, compared to PC2 and PC3 with averages ages of 46.4 and 47.0, respectively
The average of the health status proxy variable is also much lower for the HMO plan. HP
has an average health status proxy value of 1.4. The two PPO plans average health status
proxy values are both well over 2. The PC2 plan has the highest enrollment of females at 73
percent. The HMO plan enrolls the lowest percentage of females at 59 percent. While these
differences in demographic characteristics do not prove adverse selection on unobservable
illness levels, they offer evidence that plans do vary in observable characteristics.
16
Table 2: Summary Statistics by Plan
Plan Variable Mean s.d. p25 p75 N
HP Total health spending $2,845 $7,459 $306 $2,710 12,003Total reimbursement $2,659 $7,139 $267 $2,470 12,003
Salary $42,972 $21,180 $30,659 $50,000 12,003Age 41.7 12.0 31.0 52.0 12,003
Health status proxy 1.4 2.8 0.2 1.4 12,003Prob. high cost next year 0.08 0.09 0.04 0.09 12,003
Indicator for female 0.59 0.49 0.00 1.00 12,003
PC2 Total health spending $3,964 $7,559 $774 $4,402 1,340Total reimbursement $3,657 $7,555 $643 $3,972 1,340
Salary $45,946 $21,533 $32,802 $52,008 1,340Age 46.4 11.5 38.0 55.0 1,340
Health status proxy 2.2 3.2 0.5 2.9 1,340Prob. high cost next year 0.12 0.11 0.05 0.14 1,340
Indicator for female 0.73 0.44 0.00 1.00 1,340
PC3 Total health spending $4,700 $9,243 $864 $5,023 1,292Total reimbursement $4,449 $9,151 $758 $4,639 1,292
Salary $50,357 $26,795 $33,615 $57,811 1,292Age 47.0 10.7 39.0 55.0 1,292
Health status proxy 2.5 3.8 0.5 3.4 1,292Prob. high cost next year 0.12 0.12 0.05 0.14 1,292
Indicator for female 0.66 0.47 0.00 1.00 1,292
Total Total health spending $3,111 $7,664 $374 $3,029 14,635Total reimbursement $2,909 $7,397 $328 $2,734 14,635
Salary $43,896 $21,874 $31,210 $50,898 14,635Age 42.6 12.0 32.0 52.0 14,635
Health status proxy 1.6 3.0 0.2 1.5 14,635Prob. high cost next year 0.09 0.10 0.04 0.10 14,635
Indicator for female 0.61 0.49 0.00 1.00 14,635
17
Table 3 displays the health status proxy summary statistics for each year. The average
value of the health status proxy over all plans is stable across the three years in the sample
at approximately 1.6. The other descriptors of the distribution are also stable from year
to year with an average standard deviation of 2.96 and the same values for the 25th and
the 75th percentile in each year. We use this stability across years in the underlying health
status to construct our estimator, which is described in the next section.
Table 3: Health Status Proxy by Year
Year Mean s.d. p25 p75 N2002 1.64 3.12 0.20 1.49 5,0382003 1.60 2.81 0.20 1.49 5,1342004 1.57 2.92 0.20 1.49 4,463Total 1.60 2.96 0.20 1.49 14,635
4 Estimation
We use a two-step semiparametric estimation strategy to recover the underlying parameters
of the consumer’s utility function and the distribution of latent health status. An individual
observation is a full-year single coverage enrollee i, in a plan j, for a given year t. Equation
2.2 specifies the optimal decision rule for an individual with the latent health status θi
which can be transformed in to Equation 2.3. We estimate an empirical version of the latter
equation. Components which come directly from the data are health care spending mijt,
income yit, and premiums pjt. Health care spending and income are unique to an individual
i in plan j in a given year t. Premiums are the same for all individuals in a plan j in a given
year t. Components which are estimated from the data are the conditional reimbursement
distribution fjt(ajt|mjt), its derivative∂fjt(ajt|mjt)
∂mjt, and risk parameters γ1, γ2. The estimated
version of Equation 2.3 is as follows:
θi =I
I − (1−γ2)
mγ2ijt
(4.1)
18
where
I =
∫[−(1− γ1)(1− ajt)(yit − pjt −mijt(1− ajt))−γ1 fjt(ajt|mjt)
+ (yit − pjt −mijt(1− ajt))1−γ1 ∂fjt(ajt|mjt)
∂mjt
]dajt (4.2)
Estimation of Equation 4.1 proceeds in three steps:
1. Estimate conditional reimbursement distributions fjt(ajt|mjt) and∂fjt(ajt|mjt)
∂mjt.
2. Given fjt(ajt|mjt) and∂fjt(ajt|mjt)
∂mjt, substitute the observed yit, pjt, mijt and ajt and find
θi in terms of I and γ1, γ2.
3. Estimate utility parameters γ1, γ2 using GMM.
Intuitively, the unobserved health status θ is the analogue of the unobserved error term
in a structural linear demand equation. Given that we observe data on income, insurance
premium, the reimbursement rate, and the medical expenditures, the identifying assumption
that the unconditional distribution of the health status does not vary across years allows
us to recover utility parameters and the distribution of the latent health status. The effec-
tive instruments, which are the year dummies, are associated with exogenous shifts in the
distribution of income and the relative price of medical care that are uncorrelated with the
health shock. Thus intuitively we can hold the health status “fixed” while the relative prices
are shifted by the instruments, which leads to co-movement in medical expenditures and
consumption. Such co-movements allow us to identify the utility parameters. Once we have
estimated the utility parameters, the distribution of health status can be backed out from
the first order condition. The difference between our method and the conventional two-stage
least squares estimator is that instead of relying on a reduced-form specification of a linear
functional form for the demand equation, we use the optimality condition for a risk averse
consumer to derive the functional form of the demand equation.
19
This estimated distribution of health status will then be used to identify adverse selec-
tion and moral hazard. Adverse selection identification comes from how the distribution is
sorted into plans. Moral hazard estimation will come from a counterfactual choice of health
expenditures, in the absence of insurance, of an individual given his health status estimate
and the derived functional form of the demand equation.
4.1 Estimating Conditional Reimbursement Distributions
In the first step, we nonparametrically estimate the conditional reimbursement distributions
for each plan in each year. First, the ex-post realized reimbursement is calculated for each
enrollee as a percentage of total health spending. The nonparametric kernel estimation
uses the realized reimbursements to calculate the joint probability of a given reimbursement
percentage and a given level of health spending, fjt(ajt,mmt).10 To estimate the conditional
distribution, we then nonparametrically calculate the health spending distribution fjt(mjt).
The estimated conditional distribution fjt(ajt|mjt) is then the ratio of the joint distribution
of reimbursement and spending, and the distribution of spending. That is,
fjt(ajt|mjt) =fjt(ajt,mjt)
fjt(mjt).
Figure 1 reveals some of the uncertainty between levels of spending and reimbursement
levels in our data. Each scatter plot maps patients’ spending levels for a given year in a given
plan, during which reimbursement policies of the plan were fixed, with the reimbursement
that the patient finally received. There is substantial variation between expenditure and re-
imbursement, which would lead to uncertainty when a patient chooses a level of expenditure.
The conditional distribution is calculated for a grid of reimbursement percentages, ajt,
and health expenditure categories.11 The conditional distributions fjt(ajt|mjt) are displayed
by plan for each year in Figure 2, Figure 3, and Figure 4. Across all plans and years, the
10The bandwidth is the optimal bandwidth rule of thumb suggested by Bowman and Azzalini (1997).11These results use a 128x128 grid of ajs and mts. Both more and fewer categories produced very little
change in the resulting estimates.
20
Figure 1: Reimbursement and Spending Levels
lowest reimbursement percentages occur at the lowest level of health spending, and vice versa.
To illustrate the diagrams, Figure 4 shows that PC3 in 2002 has a positive probability of
zero percent reimbursement across all expenditure ranges because the reimbursement curve
is flush with the lower axis. Figure 2 shows that the HP plan has the lowest out-of-pocket
cost schedule; the conditional probabilities corresponding to over 90 percent reimbursement
are higher in HP than the other two plans. In our last step, we obtain∂fjt(ajt|mjt)
∂mjtby applying
an approximate derivative to all grid points of the conditional distribution. These estimation
results use an adaptive Simpson quadrature.
4.2 Estimating the Latent Health Status Distribution
We use the estimated conditional reimbursement probabilities to estimate the latent health
status distribution. The underlying identification assumption is that the aggregate distri-
bution of latent health status of consumers enrolled in the three plans is the same for the
21
Figure 2: Conditional Reimbursement Distributions, HP Plan 2002-2004
22
Figure 3: Conditional Reimbursement Distributions, PC2 Plan 2002-2004
23
Figure 4: Conditional Reimbursement Distributions, PC3 Plan 2002-2004
24
three years. This is a reasonable assumption for the short range of years, 2002-2004. It is
also supported by the proxy health status variable constructed in the data from diagnosis
codes. This health proxy status variable has a similar distribution across all three years, as
reported in Table 3. Although we assume that the entire distribution of latent health status
remains constant, it is possible that individual realizations of θi may be different over this
time period, and there may also be changes in plan variables such as premiums and cost
sharing levels. Alternatively put, our identifying assumption is that there was no change in
the health distribution of the population (e.g., due to an epidemic or medical innovation) in
our sample over the three year period.
For each individual enrollee, θi is recovered using Equation 4.1. Given the estimated
conditional reimbursement probabilities and the data on spending and premiums, θi can be
recovered in terms of the remaining parameters to be estimated, γ1 and γ2.
4.3 Solving for Utility Parameters
For each single-coverage full year enrollee i in a given plan j and a given year t, we now
have an expression for the enrollee’s latent health status in terms of the utility parameters
of the enrollee’s maximization problem. We estimate the utility parameters using a GMM
framework. Based on our identifying assumption, we estimate the model requiring the latent
health status distributions to be equal to each other across years. Health expenditures
often have a small number of extreme outliers, so, since we are matching the shape of the
distribution, we trim the distribution. The nonparametric approach benefits from trimming
the tails of the distribution and the data points that are dropped are not needed for a
consistent estimator. First, we drop enrollees with zero yearly expenditures. This amounts
to approximately 12 percent of the HP sample in each year, 7 percent of the PC2 sample,
and 4 percent of the PC3 sample. Second, we drop enrollees whose expenditures fell in the
top 20 percent of the entire sample of expenditures over all three plans. Finally, a very small
number of enrollees, less than 1 percent in each year, were dropped if their out-of-pocket
25
expenditures were larger than their salaries. Table 4 displays the resulting estimation sample.
Over 10,000 employer-year observations remain in the estimation sample.
Table 4: Estimation Sample Size
Plan 2002 2003 2004 TotalHP 2,690 2,791 2,631 8,112PC2 351 341 295 987PC3 392 392 174 967Total 3,433 3,533 3,100 10,066
The GMM estimation procedure to recover utility parameters is implemented as fol-
lows. We explicitly describe the estimation procedure for the first moment, i.e., the mean.
The other moments are constructed similarly.12 The latent health status of each individ-
ual enrollee i, θi, is as specified in Equation 4.1. Once we substitute the data on observed
health expenditure, income, insurance plan premium, and reimbursement probabilities, i.e.,
wijt = (mijt, yit, pjt, fjt), each individual i has an expression for her latent health status solely
in terms of the utility parameters, γ. Equation 4.3 displays the mean of the health status
distribution for the years 2002 and 2003, i.e., µθ02 , where N02 is the number of enrollees in
2002, as well as the corresponding definition of mean in 2003, µθ03 .
µθ02(γ) =
N02∑i=1
1
N02
θi(γ), µθ03(γ) =
N03∑i=1
1
N03
θi(γ) (4.3)
Let h1(w, γ0) denote the first moment condition for our sample. Statistical inference
using the GMM estimator is based on the property that E[h1(w, γ0)] = 0. For each of the
four distribution moments we have 3 moment conditions in our sample. These are the 3
pairs generated by 3 years of data. The 3 sample moment conditions associated with the
distribution mean are:
12In recovering the utility parameters we match the distributions using the first four moments of mean,variance, kurtosis, and skewness within each year.
26
h1(w, γ) =(µθ02(γ)− µθ03(γ)
)2
h2(w, γ) =(µθ03(γ)− µθ04(γ)
)2
h3(w, γ) =(µθ02(γ)− µθ04(γ)
)2
Using the other 3 distribution moments we generate 9 additional sample moment condi-
tions for a total of 12 sample moment conditions. The GMM estimator minimizes the sum
of all 12 sample moment conditions. This minimum value is found through a grid search
over the possible values of the two utility parameters. Grids of varying size were used for
γ1 ∈ [1, 6] and γ2 ∈ [1, 6]. We substituted each combination of grid values for γ1 and γ2 into
the sample moments and found the resulting sum of squared differences for that combina-
tion. The optimal γ that was chosen is the grid combination of γ1 and γ2 with the sum of
squared differences that is the closest to zero.
5 Asymmetric Information Results and Analysis
5.1 Test of Identifying Assumptions
The number of GMM conditions, four distribution moments times 3 sample years, is greater
than the dimension of the risk parameter vector. Because the model is overidentified, this
allows us to test our identifying assumptions. If the assumption of equal θ distributions
between all years is valid, then using only two years instead of all three years should yield
the same predictions. We perform a test by estimating parameters using a subset of data for
only two years and then using the estimated parameters from this subset to construct the
health status distribution for all three years. If resulting health status distributions for all
three years are statistically similar, even the year that was not included in estimating the
parameters, this lends strong support to our identifying assumption and we conclude our
GMM model fits the data well.
27
We perform this test by treating each of the three years as a holdout year in turn. For
example, we estimate risk parameters, γ based on data from 2002 and 2003, and then use
the estimated γs to construct the in-sample “estimated” health shocks θ02 and θ03. The γs
estimated using 2002 and 2003 data are applied to the 2004 data to construct a “predicted”
θ04 for the holdout sample. We then perform a Kolmogorov-Smirnov test for equality of
distributions between each in-sample “estimated” θ distribution versus the holdout sample
“predicted” θ distribution. We repeat this exercise holding out data from 2003, and then
again holding out data from 2002.
Table 5 lists the results for each of these tests. The bottom row lists results for the
test described above, where 2004 is the holdout year. The K-S statistics in the bottom row
demonstrate the “estimated” distributions, θ02 and θ03, cannot be statistically distinguished
from the predicted θ02 distribution. That is, the K-S test for equality of distributions is
not rejected. The test holding out all 2003 data also finds that the resulting estimated
and predicted distributions are not statistically distinct. However, the K-S test between the
predicted 2002 health shock distribution based on the 2003 and 2004 data rejects the equality
of distributions between the predicted year and the estimated years. It should be noted that
our sample size is very large – each year is over 3,000 observations – hence the statistical
precision of these tests is quite high. It may be possible that, although the predicted health
distribution for the holdout year 2002 is statistically different from the in-sample estimated
2003 and 2004 distributions, their economic significance may not be very different. Figure
5 displays the predicted 2002 vs. estimated 2003 and 2004 health shock distributions in
histograms. In spite of K-S statistical rejection when comparing the three histograms, the
general shape of all three distributions remains very similar. Hence, we conclude that our
assumption of a constant health status distribution seems plausible.
28
Figure 5: Predicted 2002 vs. Estimated 2003, 2004 Health Shock Distributions
29
Table 5: Predicted vs. Estimated Health Shock Distributions
H0: X1 and X2 are from the same continuous distribution.HA: X1 and X2 are from different distributions.
(X1) Predicted Year (X2) Estimated Year 1 Estimated Year 22002 0.0343 Reject 0.0404 Reject2003 0.0262 Do Not Reject 0.0236 Do Not Reject2004 0.0313 Do Not Reject 0.0275 Do Not Reject
In each year, left column is the K-S Statistic and right column is the result.
5.2 Utility Parameter Estimates
We estimate coefficients of relative risk aversion for aggregate consumption (γ1) and health
(γ2) respectively. Table 6 displays values of γ using increasingly fine grids over the parameter
space. The standard errors were computed using the method described in the Appendix.
Table 6: Estimated Risk Coefficients γ1, γ2
Grid Size γ1 γ2
20 1.98 3.27(0.76) (1.20)
40 1.88 3.12(0.86) (1.35)
50 1.93 3.23(0.86) (1.35)
Standard errors in parentheses.
Estimates from 200 bootstrap iterations.
The resulting risk coefficients are in the range of [1.88, 1.98] for γ1 and in the range of [3.12,
3.27] for γ2. Higher values in the γ1 range tend to be associated with correspondingly higher
values of γ2. This result implies that individuals are more risk averse with respect to health
status than to the aggregate consumption commodity. The aggregate consumption estimates
are within the range found in the literature on consumption (see Shea 1995, Gourinchas and
Parker 2002). Risk parameters between 1 and 2 have been found to approximate empirical
data in previous literature on risk in consumption versus investment (Prescott 1986). Our
estimated health risk coefficients are slightly higher than the risk coefficients for aggregate
30
consumption. This is consistent with the notion that people are more risk averse with respect
to their health as it often cannot be regained once lost.
5.3 Moral Hazard
Consumers’ health care consumption behavior, or “excess” consumption, has been cited as
a factor in rising health care costs. In the context of employer-sponsored insurance, the
consumer chooses her plan for a given year, and then chooses her health care consumption
throughout the year based on the plan contract. The concept of moral hazard we develop and
compute addresses the counterfactual difference in health expenditures due to the presence of
this insurance contract, where a consumer’s out-of-pocket expense is not matched to the full
cost of providing the care. This approach incorporates the insights of sequential contracting,
where an agent selects her best action within a previously chosen contract (See Courty and
Li 2000, Dai, Lewis and Lopomo 2006).13
This measure follows previous health economics work which defines moral hazard as
the change in a consumer’s choices that was induced by lower costs of health care through
insurance. Our measurement approach, however, focuses more on consumer behavior than
some of the previous empirical literature, such as Feldstein (1973) and Feldman and Dowd
(1991). The first advantage of our measure of moral hazard is that it allows for considerable
nonlinearities in the change in behaviors with respect to changes in reimbursement policy,
e.g., consumer responses to nonlinear changes in reimbursement policies due to deductibles.
The second is that the consumer’s choice is income neutral.
Our measure of moral hazard is based on a counterfactual which allocates the consumer
the same resources previously consumed in both health care and aggregate consumption, but
now allows the consumer to choose the allocation. A more precise mathematical description
is provided below but we first describe the counterfactual graphically in Figure 6. The figure
displays a consumer’s budget constraint and indifference curves in both the observed (1)
13Note that the conventional measure of moral hazard in the health economics literature differs from thatin contract theory, where this term is reserved for situations in which agent’s behavior cannot be directlyobserved by the agent, i.e., “hidden actions.”
31
and counterfactual scenarios (2). The horizontal axis is health care expenditures, m, and
the vertical axis is composite good consumption, c. The line BC1, shows the consumer’s
budget constraint observed in the data. The flatter slope of BC1 reflects the fact that
health expenditures are subsidized through insurance, so the same income buys more health
expenditure compared to the composite good. The intersection of BC1 and the indifference
curve IC1, shows the resulting optimal consumption bundle (m1, c1) of health expenditures
and composite good consumption observed in the data.
In the counterfactual scenario, we remove subsidized consumption of health care through
insurance. The relative prices of health care and the composite good are each set to one, so an
additional dollar of income buys the same additional amount of either good. The consumer is
given a lump sum income transfer equal to the amount observed in the data that insurance
paid for her health care. This ensures the original consumption bundle (m1, c1) remains
affordable to the consumer. This is shown in Figure 6 where BC2 runs through the original
consumption bundle (m1, c1). The slope of BC2 shows a one-to-one tradeoff between health
expenditures and composite good consumption.
We then examine how consumption of health expenditures and the composite good might
adjust without the price differential caused by insurance coverage. Although the consumer
can choose to consume the same bundle, in this scenario, she may also choose a different
allocation of her total budget between health care and the composite good that corresponds
to a higher indifference curve. The extent of moral hazard is captured by the subsequent
change in the optimal consumption bundle, the tangency of BC2 and IC2. In the figure, the
consumer reaches a higher indifference curve in the counterfactual allocation by consuming a
greater amount of composite good and reducing her health care expenditures at (m2, c2). The
measure of overconsumption of health care is the difference between the two consumption
bundles, m1 −m2.
To calculate the extent of overconsumption, we first calculate the insurer’s total health
care spending for each consumer in each year. Denote this observed amount as T = aj ·m1.
32
Composite good
Health expenditure
IC1
IC2
BC1 BC2
c2
c1
m2 m1
Figure 6: Moral Hazard Counterfactual
The counterfactual budget constraint is then:
c2 +m2 ≤ y − pj + T
where T represents the lump sum transfer to an individual consumer. The variables c2 and
m2 are the composite good consumption and health expenditure choices in the counterfactual
environment.
Next, we calculate the distribution of the health shocks, θ, in each year using the data
and the estimated utility parameters from the most detailed grid search of [γ1, γ2]. The
consumer’s utility maximization problem is then:
U(m2, pj, y, θ; γ) = (1− θ)(y − pj −m2 + T )1−γ1
1− γ1
+ θm1−γ2
2
1− γ2
(5.1)
The consumer’s maximization problem sets her marginal rate of substitution between
aggregate good consumption and health care expenditure equal to the ratio of prices. The
consumer no longer faces a reimbursement schedule aj, but receives a lump sum transfer of
T and pays for all care out-of-pocket. The ratio of the price of health care to the price of
the composite good is simply equal to one. In our estimation, we calculate the MRS:
MRS =θ
(1− θ)(y − pj + T −m2)γ1
mγ22
33
Since [θ, y, pj, T, γ1, γ2] are all known, we then solve for the value of m2 where MRS = 1.
The difference between the counterfactual m2 and the actual observed health expenditure
m1 is a measure of the amount that a consumer is overconsuming as a result of the change
in the price ratio. This is because m2 is the amount a consumer chooses if forced to pay the
full cost of care, given the same realized net income as when insured. In the counterfactual
scenario, consumers are no longer restricted to spend their reimbursement income, T , on
health expenditure. If the consumer chooses to spend part of the unrestricted income T on
consumption of the aggregate good, then this reveals how purchasing health care through
insurance influences her behavior. In the estimation results, we will refer to m2 as first-best
consumption.
We find that the magnitude of overconsumption in our data is substantial. For each
year, we calculate the differences for each consumer between the observed amount of health
expenditures and the first best level of expenditures, as described above. Figure 7 shows the
resulting distributions of overconsumption estimates for each of the years 2002-2004. The
horizontal axis is the difference between observed expenditure and first best expenditure. The
vertical axis is a count of the number of consumers. The general shape of the distribution
is similar for each of the three years. Each year shows approximately half of the consumers
have overconsumption between $0 and $500, after which the distribution drops rapidly with
a long right tail essentially finishing after $3,000 in overconsumption.
Table 7 displays summary statistics for the estimated overconsumption. The three right-
most columns display percentiles of the overconsumption for the three years. The 25th
percentile of overconsumption was approximately $190 in 2003 and 2004, and $177 in 2002.
The median level of overconsumption was near $400 in the three years, and the 75th per-
centile was over $700 in 2002, and over $900 in both 2003 and 2004.
Summing overconsumption over all consumers, the total yearly estimate of overconsump-
tion across plans is large. The sum of all consumers’ differences between observed and first
best is $2,125,480 in 2002. The sum is $2,504,380 in 2003 and is $2,270,059 in 2004. To
34
Figure 7: Estimated Overconsumption, 2002-2004
35
Table 7: Summary Statistics on Overconsumption
Year N Percentilep25 p50 p75
2002 3,433 $177 $392 $7842003 3,533 $193 $437 $9352004 3,100 $190 $446 $984
put individual overconsumption in perspective, Table 8 list summary statistics by year for
overconsumption as a percentage of a consumer’s original health care expenditure. On av-
erage, a consumer’s overconsumption was 45 percent of the consumer’s original health care
expenditure. The median overconsumption percentage is also over 40 percent of the original
choice of health expenditure, with a standard deviation of 10 percentage points or less in all
years.
Table 8: Overconsumption as Percentage of Original Health Care Expenditure
Year Mean Median s.d. N2002 45.14 42.09 8.83 3,4332003 46.22 42.14 8.85 3,5332004 46.22 42.49 10.04 3,100
To place our results on moral hazard in the context of previous literature, we must be
clear on our definition of moral hazard. We measure the consumer’s health expenditure
choice in response to the availability of health insurance once an illness has occurred. Our
approach also incorporates an aspect left out of many previous estimates by making the
original consumption bundle available to the consumer in our counterfactual. For this reason,
our counterfactual is able to determine the consumer’s choice when there is no price subsidy
for health care, but with a budget constraint that allows the consumer’s utility to remain
the same. Our results show that, given the same net observed income, consumers consume
over 40 percent less health care when they pay out-of-pocket for all health care.
Much of the literature on moral hazard takes advantage of natural experiments of a
36
change in the consumer’s out-of-pocket price to determine the change in the level of health
expenditures. These natural experiments feature a change in consumers’ out-of-pocket cost,
without an accompanying income adjustment. Scheffler (1984) used a pre-post approach
around a natural experiment introducing 40 percent cost sharing to consumers and found
a 38 percent reduction in health expenditures. This result appears to imply a higher level
of moral hazard than in our counterfactual, because Scheffler (1984)’s expenditures dropped
by nearly the same percentage in response to a much less severe change in price. However,
recall that this natural experiment does not control for income effects, which may explain
Scheffler’s more severe drop in expenditures.
Several important moral hazard estimates come from the RAND Health Insurance Exper-
iment (HIE). This was a large government-sponsored experiment that randomized consumers
into health insurance plans with different levels of coinsurance. HIE estimates are able to
isolate moral hazard effects because the randomization of participants into plans removes
confounding adverse selection concerns. Although the HIE did give lump sum payments
to cover worst-case scenarios from participating, the HIE estimates still do not attain the
income neutrality of our estimates. Manning et al. (1987) found that expenditures fell by 15
percent given an increase from 0 percent to 25 percent coinsurance. However, this estimate
does not take into account the deductibles in the HIE plans. Keeler and Rolph (1988) con-
ducted a subsequent study recognizing nonlinear reimbursement, and estimated that going
from no insurance to full coverage results in a 50 percent increase in expenditures. This
estimate is closer to our finding where dropping from substantial reimbursement to full out-
of-pocket expenditures reduced health expenditure by 40 percent. The income neutrality of
our counterfactual may explain why consumers in our sample reduce expenditures by slightly
less than that estimated by Keeler and Rolph. Our results suggest that failing to control for
income effects may overstate the level of moral hazard.
37
5.4 Adverse Selection
Adverse selection is a substantial concern in the health insurance literature. In this section,
we propose a distribution-free test for adverse selection. The presence of adverse selection
causes consumers to sort across different plans based on their latent health status (e.g.
Rothschild and Stiglitz 1976). In our framework, this would imply that the distribution of
the latent health status variable varies across health plans. To determine if the latent health
status distribution is different across the three plans, we first compute the yearly distribution
of latent health status using the estimated parameters. Figure 8 displays the distribution
of health status, θ, for each of the years 2002-2004. Health status is on the horizontal axis,
i.e., between 0 and 1. On the vertical axis is the number of consumers at each point in the
interval. The distribution appears to be bimodal – many consumers in each year had very
low values of θ, but there is a small clustering at the value of 0.8. This is a typical health
distribution for a large general population – many healthy individuals but a small number
of very sick individuals at the extreme tail of the distribution.
To place our estimated latent health status measure in the context of observable charac-
teristics, we examine the relationship between our estimated latent health status and various
consumer characteristics in our data. Table 9 reports the results of two regressions on log
of the estimated latent health status. Regression 1 includes age, an indicator variable equal
to 1 for female, log salary, and dummies for Plan 2 and Plan 3 with the omitted category
being the HP Plan. Regression 2 also includes the previous variables plus the health status
proxy variable constructed from diagnoses. The signs of the resulting coefficients generally
plausible. Age has a positive relationship with log health status in Regression 1. A ten year
increase in age is associated with a 0.14 percent increase in the magnitude of our estimated
latent health status. Individuals with higher salaries are associated with lower levels of θ,
or better latent health status, in both regressions. A negative relationship between income
and illness is well-documented in both the economics and medical literatures, such as Smith
(1999), Ettner (1996), and Deaton and Paxton (1998). Both Plan 2 and Plan 3 have higher
38
Figure 8: Health status Distribution, 2002-2004
39
predicted average values of the estimated health status θ. Plan 3 has a slightly higher pre-
dicted contribution to health status, at 0.571 more than the HP plan average, compared
with the 0.504 from Plan 2. Age is no longer significant when including in Regression 2 the
observed health status proxy – the variable provided in the data using weighted diagnoses
codes. Recall, the observed health status proxy is not used in estimating the model param-
eters. A one percent increase in the health status proxy variable is associated with a 1.23
percent increase in the value of our estimated latent health status. Beyond the positive re-
lationship between our estimated latent health status and the observed health status proxy,
these regressions demonstrate that our latent health status estimates are able to capture a
more general measure of health status than limited diagnosis codes.
Table 9: Estimated Health status Regression
Dependent variable: ln(health status, θ)Covariate Regression 1 Regression 2Age 0.014 0.002
(0.002) (0.002)Indicator =1 if female 1.120 0.854
(0.049) (0.040)ln(salary) -0.851 -0.721
(0.059) (0.048)Plan 2 indicator 0.504 -0.095
(0.080) (0.065)Plane 3 indicator 0.571 -0.064
(0.081) (0.066)ln(health status proxy) 1.237
(0.017)Constant 5.756 5.793
(0.604) (0.484)R-squared 0.087 0.412N 10,066 10,066
Standard errors in parentheses.
Table 10 estimates the relationship between various consumer characteristics and plan
type using a multinomial logit specification. It reports the marginal effects of the regressors.
The first logit specification includes only demographic characteristics, the others contain an
40
indicator for health: specification 2 includes our estimated value of θ, specification 3 includes
the data’s health proxy, and the fourth specification includes both. The marginal effects of
age, female, and salary are negative for the HP Plan. This means that the HP plan is more
likely to have younger, male enrollees, which is indicative of advantageous selection, as we
found above. The corresponding marginal effects for the PC2 and PC3 plans are positive,
more likely to have adverse selection on the observable characteristics in the data. In the
specifications including health indicators, the HP plan has negative marginal effects on all
health variables, where lower values of the health variable indicates healthier patients. The
PC2 and PC3 plans have positive marginal effects for the health indicators, supporting our
finding that these plans have patients with more severe health shocks.
To test for sorting among plans, we break down the estimated yearly latent health status
distribution by plan to examine whether individuals with a more severe latent health status
appear to self-select into more generous plans. Figure 9 shows the distribution of latent
health status over all years in each of the three plans. The horizontal axis is the 0 to 1 range
of the θ, and the vertical axis is the fraction of consumers in each plan corresponding to the θ
value. A simple visual analysis of the HMO plan, HP, versus the PPO plans, PC2 and PC3,
shows that the HMO plan has a much larger fraction of very healthy individuals – those
with θ near zero. Over 20 percent of HP consumers had a θ value less than 0.05, compared
to the PPO plans with approximately 15 percent of consumers in this range. Both PC2 and
PC3 show a larger clustering than the HMO plan around the health status value of 0.8.
In adverse selection models, consumers strategically sort across plans. The sickest con-
sumers will choose the plan with the most generous benefit structure that facilitates their
care. If adverse selection is present, we would expect the distribution of latent health sta-
tus severity to vary by plan, since higher θ consumers will choose a different type of plan
than lower θ consumers. If no adverse selection is present, we would expect the distribution
of latent health status to be the same across different plans. To formally test for adverse
selection among the plans, we perform a hypothesis test. The null hypothesis is that the
41
Table 10: Multinomial Plan Logit on Observables
Variable 1 2 3 4
HP Plan Age -0.004*** -0.004*** -0.003*** -0.004***(0.000) (0.000) (0.000) (0.000)
Indicator = 1 if female -0.035*** -0.055*** -0.018*** -0.057***(0.004) (0.008) (0.004) (0.008)
ln(salary) -0.032*** -0.090*** -0.026*** -0.074***(0.003) (0.010) (0.003) (0.010)
Estimated θ -0.177*** -0.078***(0.012) (0.015)
ln(health status proxy) -0.025*** -0.046***(0.001) (0.004)
PC2 Plan Age 0.002*** 0.002*** 0.001*** 0.002***(0.000) (0.000) (0.000) (0.000)
Indicator = 1 if female 0.025*** 0.043*** 0.017*** 0.044***(0.003) (0.006) (0.003) (0.006)
ln(salary) 0.017*** 0.028*** 0.014*** 0.021***(0.002) (0.007) (0.002) (0.007)
Estimated θ 0.081*** 0.029***(0.009) (0.011)
ln(health status proxy) 0.011*** 0.024***(0.001) (0.003)
PC3 Plan Age 0.002*** 0.002*** 0.002*** 0.002***(0.000) (0.000) (0.000) (0.000)
Indicator = 1 if female 0.009*** 0.012** 0.001 0.013**(0.002) (0.006) (0.002) (0.006)
ln(salary) 0.014*** 0.061*** 0.011*** 0.054***(0.002) (0.007) (0.002) (0.007)
Estimated θ 0.096*** 0.049***(0.009) (0.011)
ln(health status proxy) 0.013*** 0.022***(0.001) (0.003)
Marginal effects reported. Standard errors in parentheses.
***, ** indicates significant at the 1,5 percent level.
42
Figure 9: Health Status Distribution, by plan
43
estimated latent health distribution within one plan looks similar to the distribution of the
other plan. The alternative hypothesis then, is that the distributions were actually drawn
from different populations, so that one plan has a sicker population than the other. We use
K-S test statistics to compare the distributions between plans.
Comparing the HMO and PPO plans, Table 12 reports the K-S statistics testing the null
hypothesis that both plans’ latent health status distributions come from the same continuous
distribution. The three possible combinations of plan comparisons (i.e., HP and PC2, HP
and PC3, PC2 and PC3) are presented. The null hypothesis is rejected between the HMO
plan and a PPO plan in all three years. These results are significant at the 1 percent level.
The insignificance of the result between PC2 and PC3 in 2002 and 2004 may be expected
because the pricing structure of the two plans is the same, with the only difference being
the network providers. These results confirm the presence of adverse selection within our
sample, showing that the consumer population is indeed different across the three plans.
Table 11: K-S Test Statistics, Inequality
H0: X1 and X2 are from the same continuous distribution.HA: X1 and X2 are from different distributions.
Plan (X1 and X2) 2002 2003 2004HP and PC2 0.1948 Reject 0.1778 Reject 0.2347 RejectHP and PC3 0.1638 Reject 0.3036 Reject 0.1864 RejectPC2 and PC3 0.0459 Do Not Reject 0.1819 Reject 0.1050 Do Not Reject
In each year, left column is the K-S Statistic and right column is the result.
Table 12: K-S Test Statistics, Inequality
H0: X1 and X2 are from the same continuous distribution.HA: X1 and X2 are from different distributions.
Plan (X1 and X2) 2002 2003 2004HP and PC2 0.1948 Reject 0.1778 Reject 0.2347 RejectHP and PC3 0.1638 Reject 0.3036 Reject 0.1864 RejectPC2 and PC3 0.0459 Do Not Reject 0.1819 Reject 0.1050 Do Not Reject
In each year, left column is the K-S Statistic and right column is the result.
Given that Table 12 establishes that the distribution of estimated latent health status
44
is not the same across plans, we next examine which plans seem to contain the higher tail
of the distribution. The K-S test can be used to check if one plan’s cdf is greater than
another’s, implying that the plan with the greater cdf has an estimated latent health status
distribution which contains more consumers at the lower (healthier) end of the distribution.
Table 13 looks in greater detail at the latent health status distribution between plans.
The tradeoff for consumers between HMO plans and PPO plans is that PPO plans provide
greater flexibility in provider choice in exchange for more cost sharing by consumers. The
HMO restricts provider choice the most tightly, but has the lowest premium and cost sharing
structure of all the plan choices. Consumers with specific health concerns likely have the
greatest preference for flexibility in provider choice. In contrast, healthy consumers who do
not anticipate using much care will choose the lowest cost plan. Relatively healthy consumers
who do not expect to incur more than a yearly checkup may also be satisfied with the more
limited network of providers in the HMO plan. A more severe health status enters utility as
a larger θ. Thus, if a consumer expects to be relatively healthy, a low value of θ, she will
choose the cheapest option – the HMO plan. If a consumer expects to be in more severe
health, a larger θ, she may select into the PC3 plan for the most flexibility. Therefore, the
cdf of the plan with lower values of θ (better health status) should stochastically dominate
the cdf of a plan with greater values of θ (worse health status).
The first two rows of Table 13 report results of the K-S comparison test between the HMO
and PPO plans’ cdfs, which support the above hypotheses. When comparing the HMO plan,
HP, with either PPO plan, PC2 or PC3, the null hypothesis of equality of distributions is
rejected in favor HP, meaning that both PC2 and PC3 are stochastically dominated. This
implies that the portion of relatively healthy consumers is larger in the HP plan than in the
PC2 or PC3 plan.
Within the tiered PPO plans, PC2 and PC3 differ only in the types of facilities that
consumers may visit. PC3 providers include the PC2 providers plus additional providers.
Once again, consumers with more complicated health conditions, a larger value of θ, may
45
select into the PC3 plan to take advantage of the additional available providers. In 2003,
the hypothesis of equality of distributions between PC2 and PC3 is rejected in favor of PC2
having a larger cdf, meaning the distribution of consumers is healthier in the PC2 plan.
However, this hypothesis cannot be rejected for the other two years, 2002 and 2004. This
inability to reject may be due to the fact that, despite the difference in provider choice, these
two plans do have the same cost structure.
Table 13: K-S Test Statistics, Larger
H0: X1 and X2 are from the same continuous distribution.HA: X1’s cdf is greater than X2’s cdf.
Plan (X1 and X2) 2002 2003 2004HP and PC2 0.1948 Reject 0.1778 Reject 0.2347 RejectHP and PC3 0.1638 Reject 0.3036 Reject 0.1864 RejectPC2 and PC3 0.0097 Do Not Reject 0.1819 Reject 0.0204 Do Not Reject
In each year, left column is the K-S Statistic and right column is the result.
Finally, to check the validity of our tests for adverse selection, we check the same hy-
potheses of equality in distributions, but within a plan across years instead of across plans.
If distributions within a plan are equal across years, this supports the persistence of adverse
selection. Additionally, equality of distributions within a plan across years means the plan’s
population is relatively stable, and that the differences between plans in the previous tests
can be attributed to sorting. The results are displayed in Table 14. In general, the null hy-
pothesis cannot be rejected that a plan’s latent health status distribution is different across
years. The two exceptions are the HP plan between 2003-2004 and the PC3 plan between
2002-2003. These results support the assertion that the differences across plans in Table
13 are due to fundamental distribution differences that persist across years. Overall, the
K-S tests show that the underlying distributions of health status are different between the
different types of plans offered to enrollees, and that these differences persist over several
years of testing.
Our results demonstrate adverse selection by testing for differences between three employer-
46
Table 14: K-S Test Statistics, Within plans
H0: X1 and X2 are from the same continuous distribution.HA: X1 and X2 are from different distributions.
(X1 and X2) HP PC2 PC32002 and 2003 0.0212 Do Not Reject 0.0645 Do Not Reject 0.1795 Reject2003 and 2004 0.0281 Do Not Reject 0.1098 Do Not Reject 0.1495 Reject2002 and 2004 0.0332 Do Not Reject 0.0984 Do Not Reject 0.1035 Do Not Reject
In each year, left column is the K-S Statistic and right column is the result.
sponsored plans’ latent health status distributions. We find that the HMO attracts healthier
consumers compared with two PPO options, and within the PPO options, healthier con-
sumers sort into the more provider-restrictive tiers. Although the empirical literature on
adverse selection is mixed, these results fall solidly in support of the existence of adverse
selection in health insurance.
Our results on HMO sorting are similar to results of Juba, Lave, and Shaddy (1980),
who find, in logit estimates, that lower self-reported health status decreases the chance of
selecting HMO enrollment. When comparing cost and utilization rates, Jackson-Breeck and
Keinman (1983) find that HMO enrollees averaged 53 percent fewer inpatient days before
joining the HMO than those who stayed in a fee-for-service plan. Cutler and Reber (1998)
also find plans with low costs but restrictive provider choice attract healthier consumers. Our
results build on these earlier papers because this paper’s approach uses a structural model,
which enables the counterfactual experiment in moral hazard to complement the findings on
adverse selection.
The sorting we find within the PPO plans, between cost tiers, is similar to Marquis
and Phelps (1987). Using probit estimation for the purchase of supplementary insurance,
Marquis and Phelps found that families in the highest quartile of illness had a 42 percent
greater probability of purchasing supplementary insurance compared with those in the lowest
quartile. Our results show that the consumers in the highest quartiles of illness sort into
the PPO plan with the greatest provider coverage, essentially supplementing insurance. In
47
a similar finding across years, Ellis (1985) finds that age and a worse health status in the
previous year are associated with purchasing more generous coverage in the next year.
Within structural approaches to measuring adverse selection, Cardon and Hendel (2001)
do not find evidence of informational asymmetries. After accounting for observables, the
authors find no link between health care consumption and insurance choices that can be
attributed to unobservable characteristics. An advantage of our structural approach is that it
matches insurance structures more closely. Although Cardon and Hendel do explicitly model
deductibles, our semiparametric approach is more flexible given the complicated nature of
insurance coverage that includes not only deductibles but also coinsurance and copays. In
addition, our approach captures the reality that consumers often have only a general idea
of reimbursement probabilities, so insurance plan’s structures should not enter the utility
function deterministically.
6 Conclusions
To conclude, we present a new approach to measuring moral hazard and adverse selection.
This is based on a model of demand for health care and health expenditures in the presence
of unobserved heterogeneity in the health status of individuals. We make three key contri-
butions to the existing literature. First, we disentangle moral hazard and adverse selection
through estimation of the latent health status distribution using detailed claims level data.
We compute a measure of moral hazard by performing a counterfactual to isolate the effect
of insurance on health expenditure. Adverse selection is examined by testing for differences
in latent health status distributions across plans. The second contribution is based on our
semiparametric estimation method. We estimate insurance reimbursement schedules non-
parametrically which has two advantages: (i) it allows for complex insurance plans including
copays, deductibles, and other nonlinear features, and (ii) nonparametric estimation of a
conditional probability of reimbursement relaxes a common assumption in the previous lit-
erature by incorporating the reality that plans are complex and difficult for consumers to
48
predict in every state of the world. Finally, the third key contribution is that we present a
measure of the magnitude of both moral hazard and adverse selection in a self-insured em-
ployer insurance pool. Our estimates indicate that moral hazard may account for as much
as 40 percent of current health spending in these plans, and that consumers with a relatively
less complex health status do sort into the lowest cost sharing plan.
Although, our proposed semiparametric method provides a more flexible and robust al-
ternative for analyzing adverse selection and moral hazard there are caveats. We assume
that the utility function is separable in the aggregate consumption commodity and medi-
cal care. Although, this captures risk aversion in health status, it rules out more flexible
interactions between aggregate consumption and health status (see e.g., Viscusi and Evans
1990). However, Spence and Zeckhauser (1971) and Blomqvist (1997) use a similar spec-
ification, and Campo, Guerre, Perrigne and Vuong (2011) also require similar restrictions
on utility in an auctions context. In spite of these limitations, our research is novel in that
it develops a tractable estimation procedure under parsimonious parametric assumptions to
simultaneously examine adverse selection and moral hazard. Our research is also important
as it provides a framework for similar analysis in other contexts where distortions exist due
to asymmetric information, especially when the data is limited to being cross sectional.
References
[1] Abbring, J. H., P. A. Chiappori, J. H. Heckman and J. Pinquet. Adverse Selection and
Moral Hazard in Insurance: Can Dynamic Data Help to Distinguish? Journal of the
European Economic Association. 2003, 1(Papers and Proceedings): 512-521.
[2] Ackerberg, D., X. Chen, and J. Hahn. A Practical Asymptotic Variance Estimator for
Two-Step Semiparametric Estimators. Working Paper. UCLA. 2009.
[3] Akerlof, George. The Market for “Lemons”: Quality Uncertainty and the Market Mech-
anism. The Quarterly Journal of Economics. 1970, 84(3): 488-500.
49
[4] Arrow, K.J. Uncertainty and the Welfare Economics of Medical Care. American Eco-
nomic Review. 1963, 53:941-973.
[5] Blau, D. and Donna B. Gilleskie. The Role of Retiree Health Insurance in the Em-
ployment Behavior of Older Males. International Economic Review. 2008, 49(2), p.
475-514.
[6] Blomqvist, A.G. Optimal Nonlinear Health Insurance. Journal of Health Economics.
1997, 16(3):303-321.
[7] Bowman, A.W. and Azzalini, A. Applied Smoothing Techniques for Data Analysis:
the Kernel Approach with S-Plus Illustrations. Oxford University Press, Oxford. 1997.
[8] Bundorf, K., Levin, J., and Mahoney, N. Pricing and Welfare in Health Plan Choice.
Working Paper. Stanford University. 2010.
[9] Cameron. A. C., P. K. Trivedi, Frank Milne, and J. Piggott. A Microeconometric
Model of the Demand for Health Care and Health Insurance in Australia. Review of
Economic Studies. 1988, 55(1):85-106.
[10] Campo, Sandra M., E. Guerre, I. Perrigne, and Q. Vuong. Semiparametric Estimation
of First-Price Auctions with Risk Averse Bidders. Review of Economic Studies. 2011,
volume 78(1): 112–147.
[11] Cardon, James H. and Igal Hendel. Asymmetric Information in Health Insurance:
Evidence from the National Medical Expenditure Survey. RAND Journal of Economics.
2001, 32(3):408-27.
[12] Carlin, C., and Town, R. Adverse Selection, Welfare and Optimal Pricing of Employer
Sponsored Health Plans. Working Paper. University of Minnesota. 2009.
[13] Chiappori, Pierre-Andre and Bernard Salanie. Testing Contract Theory: a Survey
of Some Recent Work, in Advances in Economics and Econometrics - Theory and
50
Applications, Eighth World Congress, M. Dewatripont, L. Hansen and P. Turnovsky,
ed., Econometric Society Monographs, Cambridge University Press, Cambridge, 2003:
115-149.
[14] Courty, P., and Hao Li. Sequential Screening. Review of Economic Studies. 2000,
67:697-717.
[15] Cutler, D., and Reber, S. Paying for Health Insurance: The Trade-off Between Com-
petition and Adverse Selection. The Quarterly Journal of Economics. 1998, 113(2):
433-466.
[16] Cutler, David M. and Richard J. Zeckhauser. The Anatomy of Health Insurance. Cu-
lyer, Anthony J. and Joseph P. Newhouse (eds) Handbook of Health Economics Vol.
1A. Amsterdam , New York : Elsevier, 2000: 563-643.
[17] Dai, Chifeng, Tracy Lewis, and Giuseppe Lopomo. Delegating Management to Experts.
RAND Journal of Economics. 2006, 27(3): 403-420.
[18] Deaton, Angus and Christina Paxton. Aging and Inequality in Health and Income.
American Economic Review. 1998: 88(2):248-253
[19] Einav, L., Finkelstein, A., and Cullen, M. Using Price Variation to Estimate the Wel-
fare Effects of Adverse Selection. The Quarterly Journal of Economics. 2010, 125(3):
877-921.
[20] Einav, L., Finkelstein, A., and Levin, J. Beyond Testing: Empirical Models of Insur-
ance Markets. Annual Review of Economics. 2010, 2:311-36.
[21] Einav, L., Jenkins, M., and Levin, J. Contract Pricing in Consumer Credit Markets.
Working Paper. Stanford University. 2010.
[22] Ellis, Randall P., “The effect of prior-year health expenditures on health coverage pian
51
choice,” Advances in Health Economics and Health Services Research, Vol. 6, Richard
M. Scheffler and Louis F. Rossiter, eds. (Greenwich, CT: JAI Press, 1985), pp 127-47.
[23] Ettner, Susan L. New Evidence on the Relationship Between Income and Health.
Journal of Health Economics. 1996, 15:67-85.
[24] Fang, H., and Gavazza, A. Dynamic Inefficiencies in Employment-Based Health Insur-
ance System: Theory and Evidence. NBER Working Paper 13371. 2010
[25] Feldman, Roger and Bryan Dowd. A New Estimate of the Welfare Loss of Excess
Health Insurance. The American Economic Review, 1991, 81(1): 297-301.
[26] Feldstein, Martin. The Welfare Loss of Excess Health Insurance. Journal of Polical
Economy. 1973, 81(2)251-280.
[27] Gourinchas, P. O. and J. Parker. Consumption Over the Life Cycle. Econometrica.
2002. 70(1): 47-89.
[28] Guerre, E., I. Perrigne, and Q. Vuong. Optimal Nonparametric Estimation of First-
Price Auctions. Econometrica. 2000, 68(3):525-574.
[29] Handel, B. ‘Adverse Selection and Switching Costs in Health Insurance Markets: When
Nudging Hurts. Working Paper. Northwestern University. 2010.
[30] Jackson-Beeck, Marilyn and John H. Kleinman, “Evidence for self-selection among
health maintenance organization enrollees,” Journal of the American Medical Associ-
ation, Vol. 250, No. 20 (November 25. 1983), pp. 2826-29.
[31] Juba, David. A, Judith R. Lave and Jonathan Shaddy, “An analysis of the choice of
health benefits plans,” Inquiry, Vol. 17 (Spring 1980), pp. 62-71.
[32] Keeler, E., J. Newhouse and C. Phelps. Deductibles and the Demand for Medical
Care Services: The Theory of a Consumer Facing a Variable Price Schedule under
Uncertainty. Econometrica, 1977, 45: 641-655.
52
[33] Keeler, E., and J. Rolph. The Demand for Episodes of Treatment in the Health Insur-
ance Experiment. Journal of Health Economics, 1988, 7:337-367.
[34] Khwaja, A. Health Insurance Habits and Health Outcomes: A Dynamic Stochastic
Model of Investment in Health. Ph.D Dissertation, University of Minnesota. 2001.
[35] Khwaja, A. “Estimating Willingness to Pay for Medicare Using a Dynamic Life-Cycle
Model of Demand for Health Insurance,” Journal of Econometrics, May, 2010, Vol.
156.
[36] Kowalski, Amanda. “Censored Quantile Instrumental Variable Estimates of the Price
Elasticity of Expenditure on Medical Care” NBER Working Paper 15085. June 2010.
[37] Marquis, M. Susan, Adverse selection with a multiple choice among health insurance
plans: A simulation analysis, Journal of Health Economics, Vol. 11, No. 2 (August
1992), pp. 129-51.
[38] Manning, W.G., J.P. Newhouse, N.Duan et al. Health Insurance and the Demand for
Medical Care: Evidence from a Randomized Experiment. American Economic Review.
1987, 77(3):251-277.
[39] Marquis, M. Susan and Charles E. Phelps,“Price elasticity and adverse selection in the
demand for supplemental health insurance.” Economic Inquiry, Vol. 25, No. 2 (April
1987), pp. 299-313.
[40] Marsh, Christina. “Estimating Health Expenditure Elasticities Using Nonlinear Reim-
bursment.” Working paper. June 2010.
[41] Newey, W. K. The Asymptotic Variance of Semiparametric Estimators. Econometrica.
1994, 62(6):1349-1382.
[42] Pauly, M. The Economics of Moral Hazard: Comment. American Economic Review.
1968, 58:531-536.
53
[43] Prescott, E., Theory ahead of business cycle measurement, Federal Reserve Bank of
Minneapolis Quarterly Review, 10, Fall, 9-22. 1986
[44] Rothschild, M. and J.E. Stiglitz. Equilibrium in Competitive Insurance Markets: An
Essay on the Economics of Imperfect Information. Quarterly Journal of Economics.
1976, 90(4):630-649.
[45] Scheffler, R.M. “The United Mine Workers’ health plan: an analysis of the cost-sharing
program,” Medical Care 22(3):247-254. 1984.
[46] Shea, J. Union Contracts and the Life-Cycle/Permanent-Income Hypothesis. American
Economic Review. 1995, 85(1): 186-200.
[47] Smith, James P. Healthy Bodies and Thick Wallets: The dual relation between health
and economic status. Journal of Economic Perspectives. 1999, 13(2):145-166.
[48] Spence, M. and R. Zeckhauser. Insurance, Information, and Individual Action. Amer-
ican Economic Review. 1971, 61(2):380-387.
[49] Viscusi, K. and W. Evans. Utility Functions that Depend on Health Status: Estimates
and Economic Implications. American Economic Review. 1990, 80: 353-374.
[50] Vera-Hernandez, Marcos. Estimation of a Principal-Agent Model: Moral Hazard in
Medical Insurance. The Rand Journal of Economics. 2003, 34(4):670-693.
54
A Appendix: Asymptotics - Not for Publication
The difficulty of computing standard errors and obtaining a valid statistical inference pro-
cedure in semiparametric models is well known. The estimator proposed in this paper is a
semiparametric GMM estimator that depends on first stage nonparametric estimation of the
conditional distribution of the rate of reimbursement given the amount of medical expendi-
ture. Statistical inference presents a major challenge just as in other similar semiparametric
models which require nonparametric estimation of conditional mean or distribution functions
in the first stage.
There are several conventional approaches to compute the correct standard errors to take
into account the statistical uncertainty introduced by nonparametric estimation in the first
stage. The first one is to derive the asymptotic distribution of the estimator analytically and
replace the asymptotic variance with a consistent estimate based on the sample data. This
is in principle possible by following the pathwise derivative calculation in Newey (1994). A
second approach is resampling. Either bootstrap or subsampling will provide a valid infer-
ence procedure for this particular estimator. A third approach is to make use of an insight
by Newey (1994) that the asymptotic variance of the second stage estimator does not de-
pend on how the first stage nonparametric estimation method is implemented. While the
first stage estimator is currently implemented using kernel density smoothers, the second
stage estimator will have approximately the same asymptotic variance even if the first stage
is estimated using a sieve parametric approximation instead of the kernel smoother. If the
implementation of the first stage estimator can be modified to be a sieve parametric approx-
imation method then the (overidentifying) moment conditions that are used in obtaining
the estimator can potentially be modified to a set of exactly identifying moment conditions
for both the first stage sieve parameters and the second stage structural parameters of the
model. According to Newey (1994), if this is possible, then the approximate variance of the
second stage estimator can be read off from the lower diagonal of the variance-covariance
matrix of the entire generalized method of moment estimator that includes both the first
55
stage and second stage estimators. Computing the overall variance-covariance matrix is
straightforward using the conventional sandwich formula for GMM estimators.
Unfortunately each of these approaches has its own disadvantages. The pathwise deriva-
tion calculation in Newey (1994) is often tedious and prone to errors in analytic computation.
The resulting asymptotic variance estimate can also be complex and is sensitive to errors
introduced through numerical calculations and computations. Resampling methods require
recomputing the estimators repeatedly over many bootstrap iterations. Given the nonlinear
nature of the method of moment estimators, this might not be computationally feasible. Re-
placing the first stage kernel smoother with a sieve parametric approach also seems at odds
with the current implementation of the semiparametric estimator and might also lead to dif-
ferent point estimates for the second stage structural parameters. In addition, implementing
a first stage sieve parametric approach appears to be more difficult than implementing the
kernel smoother.
The intuition underlying the computation of the standard errors is as follows. In com-
puting the standard errors for a two step estimator the first stage is treated in a parametric
fashion (somewhat like in sieve estimation). Then the standard error for the second stage
can be read off from the lower diagonal components of the entire variance-covariance matrix
of both the first and second stage parameters. The paper by Ackerberg, Chen, and Hahn
(2009) formally proves the validity of this approach.
In the following we formally outline the pathwise derivative argument of Newey (1994)
which justifies the asymptotic normality of the semiparametric two step estimator. First
note that the proposed estimator in this paper takes a general form for most two step
semiparametric estimators:
γ = arg minγ∈Γ
gn(γ)′Wngn(γ),
where
gn(γ) =1
n
n∑i=1
g(zi, fa,m(·, ·), γ
).
56
In the above expression, fa,m(·, ·) denotes the first stage kernel smoothing nonparametric
estimate of the joint density function of a and m. The zi denotes the collection of all the
variables for observation i. The functional notation is used to emphasize that the second
stage moment condition depends on the entire joint density because θ is recovered through
an equation that depends on integrating against the conditional distribution of a given m,
and on the derivative of the conditional density fa|m(·|·).
It can be shown through standard Taylor expansion arguments that under suitable reg-
ularity conditions (which include the condition that the first stage nonparametric regression
should make use of an undersmoothing sequence of bandwidth parameters):
√n(γ − γ) = G−1W
1√n
n∑i=1
g(zi, fa,m(·, ·), γ0
)+ op(1) ,
where op(1) denotes a term that converges to zero in probability. The W is the probability
limit of Wn, and
G =∂
∂γg (zi, fa,m(·, ·), γ0) .
When the nonparametric component fa,m(·, ·) is known, asymptotic normality of γ follows
immediately from a central limit theorem applied to the normalized sum of the moment
conditions evaluated at γ0:
1√n
n∑i=1
g (zi, fa,m(·, ·), γ0)d−→ N(0,Ω) where Ω = V ar(g (zi, fa,m(·, ·), γ0)).
However, when fa,m(·, ·) has to be estimated in the first stage its impact on the second
stage asymptotic variance needs to be taken into account. Specifically, it can be shown under
additional regularity conditions that
1√n
∑ni=1 g
(zi, fa,m(·, ·), γ0
)− 1√
n
∑ni=1 g (zi, fa,m(·, ·), γ0)
=√nE[g(zi, fa,m(·, ·), γ0
)− g (zi, fa,m(·, ·), γ0)
]+ op(1).
The pathwise derivation calculation in Newey (1994) seeks a linear influence function
57
ψ(zi) such that
√nE[g(zi, fa,m(·, ·), γ0
)− g (zi, fa,m(·, ·), γ0)
]=
1√n
n∑i=1
ψ(zi) + op(1).
Conditional on knowledge of this influence function, it is easy to see that
1√n
n∑i=1
g(zi, fa,m(·, ·), γ0
)d−→ N(0, V ar(g
(zi, fa,m(·, ·), γ0
)+ ψ(zi))).
Then we can conclude that
√n(γ − γ)
d−→ N(0, (G′WG)−1G′WΩWG(G′WG)−1)).
58