Moral Hazard, Adverse Selection and Health Expenditures: A

Moral Hazard, Adverse Selection and HealthExpenditures: A Semiparametric Analysis∗

Patrick BajariDepartment of Economics

University of Minnesota and NBER

Han HongDepartment of Economics

Stanford University

Ahmed KhwajaSchool of Management

Yale University

Christina MarshDepartment of Economics

University of Georgia

Abstract

Theoretical models predict asymmetric information in health insurance marketsmay generate inefficient outcomes due to adverse selection and moral hazard. How-ever, previous empirical research has found it difficult to disentangle adverse selectionfrom moral hazard in health care. We empirically study this question using a uniqueclaims-level dataset with confidential information from a large self-insured employerto estimate a structural model of the demand for health care. We propose a two-step semiparametric estimation strategy to identify and estimate a canonical model ofasymmetric information in health care markets. We find significant evidence of moralhazard and adverse selection.

JEL CLASSIFICATION: C14, D82, I11KEYWORDS: Adverse Selection, moral hazard, health insurance, semiparametric es-timation

∗We have benefited from the comments of participants of the Conference on Structural Models in Labor,Aging and Health, and the Seventeenth Annual Health Economics Conference. We acknowledge excellentresearch assistance from Ivan Shaliastovich, and especially Alvin Murphy. Bajari and Hong would like tothank the National Science Foundation for generous research support. The authors may be contacted at thefollowing email addresses respectively: [email protected], [email protected], [email protected],[email protected].

1

1 Introduction

A large theoretical literature predicts that adverse selection and moral hazard may gener-

ate inefficient outcomes in insurance markets due to asymmetric information (Arrow 1963,

Akerlof 1970, Spence and Zeckhauser 1971). The predictions of insurance theory and the

efficient regulation of insurance markets can depend on whether adverse selection or moral

hazard is more important. However, it is well recognized that it is empirically difficult to

distinguish between moral hazard and adverse selection and empirical research that disen-

tangles moral hazard from adverse selection is almost non-existent.1As a consequence there

is little consensus on which of these two sources of inefficiency is more important to address

for public policy. In this paper, we introduce an analytical approach that can separately

identify the two effects.

Our research builds on a canonical model of demand for health care. We begin by formu-

lating a model of consumers’ health care choices given insurance plan offerings. The model is

motivated by theoretical models proposed by Spence and Zeckhauser (1971) and Blomqvist

(1997). A consumer has preferences over both health care expenditure and aggregate con-

sumption, which are influenced by the consumer’s latent health status. A consumer’s optimal

choice of health care expenditure depends on his latent health status and his budget con-

straint. The budget constraint accounts for the characteristics of the chosen health insurance

plan and a plan-specific probability distribution of reimbursement for health expenditure.

Consumers have private information about their latent health status which is unobserved by

the insurer, leading to adverse selection. The consumers do not pay the full costs of their

health care coverage, which induces a moral hazard problem. We estimate risk parameters

semiparametrically and use these parameters to recover the latent health status distribution.

Given the estimated health status distribution, utility parameters, and observed consump-

tion choices, we are able to infer the level of moral hazard with a counterfactual replicating

1A notable exception is the work of Einav, Jenkins and Levin (2010), who estimate a model that incor-porates both these phenomena in the context of automobile loan markets.

2

a social planner’s optimal consumption allocation. Lastly, we use the estimated plan specific

latent health distributions to test for adverse selection across plans.

Our empirical research is made possible by access to a unique confidential claims-level

dataset from a large self-insured employer. The data contains a high level of detail and

includes all information that was necessary to process a claim. Since this data was used

directly for insurance processing by the employer, the data also has a high level of com-

pleteness and accuracy. Each consumer is matched to corresponding health expenditures,

reimbursements, salary, age, and health status indicators.

Our work contributes to the growing body of empirical work on the structural estimation

of models of medical utilization and health insurance choice (Cameron et al. 1988, Cardon

and Hendel 2001, Vera-Hernandez 2003, Blau and Gilleskie 2008, Khwaja 2001, 2010). We

differ from the existing earlier literature in three ways. First, we allow for both adverse selec-

tion and moral hazard due to asymmetric information in estimating the model. Second, our

estimation strategy is semiparametric, in contrast to earlier work relying on parsimoniously

specified parametric models. This is important because theory provides little guidance about

which parametric distributions for latent health shocks are a priori most plausible. Semi-

parametric estimation also is a major contribution because it allows us to specify plans’

reimbursement schedules flexibly, allowing for non-linearities in common characteristics such

as deductibles and copays, which could not be captured by more restrictive specifications in

previous literature. Third and most notably, our method only necessitates a single, relatively

weak identification assumption. Previous structural approaches rely on strong identifying

assumptions. In this paper, the sole identifying assumption is that the distribution of health

shocks is invariant over a short time span, i.e., the quantiles of the underlying health status

distribution are invariant across the three years of our data. This assumption is reasonable

given a large enough population of study. In particular, our data is from the largest em-

ployer in the state of Minnesota, which has retained the top position for many years with

a stable population of employees over our period of study. Additionally, we use consecutive

3

years in our estimation, which further supports the validity of this identifying assumption by

limiting the possibility of long-term changes in the local population or restructuring of the

employee base. Hence, our research extends the nascent empirical literature on examining

the economic effects of adverse selection in health insurance (see e.g., Carlin and Town 2009,

Bundorf, Levin and Mahoney 2010, Einav, Finkelstein and Cullen 2010, Handel 2010).

More generally, our paper contributes to the literature on analyzing distortions due to

asymmetric information in insurance markets. A common method in previous research to

detect asymmetric information is to examine the correlation between risk outcomes and a

measure of the generosity of a contract. In the context of health insurance, Cutler and Zeck-

hauser (2000) review an extensive literature that finds evidence of adverse selection based

on the positive correlation between generosity of the insurance contract and adverse out-

comes, and moral hazard based on the coinsurance elasticity of the demand for medical care.

However, Chiappori and Salanie (2003) show that, under moral hazard, the generosity of

the contract will lead to adverse risk outcomes, while under adverse selection the causality

is reversed. This leads to observational equivalence between the two hypotheses. Einav,

Finkelstein, and Levin (2010) and Abbring et al(2003) show that disentangling the two ef-

fects requires a modeling framework in the absence of experimental data.2 Moreover, the

nature of health insurance markets necessitates a more structured approach than reduced

form methods. This is because the price for health care services, in the form of deductibles

or copays, is often based on the amount of health care consumed. Thus, any conventional

regression analysis between health care quantities and incentives will be susceptible to endo-

geneity bias. Our framework, in addition to documenting the presence of moral hazard and

adverse selection, permits policy analysis using counterfactual simulations. Furthermore, it

allows estimation of the level of distortions of insurance markets and the welfare impact of

market interventions.

2Abbring et al. (2003) suggest exploiting the dynamic consequences of experience rating to distinguishbetween adverse selection and moral hazard. U.S. health insurance market regulation precludes this empiricalstrategy.

4

Our results indicate both significant moral hazard and adverse selection. Our estimates

of risk aversion on health expenditures are slightly higher than for composite good consump-

tion, which is consistent with the irrecoverable nature of health. Second, we find substantial

evidence of moral hazard or “overconsumption” of medical care. In our plans, overconsump-

tion due to insurance may be as high as 40 percent of health expenditure within our plans.

Third, we devise a nonparametric test for adverse selection, and find that individuals are

indeed sorting across different insurance categories based on their latent health status. We

find that plans with low out-of-pocket costs but restrictive provider choice attract a larger

proportion of healthy consumers vis a vis more expensive but flexible plans.

Our research is novel in that it develops a tractable estimation procedure under minimal

parametric assumptions to simultaneously examine adverse selection and moral hazard in

health insurance contracts. It also provides an important framework for similar analysis in

other contexts, especially with cross-section data, where distortions exist due to asymmetric

information. The rest of the paper is organized as follows. The model is discussed in Section 2

and the data in Section 3. Details of the estimation procedure are provided in Section 4.

Estimation results including analysis of moral hazard and adverse selection are discussed in

Section 5. Section 6 concludes.

2 Model

Our focus is on disentangling adverse selection and moral hazard in health insurance mar-

kets. This is difficult because the market for health insurance and health care is rich with

institutional detail and complex. Hence, in order to encapsulate the essential features of

asymmetric information in health insurance markets , we develop a canonical model of de-

mand for health care that builds on the work of Spence and Zeckhauser (1971), Bloomqvist

(1997) and Cardon and Hendel (2001). The objective of consumers acting directly, or in

concert with physicians who are their agents, is to maximize consumer utility. This is done

by choosing the optimal amount of health services m and consumption of a composite com-

5

modity c subject to a budget constraint. This budget constraint requires that out-of-pocket

expenses for m plus the value of c must be less than or equal to income y minus the insurance

premium p.

Our model differs from the standard neoclassical model of consumer choice in two impor-

tant respects. First, the optimal choice of m and c depends on a consumer’s latent health

status, which we denote as the scalar θ. We interpret θ as a preference shock for health

services, that is, the higher the value of θ, the higher the utility from health services. While

θ is observed to the consumer, it is not observed by the insurer. As a result, our model allows

for adverse selection, which plays a central role in the economic analysis of health insurance

and insurance markets. Second, the budget constraint in our model is more complicated

than the standard neoclassical model. Health insurance plans introduce nonlinearities in the

budget constraint through features such as deductibles and coverage gaps (Keeler, Newhouse

and Phelps 1977). Also, the consumer’s out-of-pocket expenses are uncertain at the time

that the health care choice m is made. We allow this out-of-pocket expense to be stochastic,

generalizing previous research which typically assumes that the consumer’s costs of health

care are known at the time m is chosen. Given this budget constraint, moral hazard arises

in the model when the consumers do not face the full cost of health services.

An important reason for adopting this framework to disentangle moral hazard and ad-

verse selection in health insurance is that there is strong evidence that consumers’ health

care choices do respond to health insurance incentive structures (see Manning et al. 1987,

Kowalski 2010, Marsh 2010). Conversely, insurance plans are designed with enrollee char-

acteristics, utilization and expenditures in mind. Hence, in the absence of the framework

described above, it would be extremely difficult to disentangle adverse selection from moral

hazard because consumer choices and plan characteristics are determined jointly in response

to each other. We describe our model in more detail below.

6

2.1 Consumer Preferences

The consumer’s utility function is specified as:

U(c,m; θ, γ) = (1− θ) c1−γ1

1− γ1

+ θm1−γ2

1− γ2

The consumer’s utility depends on the consumption of m and c. We assume consumption

to be additively separable in these two terms. The latent health status parameter θ lies

between [0,1] and indexes the weight that the consumer places on consumption of health

services m and non-health aggregate commodity c. If θ is close to one (zero), the consumer

has a greater valuation for m (c).

Consumers in our environment are heterogenous because θ varies across individuals. Let

g(θ) and G(θ) denote the density and cdf of health status across individuals in our data

set. In principle, we could let g depend on observed covariates considered to be associated

with health status such as age, education level or income. However, in the absence of a

theoretical foundation for the choice of such covariates, any ad hoc specification involves the

risk of misspecification. As we shall discuss in the estimation section, we do not need to

specify this part of the model. In our results section, we examine the relationship between θ

and such covariates, which has the additional advantage that it helps evaluate the plausibility

of the estimated distribution of θ.

Another feature of our model is that we allow for multidimensional risk aversion. We

assume constant relative risk aversion in c and m. The parameters γ1 and γ2 describe the

consumer’s attitude towards risk with respect to aggregate consumption and health respec-

tively. This is important in our framework because the out-of-pocket expenses associated

with m are stochastic.

2.2 Budget Constraint

The budget constraint specifies that the consumer’s total expenditure on the composite

commodity plus health must be less than her income after deducting medical premiums.

7

The price to the consumer of m depends on two things. The first is pj, the fixed premium of

the insurance plan j. There are three plans in our data, a Health Maintenance Organization

(HMO) and two Preferred Provider Organizations (PPO).

We do not model the choice of j. We could imbed our framework in a richer model that

includes the choice of a health plan in an earlier stage.3 However, the optimal choice of m

will be sufficient to identify our model parameters so we abstract from this complication.

The choice of j does not influence the consistency of our estimates. In our results section,

we will use our estimates of the model to ask how consumers with different values of θ sort

across plans and if their observed sorting is consistent with adverse selection. The second

part of our model that determines the consumer’s cost is the reimbursement rate of insurance

plan j, which we label as the scalar aj. If a consumer chooses health services that lead to

medical expenditure m then the insurer will cover ajm of these expenses. As a result, the

consumer must pay for m(1− aj) out-of-pocket.

The consumer’s budget constraint is:

c+m(1− aj) ≤ y − pj

A difficulty the consumer faces is that aj will be uncertain at the time that m is cho-

sen. Medical plans are long and typically quite complicated documents that are frequently

written by insurers, their executives, and attorneys. It is unlikely that a typical consumer

invests the resources to understand what these medical plans cover in all states of the world.

Furthermore, the final determination of reimbursement is done by administrators and is of-

ten the outcome of a negotiation between the health provider and the insurer, which is a

complicated process.

What is important to model is the determination of aj from the perspective of the con-

sumer since we are ultimately interested in consumer demand. The other issues raised above,

while interesting, are secondary to our research question. Therefore, we shall model aj as a

3See e.g., Khwaja (2001, 2010) for a dynamic model of health insurance and health care decisions.

8

random variable with a distribution fj(aj|m). This allows the generosity of the benefits to

depend on the plan j chosen by the consumer. Furthermore, we allow aj to depend on m.

This is natural in the context of health care. For example, many plans require a consumer to

pay a fixed fee for a doctor’s visit. Plans may display features such as deductibles or coverage

gaps which do not reimburse a certain range of expenditures. Our framework accommodates

these complications by allowing the reimbursement rate to depend on m.

We shall use the observed distribution of reimbursements and flexible, nonparametric

methods to identify fj(aj|m). We view this is as preferable to a strict, ex ante specification

of consumers’ expectations about aj. We allow these expectations to be data driven and con-

sistent with the standard economic assumption that consumers have rational expectations,

in the sense that their beliefs about aj must be consistent with the observed outcomes.4

2.3 Expected Utility and First Order Conditions

The consumer makes her choice under uncertainty. A time line for the consumer’s choice is:

1. A value of θ is drawn from g.

2. After the consumer observes θ, the consumer makes a choice of m.

3. The reimbursement rate aj is realized from the distribution fj(aj|m).

4. Since preferences are strictly increasing, the budget constraint binds and c is deter-

mined by the equation c = y − pj −m(1− aj).

Let EU(m; pj, y, θ, γ) denote the consumer’s expected utility at step 2 in the time line

above, given pj, y, θ, and γ.

4As an alternative it might be interesting to allow consumers to have biased or irrational beliefs aboutthe determination of aj . However, our model is very flexible and comes close to exhausting the degrees offreedom in the data. The identification of such irrational beliefs would therefore be tenuous. There is noconsensus about a theoretical framework which would provide a plausible basis for the a priori specificationof how consumers bias their beliefs in the context of health care. As a consequence, we use the more commonassumption that consumers have beliefs that are consistent with ex post outcomes.

9

EU(m; pj, y, θ, γ) =

∫(1− θ)(y − pj −m(1− aj))1−γ1

1− γ1

fj(aj|m)daj + θm1−γ2

1− γ2

(2.1)

The above expression substitutes the choice of m for c using the consumer’s budget con-

straint. In computing expected utility, the consumer integrates over aj using the distribu-

tion fj(aj|m). If the realization of aj is close to one (zero), the value of c will be larger

(smaller) all else held equal. The value of EU(m) depends crucially on γ, the consumer’s

attitude towards risk. For example, the more risk averse the consumer is toward uncertainty

in consumption of c, we would expect her to consume less m. The utility also depends on

income and premiums through y− pj. For example, households with lower income are more

adversely impacted by a low realization of aj.

The optimal choice of m is determined by a first order condition that sets the derivative

of EU(m) in m equal to zero. Formally, we can write this condition as:

∂EU(.)

∂m=

∂

∂m

[∫(1− θ)(y − pj −m(1− aj))1−γ1

1− γ1

fj(aj|m)daj

]+

∂

∂m

[θm1−γ2

1− γ2

]

=1− θ1− γ1

∫[−(1− γ1)(1− aj)(y − pj −m(1− aj))−γ1fj(aj|m)

+ (y − pj −m(1− aj))1−γ1 ∂fj(aj|m)

∂m]da+

[θ ·m−γ2

]= 0 (2.2)

The first line of the equation 2.2 describes the marginal effect of increase in health care

expenditure on expected utility. The left bracketed term shows the marginal change in a con-

sumer’s utility due to a marginal change in consumption of the aggregate commodity c. The

presence of the conditional probability of reimbursement fj(aj|m) reflects the uncertainty

in reimbursement and the effect this has on the income that is available for consumption of

the aggregate commodity. The right bracketed term shows the marginal increase in utility

10

due to the consumption of health care services. Each bracketed term is weighted by θ, so

that the weight on a marginal increase in health care services reflects the private value the

consumer places on health care. The total effect on expected utility is the combined terms,

which are expanded in the second line of equation 2.2.

2.4 Inferring θ from observed choices

In this section we provide intuition about how our estimation procedure allows us to estimate

the density of health shocks g and the preference parameters γ from the observed choices.

Let us rewrite 2.2 in the following way:

θ =I

I − (1−γ2)mγ2

(2.3)

where

I =

∫[−(1− γ1)(1− aj)(y − pj −m(1− aj))−γ1fj(aj|m)

+(y − pj −m(1− aj))1−γ1 ∂fj(aj|m)

∂m]daj (2.4)

The rationale underlying our estimator is similar to that used in auction models, espe-

cially Guerre, Perrigne and Vuong (2000) and Campo, Guerre, Perrigne and Vuong (2011).

The left hand side of the above equation is the consumer’s private information, θ. For a

fixed γ, the remaining terms on the right hand side of this equation are potentially observ-

able. For example, from the data, we will be able to construct an estimate of fj(aj|m) by

observing the distribution of reimbursements conditional on medical expenditures m. Also,

our data set contains information about y, a particular consumer’s income level, pj, medical

premiums for plan j, and m the total value of consumption of medical services.

Under suitable regularity conditions, for a fixed value of γ, we shall always be able

to find a value of θ that rationalizes the consumer’s choice. As a result, it follows that

the assumption of utility maximization alone will not be adequate to identify both g and

11

γ. Our approach to identification will be to impose additional moment restrictions. This

identification strategy is similar to the analysis of the risk averse auction model in Campo,

Guerre, Perrigne and Vuong (2011). We note that in nonlinear parametric models, global

identification is generally difficult to verify. Campo et al. (2011) also make use of a set

of nonlinear moment conditions to estimate the risk aversion parameters. Their “high-

level” parametric identifying condition A1 (iv) implies the order condition for identification,

but does not explicitly provide a rank condition. For a special case of the CRRA utility

function, their moment conditions become linear so that the rank condition can be directly

tested using the observed data.5 Our approach to identification will be to impose additional

moment restrictions. In our application, we will use the moment restriction that g does not

depend on time. Our data is a three year panel on incomes and health care choices from

a large Minnesota-based employer. Intuitively, this restriction means that the severity of

illnesses for this large population of employees does not change over an interval of three

years. Below, we shall argue that this is reasonable since the population of employees is very

large and there is no reason to expect large fluctuations in the severity of illness as reflected

in g among this group within a reasonably short three year time period.6 However, there

will be considerable variation in individual incomes as individuals experience promotions or

change jobs within the organization. Also, premiums and reimbursement rates fj(aj|m) will

vary from year to year. In theory, another source of variation in the latent health distribution

could be the option of not insuring at all. However, given that our data set comes from a

large self-insured employer, selection effects are limited because most employees are covered

by the plans offered by the employer. Finally, the plans used in estimation capture most of

the employee base of the large employer, insuring over 80 percent of employees.

5However, Campo et al. (2011) candidly acknowledged that “considering another parametric specification... would lead to a nonlinear system of equations in (the risk aversion parameters) for which local identificationconditions can be obtained through the usual ‘rank’ conditions,” without providing explicit details for theverification of the rank conditions.

6To our knowledge, there was no epidemic nor any significant changes such as innovations in medicaltechnology that might have led to a large shift in the health distribution in this population during this timeinterval.

12

2.5 Discussion

The two primary issues in health insurance that are addressed in this paper, moral hazard

and adverse selection, result from the fact that consumer’s health, θ, is unobserved to the

insurer, yet the insurer offers a menu of different plans from which a consumer may choose.

Adverse selection occurs when plans are not able to distinguish between consumers with

different values of θ. In such a situation, a consumer with a high value of θ can choose a

plan j with generous features such as high reimbursement or better access to care that was

designed for consumers with low values of θ. Since θ is private information to the consumer

and the insurance plan cannot correct for this selection. Equation 2.3 will be used to back

out the distribution of θ for each plan and test how these distributions differ across plans,

i.e., whether there is evidence of sorting in the data.

Moral hazard occurs in the model because consumers pay some proportion aj for each

unit of health care m consumed, but do not pay the full cost incurred by the insurer. This

definition of moral hazard is commonly used in the health economics literature, although

other fields, such as contract theory, may use instead a definition that focuses on “hidden

actions.” The approach we use for moral hazard in this paper follows a longstanding def-

inition, as established by Pauly (1968).7 In our model, insurers establish pj and fj(aj|m),

but once consumers have chosen plan j, the insurer cannot contract the amount of care m a

consumer chooses within plan j. In equation 2.2, a consumer’s marginal increase in expected

utility with respect to health care must be equal to the price ratio between health care and

composite good consumption. The reimbursement aj lowers the relative price of health care

relative to composite good consumption, leading to moral hazard or “overconsumption” of

medical care compared to a situation in which the price of medical care is not subsidized by

insurance.

7Pauly discusses moral hazard stating: “Medicare insurance, by lowering the marginal cost of care to theindividual, may increase usage; this characteristic has been termed “moral hazard”.”

13

3 Data

To estimate the model we use a detailed confidential claims level dataset from a large self-

insured employer.8 The claims level data is linked to enrollment and demographic history

from the employer. Since the employer is self-insured, any medical care with an associated

claim is included in the dataset for every employee in the firm. Full claims data is available

for the years 2002-2004. The employer has offices in several locations and the complete

claims data covers over 19,000 employees and over 39,000 total beneficiaries.

The employer offers health insurance coverage for individuals, spouse/domestic partners

and families. Employees had a choice of four types of plan during 2002-2004, a traditional

health maintenance organization (HMO), a preferred provider organization (PPO), a tiered

network product based on care systems, and a consumer driven health plan (CDHP). The

HMO featured generous coverage for network physicians and hospitals but no coverage of

out-of-network care. The PPO had nominal copayments for services in-network, with lower

coverage out-of-network after meeting a deductible. In the tiered network product, employees

chose from three cost tiers with standardized benefits but varying premiums which changed

depending on the bids submitted by each of the facilities in the tiers. The CDHP option

featured an employer-funded health savings account and a high deductible.

Each claim includes total approved health care spending, total reimbursement, and cost

variables such as the coinsurance amount, copayment, and amount of deductible used. The

claim also includes information such as primary and secondary diagnosis codes, procedure

codes, type of treatment facility, and type of provider. Cost and spending amounts are ag-

gregated for each beneficiary over the year for a precise measure of total spending by the

employer and total out-of-pocket cost to the beneficiary. Claims data is linked to employee

information to include demographic information. Age and gender are taken directly from the

claims files. Salary was imputed based on birthdate, gender, home zipcode, work location,

and job classification using a separate file provided directly by the employer. Plan charac-

8We thank Robert Town and Caroline Carlin for their assistance in accessing this data.

14

teristics such as premium, employee contribution, and cost structure were obtained directly

from the employer’s enrollment materials.

For the empirical analysis, we focus on the three most populous plans which comprise

over 80 percent of enrollment. These plans are the HMO plan, hereafter referred to as “HP,”

and two of the plans in the tiered network product, referred to as “PC2” and “PC3.” PC2

and PC3 have similar cost sharing parameters, however PC2 has a network of providers of

lower cost than PC3. We use only employees enrolled under single coverage, and only those

employees with continuous enrollment over the entire year.9 Table 1 displays enrollment

statistics for each year and plan for single coverage full-year enrollment. Over the three-year

period, the dataset has over 14,000 single-coverage enrollees. The HP plan has the largest

enrollment with over 80 percent of enrollees each year. Enrollments in the PC2 and PC3

plans range between 230 and 539.

Table 1: Plan Total Enrollees

Plan 2002 2003 2004 TotalHP 4,032 4,126 3,845 12,003PC2 483 469 388 1,340PC3 523 539 230 1,292Total 5,038 5,134 4,463 14,635

The remaining variables included in the dataset are a current health status proxy and a

measure for the probability of high future health costs. These two variables are created for

each individual enrollee using the Johns Hopkins University ACG Case-Mix System (v6),

which was developed by the Health Services Research and Development Center. This is a

commercial algorithm used to predict future illness severity and health care spending. The

health proxy variable is constructed around a national average of 1.0 with higher values

indicating greater illness severity. The diagnosis codes and health status proxy are then

used to create a measure of the probability of high costs next year.

9Continuous single coverage replicates our specified model, but more categories of consumer could beanalyzed with simple modifications to our framework.

15

Table 2 displays summary statistics for each of the three plans for total health spending,

total reimbursement, salary, age, the health status proxy, next year’s high cost probability,

and percentage female. Statistics are displayed for the over 14,000 individual observations

in the dataset. Average health care spending is less than $5,000 in all three of the plans,

but the tiered PPO plans have higher average spending than the HMO plan. The highest

average spending is in the PC3 plan at $4,700 per consumer across all years. PC2’s average

spending is approximately $700 lower than PC3. The HMO’s average spending is significantly

lower than the PPO’s spending, at $2,845 per consumer. For a better understanding of

the spending distribution in each plan, Table 2 also lists the 25th and 75th percentiles of

spending. In the 25th percentile, the HMO plan has again significantly less expenditure

than the two PPO plans. The HMO’s 25th percentile of spending is $306 as compared to

$774 and $864 of PC2 and PC3, respectively. This pattern continues in the 75th percentile

where HMO expenditure is $2,710, compared to PC2 at $4,402 and PC3 at $5,023. For each

plan, total reimbursement is slightly less than total health spending, which reflects enrollees’

out-of-pocket costs.

Comparing demographic variables across the plans, there is preliminary evidence of the

existence of adverse selection. The average salary in the dataset is $43,896. The PC3 plan

has the highest average salary of $50,357, compared with the PC2’s $45,946 and HP’s $42,972

average salaries. The PC3 plan retains the highest distribution of salaries when looking at

both the 25th and 75th percentiles. The HP plan’s enrollees are younger, with an average

age of 41.7, compared to PC2 and PC3 with averages ages of 46.4 and 47.0, respectively

The average of the health status proxy variable is also much lower for the HMO plan. HP

has an average health status proxy value of 1.4. The two PPO plans average health status

proxy values are both well over 2. The PC2 plan has the highest enrollment of females at 73

percent. The HMO plan enrolls the lowest percentage of females at 59 percent. While these

differences in demographic characteristics do not prove adverse selection on unobservable

illness levels, they offer evidence that plans do vary in observable characteristics.

16

Table 2: Summary Statistics by Plan

Plan Variable Mean s.d. p25 p75 N

HP Total health spending $2,845 $7,459 $306 $2,710 12,003Total reimbursement $2,659 $7,139 $267 $2,470 12,003

Salary $42,972 $21,180 $30,659 $50,000 12,003Age 41.7 12.0 31.0 52.0 12,003

Health status proxy 1.4 2.8 0.2 1.4 12,003Prob. high cost next year 0.08 0.09 0.04 0.09 12,003

Indicator for female 0.59 0.49 0.00 1.00 12,003

PC2 Total health spending $3,964 $7,559 $774 $4,402 1,340Total reimbursement $3,657 $7,555 $643 $3,972 1,340

Salary $45,946 $21,533 $32,802 $52,008 1,340Age 46.4 11.5 38.0 55.0 1,340



PC3 Total health spending $4,700 $9,243 $864 $5,023 1,292Total reimbursement $4,449 $9,151 $758 $4,639 1,292

Salary $50,357 $26,795 $33,615 $57,811 1,292Age 47.0 10.7 39.0 55.0 1,292



Total Total health spending $3,111 $7,664 $374 $3,029 14,635Total reimbursement $2,909 $7,397 $328 $2,734 14,635

Salary $43,896 $21,874 $31,210 $50,898 14,635Age 42.6 12.0 32.0 52.0 14,635



17

Table 3 displays the health status proxy summary statistics for each year. The average

value of the health status proxy over all plans is stable across the three years in the sample

at approximately 1.6. The other descriptors of the distribution are also stable from year

to year with an average standard deviation of 2.96 and the same values for the 25th and

the 75th percentile in each year. We use this stability across years in the underlying health

status to construct our estimator, which is described in the next section.

Table 3: Health Status Proxy by Year

Year Mean s.d. p25 p75 N2002 1.64 3.12 0.20 1.49 5,0382003 1.60 2.81 0.20 1.49 5,1342004 1.57 2.92 0.20 1.49 4,463Total 1.60 2.96 0.20 1.49 14,635

4 Estimation

We use a two-step semiparametric estimation strategy to recover the underlying parameters

of the consumer’s utility function and the distribution of latent health status. An individual

observation is a full-year single coverage enrollee i, in a plan j, for a given year t. Equation

2.2 specifies the optimal decision rule for an individual with the latent health status θi

which can be transformed in to Equation 2.3. We estimate an empirical version of the latter

equation. Components which come directly from the data are health care spending mijt,

income yit, and premiums pjt. Health care spending and income are unique to an individual

i in plan j in a given year t. Premiums are the same for all individuals in a plan j in a given

year t. Components which are estimated from the data are the conditional reimbursement

distribution fjt(ajt|mjt), its derivative∂fjt(ajt|mjt)

∂mjt, and risk parameters γ1, γ2. The estimated

version of Equation 2.3 is as follows:

θi =I

I − (1−γ2)

mγ2ijt

(4.1)

18

where

I =

∫[−(1− γ1)(1− ajt)(yit − pjt −mijt(1− ajt))−γ1 fjt(ajt|mjt)

+ (yit − pjt −mijt(1− ajt))1−γ1 ∂fjt(ajt|mjt)

∂mjt

]dajt (4.2)

Estimation of Equation 4.1 proceeds in three steps:

1. Estimate conditional reimbursement distributions fjt(ajt|mjt) and∂fjt(ajt|mjt)

∂mjt.

2. Given fjt(ajt|mjt) and∂fjt(ajt|mjt)

∂mjt, substitute the observed yit, pjt, mijt and ajt and find

θi in terms of I and γ1, γ2.

3. Estimate utility parameters γ1, γ2 using GMM.

Intuitively, the unobserved health status θ is the analogue of the unobserved error term

in a structural linear demand equation. Given that we observe data on income, insurance

premium, the reimbursement rate, and the medical expenditures, the identifying assumption

that the unconditional distribution of the health status does not vary across years allows

us to recover utility parameters and the distribution of the latent health status. The effec-

tive instruments, which are the year dummies, are associated with exogenous shifts in the

distribution of income and the relative price of medical care that are uncorrelated with the

health shock. Thus intuitively we can hold the health status “fixed” while the relative prices

are shifted by the instruments, which leads to co-movement in medical expenditures and

consumption. Such co-movements allow us to identify the utility parameters. Once we have

estimated the utility parameters, the distribution of health status can be backed out from

the first order condition. The difference between our method and the conventional two-stage

least squares estimator is that instead of relying on a reduced-form specification of a linear

functional form for the demand equation, we use the optimality condition for a risk averse

consumer to derive the functional form of the demand equation.

19

This estimated distribution of health status will then be used to identify adverse selec-

tion and moral hazard. Adverse selection identification comes from how the distribution is

sorted into plans. Moral hazard estimation will come from a counterfactual choice of health

expenditures, in the absence of insurance, of an individual given his health status estimate

and the derived functional form of the demand equation.

4.1 Estimating Conditional Reimbursement Distributions

In the first step, we nonparametrically estimate the conditional reimbursement distributions

for each plan in each year. First, the ex-post realized reimbursement is calculated for each

enrollee as a percentage of total health spending. The nonparametric kernel estimation

uses the realized reimbursements to calculate the joint probability of a given reimbursement

percentage and a given level of health spending, fjt(ajt,mmt).10 To estimate the conditional

distribution, we then nonparametrically calculate the health spending distribution fjt(mjt).

The estimated conditional distribution fjt(ajt|mjt) is then the ratio of the joint distribution

of reimbursement and spending, and the distribution of spending. That is,

fjt(ajt|mjt) =fjt(ajt,mjt)

fjt(mjt).

Figure 1 reveals some of the uncertainty between levels of spending and reimbursement

levels in our data. Each scatter plot maps patients’ spending levels for a given year in a given

plan, during which reimbursement policies of the plan were fixed, with the reimbursement

that the patient finally received. There is substantial variation between expenditure and re-

imbursement, which would lead to uncertainty when a patient chooses a level of expenditure.

The conditional distribution is calculated for a grid of reimbursement percentages, ajt,

and health expenditure categories.11 The conditional distributions fjt(ajt|mjt) are displayed

by plan for each year in Figure 2, Figure 3, and Figure 4. Across all plans and years, the

10The bandwidth is the optimal bandwidth rule of thumb suggested by Bowman and Azzalini (1997).11These results use a 128x128 grid of ajs and mts. Both more and fewer categories produced very little

change in the resulting estimates.

20

Figure 1: Reimbursement and Spending Levels

lowest reimbursement percentages occur at the lowest level of health spending, and vice versa.

To illustrate the diagrams, Figure 4 shows that PC3 in 2002 has a positive probability of

zero percent reimbursement across all expenditure ranges because the reimbursement curve

is flush with the lower axis. Figure 2 shows that the HP plan has the lowest out-of-pocket

cost schedule; the conditional probabilities corresponding to over 90 percent reimbursement

are higher in HP than the other two plans. In our last step, we obtain∂fjt(ajt|mjt)

∂mjtby applying

an approximate derivative to all grid points of the conditional distribution. These estimation

results use an adaptive Simpson quadrature.

4.2 Estimating the Latent Health Status Distribution

We use the estimated conditional reimbursement probabilities to estimate the latent health

status distribution. The underlying identification assumption is that the aggregate distri-

bution of latent health status of consumers enrolled in the three plans is the same for the

21

Figure 2: Conditional Reimbursement Distributions, HP Plan 2002-2004

22

Figure 3: Conditional Reimbursement Distributions, PC2 Plan 2002-2004

23

Figure 4: Conditional Reimbursement Distributions, PC3 Plan 2002-2004

24

three years. This is a reasonable assumption for the short range of years, 2002-2004. It is

also supported by the proxy health status variable constructed in the data from diagnosis

codes. This health proxy status variable has a similar distribution across all three years, as

reported in Table 3. Although we assume that the entire distribution of latent health status

remains constant, it is possible that individual realizations of θi may be different over this

time period, and there may also be changes in plan variables such as premiums and cost

sharing levels. Alternatively put, our identifying assumption is that there was no change in

the health distribution of the population (e.g., due to an epidemic or medical innovation) in

our sample over the three year period.

For each individual enrollee, θi is recovered using Equation 4.1. Given the estimated

conditional reimbursement probabilities and the data on spending and premiums, θi can be

recovered in terms of the remaining parameters to be estimated, γ1 and γ2.

4.3 Solving for Utility Parameters

For each single-coverage full year enrollee i in a given plan j and a given year t, we now

have an expression for the enrollee’s latent health status in terms of the utility parameters

of the enrollee’s maximization problem. We estimate the utility parameters using a GMM

framework. Based on our identifying assumption, we estimate the model requiring the latent

health status distributions to be equal to each other across years. Health expenditures

often have a small number of extreme outliers, so, since we are matching the shape of the

distribution, we trim the distribution. The nonparametric approach benefits from trimming

the tails of the distribution and the data points that are dropped are not needed for a

consistent estimator. First, we drop enrollees with zero yearly expenditures. This amounts

to approximately 12 percent of the HP sample in each year, 7 percent of the PC2 sample,

and 4 percent of the PC3 sample. Second, we drop enrollees whose expenditures fell in the

top 20 percent of the entire sample of expenditures over all three plans. Finally, a very small

number of enrollees, less than 1 percent in each year, were dropped if their out-of-pocket

25

expenditures were larger than their salaries. Table 4 displays the resulting estimation sample.

Over 10,000 employer-year observations remain in the estimation sample.

Table 4: Estimation Sample Size

Plan 2002 2003 2004 TotalHP 2,690 2,791 2,631 8,112PC2 351 341 295 987PC3 392 392 174 967Total 3,433 3,533 3,100 10,066

The GMM estimation procedure to recover utility parameters is implemented as fol-

lows. We explicitly describe the estimation procedure for the first moment, i.e., the mean.

The other moments are constructed similarly.12 The latent health status of each individ-

ual enrollee i, θi, is as specified in Equation 4.1. Once we substitute the data on observed

health expenditure, income, insurance plan premium, and reimbursement probabilities, i.e.,

wijt = (mijt, yit, pjt, fjt), each individual i has an expression for her latent health status solely

in terms of the utility parameters, γ. Equation 4.3 displays the mean of the health status

distribution for the years 2002 and 2003, i.e., µθ02 , where N02 is the number of enrollees in

2002, as well as the corresponding definition of mean in 2003, µθ03 .

µθ02(γ) =

N02∑i=1

1

N02

θi(γ), µθ03(γ) =

N03∑i=1

1

N03

θi(γ) (4.3)

Let h1(w, γ0) denote the first moment condition for our sample. Statistical inference

using the GMM estimator is based on the property that E[h1(w, γ0)] = 0. For each of the

four distribution moments we have 3 moment conditions in our sample. These are the 3

pairs generated by 3 years of data. The 3 sample moment conditions associated with the

distribution mean are:

12In recovering the utility parameters we match the distributions using the first four moments of mean,variance, kurtosis, and skewness within each year.

26

h1(w, γ) =(µθ02(γ)− µθ03(γ)

)2

h2(w, γ) =(µθ03(γ)− µθ04(γ)

)2

h3(w, γ) =(µθ02(γ)− µθ04(γ)

)2

Using the other 3 distribution moments we generate 9 additional sample moment condi-

tions for a total of 12 sample moment conditions. The GMM estimator minimizes the sum

of all 12 sample moment conditions. This minimum value is found through a grid search

over the possible values of the two utility parameters. Grids of varying size were used for

γ1 ∈ [1, 6] and γ2 ∈ [1, 6]. We substituted each combination of grid values for γ1 and γ2 into

the sample moments and found the resulting sum of squared differences for that combina-

tion. The optimal γ that was chosen is the grid combination of γ1 and γ2 with the sum of

squared differences that is the closest to zero.

5 Asymmetric Information Results and Analysis

5.1 Test of Identifying Assumptions

The number of GMM conditions, four distribution moments times 3 sample years, is greater

than the dimension of the risk parameter vector. Because the model is overidentified, this

allows us to test our identifying assumptions. If the assumption of equal θ distributions

between all years is valid, then using only two years instead of all three years should yield

the same predictions. We perform a test by estimating parameters using a subset of data for

only two years and then using the estimated parameters from this subset to construct the

health status distribution for all three years. If resulting health status distributions for all

three years are statistically similar, even the year that was not included in estimating the

parameters, this lends strong support to our identifying assumption and we conclude our

GMM model fits the data well.

27

We perform this test by treating each of the three years as a holdout year in turn. For

example, we estimate risk parameters, γ based on data from 2002 and 2003, and then use

the estimated γs to construct the in-sample “estimated” health shocks θ02 and θ03. The γs

estimated using 2002 and 2003 data are applied to the 2004 data to construct a “predicted”

θ04 for the holdout sample. We then perform a Kolmogorov-Smirnov test for equality of

distributions between each in-sample “estimated” θ distribution versus the holdout sample

“predicted” θ distribution. We repeat this exercise holding out data from 2003, and then

again holding out data from 2002.

Table 5 lists the results for each of these tests. The bottom row lists results for the

test described above, where 2004 is the holdout year. The K-S statistics in the bottom row

demonstrate the “estimated” distributions, θ02 and θ03, cannot be statistically distinguished

from the predicted θ02 distribution. That is, the K-S test for equality of distributions is

not rejected. The test holding out all 2003 data also finds that the resulting estimated

and predicted distributions are not statistically distinct. However, the K-S test between the

predicted 2002 health shock distribution based on the 2003 and 2004 data rejects the equality

of distributions between the predicted year and the estimated years. It should be noted that

our sample size is very large – each year is over 3,000 observations – hence the statistical

precision of these tests is quite high. It may be possible that, although the predicted health

distribution for the holdout year 2002 is statistically different from the in-sample estimated

2003 and 2004 distributions, their economic significance may not be very different. Figure

5 displays the predicted 2002 vs. estimated 2003 and 2004 health shock distributions in

histograms. In spite of K-S statistical rejection when comparing the three histograms, the

general shape of all three distributions remains very similar. Hence, we conclude that our

assumption of a constant health status distribution seems plausible.

28

Figure 5: Predicted 2002 vs. Estimated 2003, 2004 Health Shock Distributions

29

Table 5: Predicted vs. Estimated Health Shock Distributions

H0: X1 and X2 are from the same continuous distribution.HA: X1 and X2 are from different distributions.

(X1) Predicted Year (X2) Estimated Year 1 Estimated Year 22002 0.0343 Reject 0.0404 Reject2003 0.0262 Do Not Reject 0.0236 Do Not Reject2004 0.0313 Do Not Reject 0.0275 Do Not Reject

In each year, left column is the K-S Statistic and right column is the result.

5.2 Utility Parameter Estimates

We estimate coefficients of relative risk aversion for aggregate consumption (γ1) and health

(γ2) respectively. Table 6 displays values of γ using increasingly fine grids over the parameter

space. The standard errors were computed using the method described in the Appendix.

Table 6: Estimated Risk Coefficients γ1, γ2

Grid Size γ1 γ2

20 1.98 3.27(0.76) (1.20)

40 1.88 3.12(0.86) (1.35)

50 1.93 3.23(0.86) (1.35)

Standard errors in parentheses.

Estimates from 200 bootstrap iterations.

The resulting risk coefficients are in the range of [1.88, 1.98] for γ1 and in the range of [3.12,

3.27] for γ2. Higher values in the γ1 range tend to be associated with correspondingly higher

values of γ2. This result implies that individuals are more risk averse with respect to health

status than to the aggregate consumption commodity. The aggregate consumption estimates

are within the range found in the literature on consumption (see Shea 1995, Gourinchas and

Parker 2002). Risk parameters between 1 and 2 have been found to approximate empirical

data in previous literature on risk in consumption versus investment (Prescott 1986). Our

estimated health risk coefficients are slightly higher than the risk coefficients for aggregate

30

consumption. This is consistent with the notion that people are more risk averse with respect

to their health as it often cannot be regained once lost.

5.3 Moral Hazard

Consumers’ health care consumption behavior, or “excess” consumption, has been cited as

a factor in rising health care costs. In the context of employer-sponsored insurance, the

consumer chooses her plan for a given year, and then chooses her health care consumption

throughout the year based on the plan contract. The concept of moral hazard we develop and

compute addresses the counterfactual difference in health expenditures due to the presence of

this insurance contract, where a consumer’s out-of-pocket expense is not matched to the full

cost of providing the care. This approach incorporates the insights of sequential contracting,

where an agent selects her best action within a previously chosen contract (See Courty and

Li 2000, Dai, Lewis and Lopomo 2006).13

This measure follows previous health economics work which defines moral hazard as

the change in a consumer’s choices that was induced by lower costs of health care through

insurance. Our measurement approach, however, focuses more on consumer behavior than

some of the previous empirical literature, such as Feldstein (1973) and Feldman and Dowd

(1991). The first advantage of our measure of moral hazard is that it allows for considerable

nonlinearities in the change in behaviors with respect to changes in reimbursement policy,

e.g., consumer responses to nonlinear changes in reimbursement policies due to deductibles.

The second is that the consumer’s choice is income neutral.

Our measure of moral hazard is based on a counterfactual which allocates the consumer

the same resources previously consumed in both health care and aggregate consumption, but

now allows the consumer to choose the allocation. A more precise mathematical description

is provided below but we first describe the counterfactual graphically in Figure 6. The figure

displays a consumer’s budget constraint and indifference curves in both the observed (1)

13Note that the conventional measure of moral hazard in the health economics literature differs from thatin contract theory, where this term is reserved for situations in which agent’s behavior cannot be directlyobserved by the agent, i.e., “hidden actions.”

31

and counterfactual scenarios (2). The horizontal axis is health care expenditures, m, and

the vertical axis is composite good consumption, c. The line BC1, shows the consumer’s

budget constraint observed in the data. The flatter slope of BC1 reflects the fact that

health expenditures are subsidized through insurance, so the same income buys more health

expenditure compared to the composite good. The intersection of BC1 and the indifference

curve IC1, shows the resulting optimal consumption bundle (m1, c1) of health expenditures

and composite good consumption observed in the data.

In the counterfactual scenario, we remove subsidized consumption of health care through

insurance. The relative prices of health care and the composite good are each set to one, so an

additional dollar of income buys the same additional amount of either good. The consumer is

given a lump sum income transfer equal to the amount observed in the data that insurance

paid for her health care. This ensures the original consumption bundle (m1, c1) remains

affordable to the consumer. This is shown in Figure 6 where BC2 runs through the original

consumption bundle (m1, c1). The slope of BC2 shows a one-to-one tradeoff between health

expenditures and composite good consumption.

We then examine how consumption of health expenditures and the composite good might

adjust without the price differential caused by insurance coverage. Although the consumer

can choose to consume the same bundle, in this scenario, she may also choose a different

allocation of her total budget between health care and the composite good that corresponds

to a higher indifference curve. The extent of moral hazard is captured by the subsequent

change in the optimal consumption bundle, the tangency of BC2 and IC2. In the figure, the

consumer reaches a higher indifference curve in the counterfactual allocation by consuming a

greater amount of composite good and reducing her health care expenditures at (m2, c2). The

measure of overconsumption of health care is the difference between the two consumption

bundles, m1 −m2.

To calculate the extent of overconsumption, we first calculate the insurer’s total health

care spending for each consumer in each year. Denote this observed amount as T = aj ·m1.

32

Composite good

Health expenditure

IC1

IC2

BC1 BC2

c2

c1

m2 m1

Figure 6: Moral Hazard Counterfactual

The counterfactual budget constraint is then:

c2 +m2 ≤ y − pj + T

where T represents the lump sum transfer to an individual consumer. The variables c2 and

m2 are the composite good consumption and health expenditure choices in the counterfactual

environment.

Next, we calculate the distribution of the health shocks, θ, in each year using the data

and the estimated utility parameters from the most detailed grid search of [γ1, γ2]. The

consumer’s utility maximization problem is then:

U(m2, pj, y, θ; γ) = (1− θ)(y − pj −m2 + T )1−γ1

1− γ1

+ θm1−γ2

2

1− γ2

(5.1)

The consumer’s maximization problem sets her marginal rate of substitution between

aggregate good consumption and health care expenditure equal to the ratio of prices. The

consumer no longer faces a reimbursement schedule aj, but receives a lump sum transfer of

T and pays for all care out-of-pocket. The ratio of the price of health care to the price of

the composite good is simply equal to one. In our estimation, we calculate the MRS:

MRS =θ

(1− θ)(y − pj + T −m2)γ1

mγ22

33

Since [θ, y, pj, T, γ1, γ2] are all known, we then solve for the value of m2 where MRS = 1.

The difference between the counterfactual m2 and the actual observed health expenditure

m1 is a measure of the amount that a consumer is overconsuming as a result of the change

in the price ratio. This is because m2 is the amount a consumer chooses if forced to pay the

full cost of care, given the same realized net income as when insured. In the counterfactual

scenario, consumers are no longer restricted to spend their reimbursement income, T , on

health expenditure. If the consumer chooses to spend part of the unrestricted income T on

consumption of the aggregate good, then this reveals how purchasing health care through

insurance influences her behavior. In the estimation results, we will refer to m2 as first-best

consumption.

We find that the magnitude of overconsumption in our data is substantial. For each

year, we calculate the differences for each consumer between the observed amount of health

expenditures and the first best level of expenditures, as described above. Figure 7 shows the

resulting distributions of overconsumption estimates for each of the years 2002-2004. The

horizontal axis is the difference between observed expenditure and first best expenditure. The

vertical axis is a count of the number of consumers. The general shape of the distribution

is similar for each of the three years. Each year shows approximately half of the consumers

have overconsumption between $0 and $500, after which the distribution drops rapidly with

a long right tail essentially finishing after $3,000 in overconsumption.

Table 7 displays summary statistics for the estimated overconsumption. The three right-

most columns display percentiles of the overconsumption for the three years. The 25th

percentile of overconsumption was approximately $190 in 2003 and 2004, and $177 in 2002.

The median level of overconsumption was near $400 in the three years, and the 75th per-

centile was over $700 in 2002, and over $900 in both 2003 and 2004.

Summing overconsumption over all consumers, the total yearly estimate of overconsump-

tion across plans is large. The sum of all consumers’ differences between observed and first

best is $2,125,480 in 2002. The sum is $2,504,380 in 2003 and is $2,270,059 in 2004. To

34

Figure 7: Estimated Overconsumption, 2002-2004

35

Table 7: Summary Statistics on Overconsumption

Year N Percentilep25 p50 p75

2002 3,433 $177 $392 $7842003 3,533 $193 $437 $9352004 3,100 $190 $446 $984

put individual overconsumption in perspective, Table 8 list summary statistics by year for

overconsumption as a percentage of a consumer’s original health care expenditure. On av-

erage, a consumer’s overconsumption was 45 percent of the consumer’s original health care

expenditure. The median overconsumption percentage is also over 40 percent of the original

choice of health expenditure, with a standard deviation of 10 percentage points or less in all

years.

Table 8: Overconsumption as Percentage of Original Health Care Expenditure

Year Mean Median s.d. N2002 45.14 42.09 8.83 3,4332003 46.22 42.14 8.85 3,5332004 46.22 42.49 10.04 3,100

To place our results on moral hazard in the context of previous literature, we must be

clear on our definition of moral hazard. We measure the consumer’s health expenditure

choice in response to the availability of health insurance once an illness has occurred. Our

approach also incorporates an aspect left out of many previous estimates by making the

original consumption bundle available to the consumer in our counterfactual. For this reason,

our counterfactual is able to determine the consumer’s choice when there is no price subsidy

for health care, but with a budget constraint that allows the consumer’s utility to remain

the same. Our results show that, given the same net observed income, consumers consume

over 40 percent less health care when they pay out-of-pocket for all health care.

Much of the literature on moral hazard takes advantage of natural experiments of a

36

change in the consumer’s out-of-pocket price to determine the change in the level of health

expenditures. These natural experiments feature a change in consumers’ out-of-pocket cost,

without an accompanying income adjustment. Scheffler (1984) used a pre-post approach

around a natural experiment introducing 40 percent cost sharing to consumers and found

a 38 percent reduction in health expenditures. This result appears to imply a higher level

of moral hazard than in our counterfactual, because Scheffler (1984)’s expenditures dropped

by nearly the same percentage in response to a much less severe change in price. However,

recall that this natural experiment does not control for income effects, which may explain

Scheffler’s more severe drop in expenditures.

Several important moral hazard estimates come from the RAND Health Insurance Exper-

iment (HIE). This was a large government-sponsored experiment that randomized consumers

into health insurance plans with different levels of coinsurance. HIE estimates are able to

isolate moral hazard effects because the randomization of participants into plans removes

confounding adverse selection concerns. Although the HIE did give lump sum payments

to cover worst-case scenarios from participating, the HIE estimates still do not attain the

income neutrality of our estimates. Manning et al. (1987) found that expenditures fell by 15

percent given an increase from 0 percent to 25 percent coinsurance. However, this estimate

does not take into account the deductibles in the HIE plans. Keeler and Rolph (1988) con-

ducted a subsequent study recognizing nonlinear reimbursement, and estimated that going

from no insurance to full coverage results in a 50 percent increase in expenditures. This

estimate is closer to our finding where dropping from substantial reimbursement to full out-

of-pocket expenditures reduced health expenditure by 40 percent. The income neutrality of

our counterfactual may explain why consumers in our sample reduce expenditures by slightly

less than that estimated by Keeler and Rolph. Our results suggest that failing to control for

income effects may overstate the level of moral hazard.

37

5.4 Adverse Selection

Adverse selection is a substantial concern in the health insurance literature. In this section,

we propose a distribution-free test for adverse selection. The presence of adverse selection

causes consumers to sort across different plans based on their latent health status (e.g.

Rothschild and Stiglitz 1976). In our framework, this would imply that the distribution of

the latent health status variable varies across health plans. To determine if the latent health

status distribution is different across the three plans, we first compute the yearly distribution

of latent health status using the estimated parameters. Figure 8 displays the distribution

of health status, θ, for each of the years 2002-2004. Health status is on the horizontal axis,

i.e., between 0 and 1. On the vertical axis is the number of consumers at each point in the

interval. The distribution appears to be bimodal – many consumers in each year had very

low values of θ, but there is a small clustering at the value of 0.8. This is a typical health

distribution for a large general population – many healthy individuals but a small number

of very sick individuals at the extreme tail of the distribution.

To place our estimated latent health status measure in the context of observable charac-

teristics, we examine the relationship between our estimated latent health status and various

consumer characteristics in our data. Table 9 reports the results of two regressions on log

of the estimated latent health status. Regression 1 includes age, an indicator variable equal

to 1 for female, log salary, and dummies for Plan 2 and Plan 3 with the omitted category

being the HP Plan. Regression 2 also includes the previous variables plus the health status

proxy variable constructed from diagnoses. The signs of the resulting coefficients generally

plausible. Age has a positive relationship with log health status in Regression 1. A ten year

increase in age is associated with a 0.14 percent increase in the magnitude of our estimated

latent health status. Individuals with higher salaries are associated with lower levels of θ,

or better latent health status, in both regressions. A negative relationship between income

and illness is well-documented in both the economics and medical literatures, such as Smith

(1999), Ettner (1996), and Deaton and Paxton (1998). Both Plan 2 and Plan 3 have higher

38

Figure 8: Health status Distribution, 2002-2004

39

predicted average values of the estimated health status θ. Plan 3 has a slightly higher pre-

dicted contribution to health status, at 0.571 more than the HP plan average, compared

with the 0.504 from Plan 2. Age is no longer significant when including in Regression 2 the

observed health status proxy – the variable provided in the data using weighted diagnoses

codes. Recall, the observed health status proxy is not used in estimating the model param-

eters. A one percent increase in the health status proxy variable is associated with a 1.23

percent increase in the value of our estimated latent health status. Beyond the positive re-

lationship between our estimated latent health status and the observed health status proxy,

these regressions demonstrate that our latent health status estimates are able to capture a

more general measure of health status than limited diagnosis codes.

Table 9: Estimated Health status Regression

Dependent variable: ln(health status, θ)Covariate Regression 1 Regression 2Age 0.014 0.002

(0.002) (0.002)Indicator =1 if female 1.120 0.854

(0.049) (0.040)ln(salary) -0.851 -0.721

(0.059) (0.048)Plan 2 indicator 0.504 -0.095

(0.080) (0.065)Plane 3 indicator 0.571 -0.064

(0.081) (0.066)ln(health status proxy) 1.237

(0.017)Constant 5.756 5.793

(0.604) (0.484)R-squared 0.087 0.412N 10,066 10,066

Standard errors in parentheses.

Table 10 estimates the relationship between various consumer characteristics and plan

type using a multinomial logit specification. It reports the marginal effects of the regressors.

The first logit specification includes only demographic characteristics, the others contain an

40

indicator for health: specification 2 includes our estimated value of θ, specification 3 includes

the data’s health proxy, and the fourth specification includes both. The marginal effects of

age, female, and salary are negative for the HP Plan. This means that the HP plan is more

likely to have younger, male enrollees, which is indicative of advantageous selection, as we

found above. The corresponding marginal effects for the PC2 and PC3 plans are positive,

more likely to have adverse selection on the observable characteristics in the data. In the

specifications including health indicators, the HP plan has negative marginal effects on all

health variables, where lower values of the health variable indicates healthier patients. The

PC2 and PC3 plans have positive marginal effects for the health indicators, supporting our

finding that these plans have patients with more severe health shocks.

To test for sorting among plans, we break down the estimated yearly latent health status

distribution by plan to examine whether individuals with a more severe latent health status

appear to self-select into more generous plans. Figure 9 shows the distribution of latent

health status over all years in each of the three plans. The horizontal axis is the 0 to 1 range

of the θ, and the vertical axis is the fraction of consumers in each plan corresponding to the θ

value. A simple visual analysis of the HMO plan, HP, versus the PPO plans, PC2 and PC3,

shows that the HMO plan has a much larger fraction of very healthy individuals – those

with θ near zero. Over 20 percent of HP consumers had a θ value less than 0.05, compared

to the PPO plans with approximately 15 percent of consumers in this range. Both PC2 and

PC3 show a larger clustering than the HMO plan around the health status value of 0.8.

In adverse selection models, consumers strategically sort across plans. The sickest con-

sumers will choose the plan with the most generous benefit structure that facilitates their

care. If adverse selection is present, we would expect the distribution of latent health sta-

tus severity to vary by plan, since higher θ consumers will choose a different type of plan

than lower θ consumers. If no adverse selection is present, we would expect the distribution

of latent health status to be the same across different plans. To formally test for adverse

selection among the plans, we perform a hypothesis test. The null hypothesis is that the

41

Table 10: Multinomial Plan Logit on Observables

Variable 1 2 3 4

HP Plan Age -0.004*** -0.004*** -0.003*** -0.004***(0.000) (0.000) (0.000) (0.000)

Indicator = 1 if female -0.035*** -0.055*** -0.018*** -0.057***(0.004) (0.008) (0.004) (0.008)

ln(salary) -0.032*** -0.090*** -0.026*** -0.074***(0.003) (0.010) (0.003) (0.010)

Estimated θ -0.177*** -0.078***(0.012) (0.015)

ln(health status proxy) -0.025*** -0.046***(0.001) (0.004)

PC2 Plan Age 0.002*** 0.002*** 0.001*** 0.002***(0.000) (0.000) (0.000) (0.000)

Indicator = 1 if female 0.025*** 0.043*** 0.017*** 0.044***(0.003) (0.006) (0.003) (0.006)

ln(salary) 0.017*** 0.028*** 0.014*** 0.021***(0.002) (0.007) (0.002) (0.007)

Estimated θ 0.081*** 0.029***(0.009) (0.011)

ln(health status proxy) 0.011*** 0.024***(0.001) (0.003)

PC3 Plan Age 0.002*** 0.002*** 0.002*** 0.002***(0.000) (0.000) (0.000) (0.000)

Indicator = 1 if female 0.009*** 0.012** 0.001 0.013**(0.002) (0.006) (0.002) (0.006)

ln(salary) 0.014*** 0.061*** 0.011*** 0.054***(0.002) (0.007) (0.002) (0.007)

Estimated θ 0.096*** 0.049***(0.009) (0.011)

ln(health status proxy) 0.013*** 0.022***(0.001) (0.003)

Marginal effects reported. Standard errors in parentheses.

***, ** indicates significant at the 1,5 percent level.

42

Figure 9: Health Status Distribution, by plan

43

estimated latent health distribution within one plan looks similar to the distribution of the

other plan. The alternative hypothesis then, is that the distributions were actually drawn

from different populations, so that one plan has a sicker population than the other. We use

K-S test statistics to compare the distributions between plans.

Comparing the HMO and PPO plans, Table 12 reports the K-S statistics testing the null

hypothesis that both plans’ latent health status distributions come from the same continuous

distribution. The three possible combinations of plan comparisons (i.e., HP and PC2, HP

and PC3, PC2 and PC3) are presented. The null hypothesis is rejected between the HMO

plan and a PPO plan in all three years. These results are significant at the 1 percent level.

The insignificance of the result between PC2 and PC3 in 2002 and 2004 may be expected

because the pricing structure of the two plans is the same, with the only difference being

the network providers. These results confirm the presence of adverse selection within our

sample, showing that the consumer population is indeed different across the three plans.

Table 11: K-S Test Statistics, Inequality


Plan (X1 and X2) 2002 2003 2004HP and PC2 0.1948 Reject 0.1778 Reject 0.2347 RejectHP and PC3 0.1638 Reject 0.3036 Reject 0.1864 RejectPC2 and PC3 0.0459 Do Not Reject 0.1819 Reject 0.1050 Do Not Reject


Table 12: K-S Test Statistics, Inequality




Given that Table 12 establishes that the distribution of estimated latent health status

44

is not the same across plans, we next examine which plans seem to contain the higher tail

of the distribution. The K-S test can be used to check if one plan’s cdf is greater than

another’s, implying that the plan with the greater cdf has an estimated latent health status

distribution which contains more consumers at the lower (healthier) end of the distribution.

Table 13 looks in greater detail at the latent health status distribution between plans.

The tradeoff for consumers between HMO plans and PPO plans is that PPO plans provide

greater flexibility in provider choice in exchange for more cost sharing by consumers. The

HMO restricts provider choice the most tightly, but has the lowest premium and cost sharing

structure of all the plan choices. Consumers with specific health concerns likely have the

greatest preference for flexibility in provider choice. In contrast, healthy consumers who do

not anticipate using much care will choose the lowest cost plan. Relatively healthy consumers

who do not expect to incur more than a yearly checkup may also be satisfied with the more

limited network of providers in the HMO plan. A more severe health status enters utility as

a larger θ. Thus, if a consumer expects to be relatively healthy, a low value of θ, she will

choose the cheapest option – the HMO plan. If a consumer expects to be in more severe

health, a larger θ, she may select into the PC3 plan for the most flexibility. Therefore, the

cdf of the plan with lower values of θ (better health status) should stochastically dominate

the cdf of a plan with greater values of θ (worse health status).

The first two rows of Table 13 report results of the K-S comparison test between the HMO

and PPO plans’ cdfs, which support the above hypotheses. When comparing the HMO plan,

HP, with either PPO plan, PC2 or PC3, the null hypothesis of equality of distributions is

rejected in favor HP, meaning that both PC2 and PC3 are stochastically dominated. This

implies that the portion of relatively healthy consumers is larger in the HP plan than in the

PC2 or PC3 plan.

Within the tiered PPO plans, PC2 and PC3 differ only in the types of facilities that

consumers may visit. PC3 providers include the PC2 providers plus additional providers.

Once again, consumers with more complicated health conditions, a larger value of θ, may

45

select into the PC3 plan to take advantage of the additional available providers. In 2003,

the hypothesis of equality of distributions between PC2 and PC3 is rejected in favor of PC2

having a larger cdf, meaning the distribution of consumers is healthier in the PC2 plan.

However, this hypothesis cannot be rejected for the other two years, 2002 and 2004. This

inability to reject may be due to the fact that, despite the difference in provider choice, these

two plans do have the same cost structure.

Table 13: K-S Test Statistics, Larger

H0: X1 and X2 are from the same continuous distribution.HA: X1’s cdf is greater than X2’s cdf.



Finally, to check the validity of our tests for adverse selection, we check the same hy-

potheses of equality in distributions, but within a plan across years instead of across plans.

If distributions within a plan are equal across years, this supports the persistence of adverse

selection. Additionally, equality of distributions within a plan across years means the plan’s

population is relatively stable, and that the differences between plans in the previous tests

can be attributed to sorting. The results are displayed in Table 14. In general, the null hy-

pothesis cannot be rejected that a plan’s latent health status distribution is different across

years. The two exceptions are the HP plan between 2003-2004 and the PC3 plan between

2002-2003. These results support the assertion that the differences across plans in Table

13 are due to fundamental distribution differences that persist across years. Overall, the

K-S tests show that the underlying distributions of health status are different between the

different types of plans offered to enrollees, and that these differences persist over several

years of testing.

Our results demonstrate adverse selection by testing for differences between three employer-

46

Table 14: K-S Test Statistics, Within plans


(X1 and X2) HP PC2 PC32002 and 2003 0.0212 Do Not Reject 0.0645 Do Not Reject 0.1795 Reject2003 and 2004 0.0281 Do Not Reject 0.1098 Do Not Reject 0.1495 Reject2002 and 2004 0.0332 Do Not Reject 0.0984 Do Not Reject 0.1035 Do Not Reject


sponsored plans’ latent health status distributions. We find that the HMO attracts healthier

consumers compared with two PPO options, and within the PPO options, healthier con-

sumers sort into the more provider-restrictive tiers. Although the empirical literature on

adverse selection is mixed, these results fall solidly in support of the existence of adverse

selection in health insurance.

Our results on HMO sorting are similar to results of Juba, Lave, and Shaddy (1980),

who find, in logit estimates, that lower self-reported health status decreases the chance of

selecting HMO enrollment. When comparing cost and utilization rates, Jackson-Breeck and

Keinman (1983) find that HMO enrollees averaged 53 percent fewer inpatient days before

joining the HMO than those who stayed in a fee-for-service plan. Cutler and Reber (1998)

also find plans with low costs but restrictive provider choice attract healthier consumers. Our

results build on these earlier papers because this paper’s approach uses a structural model,

which enables the counterfactual experiment in moral hazard to complement the findings on

adverse selection.

The sorting we find within the PPO plans, between cost tiers, is similar to Marquis

and Phelps (1987). Using probit estimation for the purchase of supplementary insurance,

Marquis and Phelps found that families in the highest quartile of illness had a 42 percent

greater probability of purchasing supplementary insurance compared with those in the lowest

quartile. Our results show that the consumers in the highest quartiles of illness sort into

the PPO plan with the greatest provider coverage, essentially supplementing insurance. In

47

a similar finding across years, Ellis (1985) finds that age and a worse health status in the

previous year are associated with purchasing more generous coverage in the next year.

Within structural approaches to measuring adverse selection, Cardon and Hendel (2001)

do not find evidence of informational asymmetries. After accounting for observables, the

authors find no link between health care consumption and insurance choices that can be

attributed to unobservable characteristics. An advantage of our structural approach is that it

matches insurance structures more closely. Although Cardon and Hendel do explicitly model

deductibles, our semiparametric approach is more flexible given the complicated nature of

insurance coverage that includes not only deductibles but also coinsurance and copays. In

addition, our approach captures the reality that consumers often have only a general idea

of reimbursement probabilities, so insurance plan’s structures should not enter the utility

function deterministically.

6 Conclusions

To conclude, we present a new approach to measuring moral hazard and adverse selection.

This is based on a model of demand for health care and health expenditures in the presence

of unobserved heterogeneity in the health status of individuals. We make three key contri-

butions to the existing literature. First, we disentangle moral hazard and adverse selection

through estimation of the latent health status distribution using detailed claims level data.

We compute a measure of moral hazard by performing a counterfactual to isolate the effect

of insurance on health expenditure. Adverse selection is examined by testing for differences

in latent health status distributions across plans. The second contribution is based on our

semiparametric estimation method. We estimate insurance reimbursement schedules non-

parametrically which has two advantages: (i) it allows for complex insurance plans including

copays, deductibles, and other nonlinear features, and (ii) nonparametric estimation of a

conditional probability of reimbursement relaxes a common assumption in the previous lit-

erature by incorporating the reality that plans are complex and difficult for consumers to

48

predict in every state of the world. Finally, the third key contribution is that we present a

measure of the magnitude of both moral hazard and adverse selection in a self-insured em-

ployer insurance pool. Our estimates indicate that moral hazard may account for as much

as 40 percent of current health spending in these plans, and that consumers with a relatively

less complex health status do sort into the lowest cost sharing plan.

Although, our proposed semiparametric method provides a more flexible and robust al-

ternative for analyzing adverse selection and moral hazard there are caveats. We assume

that the utility function is separable in the aggregate consumption commodity and medi-

cal care. Although, this captures risk aversion in health status, it rules out more flexible

interactions between aggregate consumption and health status (see e.g., Viscusi and Evans

1990). However, Spence and Zeckhauser (1971) and Blomqvist (1997) use a similar spec-

ification, and Campo, Guerre, Perrigne and Vuong (2011) also require similar restrictions

on utility in an auctions context. In spite of these limitations, our research is novel in that

it develops a tractable estimation procedure under parsimonious parametric assumptions to

simultaneously examine adverse selection and moral hazard. Our research is also important

as it provides a framework for similar analysis in other contexts where distortions exist due

to asymmetric information, especially when the data is limited to being cross sectional.

References

[1] Abbring, J. H., P. A. Chiappori, J. H. Heckman and J. Pinquet. Adverse Selection and

Moral Hazard in Insurance: Can Dynamic Data Help to Distinguish? Journal of the

European Economic Association. 2003, 1(Papers and Proceedings): 512-521.

[2] Ackerberg, D., X. Chen, and J. Hahn. A Practical Asymptotic Variance Estimator for

Two-Step Semiparametric Estimators. Working Paper. UCLA. 2009.

[3] Akerlof, George. The Market for “Lemons”: Quality Uncertainty and the Market Mech-

anism. The Quarterly Journal of Economics. 1970, 84(3): 488-500.

49

[4] Arrow, K.J. Uncertainty and the Welfare Economics of Medical Care. American Eco-

nomic Review. 1963, 53:941-973.

[5] Blau, D. and Donna B. Gilleskie. The Role of Retiree Health Insurance in the Em-

ployment Behavior of Older Males. International Economic Review. 2008, 49(2), p.

475-514.

[6] Blomqvist, A.G. Optimal Nonlinear Health Insurance. Journal of Health Economics.

1997, 16(3):303-321.

[7] Bowman, A.W. and Azzalini, A. Applied Smoothing Techniques for Data Analysis:

the Kernel Approach with S-Plus Illustrations. Oxford University Press, Oxford. 1997.

[8] Bundorf, K., Levin, J., and Mahoney, N. Pricing and Welfare in Health Plan Choice.

Working Paper. Stanford University. 2010.

[9] Cameron. A. C., P. K. Trivedi, Frank Milne, and J. Piggott. A Microeconometric

Model of the Demand for Health Care and Health Insurance in Australia. Review of

Economic Studies. 1988, 55(1):85-106.

[10] Campo, Sandra M., E. Guerre, I. Perrigne, and Q. Vuong. Semiparametric Estimation

of First-Price Auctions with Risk Averse Bidders. Review of Economic Studies. 2011,

volume 78(1): 112–147.

[11] Cardon, James H. and Igal Hendel. Asymmetric Information in Health Insurance:

Evidence from the National Medical Expenditure Survey. RAND Journal of Economics.

2001, 32(3):408-27.

[12] Carlin, C., and Town, R. Adverse Selection, Welfare and Optimal Pricing of Employer

Sponsored Health Plans. Working Paper. University of Minnesota. 2009.

[13] Chiappori, Pierre-Andre and Bernard Salanie. Testing Contract Theory: a Survey

of Some Recent Work, in Advances in Economics and Econometrics - Theory and

50

Applications, Eighth World Congress, M. Dewatripont, L. Hansen and P. Turnovsky,

ed., Econometric Society Monographs, Cambridge University Press, Cambridge, 2003:

115-149.

[14] Courty, P., and Hao Li. Sequential Screening. Review of Economic Studies. 2000,

67:697-717.

[15] Cutler, D., and Reber, S. Paying for Health Insurance: The Trade-off Between Com-

petition and Adverse Selection. The Quarterly Journal of Economics. 1998, 113(2):

433-466.

[16] Cutler, David M. and Richard J. Zeckhauser. The Anatomy of Health Insurance. Cu-

lyer, Anthony J. and Joseph P. Newhouse (eds) Handbook of Health Economics Vol.

1A. Amsterdam , New York : Elsevier, 2000: 563-643.

[17] Dai, Chifeng, Tracy Lewis, and Giuseppe Lopomo. Delegating Management to Experts.

RAND Journal of Economics. 2006, 27(3): 403-420.

[18] Deaton, Angus and Christina Paxton. Aging and Inequality in Health and Income.

American Economic Review. 1998: 88(2):248-253

[19] Einav, L., Finkelstein, A., and Cullen, M. Using Price Variation to Estimate the Wel-

fare Effects of Adverse Selection. The Quarterly Journal of Economics. 2010, 125(3):

877-921.

[20] Einav, L., Finkelstein, A., and Levin, J. Beyond Testing: Empirical Models of Insur-

ance Markets. Annual Review of Economics. 2010, 2:311-36.

[21] Einav, L., Jenkins, M., and Levin, J. Contract Pricing in Consumer Credit Markets.

Working Paper. Stanford University. 2010.

[22] Ellis, Randall P., “The effect of prior-year health expenditures on health coverage pian

51

choice,” Advances in Health Economics and Health Services Research, Vol. 6, Richard

M. Scheffler and Louis F. Rossiter, eds. (Greenwich, CT: JAI Press, 1985), pp 127-47.

[23] Ettner, Susan L. New Evidence on the Relationship Between Income and Health.

Journal of Health Economics. 1996, 15:67-85.

[24] Fang, H., and Gavazza, A. Dynamic Inefficiencies in Employment-Based Health Insur-

ance System: Theory and Evidence. NBER Working Paper 13371. 2010

[25] Feldman, Roger and Bryan Dowd. A New Estimate of the Welfare Loss of Excess

Health Insurance. The American Economic Review, 1991, 81(1): 297-301.

[26] Feldstein, Martin. The Welfare Loss of Excess Health Insurance. Journal of Polical

Economy. 1973, 81(2)251-280.

[27] Gourinchas, P. O. and J. Parker. Consumption Over the Life Cycle. Econometrica.

2002. 70(1): 47-89.

[28] Guerre, E., I. Perrigne, and Q. Vuong. Optimal Nonparametric Estimation of First-

Price Auctions. Econometrica. 2000, 68(3):525-574.

[29] Handel, B. ‘Adverse Selection and Switching Costs in Health Insurance Markets: When

Nudging Hurts. Working Paper. Northwestern University. 2010.

[30] Jackson-Beeck, Marilyn and John H. Kleinman, “Evidence for self-selection among

health maintenance organization enrollees,” Journal of the American Medical Associ-

ation, Vol. 250, No. 20 (November 25. 1983), pp. 2826-29.

[31] Juba, David. A, Judith R. Lave and Jonathan Shaddy, “An analysis of the choice of

health benefits plans,” Inquiry, Vol. 17 (Spring 1980), pp. 62-71.

[32] Keeler, E., J. Newhouse and C. Phelps. Deductibles and the Demand for Medical

Care Services: The Theory of a Consumer Facing a Variable Price Schedule under

Uncertainty. Econometrica, 1977, 45: 641-655.

52

[33] Keeler, E., and J. Rolph. The Demand for Episodes of Treatment in the Health Insur-

ance Experiment. Journal of Health Economics, 1988, 7:337-367.

[34] Khwaja, A. Health Insurance Habits and Health Outcomes: A Dynamic Stochastic

Model of Investment in Health. Ph.D Dissertation, University of Minnesota. 2001.

[35] Khwaja, A. “Estimating Willingness to Pay for Medicare Using a Dynamic Life-Cycle

Model of Demand for Health Insurance,” Journal of Econometrics, May, 2010, Vol.

156.

[36] Kowalski, Amanda. “Censored Quantile Instrumental Variable Estimates of the Price

Elasticity of Expenditure on Medical Care” NBER Working Paper 15085. June 2010.

[37] Marquis, M. Susan, Adverse selection with a multiple choice among health insurance

plans: A simulation analysis, Journal of Health Economics, Vol. 11, No. 2 (August

1992), pp. 129-51.

[38] Manning, W.G., J.P. Newhouse, N.Duan et al. Health Insurance and the Demand for

Medical Care: Evidence from a Randomized Experiment. American Economic Review.

1987, 77(3):251-277.

[39] Marquis, M. Susan and Charles E. Phelps,“Price elasticity and adverse selection in the

demand for supplemental health insurance.” Economic Inquiry, Vol. 25, No. 2 (April

1987), pp. 299-313.

[40] Marsh, Christina. “Estimating Health Expenditure Elasticities Using Nonlinear Reim-

bursment.” Working paper. June 2010.

[41] Newey, W. K. The Asymptotic Variance of Semiparametric Estimators. Econometrica.

1994, 62(6):1349-1382.

[42] Pauly, M. The Economics of Moral Hazard: Comment. American Economic Review.

1968, 58:531-536.

53

[43] Prescott, E., Theory ahead of business cycle measurement, Federal Reserve Bank of

Minneapolis Quarterly Review, 10, Fall, 9-22. 1986

[44] Rothschild, M. and J.E. Stiglitz. Equilibrium in Competitive Insurance Markets: An

Essay on the Economics of Imperfect Information. Quarterly Journal of Economics.

1976, 90(4):630-649.

[45] Scheffler, R.M. “The United Mine Workers’ health plan: an analysis of the cost-sharing

program,” Medical Care 22(3):247-254. 1984.

[46] Shea, J. Union Contracts and the Life-Cycle/Permanent-Income Hypothesis. American

Economic Review. 1995, 85(1): 186-200.

[47] Smith, James P. Healthy Bodies and Thick Wallets: The dual relation between health

and economic status. Journal of Economic Perspectives. 1999, 13(2):145-166.

[48] Spence, M. and R. Zeckhauser. Insurance, Information, and Individual Action. Amer-

ican Economic Review. 1971, 61(2):380-387.

[49] Viscusi, K. and W. Evans. Utility Functions that Depend on Health Status: Estimates

and Economic Implications. American Economic Review. 1990, 80: 353-374.

[50] Vera-Hernandez, Marcos. Estimation of a Principal-Agent Model: Moral Hazard in

Medical Insurance. The Rand Journal of Economics. 2003, 34(4):670-693.

54

A Appendix: Asymptotics - Not for Publication

The difficulty of computing standard errors and obtaining a valid statistical inference pro-

cedure in semiparametric models is well known. The estimator proposed in this paper is a

semiparametric GMM estimator that depends on first stage nonparametric estimation of the

conditional distribution of the rate of reimbursement given the amount of medical expendi-

ture. Statistical inference presents a major challenge just as in other similar semiparametric

models which require nonparametric estimation of conditional mean or distribution functions

in the first stage.

There are several conventional approaches to compute the correct standard errors to take

into account the statistical uncertainty introduced by nonparametric estimation in the first

stage. The first one is to derive the asymptotic distribution of the estimator analytically and

replace the asymptotic variance with a consistent estimate based on the sample data. This

is in principle possible by following the pathwise derivative calculation in Newey (1994). A

second approach is resampling. Either bootstrap or subsampling will provide a valid infer-

ence procedure for this particular estimator. A third approach is to make use of an insight

by Newey (1994) that the asymptotic variance of the second stage estimator does not de-

pend on how the first stage nonparametric estimation method is implemented. While the

first stage estimator is currently implemented using kernel density smoothers, the second

stage estimator will have approximately the same asymptotic variance even if the first stage

is estimated using a sieve parametric approximation instead of the kernel smoother. If the

implementation of the first stage estimator can be modified to be a sieve parametric approx-

imation method then the (overidentifying) moment conditions that are used in obtaining

the estimator can potentially be modified to a set of exactly identifying moment conditions

for both the first stage sieve parameters and the second stage structural parameters of the

model. According to Newey (1994), if this is possible, then the approximate variance of the

second stage estimator can be read off from the lower diagonal of the variance-covariance

matrix of the entire generalized method of moment estimator that includes both the first

55

stage and second stage estimators. Computing the overall variance-covariance matrix is

straightforward using the conventional sandwich formula for GMM estimators.

Unfortunately each of these approaches has its own disadvantages. The pathwise deriva-

tion calculation in Newey (1994) is often tedious and prone to errors in analytic computation.

The resulting asymptotic variance estimate can also be complex and is sensitive to errors

introduced through numerical calculations and computations. Resampling methods require

recomputing the estimators repeatedly over many bootstrap iterations. Given the nonlinear

nature of the method of moment estimators, this might not be computationally feasible. Re-

placing the first stage kernel smoother with a sieve parametric approach also seems at odds

with the current implementation of the semiparametric estimator and might also lead to dif-

ferent point estimates for the second stage structural parameters. In addition, implementing

a first stage sieve parametric approach appears to be more difficult than implementing the

kernel smoother.

The intuition underlying the computation of the standard errors is as follows. In com-

puting the standard errors for a two step estimator the first stage is treated in a parametric

fashion (somewhat like in sieve estimation). Then the standard error for the second stage

can be read off from the lower diagonal components of the entire variance-covariance matrix

of both the first and second stage parameters. The paper by Ackerberg, Chen, and Hahn

(2009) formally proves the validity of this approach.

In the following we formally outline the pathwise derivative argument of Newey (1994)

which justifies the asymptotic normality of the semiparametric two step estimator. First

note that the proposed estimator in this paper takes a general form for most two step

semiparametric estimators:

γ = arg minγ∈Γ

gn(γ)′Wngn(γ),

where

gn(γ) =1

n

n∑i=1

g(zi, fa,m(·, ·), γ

).

56

In the above expression, fa,m(·, ·) denotes the first stage kernel smoothing nonparametric

estimate of the joint density function of a and m. The zi denotes the collection of all the

variables for observation i. The functional notation is used to emphasize that the second

stage moment condition depends on the entire joint density because θ is recovered through

an equation that depends on integrating against the conditional distribution of a given m,

and on the derivative of the conditional density fa|m(·|·).

It can be shown through standard Taylor expansion arguments that under suitable reg-

ularity conditions (which include the condition that the first stage nonparametric regression

should make use of an undersmoothing sequence of bandwidth parameters):

√n(γ − γ) = G−1W

1√n

n∑i=1

g(zi, fa,m(·, ·), γ0

)+ op(1) ,

where op(1) denotes a term that converges to zero in probability. The W is the probability

limit of Wn, and

G =∂

∂γg (zi, fa,m(·, ·), γ0) .

When the nonparametric component fa,m(·, ·) is known, asymptotic normality of γ follows

immediately from a central limit theorem applied to the normalized sum of the moment

conditions evaluated at γ0:

1√n

n∑i=1

g (zi, fa,m(·, ·), γ0)d−→ N(0,Ω) where Ω = V ar(g (zi, fa,m(·, ·), γ0)).

However, when fa,m(·, ·) has to be estimated in the first stage its impact on the second

stage asymptotic variance needs to be taken into account. Specifically, it can be shown under

additional regularity conditions that

1√n

∑ni=1 g

(zi, fa,m(·, ·), γ0

)− 1√

n

∑ni=1 g (zi, fa,m(·, ·), γ0)

=√nE[g(zi, fa,m(·, ·), γ0

)− g (zi, fa,m(·, ·), γ0)

]+ op(1).

The pathwise derivation calculation in Newey (1994) seeks a linear influence function

57

ψ(zi) such that

√nE[g(zi, fa,m(·, ·), γ0

)− g (zi, fa,m(·, ·), γ0)

]=

1√n

n∑i=1

ψ(zi) + op(1).

Conditional on knowledge of this influence function, it is easy to see that

1√n

n∑i=1

g(zi, fa,m(·, ·), γ0

)d−→ N(0, V ar(g

(zi, fa,m(·, ·), γ0

)+ ψ(zi))).

Then we can conclude that

√n(γ − γ)

d−→ N(0, (G′WG)−1G′WΩWG(G′WG)−1)).

58

Documents

Moral Hazard, Adverse Selection and Health Expenditures: A