Do Risk Aversion and Wages Explain Educational Choices?Do Risk Aversion and Wages Explain Educational Choices? Thomas O. Brodatyy, Robert J. Gary-Bobo z and Ana Prietox 6 May 2014

Do Risk Aversion and Wages Explain

Educational Choices?�

Thomas O. Brodatyy, Robert J. Gary-Boboz and Ana Prietox

6 May 2014

Abstract

We study a model in which a student�s investment in education maximizes expectedutility conditional on public and private information. The model takes future wagerisk into account and treats the direct and opportunity costs of education as additionalsources of risk. In particular, the time needed to complete a degree is random. The datashow that time-to-degree is substantially dispersed, conditional on the highest degree.We �nd signi�cant and substantial e¤ects of expected returns on individual educationchoices. The risk a¤ecting education costs and, in particular, the randomness of time-to-degree, also play an important role in explaining enrollment in higher education.We �nd precise estimates of the CRRA risk-aversion parameter, between 0.6 and 0.9.Heterogeneity in risk attitudes is signi�cant, even in sub-populations. The sons ofprofessionals bear more risk and are more risk averse than the rest of the sample. Yet,they study more because of higher returns and markedly lower expected investmentcosts. Simulations show a strong impact of changes in the probability of grade retentionon educational achievement.Keywords: Returns to Education, Wage Risk, Risk Aversion, Grade Retention,

Education Costs, Family Background, Time to Degree.

�We thank Christian Belzil, Stephane Bonhomme, Denis Fougere, Marc Gurgand, Thierry Kamionka,Francis Kramarz, Guy Laroque, Thierry Magnac and Jean-Marc Robin for their help and remarks at variousstages of this research. We also thank Orazio Attanasio, Jose-Victor Rios-Rull and the participants of the2005 NBER Summer Workshop for their help and useful remarks. The present paper is a deeply revisedversion of a manuscript circulated in January 2005 and entitled: �Risk Aversion, Expected Earnings andOpportunity Costs: A Structural Econometric Model of Human Capital Investment�.

yTHEMA, University of Cergy-Pontoise, France. Email: [email protected] 15, boulevard Gabriel Péri, 92245 Malako¤ cedex, France. Email: robert.gary-

[email protected].

1

1 Introduction

Uncertainty a¤ects not only the returns to, but also the direct and opportunity costs of edu-

cation. The methods of diversi�cation, risk-sharing and insurance are not easily applicable

to human-capital investments. Thus, one would expect risk aversion to have substantial ef-

fects on the decision to invest. In the vast literature following the pioneering work of Becker

(1964) and Mincer (1974), most of the contributions focused on returns to education1, while

a comparatively much smaller number of studies have been devoted to the risks of human-

capital investment. More speci�cally, the impact of risk aversion on education choices has

been studied in a handful of papers2. There exists, of course, an important literature on

the statistical analysis of wage dynamics, but most of its contributions are purely empirical

or based on reduced-form models3. The theoretical literature on risk aversion and human

capital is also sparse4.

We propose an econometric study of the risks of education and, at the same time, of

the role of risk aversion in the demand for education. Our analysis is based on the standard

assumption that education choices maximize the expected utility of individuals, conditional

on public and private information. The choice model hinges on returns to education (i.e.,

skill premia), and the direct and opportunity costs (i.e., foregone earnings) of schooling.

Both future wages and the costs of education are a¤ected by risks. In this model, we also

allow for unobserved heterogeneity in risk attitudes, family background and ability, and

by assumption, the students�utility functions belong to the constant relative risk aversion

(CRRA) family.

Students�beliefs, taking the form of probability distributions of education costs and

1See the surveys of Card (1999), Heckman et al. (2003), Harmon et al. (2003).2Friedman (1953) presented explanations for income inequalities in terms of risk aversion. Weiss (1972)

studied the risk-premium element in compensation, modeling wages as log-normal variables and assumedCRRA utility functions. Palacios-Huerta (2003) applied �nancial econometric techniques to education. Seealso King (1974), Shaw (1996), Carneiro, Hansen and Heckman (2003). In the recent macroeconomicsliterature, see Krebs (2003), Huggett, Ventura and Yaron (2011).

3See the survey of Meghir and Pistaferri (2010). For instance, Chen (2008) models the variance of wagesunder self-selection and unobserved heterogeneity. Low, Meghir and Pistaferri (2010) and Magnac, Pistolesiand Roux (2013) use a dynamic structural model to study wage risk, but they calibrate or ignore risk-aversion(and time-preference) and they focus on careers, not on initial education.

4See the classic contributions of Lehvari and Weiss (1974), Williams (1979), Dreze (1979) and Eaton andRosen (1980).

2

future wages, are modeled with the help of two auxiliary equations. The �rst one is a

Mincerian, log-wage equation, used by the student to predict future wages as a function of

educational achievement. The second one is a delay equation explaining the number of years

needed by the student to earn his(her) highest degree (i.e., the time-to-degree). This last

point deserves further explanations.

Our approach doesn�t rely on years-of-schooling as the only measure of human capital,

as in a number of classic studies. An original feature of the model presented below is that

it distinguishes degrees (or education levels) from the time needed to reach a certain level

(or time-to-degree). This distinction is made possible by our data, which give detailed

information on both the highest degree and school-leaving age for each individual.

Our results have been obtained with a rich sample, containing 12,500 young men who

left the educational system in 1992, in France. In this dataset, the randomness of degree-

completion dates is generated by the addition of several sources of delay. In France, as well

as in many other countries, there are grade repetitions in primary and secondary schools.

Delays, mainly due to �unking, can be accumulated during vocational or college studies as

well5. Thus, the time needed to pass exams is substantially dispersed: students di¤er in

their learning �speed� and the costs of education are random, conditional on the highest

degree. A contribution of the present article is to show that this type of risk is a key factor

in understanding enrollment in higher education.

One of the model�s outputs is the risk-return �curve�that can be drawn by joining the

points associated with each degree (or education level). The curves are typically increasing,

higher returns being associated with higher wage-risk. These mean-variance plots vary with

family background. The subsample including the sons of professionals enjoys higher returns,

but bears more risk in higher-education levels than the rest of the sample. This is true

not only on average, but also if we separate heterogeneity from uncertainty, that is, in our

context, if risk-return curves are conditional on unobserved types. In addition, we �nd

5The empirical relevance of these delays depends on the institutions of the country under study. Forinstance, grade repetitions are common in France, Spain and Germany, while social promotion prevails inGreat Britain and Scandinavian countries. Time-to-degree completion is substantially dispersed in highereducation as well. See e.g., Ehrenberg and Mavros (1995), Brunello and Winter-Ebner (2003) and Garibaldiet al . (2012). In France, the bulk of these di¤erences in speed is generated when students fail their exams(and fail the exam retakes) and must repeat a semester or a full year to earn a given degree.

3

that individual choices are very sensitive to a number of model parameters, including risk

aversion, and to expected returns to additional years of education. In contrast, choices seem

to be less elastic with respect to the risk in wages itself (i.e., the variance of log-wages).

In our model, individuals are characterized by a speed parameter, which is the prob-

ability of being promoted to the next grade, at the end of each year. This parameter varies

with family background and unobserved heterogeneity in an important way. We simulate the

impact of changes in speed on educational achievement and enrollment in higher education.

If students from less privileged backgrounds were given the average speed of the sons of ed-

ucated professionals, a high fraction of highschool dropouts would earn a vocational degree,

a substantially higher proportion of high-school graduates would successfully go to college,

etc. This shows the importance of the random costs of education for investment decisions.

Finally, we �nd values of the Arrow-Pratt coe¢ cient of relative risk aversion (RRA)

between 0:6 and 0:9. Students seem to be more risk-averse than a decision-maker using a

square-root utility, but less risk-averse than a decision maker using logarithmic utility6. Since

we use a model with a discrete number of unobservable student types, we are able to show

that there is a signi�cant amount of RRA heterogeneity. Student risk-aversion also depends

on family background, and it is perhaps surprising to �nd that the sons of professionals are

in fact more risk averse than male students from less privileged backgrounds. Yet, the sons

of professionals study more and bear more risks than others because their costs of education

are signi�cantly smaller, while their returns to education are signi�cantly higher than for

the rest of the male students.

Our approach is close in spirit, if not in its technical details, to the recent work

of Carneiro, Hansen and Heckman7 (2003). These authors separate the contribution of

heterogeneity from that of genuine uncertainty in the distribution of wages, and thus identify

the wage risks as perceived by students, before their decision to attend college. We use

parallel analytical tools in the present paper. Carneiro, Hansen and Heckman (2003) found

that �uncertainty in gains to schooling is substantial but knowledge of this uncertainty has

a very small e¤ect on the choice of schooling because the variance of gains is much smaller

6These �gures are close to the estimates found by other education economists; see, e.g., Keane and Wolpin(2001), Belzil and Hansen (2004), Sauer (2004).

7See also Cunha, Heckman and Navarro (2005), and Cunha and Heckman (2008).

4

than the variance of psychic costs or bene�ts, and it is the latter that drives most of the

heterogeneity in schooling decisions�. We also �nd that wage-uncertainty has a small e¤ect,

and that education costs (determined in part by family background) matter considerably.

But in contrast, we �nd strong e¤ects of expected returns on enrollment and speci�cally, a

strong impact of the risks a¤ecting education costs8.

In the following, Section 2 presents the model. Section 3 provides a description of the

data used for estimation. Section 4 presents the estimation results. Simulations are presented

in Section 5, and Section 6 contains concluding remarks. A number of computations, proofs

and additional details and results are omitted and relegated to the appendix.

2 The Model

2.1 Risk-averse students

We start with the basic assumption that an individual�s educational choices can be described

as the result of expected utility maximization. We assume that the agents�Von Neumann-

Morgenstern utility functions u exhibit constant relative risk aversion (CRRA) with respect

to earnings, that is, formally,

u(w) =w1� � 11�

; (1)

where w denotes the agent�s earnings. The Arrow-Pratt index of relative risk aversion for

this utility function is � 0.

There is a sample of students indexed by i, with i = 1; :::; N . To model unobserved

heterogeneity, we will rely on a discrete set of multidimensional, individual types. There are

K types indexed by j = 1; :::; K. A type value is denoted �j; it is a 6-dimensional vector,

�j = (�j ; �jc; �

jw; �

j�; �

jA; �

jd); (2)

where �j a¤ects the risk aversion coe¢ cient ; �jc a¤ects the individual�s direct costs of

education; �jw a¤ects the individual�s wage; �j� has an impact on the ex ante variance of an

individual�s wage; �jA has an impact on the age at which an individual enters junior high

8This is also in contrast to the recent literature on college-major choice, see Arcidiacono (2004), Be¤y etal. (2012).

5

school (or grade 6); and �jd has an e¤ect on the individual�s time to degree (i.e., delay).

We use pj = Pr(�j) to denote the probability of type j. Each coordinate in vector �j will

enter a di¤erent equation of the model. The students are assumed to know their type j, and

therefore know �j, but the econometrician doesn�t observe the types.

Education levels are discrete variables denoted s, i.e., s = 0; 1; :::; n. Let ws denote

the wage obtained at schooling level s. We represent education costs as a forgone fraction

of the individual�s potential earnings. More precisely, when a student has already reached

level s, the costs of investing in view of level s+ 1 are modeled as a fraction (1� hs+1)ws of

the earnings that the student could obtain if he or she went to work.

We distinguish the education level s (or degree) from the number of years of schooling

needed to reach a given level (i.e., time to degree). If xs is any variable depending on s,

denote �xs = xs� xs�1. Let ds denote the number of years used to reach level s. A student

will spend �ds years in school to increase his (her) education level from s�1 to s. Durations

ds can be viewed as random variables, because of grade repetitions and other delay factors,

but the educational investment level s is assumed to be a choice variable.

To provide a model for the choice of s, we assume that there is a factor, denoted

�, that the student is assumed to know, in addition to his (her) personal type �, but that

the econometrician does not observe. The � factor summarizes unobserved aspects of the

student�s environment and preferences that contribute to direct education costs. The in-

dividually optimal s is the result of expected utility maximization, conditional on private

information � and �. Given the assumptions listed above, the expected discounted sum of

utilities, knowing (�; �) and knowing observed covariates X, can be de�ned as follows :

V (s j � ; �) = E

24 TXt=1+ds

�tu(ws) +sXz=1

t=dzXt=1+dz�1

�tu(hzwz�1) j X ; � ; �

35 ; (3)

where � is a discount factor. In (3), wages ws and durations ds are random from the point of

view of the student. The cost factors hs also depend on (�; �) and are random from the point

of view of the econometrician. Given our dataset, in which only starting wages are recorded,

during the �rst years of career, we have a very weak basis to identify the discount factor.

We therefore focus on a model with a �nite horizon T < +1 and � = 1. Appendix 1 shows

how to derive the model�s equations if � < 1. Under the assumption of a �nite horizon and

6

patient students, expression (3) boils down to

V (s j � ; �) = E

"(T � ds)u(ws) +

sXz=1

(dz � dz�1)u(hzwz�1) j � ; �#; (4)

(we keep the conditioning with respect to X implicit to simplify notations). The relative

weight of the future career and of education years is determined by T and the hs. This point

is discussed below.

We model wages at the beginning of a worker�s career, and leave aside the problem of

modeling life-cycle earnings. The above model is in practice equivalent to a simple speci�ca-

tion in which students have the same deterministic career pro�les as a function of potential

experience, up to a translation given by the starting wage ws. In essence, we study the

behavior of students when their future social status is risky, and this riskiness is captured by

the randomness in the starting point of career pro�les. Our wage variable is an average of

the full-time wages earned during the �rst �ve years of career, weighted by their respective

employment-spell durations (see details below): this variable is a crucial predictor of future

career pro�les. It seems likely that the type of risk studied here is important for students.

However, as mentioned by a referee, the model cannot study di¤erences in, say, returns to

experience, that would be determined by student characteristics and interact with discount

factors in the choice model9.

We now specify equations determining (i), wages ws; (ii), the variance of wages

denoted �2s; (iii), the cost factors hs; (iv), the random durations ds; (v), the individual�s age

at grade 6 entry, denoted a (because it appears as an explanatory variable in the equation

for d, as will be seen below); and (vi), risk aversion itself (because it may vary across

individuals).

Firstly, yearly wages ws are determined as a function of education s as follows,

ln(ws) = fs +X0�0 + �a + �w + �s(��)�; (5)

where fs is a skill premium (depending on education s); X0 is a vector of control variables;

�0 is an associated vector of parameters; �w is the relevant component of type-value �; �s(:)

9This is clearly an interesting direction of research, but our dataset doesn�t allow us to exploit di¤erencesin the steepness of age-earnings pro�les (see Huggett et al. (2011)). As noted by a referee, this is a possiblesource of bias in our appreciation of the role of risk attitudes.

7

is the standard deviation of wages at education level s, which is assumed to be a function of

the type component ��; �a is the possible e¤ect of the individual�s age at grade 6 entry a; and

�nally, � is a standard normal random noise variable, representing risk from the student�s

point of view. We assume a simple form for the variance of wages, namely,

�s(��) = exp(�� +X�� + zs); (6)

where zs is a parameter that depends on education level s, X� is a vector of control variables

and �� is an associated vector of parameters. These �rst two equations combined, form a

standard Gaussian mixture model of log-wages: latent student types are characterized by

di¤erent means and variances of their future wages. In other words, the unobserved aspects

of an individual�s labor-market ability and risk are represented by (�w; ��).

The opportunity and direct costs of spending one more year in school, when the

education level is already s, are assumed to be a fraction (1 � hs+1(X1; �c))ws of the wage

ws. The function hs(:) depends on the type-component �c, on the vector of variables X1, for

all s = 1; :::; n. We have in mind that X1 includes some exogenous cost-shifting variables.

We specify hs as follows,

hs � exp(�X1�1 � cs + �c + �); (7)

where �1 is a vector of parameters10. We also assume that there is no constant in X1. Note

that this formulation is �exible: the hs functions depend on s (so that education costs are

not necessarily a �xed proportion of the potential wage); the hs functions also depend on

many covariates X1 and particularly, on family background (so that the son of a well-to-do

family may in fact have lower costs than the son of a blue-collar family). The assumption is

not unrealistic. We know, for instance, that in 1998, 80% of the French students have signed

a labor contract that was not a required internship11. But this �exible model is a possible

description of costs, even if many students do not work during their college years.

10The parameters cs can be interpreted as cost parameters for the following reason. It is easy to checkthat, � ln(hs+1ws) = ��cs+1 + �fs, for s � 1. Given that hs+1ws is the fraction of ws which is notforgone while studying at level s + 1, the �fs represent the bene�ts of moving from level s � 1 to level s(i.e., a skill-premium) and �cs+1 is the cost increment incurred while moving from s to s+ 1, expressed inpercentage of the earnings.11The majority of these contracts were of course part-time jobs or summer jobs, and the importance of

work increases with the education level, see Beduwe and Giret (2004).

8

We now turn to the model of durations ds, i.e., the delay equation. Let � s denote the

theoretical (and minimal number of years) needed to reach education level s. For instance,

if s is high-school graduation (i.e., the French baccalauréat), then � s = 18. We model the

ratio ds=� s as follows,ds� s= exp[X2�2 + �a�a�a + �d + �]; (8)

�� = exp(�a!a�a); (8b)

where X2 is a vector of covariates; �2 is a vector of coe¢ cients; �d is the relevant component

of type �; �a is the indicator function for the discrete age-at-grade 6 variable a; �a is the

associated coe¢ cient; and � is a normal random noise parameter. The error term � is not

observed by the agent: this is a risk for the student. The variance of this shock, denoted

�2� can potentially vary with covariates too. We chose to let it vary with age-at-grade-6, as

speci�ed by Equation (8b), where the !a are coe¢ cients to be estimated.

This simple formulation for the delay equation is suggested by the formula for the

mean of a Pascal distribution. Assume that each year, an individual i is promoted to the

next grade with a constant probability �i (so that (1� �i) is the probability of repeating a

grade). If we assume that �i is constant across years for each individual i, the probability

of reaching level s in k years, with k � � s, is given by the distribution of the number of

independent trials needed to obtain � s successes12. The expected duration of studies of level

s is simply � s=�i (this is just the formula for the mean of the Pascal distribution). This

suggests that, for each individual, �i is a measure of �speed�� in fact a measure of ability.

The recourse to Pascal probabilities is just a loose justi�cation for this convenient model,

where ds=� s = 1=�i ; Equation (8) obviously yields a linear regression form for the observed

variable ln(ds=� s).

With the help of this model, we can compute each individual�s prediction of the

average time needed to complete any degree of level s, di¤erent from his(her) observed highest

level si. This is done simply by computing the expectation E[ds j �;X] for each s. Remark12Pascal probabilities are given by,

Pr(ds = k) =

0@ k � 1

� s � 1

1A��si (1� �i)k��s :

9

that these predictions depend on the assumption that the speed factor 1=�i = exp[Xi2�2 +

�a�a�a + �d + �i] doesn�t depend on si, the observed highest level of i. The interpretation

of this prediction formula is that, to predict individual i�s counterfactual durations (i.e., the

average time needed by i to complete level si + 1, for instance), we extrapolate, using the

expected speed factor E[(1=�i) j �;X] = exp[Xi2�2 + �a�a�a + �d + �2�=2], which is itself

estimated thanks to the observed highest degrees.

Now, it may be that that the age at grade 6 entry a is endogenous in the above delay

equation. We therefore add a model for this variable. The discrete values of a are modeled

by means of an ordered Probit equation,

Pr(a) = Pr(ga+1 � XA�A + �A + � � ga); (9)

where XA is a vector of exogenous variables; �A is a vector of parameters; �A is the relevant

component of type; the gas are "cut" parameters; and � is a normal random noise parameter.

Note that the age at grade 6 entry is observed by the agent and by the econometrician.

We have introduced four random disturbance terms, assumed normal. These random

noise variables (�; �; �; �) are assumed independent, with zero means and respective variances

(�2� ; �2� ; �

2� ; �

2�). Given the above speci�cation, we have the standard identi�cation restrictions

�� = �� = 1.

Finally, we specify risk aversion as follows:

= X � + � ; (10)

where X is a vector of covariates and � is a vector of parameters.

The model doesn�t describe borrowing and savings. But the implicit assumption that

students are completely credit constrained should not be taken too literally. First, in the

context of France at the end of the 20th century, tuition and fees are close to zero for

most of the students. The latter are enrolled in public-sector institutions, essentially free of

charge. Student loans are rare and the market for these loans is underdeveloped. The direct

costs of education are �nanced through unobservable parental transfers and part-time jobs13.

Parental occupation being a proxy for parental income, using the former variable, we can

13The institutional context is very similar in Italy (e.g., Belzil and Leonardi (2013)).

10

control for some of the factors that determine the students�budget constraints. But given

our data, the e¤ect of credit constraints in the usual sense cannot be identi�ed14. A possible

interpretation of our speci�cation is therefore, as explained above, that we model individual

choices between risky career pro�les in which risk is essentially attached to the initial wage

level. The utility function captures attitudes towards this type of risk. This is not the risk of

unemployment. In contrast, it is clearly the ex ante risk a¤ecting the student�s future social

position.

2.2 Education choices

Now, the choice of an education level s is optimal for an individual with unobserved family

factors � and type � only if

�V (s+ 1 j � ; �) � 0 and �V (s j � ; �) � 0: (11)

Our model is therefore determined by equations (5)-(10) � giving wages, costs and delay as

a function of observable variables, unobservable type-values and random terms � and (11),

which characterizes the optimal schooling choice as a function of expected skill premia �fs,

observable characteristics X, type values � and unobservable family factors �. The students�

wage-expectations are rational in the sense that they are based on a wage equation estimated

with the help of the sample (so by assumption, the students know the true distribution of

wages).

We show in Appendix 1 that several variants of this model can be estimated. One

possibility is to let T go to in�nity and study the in�nite horizon version of the model with

a �xed value of � < 1. A second (unexplored) possibility would be to estimate � in the

in�nite horizon model with a �xed value of the risk-aversion parameter15 . In the following,

we �x a �nite value of T and set � = 1. As explained above, given our data, there is no

solid way of identifying the rate of time-preference. In the �nite-horizon, undiscounted case,

the maximum likelihood procedure can estimate (and thus identify) in a relatively natural

14For recent articles on the credit constraints faced by students and parental transfers, see Brown, Scholzand Seshadri (2012), and the survey by Lochner and Monge-Naranjo (2011).15The model is easily tractable if we set � = 1, T < +1 and utility is assumed logarithmic (i.e., = 1),

but of course we want to estimate .

11

way.

It will be convenient to use the de�nition � = � 1. It is shown in Appendix 1 that

the necessary condition for an optimal choice s, that is,

�V (s+ 1 j � ; �) � 0 � �V (s j � ; �)

is equivalent to

X1�1 + ks + cs � �c � � � X1�1 + ks+1 + cs+1 � �c: (12)

where by de�nition,

ks = �(1=�) ln (T � d̂s�1)� (T � d̂s) exp[��(�fs � (�=2)��2s)]

�d̂s

!; (13)

d̂s = E[ds j �]; �d̂s+1 = E(�ds+1j�) = d̂s+1 � d̂s; (14)

and �s = �s(�). The d̂s functions are the student�s expectations of the number of years

needed to complete level s.

Education level s is chosen by an individual only if her (his) unobserved factor � falls

in the above interval. Therefore, conditional on type �c, our theory has the structure of an

ordered discrete-choice model16, but with a particular functional form imposed on the cuts

ks + cs. The parameters of this ordered choice model are identi�ed if, as usual in ordered

Probit estimation, we impose the normalization �2� = 1.

We assume that the logarithm�s argument in the expression of ks, i.e., (13), is positive.

It has good chances of being positive if ��fs+1+(�2=2)��2s+1 is negative, or positive, but

small enough17. In practice, this has not posed many problems during the estimation phase.

Using expression (8), we easily �nd analytical expressions for d̂s and �d̂s+1. Given

the normality assumption and given that the age at grade 6 entry is observed by the student

and the econometrician, we have,

d̂s = E(dsj�) = � s exp[X2�2 + �a + �d] exp(�2�=2): (15)

16The model can therefore be viewed as a structural generalization of Cameron and Heckman�s (1998)ordered probit model.17If � > 0, this requires �fs+1 high enough and ��2s+1 small enough, i.e., in the risk-return, (�s; fs)-plane,

the slope of the risk-return curve should be steep enough. If, on the contrary, � < 0, we want the risk-returncurve to be �at enough and j�j should be small.

12

It follows that,

�d̂s+1 = E(�ds+1j�) = �� s+1 exp[X2�2 + �a + �d] exp(�2�=2); (16)

and we make use of the fact that, under normality, E(e�) = exp(�2�=2). From these formulae,

we can easily derive a closed-form expression for the crucial inequality �V (s+ 1 j � ; �) � 0.

Inequalities (12) being only necessary conditions for an optimal education level s,

we must also make sure that cs + ks < ks+1 + cs+1 for all s > 0, to guarantee that the

model�s probability distributions are well-de�ned and that s is indeed a maximum of V .

This cut-monotonicity property will be satis�ed if �fs+1 � �fs (i.e., �concave� returns),

�ds+1 > �ds (i.e., �convex�opportunity costs), and cs+1 � cs. But the latter conditions are

not necessary for cut-monotonicity: this property can still hold when returns to education

are increasing (i.e., if �fs+1 > �fs), provided that they do not increase too much.

At this stage it is useful to remark that the model is essentially a static representation

for a complex dynamic process that we do not observe. Everything is as if the students chose a

level of investment s at, say, the age of 13, clung to their project and bore the risks associated

with variations in the time-to-credential ds. Full sequential rationality would require the

student to revise his plan and beliefs at the end of every year, deciding to continue or to

quit next year, based on new information about success, failure and outside opportunities.

Our data do not allow us to study this learning process, which is learning about one�s own

ability, because we do not observe the history of successes and failures18.

In essence, the model is a standard decision problem under uncertainty, based on

plain expected utility theory, with intertemporally separable utility. We combine this with

independence of durations and wages conditional on the unobservable types �. The condi-

tional independence assumption is standard in the literature on unobserved heterogeneity.

Then, using the CRRA speci�cation of utility, we derive a mean-variance model, as shown

by Eq. (12) and (13) above. A consequence of these standard assumptions is also that

expected utility V is linear with respect to expected durations d̂s, and, as noted by a referee,

18It would be possible to model the individual as choosing the duration of schooling d and bearing therisk in the �nal outcome sd. In this alternative model, the student decides to study during d years, with thegoal of earning the highest possible credentials during the planned period. Given that the education level sis the measure of human capital here, it seems natural to choose s as the decision variable, and consider thedurations ds as random cost factors.

13

the variance of durations appears only indirectly, in the expression of conditional mean du-

rations (Eq. (15) and (16)). Thus, we can exploit the inter-individual variations of �2�, �2s,

d̂s and mean log-wages in estimation, but we are constrained by the speci�c form, derived

from expected utility maximization19.

2.3 Identi�cation

The identi�cation of unobserved heterogeneity parameters and the exclusions needed for the

identi�cation of risk aversion are the interesting questions here. First of all, apart from the

nonlinearities in the ordered choice equation determining education and the special structure

implied by risk aversion, the model is a fairly standard collection of linear regressions and

Ordered Probits. Hence, insofar as �nite Gaussian mixtures are identi�ed20, the log-wage,

the delay, and the age at grade 6 equations would be identi�ed, equation by equation. If

we now look at the model as a whole, our main identifying assumption is that (�; �; �; �) are

independent normal variables, conditional on �. In other words, the correlations between

error terms in the various equations are assumed to be captured entirely by the types j =

1; :::; K and type-values �j. If we put the covariates X aside for a moment, the model has

4 random sources (giving rise to the four equations): log-wage, delay, education and age at

grade 6. If we observe Q random sources, we can estimate (at least) Q means and Q(Q+1)=2

variances and covariances. When Q = 4, this amounts to 14 parameters. Suppose now that,

in the same context, unobserved heterogeneity is modeled with the help of K latent classes

(plus independent normal noise). We need to estimate (2 + Q)K parameters of the type

values: there are QK parameters shifting the constants in the Q equations and we add 2K

parameters for �� in the variance of wages (expression (6) above), and � in the risk aversion

model (expression (10) above). To this we add K � 1 parameters because we also estimate

the probabilities pj of the K latent types and �nally, one additional parameter because we

estimate the variance of delay �2�. In total, there are (Q + 3)K = 7K parameters to be

estimated, given that Q = 4. We �nd that 7K � 14 if and only if K � 2, so that, in

19The means and variances of ln(w) and ln(d=�) appear in the choice model through parameters fs, bd,�2s, �

2�. The variance of durations plays a role in expressions (15), (16) because durations are log-normal

random variables. As we will see below, �2� plays a genuine role.20See, e.g., McLachlan and Peel (2000), Jiang and Tanner (1999), Geweke and Keane (1997).

14

principle, we can estimate a model with two types without demanding more from the data

than a very basic approach � and of course, we can in principle demand more, because we

considered the �rst two moments of our random sources only. This provides some intuition

for the reason why the model is identi�ed21. In practice, we have been able to estimate

three types with the full sample. A fourth type can be identi�ed, but it has a very small

probability. The identi�cation is parametric, rests on the �nite mixture assumption and on

the independence of �, �, � and �. It may be that a deeper identi�cation result can be proved

for the unobserved heterogeneity in the entire system of equations, but in this type of model,

nonparametric identi�cation is an open research question, out of the scope of the present

paper22.

The other nontrivial point is the identi�cation of risk aversion parameters in the

education choice equation. Remark �rst that parameter can be interpreted in two ways.

We use a standard structural model of choice. In this model, the CRRA risk-aversion

parameter and the intertemporal elasticity of substitution 1= are two aspects of the same

thing. Standard economic theory uses this work-horse model in which a single parameter

measures two apparently di¤erent things. Given this, if we identify risk aversion, we also

identify the intertemporal substitution parameter, and conversely. Thanks to its role as a

measure of intertemporal substitution, can be identi�ed by variation in duration across

individuals (due to factors such as age at which school starts) even if there is no uncertainty

in wages or durations. Since the data also exhibit variations in the riskiness of wages across

individuals, we can exploit both the risk-aversion and intertemporal-substitution roles of

in estimation.

Although we o¤er no formal proof, it is easy to provide intuition for the reason why

the risk-aversion function is identi�ed. First of all, parameter � = � 1 is identi�ed

through variations of ks that are su¢ ciently independent of variations in X1. An inspection

of expression (13) shows that variations of ks accross individuals have two causes: inter-

individual variations in the riskiness of wages �s and variations in the predicted durationsbds. It may seem surprising at �rst glance, but the riskiness of wages is not essential, because21It is of course possible to write down the 14 equations with 14 unknowns.22See e.g., Bonhomme, Jochmans and Robin (2013).

15

the other source of risk, due to delays, is able to identify risk aversion alone. To be more

precise, variations in the individual�s prediction of his school-leaving age, namely, variations

in the d̂s, play the crucial role. Indeed, we estimated a variant of the model in which the

variance of wages didn�t depend on individual characteristics, and was still identi�ed. If

we go back to expressions (15) and (16) above we see that d̂s and �d̂s both vary with

observed covariates Xi2 (including age at grade 6) and therefore vary with the individual i.

Being expectations, d̂s and �d̂s do not depend on �i, so the identifying power comes from

variability in the observed covariates23 X2. But one variable at least must be excluded from

the list of covariates in the education cost function, i.e., from X1.

To clarify this point, we now show that a linear approximation of the education choice

model can in theory be identi�ed by means of just one reasonable exclusion (i.e., with the

help of one instrument). Let�s consider a simpli�ed, linear speci�cation of the model with

two equations (the other two equations could be added without di¢ culty and without any

essential change in the reasoning). Suppose that the delay equation was speci�ed as follows,

ln(di=� si) = Zib2 + Aia2 + �i;

where a2 and b2 are the coe¢ cients of age at grade 6 entry, denoted Ai, and of a vector of

instruments, denoted Zi, respectively. Estimation of the delay equation yields estimates ba2,bb2, and b�2�. Assume now that the choice of an education level directly depends on (Ai; Zi)and also indirectly, through the expectation E[(di=� si) j s], as follows,

Pr(si = s) = Prfcs � Zib1 + Aia1 + lnE[(di=� si) j s] + "i � cs+1g:

Substituting the expression for lnE[(di=� si) j s], we obtain,

Pr(si = s) = Prfcs � Zi(b1 + b2) + Ai(a1 + a2) + (�2�=2) + "i � cs+1g:

We get estimates of the latent-index coe¢ cients b�Z and b�A, on Ai and Zi respectively.

Parameter (and the whole model) is therefore identi�ed if we can solve the system

b�A = a1 + ba2;b�Z = b1 + bb2;23In addition the individual predictions given by (15) and (16) vary with s and with the individual�s type.

16

with respect to (a1; b1; ). It is immediate that one exclusion is needed: either a1 = 0 or

one coordinate of b1 should be set equal to zero24. In this model, as remarked by a referee,

any variable explaining education costs should also normally explain delay. It follows that

a natural exclusion is to set a1 = 0: the age at grade 6 should be excluded from the list

of variables determining education costs directly. This exclusion is natural, given that our

education costs are incurred in secondary and higher education. The age at grade 6 is a source

of variation of delay that is predetermined and should not directly a¤ect the subsequent costs

of education25. We will see that the age at grade 6 dummies do indeed very signi�cantly

a¤ect delay, with the expected sign.

In theory, the exclusion of A from the cost function should be su¢ cient for identi-

�cation, but in practice, it was not enough: the maximization algorithm converged and we

got parameter estimates, but the Hessian matrix could not be inverted (we couldn�t obtain

standard deviations for the estimates). This weak identi�ability problem explains why we

also excluded some cost-shifters from the delay equation. To be more precise, we excluded

the distance-to-college and school-density instruments from X2, as explained below. A pos-

sible interpretation for this more restrictive speci�cation could be that delay is a measure

of an individual�s ability, that is relatively independent of the individual�s environment. In

other words, the exclusion is justi�ed if delay is mainly determined by cognitive skills, and

not a¤ected by observed real-life impediments like transportation costs.

The simpli�ed variant studied above shows that the model would be identi�ed in a

semi-structural, linear framework and hence, that the identi�cation of a single parameter

would not fundamentally rely on functional form. But clearly, we still need to estimate

the e¤ect of observed family-background characteristics X and that of unobserved types �

on risk aversion. This is permitted by the nonlinear functional form of ks functions. But

the nonlinear structural form is not arbitrary; it is entirely dictated by economic theory26.

However, these re�nements are not essential. Our main results do not depend on the existence

24Of course, is only related to risk aversion in this simpli�ed model: it is not the risk-aversion indexitself.25Note that, in constrast, A is not excluded from the wage equation.26This has the obvious advantage that structural parameters, such as risk aversion, have a clear interpre-

tation.

17

of heterogeneity in risk-aversion, obtained in this manner. Subsample estimation shows that

risk aversion parameters do vary with family background, as will be seen below. In the

supplementary material appendix, we estimate a simpler variant of the model in which risk

aversion doesn�t depend on unobserved heterogeneity. Our dataset is rich and many things

are possible.

Finally, A is potentially endogenous, since it may capture some aspects of unobserved

�ability�. This is why we jointly estimate an equation explaining A itself, in which the month

of birth is used as an instrument27 (equation (9) above). The reasons why the month of birth

has a positive impact on age at grade 6 are complex. There is a recent literature on these

relative maturity e¤ects28. The main reason why this impact is positive and signi�cant is

probably that students who are relatively older in their �rst grades simply tend to have better

performances, all other things equal. In any case, our results are robust to the introduction

or deletion of the A equation. In other words, with our data, it happens that age-at-grade

6 can be treated as exogenous without any essential change in the results. The addition of

an equation explaining A doesn�t seem to change the type-vector distribution in an essential

way. Given that some important social and family-background controls have been introduced

in the delay equation, it is likely that the e¤ect of A on delay is estimated with little bias.

In the appendix, we also report on an attempt at estimating a discount factor �

by grid-search on a slightly simpli�ed version of the model. This yields very high values,

between 0:995 and 129. In any case, as explained above, there is no good empirical basis in

our data to identify time preference, since we observe wages at the beginning of a worker�s

career only.

3 Data

To perform the estimations presented below we used �Génération 92�, a large scale survey

conducted in France. The survey and associated data base have been produced by CEREQ

27The month of birth variable ranges from 1 for students born in January, to 12 for students born inDecember. To simplify the model, the age at grade 6 equation is a simple probit, instead of an orderedprobit. We explain values of this age higher than 12 (or smaller than 12) by means of a probit.28See, e.g., Bedard and Dhuey (2006), Grenet (2008), Mahjoub (2008).29See Appendix 6.

18

(Centre d�Etudes et de Recherches sur les Quali�cations), a public research agency, work-

ing under the aegis of the Ministry of Education30. Génération 92 is a sample of 26; 359

young workers of both sexes, whose education levels range from the lowest (i.e., high-school

dropouts) to graduate studies, and who graduated in a wide array of sectors and disciplines.

Observed individuals have left the educational system between January 1rst and December31

31rst, 1992. They have left the educational system for the �rst time, and for at least one

year in 199232. The labor market experience of these individuals has been observed during 5

years, until 1997. The survey provides detailed observations of individual employment and

unemployment spells, of wages and occupation types, as well as geographical locations of

the students at the age of entry into junior high-school (roughly 11), and in 1992, when they

left school. The personal labor-market history of each survey respondent has been literally

reconstructed, month after month, during the period 1993-1997, by means of an interview.

Before 1992, the individual�s educational achievement is also observed.

In the following, we distinguish education from years of education. A degree, or a

group of degrees, is called an education level. Education levels are dummy variables; they

are used below instead of the years-of-schooling to measure human capital and to estimate

returns to education. We have created 6 education levels: (i) the high-school dropouts; (ii)

the vocational high-school degree holders33; (iii) those who passed the national high-school

diploma, i.e., the baccalauréat34; (iv) two years of college35, (v) four years of college; (vi)

graduate studies (5 years of higher education or more).

Some people �nish school more quickly than others, given their highest degree36. First

de�ne as normal age the �normal�number of years needed to reach the individual�s grade,

30 Articles and descriptive statistics, concerning various aspects of the survey, are available at www.cereq.fr.31 To �x ideas, the number of inhabitants of France who left school for the �rst time in 1992 is estimated

to be of the order of 640,000.32 They did not return to school for more than one year after 1992, and they had not left school before

1992 except for compulsory military service, illness, or pregnancy.33 i.e., the so-called Certi�cats d�Aptitude and Brevet d�Etudes Professionnelles.34 Grade 12 students in the US correspond (roughly) to the French classe terminale, and the students of

this grade sit an examination called baccalauréat. There exist vocational variants of the diploma.35 The corresponding exam is called DEUG (Diplôme d�Etudes Universitaires Générales), which is the

equivalent of an Associate�s degree, or DUT (Diplôme Universitaire de Technologie). There are exams atthe end of each of the college years in French universities, and the DEUG or DUT correspond to the end ofgrade 14.36We have studied this variability in depth in another paper, Brodaty et al. (2010).

19

sit the exam and earn the corresponding degree (if there is one): it is a conventional age,

associated with each individual�s school-leaving degree. For a given degree or certi�cate,

normal age is thus the age of those who earned this degree or certi�cate without any grade

repetition or delay of any kind. It is not the average completion age. For instance, the high-

school dropouts have a normal age of 13 years; the vocational high-school degree holders

have a normal age of 16 or 18 years, depending on the category of their certi�cate; (iii)

those who passed the national high-school diploma have a normal age of 18; (iv) two years

of college correspond to a normal age of 20, and so on. A substantial part of the variance

of school-leaving age, conditional on education level or degrees (see Appendix 2), happens

to be due to grade retention. Grade repeaters are quite common, even in college37. Delays

are thus generated by grade repetitions in primary, secondary and higher education. The

e¢ ciency of grade repetitions in primary and secondary education is of course a hotly debated

issue, but until today, the institution has survived. To measure inter-individual di¤erences

in time-to-degree, we then created a variable called delay, de�ned as observed school-leaving

age divided by normal age38. We also observe the individual�s age at grade 6 entry, allowing

us to measure the delay accumulated during primary education39.

Each individual�s curriculum on the job market is an array of data including a number

of jobs, with their corresponding wages and durations in months, and unemployment spells,

again with a length in months. To estimate the returns to education, we rely on a single,

scalar index of earnings for each worker. We simply take the arithmetic average of the full-

time wages earned during full-time employment spells, weighted by their respective spell

durations. In the following, this index is called the mean wage40. For descriptive statistics

37 Freshmen repeating the �rst and second years of college are quite common.38For instance, an individual who �nished high school and passed the national examinations (i.e., the

baccalauréat) at the age of 19 has a delay of 19=18 = 1:055, and the average age of those who left schoolwith a baccalauréat is 20:78. The national high-school diploma is required for admission to colleges (i.e.,Universités) in France. We adopt the following convention: a person who passed the baccalauréat at the ageof 18 and spent two years in college but failed to pass an Associate�s or any equivalent degree has a normalage of only 18 (which corresponds to that person�s highest degree) and would have a delay of 20=18 = 1:111years. Figure A2, in Appendix 2, depicts the empirical distribution of the logarithm of delay.39The age at grade 6 entry depends on family background in a striking way. Figure A6, in Appendix 2,

shows that the average value of the variable is much lower for the sons of professionals than for other malestudents.40We have studied other statistics: the wage earned by the individual in his �rst full-time job, and the

last wage, earned in the last full-time job observed. We also computed measures of earnings, which takeunemployment spells and unemployment bene�ts into account. The results obtained are similar and are not

20

and further details41, see Appendix 2. We also observe the student�s age at grade 6 entry,

and the student�s month of birth (which will be used as an instrument for age at grade 6).

Part of our covariates are based on data with a geographical structure. Using a

�le from the National Geographical Institute42, we obtained a measure of local population

density in the town of residence at the age of grade 6 entry, which we use as an additional

control. A number of other variables are based on inter-county variation, where by county we

mean the French département43. In particular, we constructed a battery of school-opening

instruments for education, using a �le from the Ministry of Education (the Base Centrale des

Etablissements) which lists all high-school and two-year college openings in the country since

the 50s. The �le enables one to distinguish between vocational and general high-schools. In

France, the 1980s witnessed a rapid growth in the number of vocational high-schools44 (i.e.,

to be precise, of the lycées professionnels and lycées techniques). Both curves are strongly

increasing and correlated in the 70s and 80s. Interestingly, the data exhibit a substantial

degree of inter-county variability in the stock of vocational high-schools, per capita of 15-

to-19-year-olds45. In a recent paper, Currie and Moretti (2003), have used the same kind of

school-opening per capita, measured in the years when the individual was at a crucial age,

say 17 or 18. Here, given the structure of our data, we must avoid a potential problem of

negative correlation of the individual�s education with the high-school stock. This correlation

would simply re�ect the fact that educated students are older at the end of their studies and

therefore experienced an environment with less high-schools during their teens. To avoid this

problem, we have chosen to �x the year at which the stock is evaluated. The choice of 1982

as a �xed point in time, ten years before the school-leaving year of students, characterizes

the school-supply environments, roughly around the age of junior high-school entry. When

reported here.41On top of this, the survey provides information on family background. The father�s and the mother�s

occupation in 92, the father�s and the mother�s education are the most important of these variables. We knowthe geographical location of the student�s family at the age of junior high-school entry and the student�slocation at school-leaving age (i.e., in 1992). Location is rather precise since we know the code of eachcommune, and there are more than 36; 000 communes in France.42 i.e., Institut Geographique National.43 There are 95 départements in France.44We used INSEE Census data to obtain the population shares of age groups. Figure A3 shows the

historical development of the national stock of such schools, and displays a per capita version of this measure,namely, the stock divided by the number of 15�to-19-year-olds in the county.45The density of this county-level per capita measure in the year 1982 is depicted on Figure A4.

21

used to explain years of schooling or the highest degree, the instruments based on vocational

high-school openings in each county happened to be the strongest46. Further details on the

sample and sample selection are provided in Appendix 2.

Finally, we used a distance-to-college instrument in combination with (or as an al-

ternative to) school density. Based on detailed information about the geographical location

of students at the time of grade 6 entry, the distance (in kilometers) of the individual�s

residence, at this particular moment of his life, with the nearest college (i.e., the nearest

université) can be computed. This measure of distance really varies at the individual level.

4 Estimation Results

The model has been estimated with the help of the data set presented above, by straight-

forward Maximum Likelihood. It is easy to describe the estimation method: all equations

have been estimated simultaneously using a likelihood maximization algorithm, as described

in textbooks47. The sample includes 12,310 young men. The model has been estimated with

the entire sample �rst. We then used two subsamples to check for possible di¤erences in

risk aversion and other parameters in two social subgroups: the sons of professionals, and

the rest of the population. Appendix 4 provides simulations allowing an assessment of the

model�s goodness of �t: it has good performances in that respect. We describe the main

results below.46Further details and tests on these variables are given in Brodaty et al. (2010). Now, one might argue

that it is not the stock of high schools itself that plays a role, but its growth rate or �rst di¤erence. Wethen also computed the variation of our county-level stocks of vocational schools between two �xed pointsin time, namely between 1989 and 1982, and used the variation as an instrument for education. These yearscover the relevant time span during which most of our students were teenagers. Again, with this de�nition,the years at which temporal variations are evaluated do not depend on the individual�s age; these variationsdepend only on the individual�s county of residence at the age of junior high-school entry. We used theseintruments in a preliminary version of the present paper: this doesn�t lead to substantially di¤erent results.47In particular, neither the use of an EM algorithm, nor ine¢ cient equation-by-equation methods, that

would decompose estimation in steps, were necessary, except to obtain good preliminary estimates of mostparameters.

22

4.1 Results obtained with the whole sample

We start with the choice of the number of types48. The model has �rst been estimated

without any unobserved heterogeneity, i.e., with no types �; the log-likelihood is equal to

�1:1086 for this variant; results are given in Table X1, which is displayed in the supplemen-

tary material section of this paper49. The log-likelihood increases to �0:9092 when two types

are introduced (see Table 1), and the model without unobserved heterogeneity is clearly re-

jected by the Likelihood ratio test (i.e., to be precise, by the Vuong test; see Vuong (1989)).

Table 1 is described below and has the same structure as the tables presenting variants in

the supplementary material section. It is possible to estimate a third type with the whole

sample, the probabilities of the three types being (:16; :46; :38). The third type raises the

log-likelihood again, to �0:8503. The results for the three-types case are presented in Table

X2, in the supplementary material section. A fourth type cannot be estimated in a useful

way, since the probability of this fourth type is always very close to zero. We nevertheless

chose to present the model with two types for the following reasons: �rst, with the complete

sample, the di¤erences in estimated parameters between two and three types are limited, and

second, this permits a full comparison with the subsample estimates, since subsamples do

not allow to estimate three types. We conclude that introducing two or three types improves

our description of the data, but this doesn�t completely change the picture: there are some

di¤erences in estimated returns to education and in the variance of wages, but one cannot

say that the main qualitative results are driven by a special way of modeling unobserved

heterogeneity50.

To push this robustness check further, another variant of the model has been esti-

mated, this time with a limited e¤ect of unobserved types. The results are diplayed on Table

X3, in the supplementary material section. In the latter variant, the types intervene in the

log-wage and schooling choice equations only, but the other equations do not vary with types.

This variant, with a likelihood equal to �1:1020, is not much better than the variant with48Recall that each type is a vector of six given "constants".49See also the corresponding author�s web site.50Types add �exibility by allowing the random disturbances to become mixtures of normal distributions

instead of plain normal distributions. They of course also permit a treatment of endogeneity, because randomdisturbances are no longer independent.

23

no types at all. Again, changes in the other coe¢ cient estimates are limited.

We now comment on Table 1. First of all, the model estimated here is a slightly

simpli�ed version of the model described above, in which the variance of ln(d=�), i.e., �2�,

is a constant51. The results for the more sophisticated version of the model, in which �2�

varies with some covariates, are presented in Table 1B (that is commented below). In any

case, the di¤erences between the two variants are very small. Table 1 is divided into two

panels, Table 1a and Table 1b, presenting the estimation results for the complete sample and

two types52. From left to right, Table 1a gives the estimated coe¢ cients for the log-wage

equation, the variance of log-wages, and risk aversion. The t-statistics are given to the right

of each coe¢ cient estimate. The wage equation shows the signi�cance of family background

and lists the estimated �fs, the returns to education levels, very precisely estimated. Given

that each level s takes around 2 years, the estimated returns are between 5 and 8% per year,

and they are somewhat decreasing. These values are in line with most of the literature on

the subject. The last two lines of the �rst column give the estimated type values �1w and �2w.

Next, we �nd the estimated parameters of the function explaining the variance of

wages. Family background is no longer signi�cant, but wage riskiness is clearly increasing

with education. The reported �gures are the estimated values of ��2s=�2s�1, so they can be

read as percentage increases. These values are substantial and highly signi�cant53. When

expressed in standard-deviation terms, the wage-risk increase due to education remains non-

negligible (��s=�s�1 is very roughly one half of ��2s=�2s�1 for small values of ��

2s). The last

two lines of the third column give the estimated values of the type coordinates �1� and �2�.

The two rightmost columns of Table 1a display the risk aversion function estimates.

The father-went-to-college dummy is signi�cant and increases risk aversion. This result

will be con�rmed below by subsample estimations of the model. The last two lines of the

51In other words, the model speci�ed by equation (8b) above is replaced by a constant coe¢ cient.52Tables X1 to X7 in the supplementary material section have a similar structure.53To have a better view, we express these changes in terms of percentage increase in the standard deviation.

Some elementary algebra yields,

��s�s�1

= �1 +s1 +

��2s�2s�1

:

Precise computations show that if ��2s=�2s�1 = 0:53 then ��s=�s�1 = 0:23; ��2s=�

2s�1 = 0:3 yields

��s=�s�1 = 0:14 and ��2s=�2s�1 = 0:16 yields ��s=�s�1 = 0:07.

24

�fth column give the estimated values of the type coordinates �1 and �2 . Note that these

parameters are very precisely estimated. It follows that risk aversion is 0:654 for a type 1

and 0:721 for a type 2 � add 0:018 when the father went to college. The di¤erences between

the values are highly signi�cant. We thus �nd a moderate value of risk aversion, the utility

u exhibiting less risk aversion than the logarithm but more than the square root.

Table 1b displays the results for the remaining three equations of the model, namely,

the education, delay and age-at-grade-6-entry equations. The �rst column of Table 1b lists

the estimated parameters of the ordered probit on education levels. Family background has

a signi�cant impact on education. These results con�rm a number of well-known facts54. It

is interesting to note that our instruments for education are reasonably strong: three out of

four of them have a very signi�cant impact on education. The 4 variables are excluded from

the other equations. The �rst instrument is based on measures of local high-school supply

as described above. France has two kinds of vocational high schools; the lycées techniques

and lycées professionnels. We translate these two categories into English as respectively the

technical and vocational high schools, to simplify the presentation. To be more precise, the

�rst instrument is based on the stocks of vocational and technical high schools per capita of

15-19 years old in the year 1982, and in the county of residence at the age of grade 6 entry55.

The �rst two distance-to-college dummies, indicating the second and third quartiles of the

distance distribution are also signi�cant, with the expected sign56.

The cost parameters cs are also precisely estimated, and clearly increasing with s.

The rest of the variation of education is due to the variability of schooling durations ds,

wage risk �s and risk aversion, through functions ks.

The third column of Table 1b gives the results for the delay (i.e., ds=� s) equation.

The social-background controls have a signi�cant impact on delay, more or less as expected:

54We would have obtained a more detailed account of the impact of family background and of environmentalvariables on education with additional controls, but we have kept a small number of key variables only, tolighten the computational burden of ML estimation.55The instrument is a dummy, taking value 1 when the stocks of vocational and technical high schools,

divided by population aged 15-19, in the county of residence at grade 6 entry, are both greater than theirmedian values.56We know from Brodaty et al. (2010), that, as noted by a referee, the distance-to-college instruments

have more impact on the education of the sons of professionals, while the school-supply or school-openinginstruments work better with the rest of the sample.

25

educated parents reduce delay. A larger age at grade 6 entry very signi�cantly increases

delay.

The reader may be worried by the fact that the stock of high-schools instrument

varies only at the county (i.e., département) level (there are more than 95 counties, though).

This is, among others, a reason for which we added the distance-to-college instrument57.

The distance varies at a much �ner level, because we know the town (i.e., the commune)

of residence of individuals in 1982, and the measure of distance is based on this commune

of residence. There are 5920 communes in the sample and 12310 individuals. This yields

roughly 2 students per commune. In the sample, 98% of the communes are represented by less

than 10 individuals; 67% of the sampled individuals were located in communes including less

than 5 other sample members; 90% of the sampled individuals are located in communes with

less than 20 other sample members. To check for robustness, we re-estimated the model

completely with di¤erent sets of instruments for education levels. In the supplementary

material appendix, Table X4 shows a variant in which the stock of high-schools variable is

not used (only the distance variable). Table X5, on the contrary, uses the stock of high-

schools variable only. Table X6 shows a complete re-estimation of the model, based on the

subsample of individuals that is obtained when we remove all communes that are drawn more

than 20 times. This amounts to removing 10% of the communes, and this trimming leaves a

sample with 11107 students (90% of the initial sample). Table X7 gives an estimation of the

model, obtained while removing the controls for geographical characteristics (like the density

of population). These variants do not show substantial changes in the precision with which

parameters are estimated. For instance, adding the distance instrument seems to increase

the standard errors of some estimates, but only slightly. The results of all these variants are

in essence very close. We conclude that biases, that would be due to a potential clustering

problem, are not a concern here.

Table 1B presents a close variant of the model described by Table 1, in which the

variances �2s and �2� vary with the age at grade 6 entry (as speci�ed by Eq. (8b)). Table

1Ba is similar to Table 1a with one di¤erence: in the middle columns, we now see that the

57 We also added the density of population in the commune as a control in the education and wageequations.

26

variance of log-wages varies, very signi�cantly, with age at grade 6. The latter age variable

doesn�t play an important role in the formation of expected wages, but, in contrast, has a

signi�ciant impact on conditional wage risk. The lower panel, Table 1Bb, has additional lines

to report the values of coe¢ cients !a, and to be more precise, they give the impact of age at

grade 6 dummies on the standard deviation of durations ��, expressed in standard-deviation

units (age 11 being the reference). This impact is very signi�cant too. With this model we

can clearly have variations in the mean of log-durations that are independent of variations in

the variance of log-durations58. Table 1B shows that the variances of delay and wages vary

at the individual level, and that this inter-individual variability can be used to improve the

identi�cation of the education-choice model and risk-aversion, as discussed above. But, in

practice, if we may be reassured to see that �ne variations in risk play a genuine role in the

identi�cation of the risk-aversion parameters, the numerical di¤erences with the (slightly)

simpler version are small. This is why, in the following, we use the slightly simpli�ed version

as our benchmark. Intuitively, given the model and the data, the behavioral parameter

denoted captures the e¤ect of wage and duration risks on individual decisions; it doesn�t

simply re�ect intertemporal substitution or intertemporal tradeo¤s59.

Finally, the results of Table 1 have been obtained with a particular value of the horizon

T , chosen equal to 64. There is an element of arbitrariness in this choice, so that we tried

many values of T , ranging from 60 to 115. This does not cause any important changes. Table

2 shows the estimated values of the risk-aversion parameter for the two types, obtained when

T varies. Risk-aversion increases with T and varies between :6 and :88, but the qualitative

result of a risk aversion smaller than one is unchanged. The values of cs increase with T ,

but the other parameters of the model do not vary much, and are not reported. To sum

up, estimated values of stay in the same range for a large interval of reasonable values of

T . Note that the horizon T changes the relative weights of the education and work periods

in a student�s life, but that constants of the education cost functions hs can be rescaled to

counterbalance the change in T . It seems more �realistic�or more appropriate to use values

58More precisely, we can vary the family background, while keeping age at grade 6 �xed. Thus, the meanof ln(d=�) and the variance of ln(d=�) may vary independently, but the variance of d and the mean of dcannot, because d=� is assumed log-normal.59In this standard model, measures both risk-aversion and the intertemporal elasticity of substitution,

as is well known.

27

of T around 60 years, or more, than to use small values, because T is like the length of

life60. Finally, the exact value of is probably less important than the di¤erences that we

�nd in the values of for di¤erent groups of individuals, with an impact on their investment

behavior. This is why we study subsamples in the next subsection.

Keane and Wolpin (2001), Belzil and Hansen (2004) and Sauer (2004) have estimated

dynamic-programming models of educational choices, in which a risk-aversion parameter can

be inferred from individual schooling decisions. In their models, individuals are heteroge-

neous with respect to ability, but share the same degree of constant relative risk aversion.

Keane and Wolpin �nd a RRA coe¢ cient equal to 0:49; Belzil and Hansen (2004) �nd a RRA

coe¢ cient around 0:9, and Sauer (2004) �nds a RRA equal to 0:77. These values are all rel-

atively close to ours and small as compared to the estimates obtained in the macroeconomic

literature.

Chetty (2006) �nds bounds on the RRA coe¢ cient derived from labor supply behav-

ior. Given the available evidence on labor supply, he �nds that RRA should be smaller than

2 and is very likely around 1 (logarithmic utility). Finally, Belzil and Leonardi (2013) use

Italian panel data in which individual di¤erences in risk attitudes are measured by answers

to a lottery pricing question; they also �nd that individual speci�c risk aversion acts as

deterrent to higher education investment.

To sum up, we conclude from these �rst results that di¤erences in risk aversion really

matter and that letting these di¤erences play a role substantially improves the description

that we can make of the data. Unobserved heterogeneity can be captured by unobservable-

type e¤ects on wages, wage risk, education costs and delay and, in spite of this �exibility, the

likelihood is maximized when we allow for an additional and speci�c role for risk aversion.

60To see this, assume for instance that the rate of increase of wages due to potential experience is �% peryear while the discount rate is �% per year. Using our speci�cation of utility with = 0:5, it is easy tocheck that this is tantamount to choosing � =

p1 + �=(1 + �). So, with the reasonable values � ' 3% and

� ' 1:5%, we get � ' 1. This tells us that long values of T are probably more appropriate, because T isroughly like the length of life.

28

4.2 Subsample estimations

To check if risk aversion plays a di¤erent role in di¤erent social groups, we have reestimated

the model with two complementary subsamples. We have sorted out the sons of highly

educated parents, called the sons of professionals. To be precise, any individual with at least

one parent in the following occupational categories: executives, doctors, lawyers, engineers

or teachers, is included in this subsample. There are 2,315 such individuals. The other

subsample contains the rest of the individuals (and includes the sons of farmers, craftsmen,

middle managers, white-collar employees and blue collars), that is, 9,995 individuals.

Table 3a-3b and Table 4a-4b are the equivalents of Table 1a-1b for the subsample of

students whose parents are professionals and its complement, respectively. In other words,

these tables have the same structure as that of Table 1a-1b, allowing for easy comparisons be-

tween subgroups. In each subgroup, 2 types essentially exhaust the unobserved heterogeneity

(a third type would have a very small probability).

It may be surprising to see that the sons of professionals are signi�cantly more risk

averse than the others. In this �rst subgroup (Table 3a-3b), risk aversion ranges from 0:78

to 0:81. Risk aversion is only 0:69 among the sons of �non-professionals�� add 0:026 if the

father went to college. The precision of these estimates is very high, and the di¤erences

are signi�cant. In fact, as will be seen in the simulations below, these di¤erences play an

important part, and the overall picture is rather di¤erent for the sons of professionals and

for the other group.

The returns to education are clearly higher for the sons of professionals, but the slope

of the risk-return curve is smaller, since higher returns are associated with higher increases

in risk. The socially privileged students also have lower estimated education costs, in the

form of higher values of �c and strikingly lower values of the cs. The latter parameters

are estimated with good precision only in the relatively �disadvantaged� subgroup. The

relatively disadvantaged group also has greater delay on average and the instruments have

a stronger impact. This is reasonable. To sum up, it seems that the professionals�sons are

more exposed to wage risk and more risk averse than the average, but the other group, in

spite of being less risk averse, is hampered by much higher cost parameters. In other words,

29

environmental and family background disadvantages more than counterbalance the lower

risk aversion.

The interpretation of these results depends on what is captured by our risk-aversion

parameter . Given that we do not model insurance opportunities or contingent transfers

of various resources from the parents to the students, our risk-aversion parameter is likely

to re�ect di¤erences in access to insurance as well as individual psychological traits. It is

particularly hard to disentangle the two things by means of econometric methods, given the

currently available data sources. It then seems natural that relatively disadvantaged sub-

sample members be less risk-averse. Young males planning to become blue collars are taking

little risks in a society in which there exists a minimum wage legislation and unemployment

insurance is generous: risk and return are both relatively small, as well as risk aversion. In

contrast, going to college or to graduate school is a much riskier investment. Members of

the disadvantaged group have nothing to lose, as compared to the privileged group for which

educational failure may mean a drop in social status. In spite of these arguments, the result

remains surprising, because we are accustomed to think that children raised in wealthier

families have better insurance opportunities than other children. Our model o¤ers another

reasonable explanation: their education costs are markedly lower. Presumably, this is due

to unobservable intergenerational transfers.

4.3 Human Capital Risk-Return Curves

An important output of the model is a set of risk-return pairs for each education level and

each type. Formally, for the agent characterized by (�;X), these risk-return pairs are of the

form (�s(�); fs(�)); where fs(�) = fs+ �w +X0�0. Recall that �s(�) = exp(�� +X�� + zs).

Given that wages are assumed lognormally distributed, the mean of ws is in fact ems(�),

where ms(�) = fs(�) + �2s(�)=2. Joining the points (�s(�);ms(�))s=0;:::;n yields a �risk-

return curve�. It is not di¢ cult to check that a decision-maker characterized by (�;X) and

willing to maximize expected utility on this set of points would equivalently like to maximize

ms(�)�( =2)�2s(�), under the CRRA assumption. This means that we can view the decision

maker�s indi¤erence curves as quadratic in the (�;m) plane, as usual in portfolio theory.

Figure 1 represents the average risk-return curves obtained with the entire sample, for each

30

of the two types, averaging over the entire sample (while the type is �xed). Some interesting

di¤erences are uncovered if we compute the same risk-return curves with subsample estimates

of the model. Figure 2 shows the average risk-return plots for each of the two types in the

sons of professionals� subpopulation. We see that type 1 has signi�cantly higher returns

and higher risks than type 2, and that risk increases substantially with higher education.

Figure 3 shows the risk-return curves obtained with the rest of the sample. It is surprising

to see that, in this subsample, risk, and the highest returns, are lower. Note also that type

2 is dominated by type 1, in the sense that type 2 has lower returns and higher risks for

every value of s. It�s bad news to be a type 2 in this sub-population. Figure 4 permits a

comparison of the two sub-populations�risk-return curves. On this latter �gure, the curves

are averaged over the 2 types in each subsample separately.

Figures 1-4, exhibit a curious spike in the risk-return curve, that is more pronounced

in subsample estimations, and attracted the attention of referees. Figures 2 and 3 show that

it is very risky to be a drop-out, but a quali�ed blue collar is bearing a markedly lower risk.

The spike is visible on Figure 1 too: the transition from the high-school dropout level to

the vocational degree level causes a smaller increase in risk than other transitions. Due to

nonlinearities in the model, this smaller increase in risk may in fact become a reduction in

risk with two types and in separate subsamples. This suggests that there is a lot to gain

from earning a �rst degree in terms of reduced earning risk. This result is reasonable given

the French context: the �rst-level degree provides a form of insurance on the labor market.

Finally, Figures 5 and 6 show the ex ante density of wages, as seen from the point of

view of the average member of a group, conditional on his type, in two potentially counter-

factual situations: if he were a high school dropout (to the left of the �gure) and if he had

earned a Master�s degree or more (the rightmost densities). Figure 5 gives the densities for

the less socially privileged subsample, and Figure 6 for the group of sons of professionals. It

is visible that the sons of professionals take more risks in higher education. On Figure 6, for

instance, it is easy to see that the dispersion of wages is highest for type 1. On Figures 5

and 6, the middle density in each group of three is the mixture of the densities for the two

types.

31

5 Comparative Statics, Simulations and Discussion

We will now present a number of simulations of the model. The �rst simulations are mainly

based on the numerical evaluation of some key elasticities. It is shown in Appendix 5 that

the cuts ks have the following properties:

@ks@

> 0;@ks@�

< 0;@ks

@(�fs)< 0:

The cuts can be viewed as hurdles. Risk aversion raises the hurdles, while speed characteris-

tics � and returns to education lower the hurdles. A higher hurdle means that the students

need a better realization of �, given X and �, to be able to jump to the next level. The

probabilities of each education level, that is, the distribution of s, is the integral of the stan-

dard normal density between two consecutive hurdles. The slope of ks with respect to � or

may change with s, and it follows that the predicted frequency of a given level, Pr(s) may

be a nonmonotonic function of � or , in spite of the fact that the derivatives of ks have a

non-ambiguous sign.

We have computed the elasticity of relevant probabilities with respect to some pa-

rameters. The probability of observing an education level S greater than s, can be called

survival. This function is denoted s. By de�nition, we have,

s(�) = Pr(S � s j �) = 1� �(�s(�)); (17)

where � is the c.d.f of the standard normal distribution. The derivative of s with respect

to is @s=@ = �'(�s)(@�s=@ ) < 0, where ' is the density of the standard normal distri-

bution. We de�ne the elasticity61 of survival with respect to as "(j ) = (@=@ )( =).

In addition to , we have computed the mean elasticities of s with respect to cs,

�fs, ��2s and exp(�d). Variations of cs are shocks to education costs; variations of �fs and

��2s represent changes in the returns to education and in the riskiness of wages and �nally,

variations of exp(�d) directly a¤ect the measure of speed 1=�.

Table 5 gives the elasticities of s with respect to the parameters listed above. These

elasticities have been computed for the two subsamples: the sons of professionals and the rest

61To compute the numerical values, we �rst compute this elasticity for each individual i, then take theweighted average over possible types �j , using the estimated probabilities pj and �nally take the arithmeticaverage of all these values by summing over i. The de�nition will be the same with other parameters.

32

of the sample. The �rst three columns on the left give the results in the former subsample;

the rightmost three columns give the equivalent results for the latter. The �rst column

in each group of three is the elasticity itself, which depends on education s, so we have 5

di¤erent values, given that there are 6 education levels. The second in each group of three

columns gives the observed value of the survival probability s itself. The third column

gives the simulated change in this probability following a one percent change in the value

of the parameter. The top rows show the e¤ect of risk aversion on s; it is unambiguously

negative, with high values of the elasticity in absolute value.

To read Table 5 consider for instance the impact of risk-aversion on the sons of profes-

sionals. Take for instance the 5th line, corresponding to four years of college. The elasticity

of at this level is �29:47 (given in the �rst column); the observed value of this proba-

bility is :41 in the second column, i.e., 41% of the sons of professionals reach at least the

four-years-of-college level; a 1% increase in risk aversion for this group causes a drop of the

probability to 32% (in the third column). The corresponding �gures for the rest of the sam-

ple are respectively, �35, 11% and 7% (columns 4, 5 and 6). The impact of risk aversion on

enrollment in four-year college and graduate studies is sizeable in both susbsamples. But

cost parameters do not have such a dramatic e¤ect on enrollment in higher education. The

elasticity to costs of education cs is higher in the relatively disadvantaged subsample. The

riskiness of wages, surprisingly, has little if any impact in both subsamples. The returns to

education, in contrast, have a powerful, positive impact on enrollment. Finally, the elastic-

ity with respect to grade-repetition risk (or risk of delay) is negative and very important.

The orders of magnitude are the same as that of elasticities with respect to risk aversion62.

Enrollment in the highest education levels would su¤er most from an increase in the proba-

bility of delay, and the impact is clearly bigger in the less socially privileged subsample. Risk

aversion and education costs act as a powerful brake on individual educational investment,

returns to education are a powerful incentive. Our general conclusion is that, to explain

enrollment in secondary and higher education, expected returns are at least as important as

risk aversion and the costs of education.

In the college-major choice literature, the estimated impact of expected wages on

62A glance at formulae in Appendix 5 sheds light on the origin of this property.

33

chosen majors is weak, in spite of the fact that these majors command very di¤erent returns

on the labor market. We suspect that there are several reasons for these di¤erences, that

can reconcile the results to a certain extent. One important di¤erence is the role of utility

nonlinearity due to risk aversion. Another di¤erence is that we study an ordered discrete-

choice structure with a hierarchy of 6 education levels ranging from high-school dropouts

to graduate studies, but aggregate the majors or �elds of study. The college-major choice

papers of Arcidiacono (2004) and Be¤y et al. (2012) typically use a multinomial (i.e.,

unordered) discrete-choice structure. So, the equations of interest are not the same. We also

use dimensions of the data that other papers do not use (or do not use in the same way).

For instance, Be¤y et al. (2012) exploit the same data as us, but they restrict estimation to

the subsample of students who went to college, whereas we use the entire sample: it is likely

that this helps identifying stronger e¤ects of expected returns on education choices.

To obtain a better view of the importance of speed di¤erences in educational invest-

ment and achievement, we simulated changes in enrollment, or to be more precise, changes in

the distribution of students over education levels, that can be induced by changes in speed.

Setting ds = � s for every individual is tantamount to imposing "social promotion" to the

entire educational system, ceteris paribus. Every individual speed becomes "normal" and

� = 1. Such an experiment has radical e¤ects: 70% of the population would reach the level of

graduate studies! This is not a marginal change. So we focused on a less brutal and probably

more instructive counterfactual. The average probability of promotion, or "speed", of the

sons of professionals, �SoP ' 0:89, is higher than the average speed in the rest of the sample,

�RoS ' 0:857. The former average is 3.85% higher than the latter. This modest di¤erence

in speed operates every year, and will induce a substantial di¤erence in the durations, after

several years of schooling. Recall that student i�s expected duration of an education of level

s is bds = � s=�i where �i is a personal "speed" parameter. Let us multiply all the 1=�i terms

in the rest of the sample by factor � = �RoS=�SoP < 1, so that the average speed factor

is the same among the sons of professionals and in the rest of the sample63. This yields a

reduction of 0.76 years of the durations, on average.

63We can express the di¤erence in terms of average expected durations by computing the arithmetic averageof the di¤erence (1� �)(� si=�i) in the rest-of-sample sub-population.

34

Table 6 gives the simulation results for this experiment, scaling up the speed of the

less privileged students to equalize the average speed in the two groups. The �rst six columns

in Table 6 give the matrix of transitions from level s to higher levels s + 1, s + 2, following

the change in speed. The three rightmost columns give the simulated distribution of levels

after the change, before the change in the rest of the sample and the observed distribution

of levels among the sons of professionals, respectively64.

Table 6 shows some very important changes. For instance, column 1 shows that 50%

of those who were initially dropouts stay dropouts after the change, but 49.9% of these

individuals now earn a vocational degree. This is a huge progress. Column 3 shows similar

promotion e¤ects: of those who initially �nished school with the high-school degree, 99%

now go to college, among which 36% complete 2 years of college, and another 39% complete

4 years of college, etc. Note that many of those who complete some college years may

in fact have been college dropouts before the change, with the high-school degree as their

highest certi�cate, because they never passed the college exams. The biggest e¤ects in terms

of enrollment are represented by columns 1 and 2, because these columns describe the re-

dispatching of 57% = 16:29 + 40:71% of the studied population. The e¤ects displayed in

column 3 are impressive, but they apply to 15% of the subsample only.

Yet, these changes are not su¢ cient to put the rest of the sample on an equal footing

with the sons of professionals. Comparing the simulated post-change distribution for the

rest of the sample with the observed distribution for the sons of professionals, we see that

the former do not fully catch up. In particular, the simulated distribution has a much bigger

share of students leaving school with a vocational degree than the sons of professionals. This

is because a smaller speed is not the only handicap of the less privileged students: they also

have higher costs.

Are the estimated e¤ects of di¤erences in delay implausibly large? The distance

64To compute the simulations, we use the posterior probability distribution of the type �i of each individuali in the subsample, knowing the observed outcomes and covariates (si; Xi). We also condition on theinformation revealed by each individual�s observed schooling level si on his random cost shock �i, to computethe probability of choosing s after the change. In practice, this is done by drawing 500 copies of individuali in the distribution of (�ijsi; Xi), which is a truncated normal distribution. After the reform, some ofthe randomly drawn values of �i fall above the new upper threshold for si and the model predicts that iwould then study more and jump to the next levels. These changes are then averaged using the appropriateconditional distribution of types.

35

between the schooling level distributions of the two social groups is a purely empirical fact.

The gap between the two must be closed by a change in some parameters. Knowing the

French context, we don�t �nd the e¤ect of di¤erences in delay parameters so surprising.

In the French educational system, grade repetitions and exams at the end of each college

year constitute a major screening device65. Held back students progressively reduce their

ambitions and many are eventually disheartened. Indeed, an additional year in school has

very substantial direct and opportunity costs, and these costs increase with the education

level. Many people would agree that the French system is based on what is sometimes called

"selection by means of failure66". It is therefore not very surprising that a change in the

speed parameter would push thousands of young men to go to college.

Another surprising result is the relatively limited e¤ect of wage risk or return risk on

student choices. As noted by a referee, it may be that our wage observations do not capture

enough of these risks, because wages are observed during the �rst 5 years of career only, and

risk may play out over many more years.

We conclude with a discussion of the merits and demerits of a static formulation, as

opposed to a fully dynamic model in which students learn about their ability by observing

their test results and condition on new information to decide if they continue to study or

go to work. An important advantage of our formulation is its (relative) simplicity. Our

model �ts the data quite well as shown in Appendix 4, and Tables A2-A6, on the quality

of �t. Of course, we cannot decompose the choice of an education level in steps, based on

dynamic optimization and a sequence of informative signals, but our data would not allow

us to estimate such a model anyway. We may overestimate the risk in schooling costs and

wages borne by students because they in fact have the option to quit every year. We provide

a static representation of a dynamic process, mainly by bypassing the type-learning process,

assuming that students know their type from the start instead of learning it step by step.

Given the current state of knowledge, it is not easy to tell how this bypass may have biased

some of the model�s parameters.

65This theme is developed further in Brodaty et al. (2010), and Gary-Bobo, Goussé and Robin (2013).66Sélection par l�échec.

36

6 Conclusion

We have used a rich set of micro-data on young workers in France to estimate a structural

model of human capital investment. The model is based on the idea that students choose an

education level so as to maximize their conditional expected utility. Students are risk-averse,

with a constant relative risk-aversion coe¢ cient. They form rational expectations of future

wages and of their time to degree completion. We assumed that the econometrician cannot

observe a number of the individual characteristics that the students do observe and use to

predict their future wages. The model captures the fact that the risks a¤ecting time-to-

degree and future wages play a role in their choice of educational investment. The model

yields RRA parameters between 0.6 and 0.9, very precisely estimated. It also yields risk and

return curves for investments in education. Risk aversion varies with parental occupation:

the students whose parents are professionals with a higher education are more risk averse

but bear more risk than the others, because their costs of education are smaller. Small

increases in risk aversion and in the costs of education around the estimated values can lead

to substantial changes in college enrollment. Simulations also show that the e¤ects of higher

education costs and of expected returns to education are equally important.

7 References

Arcidiacono, Peter (2004), �Ability Sorting and the Returns to College Major�, Journal of

Econometrics, 121, 343-375.

Becker, Gary S. (1964), Human Capital: A Theoretical and Empirical Analysis with Special

Reference to Education, Third edition, 1993, NBER and the University of Chicago Press,

Chicago, Illinois.

Bedard, Kelly, and Elisabeth Dhuey (2006), �The Persistence of Early Childhood Maturity:

International Evidence of Long-Run Age E¤ects�, Quarterly Journal of Economics, 121,

1437-1472.

Beduwe, Catherine, and Jean-François Giret (2004), �Le travail en cours d�études a-t-il une

37

valeur professionnelle?�, Economie et Statistique, 378-379, 55-83.

Be¤y, Magali, Fougère, Denis, and Arnaud Maurel (2012), �Choosing the Field of Study

in Postsecondary Education: Do Expected Earnings Matter?�, Review of Economics and

Statistics, 94, 334-347.

Belzil, Christian and Jörgen Hansen, (2004), �Earnings Dispersion, Risk Aversion and Edu-

cation�, Research in Labor Economics, 23, 335-358.

Belzil, Christian and Marco Leonardi, (2013), �Risk Aversion and Scholing Decisions�, An-

nals of Economics and Statistics, 111-112, forthcoming.

Bonhomme, Stéphane, Jochmans, Koen and Jean-Marc Robin (2013), �Nonparametric Es-

timation of Finite Mixtures�, manuscript, Sciences Po, Paris.

Brodaty, Thomas O., Gary-Bobo, Robert J. and Ana Prieto (2005), �Risk Aversion, Ex-

pected Earnings and Opportunity Costs: A Structural Econometric Model of Human Capital

Investment�, University Paris 1, Pantheon-Sorbonne, manuscript.

Brodaty, Thomas O., Gary-Bobo, Robert J. and Ana Prieto (2010), �Does Speed Signal

Ability? The Impact of Grade Retention on Wages�, CREST-ENSAE, http://ces.univ-

paris1.fr/membre/Gary-Bobo.

Brown, Meta, Scholz, John K., and Ananth Seshadri (2012), �A New Test of Borrowing

Constraints for Education�, Review of Economic Studies, 79, 511-538.

Brunello, Giorgio, and Rudolf Winter-Ebner (2003),�Why Do Students Expect to Stay

Longer in College? Evidence from Europe.�Economics Letters, 80, 247-253.

Cameron, Stephen V., and James J. Heckman (1998), �Life-Cycle Schooling and Dynamic

Selection Bias: Models and Evidence for Five Cohorts of American Males�, Journal of

Political Economy, 106, 262-333.

Card, David (1999), �The Causal E¤ect of Education on Earnings�, Chapter 30 in: Ashen-

felter, O. and D. Card eds., Handbook of Labor Economics, Volume 3, Elsevier Science,

Amsterdam.

38

Carneiro, Pedro, Hansen, Karsten T., and James J. Heckman (2003),�Estimating Distribu-

tions of Treatment E¤ects with an Application to the Returns to Schooling and Measurement

of the E¤ects of Uncertainty on College Choice�, International Economic Review, 44, 361-

422.

Chen, Stacey H. (2008), �Estimating the Variance of Wages in the Presence of Selection and

Unobserved Heterogeneity�, Review of Economics and Statistics, 90, 275�289.

Chetty, Raj (2006), �A New Method of Estimating Risk Aversion�, American Economic

Review, 96, 1821-1834.

Cunha, Flavio, Heckman, James J., and Salvador Navarro (2005), �Separating Uncertainty

from Heterogeneity in Life-Cycle Earnings�, Oxford Economic Papers, 57, 191-261.

Cunha, Flavio and James J. Heckman (2008), �A New Framework for the Analysis of In-

equality�, Macroeconomic Dynamics, 12 (Supplement 2), 315-354.

Currie, Janet and Enrico Moretti (2003), �Mother�s Education and the Intergenerational

Transmission of Human Capital: Evidence from College Openings,�Quarterly Journal of

Economics, 118,1495-1532.

Drèze, Jacques H. (1979), �Human Capital and Risk-Bearing�, The Geneva Papers on Risk

and Insurance, 12, 5-22; reprinted in: Drèze (1987), Essays on Economic Decisions under

Uncertainty, Cambridge University Press, Cambridge, UK.

Eaton, Jonathan and Harvey S. Rosen (1980), �Taxation, Human Capital, and Uncertainty",

American Economic Review, 70, 705-715

Ehrenberg, Ronald G. and Panayiotis G. Mavros (1995), �Do Doctoral Students�Financial

Support Patterns A¤ect Their Times-to-Degree and Completion Probabilities?�, Journal of

Human Resources, 30, 581-609.

Friedman, Milton (1953), �La théorie de l�incertitude et la distribution des revenus suivant

leur grandeur�, in Econométrie, Colloques internationaux du CNRS, 40, 65-79.

39

Garibaldi, Pietro, Giavazzi, Francesco, Ichino, Andrea and Enrico Rettore (2012), �College

Cost and Time to Complete a Degree: Evidence from Tuition Discontinuities�, Review of

Economics and Statistics, 94, 699-711.

Gary-Bobo, Robert, Goussé, Marion and Jean-Marc Robin (2013), �Grade Retention and

Unobserved Heterogeneity,�manuscript, CREST-ENSAE.

Geweke, John and Michael P. Keane (1997), �Mixture of Normals Probit Models�, Research

department Sta¤ Report 237, Federal Reserve Bank of Minneapolis.

Grenet, Julien (2008), �Le mois de naissance in�uence-t-il les trajectoires scolaires et la

vie professionnelle? Une évaluation sur données françaises�, Paris School of Economics,

manuscript.

Harmon, Colm, P. Oosterbek, Hessel and Ian Walker (2003), �The Returns to Education:

Microeconomics�, Journal of Economic Surveys, 17, 115-155.

Heckman, James J., Lochner Lance J., and Petra E. Todd (2003), �Fifty Years of Min-

cer Earnings Regressions�, IZA DP no775, Institute for the Study of Labor, IZA, Bonn,

Germany.

Huggett, Mark, Ventura, Gustavo and Amir Yaron (2011), �Sources of Lifetime Inequality�,

American Economic Review, 101, 2923-2954.

Jiang, Wenxin and Martin A. Tanner (1999), �On the Identi�ability of Mixtures-of-Experts�,

Neural Networks, 12, 1253-1258.

Keane, Michael P. and Kenneth I. Wolpin (1997), �Career Decisions of young Men�, Journal

of Political Economy, 105, 473-522.

Keane, Michael P. and Kenneth I. Wolpin (2001), �The E¤ect of Parental Tranfers and

Borrowing Transfers on Educational Attainment�, International Economic Review, 42, 1051-

1103.

King, Allan G. (1974), �Occupational Choice, Risk Aversion and Wealth�, Industrial and

Labor Relations Review, 27, 586-596.

40

Krebs, Tom (2003), �Human Capital Risk and Economic Growth�, Quarterly Journal of

Economics, 118, 709-744.

Levhari, David and Yoram Weiss (1974), �The E¤ect of Risk on the Investment in Human

Capital�, American Economic Review, 64, 950-963.

Lochner, Lance and Alexander Monge-Naranjo (2011), �Credit Constraints in Education�,

NBER Working Paper 17435, National Bureau of Economic Research, Cambridge, Massa-

chusetts.

Low, Hamish, Meghir, Costas and Luigi Pistaferri (2010), �Wage Risk and Employment Risk

over the Life Cycle�, American Economic Review, 100, 1432-1467.

Magnac, Thierry, Pistolesi, Nicolas and Sébastien Roux (2013), �Post-Schooling Human-

Capital Investments and the Life-Cycle Variance of Earnings�, IDEI Working papers, no765,

University of Toulouse, France.

Meghir, Costas and Luigi Pistaferri (2010), �Earnings, Consumption and Life-Cycle Choices�,

in D. Card and O. Ashenfelter eds., Handbook of Labor Economics, vol 4B, Elsevier, Ams-

terdam.

McLachlan, Geo¤rey and David Peel (2000), Finite Mixture Models, John Wiley and Sons,

New York.

Mahjoub, M. Badrane (2007), �Grade Repetition as a Treatment�, Paris School of Eco-

nomics, manuscript.

Mincer Jacob (1974), Schooling, Experience, and Earnings, Columbia University Press, New

York.

Palacios-Huerta, Ignacio (2003), �An Empirical Analysis of the Risk Properties of Human

Capital Returns�, American Economic Review, 93, 948-964.

Sauer, Robert M. (2004), �Educational Financing and Lifetime Earnings�, Review of Eco-

nomic Studies, 71, 1189-1216.

41

Shaw, Kathryn L. (1996), �An Empirical Analysis of Risk Aversion and Income Growth�,

Journal of Labor Economics, 14, 626-653.

Vuong, Quong H., (1989), �Likelihood Ratio Tests for Model Selection and Non-Nested

Hypotheses�, Econometrica, 57, 307-334.

Weiss, Yoram (1972), �The Risk Element in Occupational and Educational Choices�, Journal

of Political Economy, 80, 1203-1213.

Williams, J. (1979), �Uncertainty and the Accumulation of Human Capital over the Life-

Cycle�, Journal of Business, 52, 521-548.

42

8 Appendix

8.1 Appendix 1. Derivation of the model

We use the following identity,

t=dzXt=1+dz�1

�t = �(1+dz�1)�1� ��dz

1� �

�: (18)

A key assumption, is the fact that wages and durations are independent conditional on �.

Using the conditional independence assumption, it is possible to simplify expression (3). We

have,

V (s j � ; �) = E

��1+ds

�1� �T�ds

1� �

�j ��E [u(ws) j �; �]

+sXz=1

E

��1+dz�1

�1� ��dz

1� �

�j ��E [u(hzwz�1) j � ; �] : (19)

To simplify notation, de�ne the mapping

�(x; y) = E

��1+x

�1� �y

1� �

�j ��;

where (x; y) are any random variables. De�ne �V (s+1 j � ; �) = V (s+1 j � ; �)�V (s j � ; �).

Simple computations then yield,

�V (s+ 1 j � ; �) = �(ds+1; T � ds+1)E [u(ws+1) j � ; �]

��(ds; T � ds)E [u(ws) j � ; �] + �(ds;�ds+1)E [u(hs+1ws) j � ; �] : (20)

We need to compute terms of the form E [u(�ws) j � ; �], with � = 1 or � = hs+1. Using the

CRRA property of utility, combined with linearity of log-wages, and using the fact that hs

depends on (�; �) only, we obtain,

E [u(�ws) j � ; �] = E�(1=�)

�1� �� exp(�� ln(ws))

�j � ; �

�=1

�f1� ��E [exp(��(fs +X0�0 + �w + �s�)) j � ; �]g

=1

�f1� �� exp[��(fs +X0�0 + �w)]E [exp(��s�)]g;

43

where � = �1. A well-known property of the expectation of a log-normal random variable

then yields

E [exp(��s�)] = expf(1=2)�2�2s]g:

From this expression, we derive

E [u(�ws) j � ; �] =1

�f1� �� exp[��(fs +X0�0 + �w � (1=2)��2s)]g; (21)

for � = 1 or � = hs+1.

Remark that � has the following convenient property,

�(ds+1; T � ds+1)� �(ds; T � ds) + �(ds;�ds+1) = 0:

Then, using the above equations, we �nd an explicit expression for �V . Easy algebra shows,

after some simpli�cations, that �V (s+ 1 j � ; �) � 0 is equivalent to

1

�

��(ds; T � ds)� �(ds+1; T � ds+1) exp[��(�fs+1 � (�=2)��2s+1)]

�(ds+1;�ds+1)

�� (hs+1)

��

�; (22)

where, by de�nition, �fs+1 = fs+1 � fs, and ��2s+1 = �2s+1 � �2s. Remark that �w and

X0 do not intervene in the above inequality: only �c and X1 play a role as variables in the

expression of hs.

8.1.1 The Finite Horizon, � = 1 Case

Let us now consider the �nite-horizon, undiscounted version of the model. We set � = 1 and

T <1. By l�Hôpital�s rule, we get for any x; y > 0,

lim�!1

�(x; y) = E

�lim�!1

�1+x�1� �y

1� �

�j ��= E [y j � ] :

Therefore, we get lim�!1�(ds; T�ds) = T�E(dsj�) and lim�!1�(ds+1;�ds+1) = E(�ds+1j�).

In this particular case, �V (s+ 1 j � ; �) � 0 is equivalent to,

(T � E(dsj�))� (T � E(ds+1j�))e��(�fs+1�(�=2)��2s+1)

�E(�ds+1j�)� 1

�e�(X1�1+cs+1��c): (23)

If � > 0, it is easy to see that we can take logarithms on both sides of the inequality,

provided that

(T � E(dsj�)) > (T � E(ds+1j�)) exp[��(�fs+1 � (�=2)��2s+1)]: (24)

44

Using the de�nition,

ks = �1

�ln

�(T � E(ds�1j�))� (T � E(dsj�)) exp[��(�fs � (�=2)��2s)]

E(�dsj�)

�; (25)

we then �nd that the crucial inequality is equivalent to,

� � X1�1 + cs+1 + ks+1 � �c: (26)

It is easy to see that the result is the same if � < 0.

8.1.2 The Logarithmic Utility Case (i.e., � = 0)

An interesting particular case is obtained by letting �! 0. Using l�Hôpital�s rule again, we

get with (13),

lim�!0+

ks = ��fs(T � d̂s)

�d̂s; (27)

where d̂s =E(dsj�). This yields the logarithmic utility model, which is characterized by the

inequalities,

cs +X1�1 �(T � d̂s)�fs

�d̂s� �c � � � cs+1 +X1�1 �

(T � d̂s+1)�fs+1

�d̂s+1� �c:

The analytic expression of the cuto¤ points lim�!0ks is easily interpreted. Since agents are

very patient, i.e., � = 1, the marginal skill-premium gain (per year) of jumping from level

s to level s+ 1 is �fs+1=�d̂s+1, multiplied by the expected number of years to go after the

end of studies, i.e., T � d̂s+1. This expression of marginal bene�t must be compared with

an expression of marginal costs, which is simply cs +X1�1 � �� c here.

In the logarithmic utility case, the monotonicity condition ks+1 > ks is equivalent to

�d̂s+1

�d̂s>�fs+1�fs

(T � d̂s+1)

(T � d̂s);

showing that the cut-monotonicity condition cs+1 + ks+1 > cs + ks will be satis�ed under

"return concavity", and "cost convexity". But one can allow for increasing returns, i.e.,

for �fs+1=�fs > 1, if T is not too large, and if �d̂s+1 is large enough, since the ratio

(T � d̂s+1)=(T � d̂s) would then be su¢ ciently smaller than 1. By continuity, these remarks

are still valid for values of � close to 0, even if � < 0.

45

8.1.3 The In�nite Horizon, Discounted Utility Case

To understand the potential of our model, consider the case in which T ! +1 and � < 1.

The bds are deterministic functions of (X; �). Remark thatlim

T!+1�(ds+1; T � ds+1) = E

�lim

T!+1�1+ds

�1� �T�ds

1� �

�j ��=

�

1� �E(�ds j � );

and,

limT!+1

�(ds+1;�ds+1) =�

1� �E��ds(1� ��ds+1) j �

�Then, in this case, the crucial inequality (23), yields

1

�

E(�ds j �) � E(�ds+1 j �) e��(�fs+1�(�=2)��2s+1)

E(�ds j �) � E(�ds+1 j �)

!� e�(X1�1+cs+1��c)

�: (28)

Now, provided that � is such that,

E(�ds j �) > E(�ds+1 j �) e��(�fs+1�(�=2)��2s+1)

that is, for values of � which are su¢ ciently close to zero, taking logarithms yields an

expression which is equivalent to:

� � X1�1 + ls+1 + cs+1 � �c (29)

where,

ls = �1

�ln

E(�ds�1 j �) � E(�ds j �) e��(�fs�(�=2)��2s)

E(�ds�1 j �) � E(�ds j �)

!: (30)

We again obtain an Ordered Discrete Choice structure with a particular functional form

imposed on the cuts ls + cs.

The rather complicated expression of ls depends on E(�ds j �). It can easily be

shown that E(�ds j �) = E[exp(! exp(�))], where � is normal with mean 0, variance �2�

and ! = ln(�)� s exp(X2�2 + �a + �d). Since � < 1, we have ! < 0, and this implies that

E[exp(! exp(�))] is well de�ned. To simplify the computations in this case, we have assumed

that �2� = 0 and therefore used the expression E(�ds j �) = �ds = e

!.

46

8.2 Appendix 2. Data and Descriptive Statistics

Table A1 shows the empirical distribution of school-leaving age, conditional on the education

level reached by male students (the displayed �gures are frequencies). As can be seen, school

leaving-age is substantially dispersed, even conditional on �nal education level67.

A di¢ culty with wages is that we do not observe the hours worked (but we know if

the individual worked full-time or part-time). To solve this problem, we decided to select

the individuals who experienced at least a full-time employment spell during the �ve-years

observation period. More precisely, we �rst removed 717 individuals who had never worked

(no employment spell recorded during 5 years). The remaining 25; 642 individuals are the

addition of 14; 213 men and 11; 429 women who worked at least once during the observation

period. We then selected the individuals who experienced at least one full-time employment

spell during the �ve years. As a consequence, we lost 11:7% of the male sub-sample, but still

had 12; 538 men. The �nal stage was to match the sample with geographical data from the

National Geographical Institute with data on schools from the Ministry of Education, in order

to compute a number of controls and geography-related instruments. Some observations

of the individual�s location at the age of entry into junior high-school (the jurisdiction of

residence�s code) were missing. This left us with only 12; 310 males. The possible bias

introduced by this selection procedure is limited in the case of men68. In the present article,

we focus on the male subsample.

A clear advantage of our selection procedure is that it permits us to compare earnings

more precisely, given that full-time employment means a 39 hours working week for most

wage-earning employees (and given the heavily regulated French labor market of the 90s).

More importantly, it tends to select a relatively homogeneous population of youths willing

to work full-time � this has some advantages.

The mean wage variable ignores the length of unemployment spells, and the di¢ culties

faced by the individual to �nd a stable (and well-paid) job. To capture the e¤ect of job

instability on average earnings, we de�ned a second average, simply called earnings. To

67 For instance, the �rst line of Table A1 says that 33 percent of the high-school dropouts left at the ageof 18.68 The same mode of selection would have left us with a sample of 8630 women, all willing to work full-time.

It is therefore likely that there is a sizeable selection bias in the female sub-sample.

47

compute this average, wages and unemployment bene�ts are weighted by the corresponding

employment or unemployment spell duration69. Figure A1 presents a plot of the density of

mean wages and earnings (in the men�s subsample).

8.3 Appendix 3. Likelihood

We can now derive individual contributions to likelihood, denoted, Li. De�ne log-wages as

xi = ln(wi); log-delay as yi = ln(dsi=� si). De�ne the cuts,

�s(�) = X1�1 + ks + cs � �c: (31)

These cuts determine the ordered choice of education levels s. Denote next,

ga(�) = ga �XA�A � �A; (32)

the cuts determining the discrete values ai of the age-at-grade 6 entry variable. Given these

joint ordered Probit structures, we have,

Pr(a = ai; si = s; xi; yi j �) =Z ga+1(�)

ga(�)

Z �s+1(�)

�s(�)

pdf(�; � j xi; yi; �)pdf(xi; yi j �)d�d�; (33)

using the decomposition pdf(�; �; xi; yi j �) = pdf(�; � j xi; yi; �)pdf(xi; yi j �), and the

densities involved are normal. Now de�ne,

b�is =xi � fs � �a �Xi0�0 � �w

�s(��);

b�is = yi �Xi2�2 � �a � �d: (34)

Thus, given our conditional independence assumptions, on the integration domain, the vector

(xi; yi) is normal with the following conditional p.d.f., denoted (: j �),

(xi; yi j �) =1

2��s(��)��exp

��b�2is2

�exp

�� b�2is2�2�

�: (35)

We can therefore factor out (xi; yi j �) in the expression of Pr(a = ai; si = s; xi; yi j �). Let

�(x) =R x�1 �(v)dv, be the standard normal c.d.f., and �(x) = (

p2�)�1 exp(�x2=2) be the

69 A worker is eligible for unemployment bene�ts if he or she has worked in the recent past. Students thusget zero before their �rst job. The unemployment bene�ts are roughly a half of the lost job�s wage.

48

standard normal p.d.f. The distributions of �i, and �i are normal, with mean 0 and variance

1. Using conditional independence again, and

pdf(�; � j xi; yi; �) = pdf(�; � j �) = pdf(�; �) = �(�)�(�);

we obtain,

Pr(ai = a; si = s; xi; yi j �) = (xi; yi j �)Z �s+1(�)

�s(�)

�(�)d�

Z ga+1(�)

ga(�)

�(�)d�:

Therefore, integration �nally yields,

Pr(ai = a; si = s; xi; yi j �) = (xi; yi j �)(�ss+1;i(�)� �ss;i(�))(�aa+1;i(�)� �aa;i(�)): (36)

where by de�nition,

�ss;i(�) = � [Xi1�1 + kis + cs � �c] ; �aa;i(�) = � [ga �XA�A � �A] : (37)

De�ne

Li(�) = (xi; yi j �)(�ss+1;i(�)� �ss;i(�))(�aa+1;i(�)� �aa;i(�));

with a = ai and s = si. Averaging over the K possible types, the contribution to likelihood

of individual i is now simply Li =PK

j=1 pjLi(�j).

8.4 Appendix 4. Goodness of �t

To assess the accuracy of the model�s predictions, we simulated a number of key variables

and compared them with their empirical counterparts. Table A2 predicts the distribution

of educational choices in the population. Tables A3 and A4 predict the log-wages and the

standard deviation of log-wages, respectively. These predictions are given for each of the two

subsamples studied above. To perform the simulations, we �rst predict the education level

s and the associated wage of each individual, drawing in the distributions of � and �, and

then take the average over possible types �. This is done 500 times and we take the average

of the simulated distributions. It is easy to see that the performances of the model are very

good.

Given that the distribution of log-wages is not easy to interpret, we have computed

the means and standard deviations of the monthly wages, expressed in euros. These results

49

are given in Tables A5 and A6. Given that the mean of a log-normal variable depends on

the variance of the underlying normal variable, prediction errors are compounded: it follows

that the predictions seem a bit less accurate.

8.5 Appendix 5. Comparative statics

To study our model�s comparative statics, the main tools are the functions �s = X1�1+ cs+

ks � �c. These functions determine the probabilities of choosing the various values of s in

the population, and therefore, the distribution of educational investment.

To simplify notation, note that we have,

ks = �(1=�) ln (Gs) ; (38)

where,

Gs =T � ds�1 � (T � ds)e

As

�ds; (39)

and

As = ��fs + �2��2s=2: (40)

We assume that Gs > 0 throughout.

Straightforward computations yield,

@�s@

=@ks@�

=1

�2

�ln(Gs) +

eAs

Gs

�T � ds�ds

�(�2��2s � ��fs)

�: (41)

Note that since the logarithm is a concave function70, we have ln(G) � 1�G�1. We can use

this property to �nd a lower bound for @�=@ . After some easy algebra, we �nd,

@�s@

� 1

�2

�(T � ds)(1� eAs + eAs(As + �2��2s=2)

Gs�ds

�:

It follows that @�s=@ > 0 if 1 � eAs + eAsAs > �eAs�2��2s=2. But this inequality holds,

since eA is a strictly convex function. Indeed, convexity implies 1 � eAs + eAsAs > 0. We

therefore conclude that an increase in risk aversion unambiguously raises the hurdles �s,

thus decreasing the probability of choosing education above s, for all s : increasing risk

aversion reduces educational investment. This result will thus hold, on average, if we look at

numerical simulations based on the estimated values of the parameters.70Concavity implies ln(1)� ln(G) � G�1(1�G).

50

Another important characteristic is the speed parameter �. Recall that we have

ds = � s=�, and 1 � � is the probability of repeating a grade, so � is a measure of �speed�.

It is easy to check that,@�s@�

=1

��

�eAs� s � � s�1Gs�� s

� 1�: (42)

Given that � < 0, we have As > 0 and Gs < 1. Thus, we can �nd an upper bound,

@�s@�

<1

��

�1

Gs� 1�< 0: (43)

If � > 0, and ��2s su¢ ciently small, we have As < 0 and Gs > 1. In this latter case too, we

get the upper bound and the same conclusion. We conclude that in the relevant range, more

able individuals (i.e., those with a higher �) will invest more in education, since a higher �

lowers the hurdles �s.

The impact on s of higher returns to education is unambiguously positive, since a

higher �fs lowers the hurdle �s, as shown by the following expression,

@�s@(�fs)

= �eAs

Gs

�T � ds�ds

�< 0: (44)

If � > 0, the model predicts that increasing the slope of the risk-return curve s !

(fs; �s) in the (f; �) plane has the usual e¤ect of discouraging investment in education, since

we have,@�s

@(��2s)=�eAs

2Gs

�T � ds�ds

�: (45)

Interestingly, since � < 0, the impact of higher wage risk (i.e., higher ��2s) is positive.

So, if � < 0, individuals will tend to study more when ��s increases, but these e¤ects are

quite weak, because j�j is small. This e¤ect is due to special properties of the log-normal

distribution: when � increases, the mean of the wage distribution increases, since we have

E(w) = exp(�+ �2=2), where � = E ln(w).

Finally, @�s=@cs = 1. Increasing the cost parameters cs obviously raises the hurdles

and discourages investment in education. To sum up, the comparative statics properties of

the model are intuitively reasonable.

51

8.6 Appendix 6. Estimation of the time preference parameter

We �nally report on estimations of the in�nite-horizon model with discounting. The in�nite

horizon version of the model is derived in Appendix 1 above. We assumed that �2� = 0

to solve the model (see above). The discount factor � can be estimated by grid search.

Maximum-Likelihood estimation and grid search have yielded two estimates of � in the two

subsamples considered, the sons of professionals, and the rest of the population. In the less

advantaged subsample, the best value of the discount parameter is � = 0:995. In the sons-

of-professionals subsample, the best value is � = 1. The results are summarized by Table

A7. It is easy to see that the estimated values of risk aversion are only slightly smaller when

� = 0:995, in the less advantaged subsample. Vuong�s test cannot reject � = 1. So, it seems

that choosing � = 1 and a �nite-horizon model is a good approximation.

52

Table 1: Estimation results (Whole sample)

Table 1a

Coeff. t Coeff. t Coeff. t

Family Background

Father went to College 0.040 3.632 0.082 1.496 0.018 3.782

Mother went to College 0.033 2.651 -0.021 -0.319 0.006 1.439

Father is a Professionnal 0.020 2.506 -0.018 -0.499 0.006 1.710

Mother is a Professionnal 0.019 1.685 -0.042 -0.567 -0.002 -0.278

Density of Population 0.002 0.468 * * * *

Paris Area 0.068 8.700 * * * *

Unemployment Rate -0.018 -3.877 * * * *

Age at grade 6

10 years old 0.025 2.050 * * * *

11 years old (ref.) * * * * * *

12 years old 0.001 0.264 * * * *

13 years old -0.015 -1.659 * * * *

14 years old or more -0.004 -0.175 * * * *

Completed Education

High school dropouts (Ref.)

Vocational degree 0.161 56.777 0.166 2.654 * *

High school graduates 0.125 53.175 0.548 13.595 * *

Two years of college 0.149 51.321 0.309 12.447 * *

Four years of college 0.123 46.760 0.351 17.440 * *

Graduate Studies 0.108 43.099 0.260 15.883 * *

Value of Types

Type 1 8.630 1239.183 0.087 27.517 0.654 94.754

Type 2 8.551 1185.422 0.025 15.952 0.721 107.612

Table 1b


Standard deviation of η * * 0.0045 3.9947 * *

Family Background

Father went to College 0.990 5.903 -0.009 -2.858 -0.326 -5.465

Mother went to College 0.173 0.936 -0.012 -3.404 -0.345 -4.637

Father is a Professionnal 0.459 3.237 -0.013 -5.425 -0.434 -9.681



Paris Area 0.176 4.290 * * * *


Age at grade 6

10 years old * * -0.023 -12.765 * *

11 years old (ref.) * * * * * *

12 years old * * 0.058 29.717 * *

13 years old * * 0.085 26.052 * *

14 years old or more * * 0.086 8.935 * *

School Density and Distance to College

Stock of vocational and technical high schools 0.231 8.698 * * * *

Distance to college 2d quartile -0.114 -3.274 * * * *


Distance to college 4th quartile -0.067 -1.629 * * * *

Month of birth * * * * 0.075 6.171

Education costs

C1 * * * * * *

C2 2.465 14.566 * * * *

C3 3.203 16.583 * * * *

C4 3.848 17.606 * * * *

C5 4.085 16.088 * * * *

Value of Types

Type 1 5.844 38.520 0.219 114.398 0.368 11.779

Type 2 3.452 18.790 0.094 70.555 0.477 26.661

Distribution of Types

Prob(type=1) 0.2925 38.2196

Number of observations

Log likelihood

Vuong Test of H0="No Unobserved Heterogeneity" (P-value)

E(ln(W)) V(ln(W)) Risk Aversion

Education (S) Delay (d/ז) Age at Grade 6

12310

-0.909197

<.0001

Table 1Ba


Family Background






Paris Area 0.067 8.505 * * * *


Age at grade 6

10 years old 0.023 2.024 -0.154 -4.378 * *

11 years old (ref.) * * * * * *

12 years old 0.005 0.875 0.176 7.078 * *

13 years old -0.011 -1.197 0.156 3.898 * *

14 years old or more 0.011 0.330 1.196 7.431 * *

Completed Education







Value of Types

Type 1 8.630 1262.602 0.077 64.240 0.657 93.121

Type 2 8.547 1157.001 0.020 26.582 0.721 111.692

Table 1Bb


Standard deviation of η (intercept) * * 0.0048 4.4774 * *

Age at grade 6 in the stand. dev. of η

10 years old * * 0.132 1.714 * *

11 years old (ref.) * * * * * *

12 years old * * -0.142 -4.908 * *

13 years old * * -0.279 -5.007 * *

14 years old or more * * -0.074 -0.277 * *

Family Background






Paris Area 0.181 4.385 * * * *


Age at grade 6

10 years old * * -0.027 -12.854 * *

11 years old (ref.) * * * * * *

12 years old * * 0.058 29.671 * *

13 years old * * 0.085 27.051 * *

14 years old or more * * 0.100 7.518 * *






Month of birth * * * * 0.075 6.176

Education costs

C1 * * * * * *

C2 2.496 15.067 * * * *

C3 3.147 16.716 * * * *

C4 3.852 17.120 * * * *

C5 4.059 14.965 * * * *

Value of Types

Type 1 6.093 35.008 0.217 108.860 0.391 11.994

Type 2 3.593 18.463 0.094 67.497 0.467 25.495


Prob(type=1) 0.2943 47.7181


Log likelihood

Risk Aversion


12310

-0.904941

Table 1B: Variant of estimation results (Whole sample)

E(ln(W)) V(ln(W))

T qg standard dev. qg standard dev.

60 0.6081 0.0073 0.6783 0.0073

65 0.6538 0.0065 0.721 0.0066

70 0.6908 0.0059 0.7547 0.006

75 0.7212 0.0055 0.782 0.0055

80 0.7467 0.0050 0.8046 0.0051

85 0.7755 0.0046 0.8136 0.0046

95 0.8031 0.0042 0.8537 0.0042

105 0.8274 0.0038 0.8746 0.0038

115 0.8468 0.0034 0.8899 0.0034

Table 2. Sensitivity. Estimated values of risk-aversion parameter qg, for

various values of horizon T.

type 1 type 2

Table 3 : Subsample Estimation Results. Sons of Professionnals

Table 3a


Family Background


Mother went to College 0.043 2.862 0.007 0.138 0.006 1.213

Father is a Professionnal * * * * * *

Mother is a Professionnal * * * * * *

Density of Population -0.004 -0.302 * * * *

Paris Area 0.052 3.157 * * * *


Age at grade 6

10 years old 0.034 1.625 * * * *

11 years old (ref.) * * * * * *

12 years old 0.053 2.632 * * * *

13 years old 0.121 2.575 * * * *

14 years old or more -0.133 -0.797 * * * *

Completed Education


Vocational degree 0.253 19.238 -0.524 -6.530 * *





Value of Types

Type 1 8.532 396.747 0.156 6.634 0.771 52.122

Type 2 8.429 290.198 0.069 4.917 0.797 73.806

Table 3b



Family Background






Paris Area 0.208 2.600 * * * *

Unemployment Rate 0.078 1.274 * * * *

Age at grade 6

10 years old * * -0.028 -7.499 * *

11 years old (ref.) * * * * * *

12 years old * * 0.067 12.036 * *

13 years old * * 0.113 7.799 * *

14 years old or more * * 0.148 3.440 * *





Distance to college 4th quartile 0.007 0.080 * * * *

Month of birth * * * * 0.112 3.306

Education costs (cuts)

C1 * * * * * *

C2 0.985 1.368 * * * *

C3 1.284 1.808 * * * *

C4 1.026 1.405 * * * *

C5 0.854 1.054 * * * *

Value of Types

Type 1 6.757 13.294 0.200 47.537 0.982 10.805

Type 2 6.389 8.591 0.076 27.302 0.987 17.713

Distribution of Types : Prob(type=h1) 0.2955 18.6646


Log likelihood


-0.866965

<.0001



2315

Table 4a


Family Background






Paris Area 0.074 7.931 * * * *


Age at grade 6

10 years old 0.012 0.887 * * * *

11 years old (ref.) * * * * * *

12 years old -0.004 -0.621 * * * *

13 years old -0.023 -2.030 * * * *

14 years old or more 0.008 0.256 * * * *

Completed Education




Two years of college 0.142 46.323 -0.092 -2.145 * *



Value of Types

Type 1 8.624 1170.764 0.052 20.221 0.692 64.673

Type 2 8.543 948.864 0.123 19.529 0.693 84.524

Table 4b



Family Background


Mother went to College 1.099 2.661 0.005 0.745 -0.584 -4.780




Paris Area 0.186 3.808 * * * *


Age at grade 6

10 years old * * -0.023 -9.918 * *

11 years old (ref.) * * * * * *

12 years old * * 0.048 23.102 * *

13 years old * * 0.075 22.722 * *

14 years old or more * * 0.107 13.251 * *






Month of birth * * * * 0.070 5.317


C1 * * * * * *

C2 0.674 3.120 * * * *

C3 1.315 5.995 * * * *

C4 1.989 6.759 * * * *

C5 2.330 7.195 * * * *

Value of Types

Type 1 5.387 27.594 0.231 100.914 0.228 5.956

Type 2 5.187 21.419 0.100 62.705 0.511 27.682


Prob(type=h1) 0.2578 26.5375


Log likelihood


9995

-0.923022

<.0001

E(ln(W)) V(ln(W))

Table 4 : Subsample Estimation Results. Rest of the Sample

Risk Aversion


Estimated Observed Simulated Estimated Observed Simulated

Elasticity Value 1%-impact Elasticity Value 1%-impact

High school dropouts - - - - - -

Vocational degree -2.98 0.95 0.92 -4.50 0.84 0.82

High school graduates -15.12 0.76 0.68 -17.01 0.42 0.36

Two years of college -23.74 0.60 0.51 -24.91 0.27 0.21

Four years of college -29.47 0.41 0.32 -35.74 0.11 0.07

Graduate Studies -28.46 0.30 0.24 -41.56 0.06 0.04









High school graduates 0.18 0.76 0.76 0.09 0.42 0.42

Two years of college 0.06 0.60 0.60 -0.11 0.27 0.27

Four years of college 0.37 0.41 0.41 0.37 0.11 0.11

Graduate Studies 0.14 0.30 0.30 0.85 0.06 0.06


Vocational degree 1.60 0.95 0.96 3.56 0.84 0.86


Two years of college 11.39 0.60 0.64 18.91 0.27 0.31









Table 5: Elasticity of Survival Functions Ψs with respect to Various Parameters

Elasticity w.r.t Variances of Wages

Elasticity w.r.t Returns to Education

Elasticity w.r.t Probability of Grade Repetition

Sons of Professionnals Other Students

Elasticity w.r.t Relative Risk Aversion

Elasticity w.r.t Costs fo Education

From level: 1 2 3 4 5 6

Rest of

sample,

simulated

Rest of

sample,

observed

SOPs**

observed

To level: 1 50.13% 0.00% 0.00% 0.00% 0.00% 0.00% 8.17% 16.29% 5.18%

2 49.87% 51.80% 0.00% 0.00% 0.00% 0.00% 29.19% 40.71% 18.75%

3 0.00% 34.56% 0.26% 0.00% 0.00% 0.00% 14.11% 15.16% 15.77%

4 0.00% 13.63% 35.88% 0.67% 0.00% 0.00% 11.11% 16.51% 19.52%

5 0.00% 0.00% 39.48% 7.16% 0.00% 0.00% 7.16% 4.67% 10.67%

6 0.00% 0.00% 24.38% 92.17% 100.00% 100.00% 30.26% 6.66% 30.11%

Total 100% 100% 100% 100% 100% 100% 100% 100% 100%

Table 6: Transition towards upper schooling levels following an increase in speed, in the rest of the sample

Matrix of simulated transitions Distributions of schooling levels

* Note: the increase in speed is proportional, so that the average speed in the rest of the sample becomes equal to that of the sons of professionals.

**Sons of professionals.

8,5

8,7

8,9

9,1

9,3

9,5

0,44 0,49 0,54 0,59 0,64R

etu

rns

Risks

Figure 4: Risk-Return curves: comparison of sub-samples

Sons of Professionnals Rest of Sample

8,4

8,6

8,8

9

9,2

9,4

9,6

0,35 0,45 0,55 0,65 0,75

Re

turn

s

Risks

Figure 1: Risk-Return curves: whole sample

Type 1 Type 2

8,4

8,6

8,8

9

9,2

9,4

9,6

0,4 0,45 0,5 0,55 0,6 0,65 0,7

Re

turn

s

Risks

Figure 2: Risk-Return curves: sons of professionnals

Type 1 Type 2

8,5

8,7

8,9

9,1

9,3

9,5

0,4 0,45 0,5 0,55 0,6 0,65 0,7

Retu

rns

Risks

Figure 3: Risk-Return curves, rest of sample

Type 1 Type 2

0

0,00005

0,0001

0,00015

0,0002

0,00025

0,0003

0,00035

0 500 1000 1500 2000 2500 3000 3500 4000 4500

den

sit

y

monthly wages (euros)

Figure 5: Wage distributions: Rest of Sample

High school dropout: mixture

Type 1

Type 2

Graduate studies: mixture

Type 1

Type 2

0

0,00005

0,0001

0,00015

0,0002

0,00025

0,0003

0,00035

0 500 1000 1500 2000 2500 3000 3500 4000 4500

de

ns

ity

monthly wages (euros)

Figure 6: Wage distributions: sons of professionnals

High school dropouts: mixture

Type 1

Type 2

Graduate studies: mixture

Type 1

Type 2

Appendix Table A1: Empirical Distribution of Male School-Leaving Age, Conditional on Education Level Age while leaving school 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32High school dropouts 0.01 0.17 0.24 0.33 0.16 0.07 0.01 0.01 0 0 0 0 0 0 0 0 0 0Vocational degree 0 0 0.03 0.30 0.37 0.21 0.06 0.02 0.01 0 0 0 0 0 0 0 0 0High school graduates (grade 12) 0 0 0 0.02 0.12 0.31 0.32 0.15 0.06 0.02 0.01 0 0 0 0 0 0 0Two years of college (grade 14) 0 0 0 0 0 0.13 0.27 0.28 0.19 0.08 0.02 0.01 0.01 0 0 0 0 0Four years of college (grade 16) 0 0 0 0 0 0 0.04 0.17 0.20 0.27 0.13 0.09 0.04 0.03 0.01 0.01 0.01 0.01Graduate studies 0 0 0 0 0 0 0 0.04 0.24 0.29 0.15 0.11 0.07 0.04 0.02 0.02 0.01 0

Figure A1: Wage Distributions (in Euros)

0.000%

0.005%

0.010%

0.015%

0.020%

0.025%

0 500 1000 1500 2000

dens

ity fu

nctio

n

earnings mean wage

Figure A2 : Distribution of Delay (d/t)

0

5

10

15

20

25

d/

Perc

enta

ge (%

)

Parents are Professionals Rest of Sample

Figure A3: Historical Growth of Vocational Secondary Education

0

1000

2000

3000

4000

5000

6000

1950

1960

1970

1980

1990

2000

Year

Voc

atio

nal H

igh

Scho

ol S

tock

in u

nits

012345678

Stock divided by 15-year-olds in thousands

Stock Stock per capita of 15-year-olds

Figure A4: Distribution of Stock of Vocational High Schools 1982

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0.5 0.75 1 1.25 1.5 1.75 2

County-level stock per capita of 15-19 year-olds (in thousands)

Den

sity

0

10

20

30

40

50

60

70

80

≤10 11 12 13 ≥14

Perc

enta

ge

Age at Grade 6 Entry

Figure A5 : Distribution of Age at Grade 6 Entry

Sons of professionnals

Rest of the sample

TABLE A2: Chosen Education Levels and Speed of Completion: Predicted and Observed

Chosen Education Levels Observed Predicted Observed Predicted Observed Predicted

High school dropouts 14.20% 13.66% 5.18% 5.08% 16.29% 15.63%

Vocational degree 36.58% 37.23% 18.75% 18.60% 40.71% 42.27%

High school graduates 15.27% 15.71% 15.77% 16.02% 15.16% 15.24%

Two years of college 17.08% 16.97% 19.52% 19.72% 16.51% 16.13%

Four years of college 5.80% 5.67% 10.67% 10.66% 4.67% 4.47%

Graduate Studies 11.07% 10.77% 30.11% 29.92% 6.66% 6.27%

Proba of entering grade 6

before 11 71.29% 71.24% 87.29% 87.30% 67.55% 67.52%

Overall proba of annual

grade completion 86.28% 86.10% 89.03% 88.85% 85.67% 85.49%

TABLE A3: Mean log wages: Predicted and Observed


High school dropouts 8.652 8.566 8.717 8.507 8.647 8.553






All levels 8.871 8.868 9.064 9.059 8.826 8.818

TABLE A4: Standard Deviation of Log Wages: Predicted and Observed


High school dropouts 0.289 0.200 0.360 0.309 0.283 0.320






All levels 0.339 0.325 0.387 0.369 0.310 0.292

TABLE A5: Mean Monthly Wages (Euros): Predicted and Observed


High school dropouts 905 821 997 801 898 837

Vocational degree 974 979 1016 982 969 976

High school graduates 1086 1143 1133 1149 1075 1113

Two years of college 1249 1369 1307 1405 1233 1286

Four years of college 1424 1603 1494 1687 1387 1462

Graduate Studies 1865 1892 1950 1960 1776 1652

All levels 1153 1153 1423 1419 1090 1082

TABLE A6: Standard Deviation of Monthly Wages (Euros): Predicted and Observed


High school dropouts 258 169 466 254 235 274

Vocational degree 238 220 269 218 234 225

High school graduates 324 324 374 325 310 276

Two years of college 343 452 356 416 338 305

Four years of college 426 624 483 601 389 380

Graduate Studies 615 846 684 735 518 492

All levels 449 435 614 563 375 336

Parental Occupation

Whole sample Professionnals Others

Whole sample Professionnals

Parental Occupation

Others

Parental Occupation


Parental Occupation


Parental Occupation


Discount Factor (d) 0.9950 1. 1. 1.

Estimated Mean Risk Aversion:

0.7458 0.7782 0.6967 0.6955

(48.43) (54.93) (63.92) (63.98)

0.7672 0.7969 0.6968 0.6957

(66.71) (75.95) (84.97) (84.69)

Prob(type=1) 0.2934 0.2933 0.2618 0.2618

(20.55) (17.37) (38.94) (26.31)

Mean Log-Likelihood -0.8825140 -0.8825490 -0.9307170 -0.9307340

Vuong Test of Discounted Utility

Model (p-value)

Student ts are in parentheses

Type 2

Table A7 : Estimation of Discount FactorSons of Professionals Other Students

Type 1

0.55946 0.99997

Discounted

Utility Model

Discounted

Utility Model

Benchmark

Model

Benchmark

Model

SUPPLEMENTARY MATERIAL

Do risk aversion and wages explain educational choices?

T.Brodaty, R. Gary-Bobo and A. Prieto

June 2013

Table X1a


Family Background


Mother went to College 0.031 2.382 0.034 0.714 0.010 1.845

Father is a Professionnal 0.026 3.087 0.097 2.687 0.006 1.480

Mother is a Professionnal 0.025 2.167 0.058 1.213 -0.004 -0.799


Paris Area 0.070 8.461 * * * *


Age at grade 6

10 years old 0.024 2.056 * * * *

11 years old (ref.) * * * * * *

12 years old -0.013 -2.206 * * * *

13 years old -0.035 -3.270 * * * *

14 years old or more -0.022 -0.746 * * * *

Completed Education



High school graduates (grade 12) 0.113 58.271 0.386 7.126 * *

Two years of college (grade 14) 0.139 61.299 -0.131 -3.648 * *

Four years of college (grade 16) 0.114 51.087 0.271 5.454 * *


Constant (no unobserved heterogeneity)

Constant 8.602 1463.528 0.082 27.778 0.685 99.217

Table X1b



Family Background




Mother is a Professionnal -0.048 -0.333 -0.006 -1.654 -0.390 -5.760


Paris Area 0.181 5.181 * * * *


Age at grade 6

10 years old * * -0.028 -14.267 * *

11 years old (ref.) * * * * * *

12 years old * * 0.061 34.354 * *

13 years old * * 0.092 28.905 * *

14 years old or more * * 0.122 15.050 * *






Month of birth * * * * 0.076 6.185

Education costs

C1 * * * * * *

C2 2.219 21.359 * * * *

C3 2.750 23.093 * * * *

C4 3.488 22.818 * * * *

C5 3.550 21.425 * * * *

Constant (no unobserved heterogeneity)

Constant 3.075 26.876 0.130 124.775 0.444 33.986


Log likelihood -1.1086

E(ln(W)) V(ln(W))

Table X1 : Estimation of the model with no unobserved heterogeneity

Risk Aversion


12310

Table X2a


Family Background

Father went to College 0.028 2.550 -0.047 -1.124 0.009 2.458

Mother went to College 0.025 1.982 -0.071 -1.674 0.000 -0.049


Mother is a Professionnal 0.014 1.216 -0.030 -0.680 0.007 1.604

Density of Population 0.000 -0.088 * * * *

Paris Area 0.066 8.431 * * * *


Age at grade 6

10 years old 0.021 1.777 * * * *

11 years old (ref.) * * * * * *

12 years old -0.009 -1.528 * * * *

13 years old -0.004 -0.411 * * * *

14 years old or more 0.032 1.521 * * * *

Completed Education







Value of Types

type 1 8.656 1006.717 0.092 29.815 0.733 109.358

type 2 8.444 622.365 0.006 22.471 0.755 118.031

type 3 8.543 1132.436 0.014 21.478 0.743 109.265

Table X2b



Family Background


Mother went to College -0.139 -0.518 -0.014 -3.876 -0.349 -4.453




Paris Area 0.246 5.077 * * * *


Age at grade 6

10 years old * * -0.020 -10.078 * *

11 years old (ref.) * * * * * *

12 years old * * 0.024 9.279 * *

13 years old * * 0.052 11.981 * *

14 years old or more * * 0.074 9.459 * *






Month of birth * * * * 0.079 6.209


C1 * * * * * *

C2 1.586 4.827 * * * *

C3 2.519 7.280 * * * *

C4 2.372 6.409 * * * *

C5 2.629 6.404 * * * *

Value of Types

type 1 8.808 26.639 0.258 124.316 0.017 0.471

type 2 6.078 15.270 0.073 50.078 0.974 15.422

type 3 6.616 17.601 0.152 97.520 0.263 10.335

Distribution of types

Prob(type 1) 0.164 32.328

Prob(type 2) 0.378 33.520


Log likelihood


Vuong Test of H0="Unobserved Heterogeneity: Two Types" (P-value)

Delay (d/ז) Age at Grade 6

12310

-0.850357

Table X2 : Estimation of a model with 3 types on the full sample

<.0001

E(ln(W)) V(ln(W))

<.0001

Risk Aversion

Education (S)

Table X3: Estimation results (Whole sample). Limited role of unobserved heterogeneity

Table X3a


Family Background



Father is a Professionnal 0.010 1.129 0.090 2.674 -0.007 -1.924

Mother is a Professionnal 0.016 1.250 0.118 2.406 0.005 1.116

Density of Population -0.001 -0.202 * * * *

Paris Area 0.064 7.677 * * * *


Age at grade 6

10 years old 0.003 0.238 * * * *

11 years old (ref.) * * * * * *

12 years old 0.021 3.047 * * * *

13 years old 0.007 0.538 * * * *

14 years old or more 0.021 0.984 * * * *

Completed Education




Two years of college 0.167 47.847 -0.126 -4.351 * *



Value of Types

Type 1 8.676 761.906

Type 2 8.498 759.570

Table X3b



Family Background



Father is a Professionnal -0.083 -0.598 -0.015 -4.682 -0.435 -9.326



Paris Area 0.230 5.319 * * * *


Age at grade 6

10 years old * * -0.031 -15.651 * *

11 years old (ref.) * * * * * *

12 years old * * 0.060 30.156 * *

13 years old * * 0.091 25.433 * *

14 years old or more * * 0.122 14.904 * *






Month of birth * * * * 0.076 6.129


C1 * * * * * *

C2 1.915 11.792 * * * *

C3 2.263 12.897 * * * *

C4 2.841 14.178 * * * *

C5 2.921 13.659 * * * *

Value of Types

Type 1 6.414 23.168

Type 2 4.230 20.960


Prob(type=1) 0.2084 12.0539


Log likelihood

12310

-1.10204



0.130 101.726 0.444 33.838

0.075 69.734 0.747 99.298

Table X4a


Family Background






Paris Area 0.068 8.704 * * * *


Age at grade 6

10 years old 0.025 2.047 * * * *

11 years old (ref.) * * * * * *

12 years old 0.002 0.370 * * * *

13 years old -0.015 -1.645 * * * *

14 years old or more -0.004 -0.150 * * * *

Completed Education







Value of Types

Type 1 8.629 1241.318 0.087 77.119 0.655 92.239

Type 2 8.550 1177.000 0.025 29.729 0.722 109.394

Table X4b



Family Background






Paris Area 0.103 2.549 * * * *


Age at grade 6

10 years old * * -0.023 -12.796 * *

11 years old (ref.) * * * * * *

12 years old * * 0.058 29.360 * *

13 years old * * 0.085 26.037 * *

14 years old or more * * 0.086 8.842 * *


Stock of vocational and technical high schools * * * * * *




Month of birth * * * * 0.075 6.172


C1 * * * * * *

C2 2.473 14.664 * * * *

C3 3.204 15.583 * * * *

C4 3.834 16.417 * * * *

C5 4.053 14.956 * * * *

Value of Types

Type 1 5.657 35.734 0.218 114.211 0.373 11.971

Type 2 3.279 17.217 0.093 70.233 0.475 26.407


Prob(type=1) 0.2957 48.077


Log likelihood

Table X4: Estimation results (Whole sample). Without the school-density instrument.

12310

-0.912281



Table X5a


Family Background






Paris Area 0.068 8.799 * * * *


Age at grade 6

10 years old 0.025 1.973 * * * *

11 years old (ref.) * * * * * *

12 years old 0.002 0.291 * * * *

13 years old -0.015 -1.650 * * * *

14 years old or more -0.005 -0.237 * * * *

Completed Education







Value of Types

Type 1 8.630 1163.109 0.087 76.766 0.654 100.053

Type 2 8.551 1229.695 0.025 29.682 0.721 110.219

Table X5b



Family Background






Paris Area 0.160 4.079 * * * *


Age at grade 6

10 years old * * -0.023 -15.352 * *

11 years old (ref.) * * * * * *

12 years old * * 0.057 34.108 * *

13 years old * * 0.085 26.195 * *

14 years old or more * * 0.086 9.404 * *



Distance to college 2d quartile * * * * * *

Distance to college 3d quartile * * * * * *

Distance to college 4th quartile * * * * * *

Month of birth * * * * 0.075 6.104


C1 * * * * * *

C2 2.472 18.758 * * * *

C3 3.212 21.760 * * * *

C4 3.859 22.984 * * * *

C5 4.089 22.132 * * * *

Value of Types

Type 1 5.955 39.715 0.219 134.745 0.367 12.827

Type 2 3.562 23.534 0.094 84.273 0.478 28.084


Prob(type=1) 0.2924 47.7658


Log likelihood

Table X5: Estimation results (Whole sample). Without distance to college

12310

-0.909925



Table X6a


Family Background






Paris Area 0.070 8.712 * * * *


Age at grade 6

10 years old 0.021 1.610 * * * *

11 years old (ref.) * * * * * *

12 years old 0.002 0.285 * * * *

13 years old -0.018 -1.894 * * * *

14 years old or more 0.003 0.139 * * * *

Completed Education



High school graduates (grade 12) 0.124 51.217 0.573 16.182 * *

Two years of college (grade 14) 0.148 51.264 0.316 13.194 * *

Four years of college (grade 16) 0.122 49.415 0.354 19.744 * *


Value of Types

Type 1 8.633 1106.465 0.084 74.479 0.654 94.812

Type 2 8.552 1189.283 0.023 27.987 0.719 102.771

Table X6b



Family Background






Paris Area 0.165 3.920 * * * *


Age at grade 6

10 years old * * -0.025 -15.286 * *

11 years old (ref.) * * * * * *

12 years old * * 0.057 32.137 * *

13 years old * * 0.083 24.466 * *

14 years old or more * * 0.086 8.636 * *






Month of birth * * * * 0.079 6.061


C1 * * * * * *

C2 2.496 17.826 * * * *

C3 3.202 20.395 * * * *

C4 3.784 20.925 * * * *

C5 3.958 20.006 * * * *

Value of Types

Type 1 5.849 35.017 0.220 128.856 0.373 12.262

Type 2 3.390 20.318 0.095 82.506 0.484 27.540


Prob(type=1) 0.2825 44.6083


Log likelihood

Age at Grade 6

Table X6 : Estimation results; sub-sample of communes with a small number of students in the sample

11107

-0.902332


Education (S) Delay (d/ז)

Table X7a


Family Background




Mother is a Professionnal 0.021 1.871 -0.017 -0.352 0.000 -0.043

Density of Population * * * * * *

Paris Area * * * * * *

Unemployment Rate * * * * * *

Age at grade 6

10 years old 0.028 2.296 * * * *

11 years old (ref.) * * * * * *

12 years old 0.001 0.195 * * * *

13 years old -0.015 -1.583 * * * *

14 years old or more -0.002 -0.084 * * * *

Completed Education







Value of Types

Type 1 8.625 1384.986 0.088 27.651 0.657 97.601

Type 2 8.547 1271.287 0.026 16.494 0.725 112.423

Table X7b



Family Background





Density of Population * * * * * *

Paris Area * * * * * *

Unemployment Rate * * * * * *

Age at grade 6

10 years old * * -0.023 -12.839 * *

11 years old (ref.) * * * * * *

12 years old * * 0.058 29.962 * *

13 years old * * 0.085 25.921 * *

14 years old or more * * 0.085 8.849 * *






Month of birth * * * * 0.075 6.168

Education costs

C1 * * * * * *

C2 2.489 15.189 * * * *

C3 3.213 17.158 * * * *

C4 3.837 17.762 * * * *

C5 4.068 16.142 * * * *

Value of Types

Type 1 5.644 37.982 0.219 114.371 0.369 11.815

Type 2 3.246 18.478 0.094 70.700 0.477 26.595


Prob(type=1) 0.294 38.3248


Log likelihood

12310

-0.919831


Education (S) Delay (d/ז)

Table X7: Estimation results (Whole sample). Without controls for geography

Age at Grade 6

Documents

Do Risk Aversion and Wages Explain Educational Choices?Do Risk Aversion and Wages Explain Educational Choices? Thomas O. Brodatyy, Robert J. Gary-Bobo z and Ana Prietox 6 May 2014