FilmerP01

Embed Size (px)

Citation preview

  • 8/13/2019 FilmerP01

    1/18

    ESTIMATING WEALTH EFFECTS 115

    Demography, Volume 38-Number 1, February 2001: 115132 115

    ESTIMATING WEALTH EFFECTS WITHOUT EXPENDITURE DATAOR TEARS:

    AN APPLICATION TO EDUCATIONAL ENROLLMENTS IN STATES OF INDIA*

    DEON FILMER AND LANT H. PRITCHETT

    tures. The econometric evidence suggests that the asset indexas a proxy of economic status for use in predicting enrollmentsis at least as reliable as conventionally measured consumptionexpenditures, and sometimes more so.

    This straightforward and pragmatic method of constructing a proxy for household economic status produces resultsthat are reassuringly consistent with other approaches (seeMontgomery et al. 2000 and references therein) and poten-tially is broadly applicable. Demographic and Health Sur-

    veys (DHS) have been conducted with nearly identical sur-vey instruments in more than 40 countries. (In India this sur-vey is called the National Family Health Survey, or NFHS.)These surveys include assets and housing conditions but noconsumption expenditures (except for an experimental con-sumption module included in Indonesia in 1994, a featurethat we exploit later in this paper).

    Our proposed method for estimating household wealth inthe DHS/NFHS allows estimates of the association of wealthwith education across households and permits a comparison owealth gaps across countries (and in India across states). Inseparate papers we use the asset index to examine wealth andgender gaps in enrollment and educational attainment in morethan 35 countries (Filmer 2000; Filmer and Pritchett 1999a).

    We limit ourselves here to demonstrating the validityand usefulness of an asset index in analyzing education out-comes. The same method might be equally useful in examining wealth differences in other socioeconomic outcomes inthe DHS data, such as mortality, morbidity, utilization ohealth facilities, fertility, and contraceptive use (BonillaChacin and Hammer 1999; Gwatkin et al. 2000; StecklovBommier, and Boerma 1999). Sahn and Stifel (1999) havetaken a similar index approach to making comparisons ofrelative poverty over time and across countries, using DHSdata for nine countries in Africa.

    A proxy for wealth not only is useful in examining ef-fects of wealth, but also is needed as a control variable inestimating effects of variables potentially correlated with

    household wealth, such as maternal education. The methodproposed here provides a simple technique for creating awealth proxy in the absence of either income or expendituredata (an issue discussed further in Montgomery et al. 2000)

    OVERCOMING THE PROBLEM OF RANKINGHOUSEHOLDS IN THE ABSENCE OFCONSUMPTION OR EXPENDITURE DATA

    The DHS and NFHS data present both an opportunity and achallenge. The opportunity is a rich set of large, representative

    Using data from India, we estimate the relationship betweenhousehold wealth and childrens school enrollment. We proxy wealthby constructing a linear index from asset ownership indicators, us-ing principal-components analysis to derive weights. In Indian datathis index is robust to the assets included, and produces internallycoherent results. State-level results correspond well to independentdata on per capita output and poverty. To validate the method andto show that the asset index predicts enrollments as accurately asexpenditures, or more so, we use data sets from Indonesia, Paki-

    stan, and Nepal that contain information on both expenditures and

    assets. The results show large, variable wealth gaps in childrensenrollment across Indian states. On average a rich child is 31

    percentage points more likely to be enrolled than a poor child,but this gap varies from only 4.6 percentage points in Kerala to 38.2in Uttar Pradesh and 42.6 in Bihar.

    his paper has both an empirical and a methodological goal.The empirical goal is to investigate the effect of household eco-nomic status on childrens educational attainment across thestates of India. To accomplish this aim, we propose and defenda method for estimating the effect of household wealth on edu-cational outcomes even in the absence of survey questions onincome or expenditures. We use data on asset ownership (e.g.,

    owning a bicycle or radio) and housing characteristics (e.g.,number of rooms, type of toilet facilities), henceforth calledasset indicators or asset variables, to construct an asset in-dex. We handle the vexing problem of choosing the appropri-ate weights by using the statistical procedure of principal com-

    ponents. We demonstrate the empirical validity of this approachfor India by comparing the state-level averages of the asset in-dex with data on poverty rates and gross state product percapita. Going further, we use data sets from Indonesia, Nepal,and Pakistan, which contain both expenditures and asset vari-ables for the same households, to show a reasonable correspon-dence between the classification of households based on the as-set index and a classification based on consumption expendi-

    T

    *Deon Filmer, Development Research Group, The World Bank, 1818H Street NW, Washington, DC 20433; E-mail: [email protected] H. Pritchett, John F. Kennedy School of Government and The WorldBank. This research was funded in part through World Bank research sup-

    port grant (RPO 682-11). We would like to thank Harold Alderman, ZoubidaAllaoua, Gunnar Eskeland, Jeffrey Hammer, Keith Hinchliffe, ValerieKozel, Alan Krueger, Peter Lanjouw, Marlaine Lockheed, Berk Ozler, andMartin Ravallion for valuable discussions and comments, including someon an earlier version of this paper (Filmer and Pritchett 1998). The findings,interpretations, and conclusions expressed here are entirely those of the au-thors. They do not necessarily represent the views of the World Bank, itsexecutive directors, or the countries they represent.

  • 8/13/2019 FilmerP01

    2/18

    116 DEMOGRAPHY, VOLUME 38-NUMBER 1, FEBRUARY 2001

    surveys with nearly identical questionnaires covering a largenumber of countries or Indian states. The challenge is that theDHS/NFHS surveys contain no data on income or on house-hold consumption expenditures. The latter is unfortunate be-cause a large literature establishes the theoretical underpin-

    nings of consumption expenditures as a measure of currentand long-run household (and implicitly individual) welfare.(For example, see Deaton 1997; Deaton and Muellbauer1980; Deaton and Zaidi 1999.) As a result, expenditures areroutinely used in measuring poverty. We overcome the ab-sence of expenditure data by using the information collectedon assets owned by household members and on housing char-acteristics. These data are used to generate an asset index that

    proxies for wealth and hence for long-run economic status.Three clarifications will avoid confusion at the outset.

    First, we are not proposing this asset index as a measureeither of current welfare or of poverty.1 We use the assetindex as a determinant of current enrollment, which de-

    pends on long-run as well as current expenditures: house-

    holds will smooth schooling expenditures over time and areunlikely to respond to temporary shocks by withdrawingchildren from school.

    Second, unlike Montgomery et al. (2000), we are not cre-ating an asset index as a proxy for current consumption ex-

    penditures. We view the asset index and expenditures as aproxy for something unobserved: a households long-run eco-nomic status. Therefore, although it is reassuring that thesetwo proxies are related, discrepancies in the classification ofhouseholds cannot be assumed to be mistakes of the assetindex. The two, as proxies, have conceptually and empiricallydistinct limitations; therefore discrepancies could just as eas-ily indicate limitations of current consumption expenditures.The major problem with the asset index is that the weights on

    individual indicators are not grounded theoretically. The ma-jor problem with current expenditures as a proxy for long-runwealth is the presence of short-term fluctuations.2

    Third, we emphasize that the principal-components ap-proach is a pragmatic response to a data constraint problem.In this paper we set the modest goal of establishing the valid-ity of this particular approach in this application; we do notestablish that our approach is optimal because there may beother methods that possess superior statistical properties.

    Construction of an Asset Index

    How does one aggregate various asset ownership indicatorsinto one variable to proxy for household wealth? Even ifthe question is simplified by limiting the aggregation to a

    linear index, how should the weights be chosen? Equalweights have the appeal of simplicity and apparent objectiv-ity, but these qualities only mask the fact that the impositionof numeric equality is completely arbitrary.

    A second option would be to estimate the current valueof household assets using explicit and implicit prices asweights. If the data did not include current values, then the

    purchase price, the date of purchase, and a suitable deprecia-tion rate could be used to estimate them. The DHS and NFHSdata, however, are typical in that they include binary indica-tors on asset ownership and housing characteristics but notcurrent value, purchase price, or vintage of assets.

    A third possible solution is not to build an index at all butsimply to enter all of the asset variables separately in a linear

    multivariate regression equation. This procedure implicitlycreates weights on the variables. Such an approach handlesthe problem of controlling for wealth in estimating the im-

    pact of other, nonwealth, variables. Yet the linear index of theassets using regression weights does not estimate the wealtheffect because many assets exert both a direct and an indirecteffect on outcomes. For instance, the households use of elec-tricity for lighting may serve as a proxy for wealth, but alsomay make study easier and hence may lower the opportunitycosts of schooling. The availability of piped water not onlymay indicate greater wealth but also may reduce the timeneeded for water collection and thus may reduce the opportu-nity costs of schooling. This argument is even clearer withhealth outcomes because water and sanitation variables have

    strong independent effects on childrens health (Bonilla-Chacin and Hammer 1999). Therefore, although linear regres-sion coefficients implicitly produce weights for the linear in-dex of the asset variables that predicts the dependent variablemost closely, there is no way to infer from these uncon-strained coefficients the impact of an increase in wealth.

    Using Principal Components

    We implement a different approach: we use the statisticalprocedure of principal components to determine the weightsfor an index of the asset variables. Principal components is atechnique for extracting from a set of variables those feworthogonal linear combinations of the variables that capturethe common information most successfully. Intuitively the

    first principal component of a set of variables is the linearindex of all the variables that captures the largest amount ofinformation that is common to all of the variables. (For read-able and intuitive descriptions of principal components, seeLindeman, Merenda, and Gold 1980; StataCorp 1999.)

    Suppose we have a set ofNvariables, a*1jto a*N j, repre-senting the ownership ofNassets by each householdj. Prin-cipal components starts by specifying each variable normal-ized by its mean and standard deviation: for example, a1j=(a*1j a*1) / (s*1), where a*1 is the mean of a*1j across

    1. The limitation in poverty analysis is twofold. First, the conventionalnotion of poverty is based on the flow of consumption relative to some pre-determined poverty threshold, whereas we, by aggregating assets, are estab-lishing only a measure of a stock. Second, the categorization used is basedon a relative measure (that is, the households ranking within the distribu-

    tion), whereas poverty thresholds typically are based on the expendituresnecessary for the consumption of a determined bundle of goods.

    2. Current expenditures would be a perfect measure only under the unre-alistically strong assumptions of perfect foresight and perfect capital markets.Current expenditures are a popular proxy for long-run wealth for two mainreasons: the theoretical justification that expenditures are superior to currentincome as a proxy for long-run income because of consumption smoothing,and, perhaps even more important, the pragmatic justification that expendi-tures are easier to measure than income in most rural settings. Current expen-ditures, however, are not preferred unambiguously to asset ownership on ei-ther of these dimensions: asset ownership also reflects smoothing and is (ifanything) easier to measure than either income or expenditures.

  • 8/13/2019 FilmerP01

    3/18

    ESTIMATING WEALTH EFFECTS 117

    holds surveyed in each Indian state varies from 9,963 inUttar Pradesh to around 1,000 in the small northeasternstates. The survey includes data on 21 asset indicators thatcan be grouped into three types: household ownership ofconsumer durables, with eight questions (clock/watch, bi

    cycle, radio, television, bicycle, sewing machine, refrigera-tor, car); characteristics of the households dwelling, with12 indicators (three about toilet facilities, three about thesource of drinking water, two about rooms in the dwellingtwo about the building materials used, and one each aboutthe main source of lighting and cooking); and householdlandownership.

    Table 1 reports the scoring factors from the principalcomponents analysis of the 21 variables (see Eq. (3)). Themean value of the index is 0; the standard deviation is 2.3Because all the asset variables (except number of rooms)take only the values 0 or 1, the weights have an easy inter-

    pretation: a move from 0 to 1 changes the index by f1i /s*

    (reported in column 4). A household that owns a clock has

    an asset index higher by 0.54 than one that does not; owning a car raises a households asset index by 1.21 units; us-ing biomass as the main cooking fuel lowers the asset index

    by 0.67.We sort individuals by the asset index and establish cut-

    off values for percentiles of the population. We then assignhouseholds to a group on the basis of their value on the indexFor expository convenience, we refer to the bottom 40% aspoor, the next 40% as middle, and the top 20% as rich,asking the reader to remember that this classification doesnot follow any of the usual definitions of poverty.

    The difference in the average index between the pooresand the middle group is 2.07 units. One example of a combi-nation of assets that would produce this difference is owning

    a radio (0.54), having a kitchen as a separate room (0.37)having electricity for lighting (0.57), and not having a dwelling of all low-quality materials (0.55). The average asset index is 3.78 units higher for the richest than for the middlegroup. This difference is equivalent to owning a motorscooter (0.91) and a television (0.83), having a flush toile(0.75) and a house of all high-quality materials (0.73), andnot using biomass as the main cooking fuel (0.67).

    The Reliability of the Asset Index

    For India the asset index performs well on three dimensionsFirst, it is internally coherent because average asset owner-ship differs markedly across the poor, middle, and rich house-holds for each asset; second, it is robust to the assets included

    third, it produces reasonable comparisons with measures ofstate-level poverty and output. The index has drawbacks, however, especially possible problems with urban/rural compari-sons.

    Internal coherence.The last three columns of Table 1compare the average ownership of each asset across the

    poor, middle, and rich households. We find large differencesacross groups for almost all assets. Clock ownership is 16%for the poor versus 98% for the rich. Also, the poor cookwith biomass fuel almost exclusively (96%), whereas only

    households and s*1is its standard deviation. These selectedvariables are expressed as linear combinations of a set of un-derlying components for each household j:

    a1j= v11A1j+ v12A2j+...+ v1NANj... j= 1,...J

    aNj= vN1A1j+ vN2A2j+...+ vNNANj, (1)

    where the As are the components and the vs are the coeffi-cients on each component for each variable (and do not varyacross households). Because only the left-hand side of eachline is observed, the solution to the problem is indeterminate.

    Principal components overcomes this indeterminacy byfinding the linear combination of the variables with maxi-mum variancethe first principal componentA1j and thenfinding a second linear combination of the variables, orthogo-nal to the first, with maximal remaining variance, and so on.Technically the procedure solves the equations (R nI)vn=0 for nand vn, where Ris the matrix of correlations betweenthe scaled variables (the as) and vn is the vector of coeffi-

    cients on the nth component for each variable. Solving theequation yields the characteristic roots of R, n(also knownas eigenvalues) and their associated eigenvectors, vn. The fi-nal set of estimates is produced by scaling the vns so the sumof their squares sums to the total variance, another restrictionimposed to achieve determinacy of the problem.

    The scoring factors from the model are recovered byinverting the system implied by Eq. (1), and yield a set ofestimates for each of theNprincipal components:

    A1j=f11a1j+f12a2j+...+f1NaNj... j= 1,...J

    ANj=fN1a1j+fN2a2j+...+fNNaNj. (2)

    The first principal component, expressed in terms of the

    original (unnormalized) variables, is therefore an index foreach household based on the expression

    A1j=f11(a*1j a*1)/(s*1)+...+f1N(a*Nj a*N) / (s*N). (3)

    The crucial assumption for our analysis (and it is just thatanassumption) is that household long-run wealth explains themaximum variance (and covariance) in the asset variables.There is no way to test this assumption directly, but in the nexttwo sections we provide evidence that this method producesreasonable results. Below we illustrate the method using thestates of India, and show the internal and external coherence ofthe asset index. Then we compare this the method with the useof consumption expenditures in three countries; we find thatthis simple asset index has reasonable agreement with con-

    sumption expenditures and performs as well, or better, in pre-dicting education outcomes in two of the countries.

    CONSTRUCTION AND INTERNAL VALIDITY OFTHE ASSET INDEX IN INDIA

    Construction of the Asset Index Using the NFHSData

    The NFHS survey covers about 88,000 households andabout one-half million individuals. The number of house-

  • 8/13/2019 FilmerP01

    4/18

    118 DEMOGRAPHY, VOLUME 38-NUMBER 1, FEBRUARY 2001

    tion between poor and rich on housing variables not relatedto infrastructure, such as all high-quality materials in thedwelling (less than 1% of the poor versus 82% of the rich)and having a kitchen as a separate room (31% of the poorversus 85% of the rich).

    22% of the rich do so. One might ask whether the asset in-dex tends too much to reflect community variables (espe-cially locally available infrastructure such as electricity forlighting or piped water) rather than household-specific vari-ables. On this score, we are reassured by the clear separa-

    TABLE 1. SCORING FACTORS AND SUMMARY STATISTICS FOR VARIABLES ENTERING THE COMPUTATION OF THE

    FIRST PRINCIPAL COMPONENT

    All India Means_______________________________________________ _________________________________Scoring Scoring Poorest Middle Richest

    Factors Mean SD Factor SD 40% 40% 20%Own Clock/Watch 0.270 0.533 0.499 0.54 0.164 0.739 0.985

    Own Bicycle 0.130 0.423 0.494 0.26 0.264 0.510 0.621

    Own Radio 0.248 0.396 0.489 0.51 0.101 0.522 0.838

    Own Television 0.339 0.209 0.407 0.83 0.000 0.127 0.866

    Own Sewing Machine 0.253 0.182 0.385 0.66 0.015 0.179 0.580

    Own Motorcycle/Scooter 0.249 0.082 0.274 0.91 0.001 0.031 0.375

    Own Refrigerator 0.261 0.068 0.252 1.04 0.000 0.006 0.353

    Own Car 0.129 0.012 0.107 1.21 0.000 0.001 0.059

    Drinking Water From Pump/Well 0.192 0.609 0.488 0.39 0.800 0.569 0.242

    Drinking Water From Open Source 0.041 0.040 0.195 0.21 0.057 0.036 0.005

    Drinking Water From Other Source 0.002 0.019 0.138 0.01 0.016 0.027 0.012

    Flush Toilet 0.308 0.217 0.412 0.75 0.005 0.175 0.797

    Pit Toilet/Latrine 0.040 0.086 0.280 0.14 0.040 0.127 0.111

    None/Other Toilet 0.001 0.001 0.029 0.03 0.001 0.001 0.001

    Main Source of Lighting Electric 0.284 0.510 0.500 0.57 0.143 0.700 0.989

    Number of Rooms in Dwelling 0.159 2.676 1.957 0.08 1.975 2.965 3.739

    Kitchen a Separate Room 0.183 0.536 0.499 0.37 0.312 0.643 0.848

    Main Cooking Fuel Biomass (Wood/Dung/Coal) 0.281 0.776 0.417 0.67 0.956 0.841 0.224

    Dwelling All High-Quality Materials 0.309 0.237 0.425 0.73 0.005 0.218 0.821

    Dwelling All Low-Quality Materials 0.273 0.483 0.500 0.55 0.832 0.308 0.017

    Own > 6 Acres Land 0.031 0.115 0.319 0.10 0.075 0.155 0.126

    Economic Status Index 0.000 2.32 2.00 0.071 3.857

    Notes:Each variable beside number of rooms takes the value 1 if true, 0 otherwise. Scoring factor is the weight assigned to each variable (normalized by itsmean and standard deviation) in the linear combination of the variables that constitute the first principal component. The percentage of the covariance explained

    by the first principal component is 26%. The first eigenvalue is 5.37; the second eigenvalue is 1.66.Source:Authors calculation from NFHS 19921993.

    TABLE 2. CLASSIFICATION DIFFERENCES OF THE POOREST 40%: ALL INDIA

    All Variables Except Asset Ownership,Base Case: Drinking Water and Housing, and Only 8 Asset

    All Variables Toilet Facilities Land Ownership Ownership Variables

    Poorest 40% 100.0 95.1 87.7 80.2

    Middle 40% 0.0 4.9 12.3 19.7

    Richest 20% 0.0 0.0 0.0 0.1

    Total 100.0 100.0 100.0 100.0

    Spearman RankCorrelation Coefficient,Ranking of Households 1.0 .94 .87 .79

    Source:Authors calculation from NFHS 19921993.

  • 8/13/2019 FilmerP01

    5/18

    ESTIMATING WEALTH EFFECTS 119

    Robustness.The asset index produces very similar clas-sifications when different subsets of variables are used in itsconstruction. Table 2 reports the percentage of householdsclassified in the poorest 40% when all assets are used, com-

    pared with indices based on (1) all the variables except those

    related to drinking water and toilet facilities, (2) ownershipof consumer durables, housing quality, number of rooms, andland ownership, and (3) only the ownership of consumerdurables (e.g., watch, radio). Almost no households classi-fied in the poorest group by the index using all variableswould be classified as rich by any of the more limited mea-sures. The robustness of the classification is similar for themiddle and the rich groups.

    A more general measure of the differences in rankingscan be derived from the rank correlation coefficient, whichcompares the degree to which two methods produce the sameranking of households. Even when the index is constructedwith only consumer durables, the correlation with the indexthat uses all the variables is .79 (all correlation coefficients

    in Table 2 are significant; p < .001, N = 87,175). Addingmore variables in constructing the index only increases thesimilarity of the rankings.

    An additional check for robustness is made by using adifferent methodology for deriving the weights. Althoughthe theoretical underpinnings and the algorithms used infactor analysis are close to those for principal components,the two methodologies differ sufficiently to make factoranalysis a possible alternative approach. The first factor de-rived from a model analogous to that described above yieldsa household ranking that has a .988 Spearman rank correla-tion with a ranking derived from principal components.Clearly, the results drawn from the asset index approach arerobust to whether one picks one or the other of these meth-

    ods.Comparisons across states.Because the poor, middle,

    and rich groups are defined on an all-India basis, statesdiffer as to the percentage of households in each group.Thus we can compare state-by-state rankings with moreconventional measures. The national poverty rate based onconsumption expenditures is 36%, roughly comparable todefining the asset poor as the poorest 40% nationally bythe asset index. The rank correlation of the poverty ratewith the proportion asset poor across states is .794 (p