Analisi Statistica del Reddito e delle Condizioni di Vita Capitolo 3 Povertà a livello locale

Analisi Statistica del Reddito e delle Condizioni di Vita

Capitolo 3

Povertà a livello locale

2

Concetti introduttivi- Modelli con effetti casuali di area - Poverty mapping - Metodo Empirica Best per la stima di misure di povertà tradizionali e Fuzzy per piccole aree* -Applicazioni e sviluppi

* Questo paragrafo costituisce un approfondimento.

3

Concetti introduttivi - 1

• La stima per piccole aree rappresenta uno strumento molto utile quando si deve misurare la povertà e la disuguaglianza a livello regionale, ma i dati campionari sono disponibili solo a livello nazionale. In questo caso sono necessarie tecniche statistiche e metodologie economiche per utilizzare informazioni ausiliarie.

• Il termine piccola area può essere riferito (Rao, 2003) sia ad aree geografiche di piccole dimensioni, sia a domini formati da sub-popolazioni definite sulla base di particolari caratteristiche demografiche o sociali.

4


• In letteratura sono classificati come modelli per piccole aree quei modelli che utilizzano informazioni ausiliarie disponibili a livello di piccola area e a livello di singola unità campionaria (nucleo familiare o individuo).

• Esiste una vasta gamma di tecniche di stima per piccole aree, e si tratta di un ambito di ricerca in continua espansione. L’adattabilità e l’efficienza di una tecnica rispetto ad un’altra, varia a seconda della specificità delle situazioni e della natura dei dati a disposizione.

5


• I metodi di stima per piccole aree possono essere classificati secondo il tipo di inferenza in tre gruppi:

• i) metodi basati sul disegno (o campionari);

• ii) metodi assistiti da modello;

• iii) metodi basati sul modello (approccio predittivo).

6


• Per i metodi del gruppo (i) il parametro di interesse viene stimato utilizzando i procedimenti campionari classici basati sulla distribuzione di probabilità indotta dal disegno di campionamento. Con questo metodo il parametro è pensato come una costante e gli stimatori sono corretti rispetto al disegno di campionamento applicato. La loro variabilità però, cresce al diminuire della numerosità del campione e può accadere che nessuna unità campionaria sia presente nella piccola area, impedendo così di ottenere una stima del parametro di interesse di piccola area.

• Questa classe è composta solo da metodi diretti, e ne fanno parte gli stimatori classici, tra i quali il più utilizzato è quello di Horvitz e Thompson.

7


• Per i metodi del gruppo (ii) l’inferenza è basata sul disegno e sul modello. L’obiettivo è quello di ottenere stimatori corretti indipendentemente dalla scelta del modello, sfruttando le informazioni derivanti dal disegno campionario.

• Questa classe è formata dallo stimatore diretto di regressione e da molti altri indiretti, tra i quali gli stimatori sintetici e quelli combinati.

8


• Per i metodi del gruppo (iii) l’aspetto rilevante è costituito dal fatto che il parametro oggetto di studio non è pensato come una costante, ma come una variabile casuale.

• Appartengono a questa categoria i modelli di piccola area (Small Area Models).

• Questi modelli prevedono la presenza di effetti casuali di area (Area Level Random Effects Model, Fay e Herriot, 1979), che vengono utilizzati quando l’informazione ausiliaria è disponibile solo a livello di area.

9

Modelli con effetti casuali di area - 1

• Come già anticipato, questi modelli possono essere utilizzati quando l’informazione ausiliaria esiste allo stesso livello di disaggregazione territoriale per il quale devono essere calcolati gli indici di povertà e disuguaglianza.

• Questi modelli collegano i parametri di interesse alle variabili ausiliarie a livello di piccole aree, considerando gli effetti casuali indipendenti. Il modello base include gli effetti casuali specifici di ogni area. Il vettore di p variabili ausiliarie a livello di piccola area è:

•(5.1) piiii xxx ,2,1, ,...,,x

10


• I parametri di interesse θi (totali, medie, proporzioni, eccetera) possono essere così indicati:

•(5.2)

• dove i=1,…,m, zi sono costanti positive note, β è il parametro di regressione del vettore px1, m sono le piccole aree e vi sono variabili casuali indipendenti e identicamente distribuite con media 0 e varianza σv2.

iiii vzx

11


• Inoltre si ipotizza che gli stimatori diretti siano disponibili per le piccole aree, non distorti dal disegno, e che sia valido il seguente modello:

•(5.3)

• dove ei sono gli errori campionari nell’area i, indipendenti, con media 0 e varianza ψi, questo significa che si tratta di stimatori corretti rispetto al disegno.

iii e

12


• Combinando le equazioni (5.2) e (5.3) riportate sopra, si ottiene il seguente modello lineare ad effetti misti di Fay e Herriot (1979):

•

(5.4)

• Esso considera gli effetti casuali di area vi, gli errori di campionamento ei ed assume la loro indipendenza.

• Questo è un caso particolare del modello lineare misto con una struttura della covarianza diagonale, così come la maggior parte dei modelli di stima per piccole aree suggeriti in letteratura.

iiiii evzx

13

Stimatore BLUP e EBLUP

• Utilizzando i risultati generali del modello lineare ad effetti fissi e casuali si può determinare il predittore ottimo lineare e corretto, BLUP (Best Linear Unbiased Predictor) per il modello di stima per piccole aree a livello di area per θi:

• (5.5)• dal quale si deduce che esso è una media

ponderata dello stimatore diretto e dello stimatore sintetico di regressione , dove è lo stimatore BLUE (Best Linear Unbiased Estimator) di β.

ˆ1ˆˆiiiii x

ixi

14

Poverty mapping - 1

Questa metodologia, facente parte delle metodologie di stima per piccole aree, combina le informazioni censuarie e quelle campionarie per produrre delle mappe disaggregate a livello territoriale. Queste mappe sono necessarie per descrivere la distribuzione spaziale della povertà e della disuguaglianza in un paese; non si tratta però esclusivamente di mappe, ma di database ad alta disaggregazione.

15

Poverty mapping - 2

• La procedura è più impegnativa rispetto al metodo EBLUP per quanto riguarda i dati che sono necessari (dati censuari a livello micro), benché non sia richiesto un abbinamento tra i dati censuari e campionari a livello di micro disaggregazioni.

• L’idea di base è quella di stimare un modello di regressione lineare con le componenti della varianza a livello locale (small area), utilizzando le informazioni provenienti dai campioni più piccoli, le informazioni aggregate dei censimenti, e dove possibile, integrarle con altre fonti.

16

Poverty mapping - 3• La variabile dipendente del modello di regressione è

costituita dal reddito disponibile familiare o dal consumo. La stima della distribuzione di queste variabili può essere utilizzata per generare la distribuzione in ogni sottopopolazione censuaria, condizionata alle caratteristiche osservate nella sottopopolazione stessa.

• Dalla stima della distribuzione di una variabile monetaria nei dati censuari, o in ogni sottopopolazione, può essere fatta una stima delle misure di povertà o di ineguaglianza.

• Per valutare la precisione delle stime è necessario che gli errori standard di queste misure siano calcolati utilizzando le procedure appropriate che vedremo successivamente.

17

BASIC IDEA

To estimate a linear regression model with local variance components on the LSMS data (the dependent variable is a monetary variable) – ESTIMATION (Stage 1)

The distribution of the dependent variable is used to generate the distribution for any subpopulation in the Census conditional to the observed data – IMPUTATION or SIMULATION (Stage 2)

Poverty mapping - 4

18

Stage 1: Estimation

The model: it is a linear approximation to the conditional distribution of the logarithm consumption expenditure of household h in cluster c,

The error component is specified to allow for a within cluster correlation in disturbances.

IMPORTANT: six different models have been estimated.

chTchch

Tchchch uxuxyEy |lnln

Poverty mapping - 5

19

Stage 2: Simulation

The estimates obtained are applied to the Census data to simulate the expenditure for each household in the Census.100 simulation has been conducted The simulated values are:

The beta coefficients , are drawn from a multivariate normal distribution with mean and variance covariance matrix equal to the one associated to .

ˆ exp Tch ch c chy x

~

Poverty mapping - 6

20

For the residual, any specific distributional form assumption has been avoided so the residual are drawn directly from the estimated residuals.

For each of the simulated consumption expenditure distributions a set of poverty and inequality measures has been calculated.

Mean over all the simulations point estimatesStandard deviation over all the simulations bootstrapping standard error.

Poverty mapping - 7

21

Caso di Studio: Albania

POVERTY AND INEQUALITY MEASURES The procedure for estimating the poverty and inequality measures has been applied for the whole of Albania and disaggregated at seven levels:a) Rural – urban level;b) The four strata used in sampling the LSMS;c) The six strata for which the linear regression models have been estimated;d) The 12 Prefectures;e) The 36 Districts;f) The 374 Communes/Municipalities;g) The 11 Mini-municipalities in which the city of Tirana is divided.

22

Table 1: Head Count Ratio and Per-capita Consumption: comparison between LSMS and Census

Head count

LSMS Head count

Census Consumption

LSMS Consumption

Census ALBANIA 25.39 28.60 7,800.82 7,569.67

(1.32) (1.28) (117.68) (120.21)

STRATUM 1 20.60 26.64 8,419.25 8,148.48

(2.22) (1.94) (218.07) (249.18)

STRATUM 2 25.57 29.49 7,496.12 7,177.76

(2.32) (2.32) (193.63) (222.95)

STRATUM 3 44.54 40.85 6,168.34 6,181.78

(2.51) (1.60) (149.86) (120.69)

STRATUM 4 17.82 18.01 9,042.59 8,981.39

(2.06) (1.09) (304.96) (140.85)

23

• THE MAPS

• Maps 1,2: Very spatial heterogeneity among Prefectures

• Maps 3,4: Low heterogeneity among

Districts within the Prefecture to which they belong

• Maps 5,6: Heterogeneity among

Municipalities within the same District

24

Analysis of the relationship between inequality in the whole Country and inequality within and between its regions

• Maps 7,8: 2/3 of the Prefecture have HCR and C significantly different from the National level.

• Maps 9,10: less than 20% of the District have HCR and C significantly different from the Prefecture they belong

• Maps 11,12: more than 40% of the Municipality have HCR and C significantly different from the District to which they belong

25

Figure 1. Head Count Ratio Figure 2. Per Capita Consumption by Prefectures.

26

Figure 3. Head Count Ratio Figure 4. Per Capita Consumption by District.

27

Figure 5. Head Count Ratio Figure 6. Per Capita Consumption by Municipality.

28

Figure 7. Prefectures Level Head Count Ratio versus Albania Head Count Ratio

Figure 8. Prefecture Level Per Capita Consumption versus Albania Per Capita Consumption

29

Figure 9. District Level Head Count Ratio versus Prefecture Level Head Count Ratio

Figure 10. District Level Per Capita Consumption versus Prefecture Level Per Capita Consumption

30

Figure 11. Commune Level Head Count Ratio versus District Level Head Count Ratio Figure 12. Commune Level Per Capita Consumption versus District Level Per Capita Consumption

31

Consideriamo un vettore casuale y contenente i valori di una variabile casuale per le unità di una popolazione finita tale che dove ys è il sub-vettore degli elementi campionati e yr il sub-vettore degli elementi non campionati. L’obiettivo è predire il valore di una funzione misurabile reale del vettore casuale y usando i dati campionati ys. Il miglior stimatore (BP) di delta è la funzione di ys che minimizza l’errore quadratico medio dello stimatore. Formalmente:

(3.1)

* Questo paragrafo consiste in un approfondimento

Empirical Best* - 1

),( ''rs yyy

)(yh

)|(ˆ 0sy

Br

E y

32

Generalmente, dipende da un vettore di parametri non noti theta che può essere sostituito con un opportuno stimatore, ottenendo così un BP empirico di (EB).E’ interessante notare che, quando y segue una distribuzione Normale con vettore medio per una matrice nota X, matrice di covarianza positiva V, e la quantità da predire è una funzione lineare di y, allora lo stimatore EB è uguale allo stimatore BLUP visto nella lezione di Lunedì 12 Aprile.

Empirical Best - 2

)(yh

B

Xβμ

33

Case Study:Small Area Estimation of poverty

and inequality measures: EBLUP and R software

Gianni BettiSSCU – Kiev, Ukraine

8 April 2010

34

Scope of the presentation

• Introduce the problem

• Small area estimation techniques

• The BLUP and EBLUP

• EBLUP at Oblast level in Ukraine

• Codes in R software

35

Why small area estimators ?

• Sample household surveys as EU-SILC, ECHP

and HBS are traditionally designed for

performing estimates at National level.

• In certain cases, when the sample size is

particularly large, the estimates may also be

significant at (large) regional level.

• However, often the sub-samples are large

enough for large regions, but not sufficient for

smaller regions.

36

Example of yesterday: Gini coefficent at Oblast level: how large are the standard errors ?

37

Why small area estimators ?

• We have taken into account a simpler

statistic:

• Head Count Ratio: HCR=FGT(0)

• Monetary variable: Total “equivalent”

consumption expenditure

• Equivalence scale: 70-70 Academy of

Science

38

Results of direct estimates and standard errors

Oblast n est se gamma est stat_se ratio_est ratio_MSE

(1) (2) (3) (4) (5) (6) (7)=(5)/(2) (8)=(6)/(3)

1 AR Crimea 462 24,89% 3,50% 0,74 23,75% 3,36% 0,95 0,96

5 Vinnytska 426 24,30% 3,38% 0,75 26,79% 3,22% 1,10 0,95

7 Volynska 287 47,30% 5,92% 0,50 43,10% 5,08% 0,91 0,86

12 Dnipropetrovska 760 27,22% 3,59% 0,73 29,38% 3,43% 1,08 0,96

14 Donetska 734 25,58% 2,85% 0,81 25,02% 2,79% 0,98 0,98

18 Zhytomyrska 326 34,22% 6,04% 0,49 34,59% 4,82% 1,01 0,80

21 Zakarpatska 310 20,30% 4,99% 0,58 23,17% 4,80% 1,14 0,96

23 Zaporizka 441 24,39% 4,04% 0,68 22,91% 3,70% 0,94 0,92

26 Ivano-Frankivska 300 20,09% 3,41% 0,75 22,41% 3,34% 1,12 0,98

32 Kyivska 350 21,17% 4,86% 0,60 18,39% 4,52% 0,87 0,93

35 Kirovogradska 291 47,09% 5,05% 0,58 42,42% 4,37% 0,90 0,86

44 Luganska 566 29,20% 3,12% 0,78 29,45% 3,08% 1,01 0,99

46 Lvivska 563 29,25% 2,81% 0,82 28,96% 2,73% 0,99 0,97

48 Mykolaivska 312 19,81% 3,53% 0,74 20,34% 3,30% 1,03 0,93

51 Odeska 393 38,48% 4,53% 0,63 36,91% 4,18% 0,96 0,92

53 Poltavska 432 27,47% 3,23% 0,77 28,10% 3,16% 1,02 0,98

56 Rivnenska 287 39,28% 6,02% 0,49 39,23% 5,22% 1,00 0,87

59 Sumska 315 29,07% 5,92% 0,50 27,85% 4,78% 0,96 0,81

61 Ternopilska 250 42,77% 7,16% 0,41 38,32% 5,86% 0,90 0,82

63 Kharkivska 585 20,59% 2,28% 0,87 20,43% 2,26% 0,99 0,99

65 Khersonska 319 32,96% 4,66% 0,62 30,44% 4,03% 0,92 0,86

68 Khmelnytska 328 29,79% 4,02% 0,69 28,49% 3,59% 0,96 0,89

71 Cherkaska 394 18,37% 3,72% 0,72 20,23% 3,44% 1,10 0,93

73 Chernivetska 237 29,60% 4,27% 0,66 28,74% 3,80% 0,97 0,89

74 Chernigivska 366 28,62% 4,34% 0,65 29,22% 3,94% 1,02 0,9180 Kyiv 494 9,63% 1,63% 0,93 9,56% 1,66% 0,99 1,02

85 Sevastopil 94 2,70% 3,23% 0,77 4,81% 3,20% 1,78 0,991,02 0,92

39

So: small area estimators

• Fundamental aspects of our approach

• Making the best use of survey data (precise standard errors)

• Aggregated information from diverse sources(as administrative registers or other surveys)

• Using them in combination – small area estimation

40

Choice of the “Region” - 1

• Oblast level could be the first administrative level of disaggregation

• It could be necessary to go further, i.e. to estimate measures at a smaller level

• So: how to choose the unit which serves as a “region” ?

41

Choice of the “Region” - 2 Basic choices:• Geographical units based on or defined according to

some functional criteria. Example: Labour Market Regions - useful for specific policy purposes; less suited for general use, and for comparisons across regions of the Europe (EU and non EU countries)

• Units defined in terms of the urban-rural classification (more elaborate than a simple ‘urban-rural’ dichotomy)- but no agreed criteria as to the definition of Urban-Rural

• Units based on administrative/political criteria, specifically NUTS regions Most suited, but must be supplemented by analysis using - other types of units - also by (non-geographical) population subgroups

42


In the European Union the NUTS classification has been officially chosen by Eurostat

The Nomenclature of Territorial Units for Statistics (NUTS) was established by Eurostat more than 30 years ago in order to provide a single uniform breakdown of territorial units for the production of regional statistics for the European Union.

43


1. Most commonly used for social policy (e.g., National Action Plans/incl)

2. Comparability facilitated by a common framework

3. Exhaustive and non-overlapping coverage of the population

4. Hierarchical structure provides framework for integration of the information across levels

5. Communication: this type of units already widely understood, accepted, and used

6. Data availability – e.g. Eurostat Free Dissemination Database (NewCronos) Links with information from many other sources based on NUTS classification

44

Application to the

HCR=FGT(0) Table 1. Covariates available at NUTS1 (OBLAST) level 1 Disposable income Average monthly wage, hrn

2 GDP GDP per capita 2007

3 Activity rate Activity rate for 2008; Males, Females and Total

4 Unemployment rate Unemployment rate 2008; Males, Females and Total

5 Urbanisation Percentage of urban population

6 Population density Population closeness, persons on 1 km2

7 IMR Infant mortality rate 2008; death rate of children under 1 year old

8 HH Size Mean size of household, 2008

9 Turnover Turnover for one person 2008, hrn

10 Youths Percentage of children under 14 years old

11 Elderly people Percentage of people 65 year old and older

45

Performance measures

• Table 2 below shows some performance measures of the SAE Model, where three interesting measures are shown:

• the model parameter gamma (γ). It is the ratio between the model variance and the total variance, and is the share of the weight given to the direct survey estimate in the final composite estimate;

• the ratio between the EBLUP estimated value and the corresponding direct estimate. This is to check the extent to which the modelling changes the input direct estimates;

• the ratio between mean square error (MSE) of the EBLUP estimate of the Oblast, and the MSE of direct survey estimate (which in this case is simply the variance, since the estimates are unbiased). This is to check the extent to which the modelling has improved precision of the estimates.

46

Application to the

HCR=FGT(0) Table 2. Small area (EBLUP) estimates of at-risk-of-poverty rates for Oblasts

Oblast n est se gamma est stat_se ratio_est ratio_MSE

(1) (2) (3) (4) (5) (6) (7)=(5)/(2) (8)=(6)/(3)

1 AR Crimea 462 24,89% 3,50% 0,74 23,75% 3,36% 0,95 0,96

5 Vinnytska 426 24,30% 3,38% 0,75 26,79% 3,22% 1,10 0,95

7 Volynska 287 47,30% 5,92% 0,50 43,10% 5,08% 0,91 0,86

12 Dnipropetrovska 760 27,22% 3,59% 0,73 29,38% 3,43% 1,08 0,96

14 Donetska 734 25,58% 2,85% 0,81 25,02% 2,79% 0,98 0,98

18 Zhytomyrska 326 34,22% 6,04% 0,49 34,59% 4,82% 1,01 0,80

21 Zakarpatska 310 20,30% 4,99% 0,58 23,17% 4,80% 1,14 0,96

23 Zaporizka 441 24,39% 4,04% 0,68 22,91% 3,70% 0,94 0,92

26 Ivano-Frankivska 300 20,09% 3,41% 0,75 22,41% 3,34% 1,12 0,98

32 Kyivska 350 21,17% 4,86% 0,60 18,39% 4,52% 0,87 0,93

35 Kirovogradska 291 47,09% 5,05% 0,58 42,42% 4,37% 0,90 0,86

44 Luganska 566 29,20% 3,12% 0,78 29,45% 3,08% 1,01 0,99

46 Lvivska 563 29,25% 2,81% 0,82 28,96% 2,73% 0,99 0,97

48 Mykolaivska 312 19,81% 3,53% 0,74 20,34% 3,30% 1,03 0,93

51 Odeska 393 38,48% 4,53% 0,63 36,91% 4,18% 0,96 0,92

53 Poltavska 432 27,47% 3,23% 0,77 28,10% 3,16% 1,02 0,98

56 Rivnenska 287 39,28% 6,02% 0,49 39,23% 5,22% 1,00 0,87

59 Sumska 315 29,07% 5,92% 0,50 27,85% 4,78% 0,96 0,81

61 Ternopilska 250 42,77% 7,16% 0,41 38,32% 5,86% 0,90 0,82

63 Kharkivska 585 20,59% 2,28% 0,87 20,43% 2,26% 0,99 0,99

65 Khersonska 319 32,96% 4,66% 0,62 30,44% 4,03% 0,92 0,86

68 Khmelnytska 328 29,79% 4,02% 0,69 28,49% 3,59% 0,96 0,89

71 Cherkaska 394 18,37% 3,72% 0,72 20,23% 3,44% 1,10 0,93

73 Chernivetska 237 29,60% 4,27% 0,66 28,74% 3,80% 0,97 0,89

74 Chernigivska 366 28,62% 4,34% 0,65 29,22% 3,94% 1,02 0,9180 Kyiv 494 9,63% 1,63% 0,93 9,56% 1,66% 0,99 1,02

85 Sevastopil 94 2,70% 3,23% 0,77 4,81% 3,20% 1,78 0,991,02 0,92

47

Comment to the results

• For what it concerns the weights given to direct estimate (gamma), those are lower for those Oblasts with lower sub-sample sizes.

• In these cases, the gain in terms of MSE can reach 20% for Oblasts like Zhytomyrska, Sumska and Ternopilska.

• Moreover, the direct estimates for the City of Sevastopil (2,70%), is considered a too value by any expert in poverty analysis. The final estimate (4,81%) should be a much more unbiased value. Here, the gain in terms of MSE is not large, since the reduction in the original standard error, is compensated by the increase of the real MSE, which is obviously proportional to the magnitude of the estimated measure.

48

Future research

• Define a smaller disaggregation level compared to Oblast level;

• Estimate poverty and inequality (direct) measures at that level;

• Estimate standard errors with Jackknife Repeated Replications (or BRR);

• Identify variables to be used as regressors in the EBLUP model available for that disaggregation;

• Perform EBLUP estimates;• Evaluate the gain in terms of variance

(greater compared to Oblast level).

Documents

Analisi Statistica del Reddito e delle Condizioni di Vita Capitolo 3 Povertà a livello locale