31
Motivation Data Model Simulation studies Real data analysis Model-based approach for household clustering with mixed scale variables Luis E. Nieto Barajas Departament of Statistics ITAM Trinity College – 22 june 2018 Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 1 / 31

Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Model-based approach for household clustering with mixedscale variables

Luis E. Nieto Barajas

Departament of StatisticsITAM

Trinity College – 22 june 2018

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 1 / 31

Page 2: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Contents

1 Motivation

2 Data

3 Model

4 Simulation studies

5 Real data analysis

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 2 / 31

Page 3: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Context

SEDESOL : Ministry of social development (government dependency)

Aim of SEDESOL : help and improve the social backwardnessTo fulfil it, SEDESOL creates social programmes to target specific needs

Social inclusionLife insurance for single mothersFeeding supportPension for elderly peopleDay care centersMilk provisionetc...

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 3 / 31

Page 4: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Objective

Currently each existing programme has its own selection rules (mainly based onincome)

Objective : Unify the selection procedure by creating clusters of households withsimilar needs and socio-economic conditions.

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 4 / 31

Page 5: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

CONEVAL

How do we measure poverty in households ?In 2009, CONEVAL proposed a new methodology based on multiple dimensions :income dimension, social deprivations and social cohesionMultidimensional poverty indicators :

1 income per capita2 education deprivation3 access to health services4 access to social security5 housing quality6 access to basic public services7 access to feeding

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 5 / 31

Page 6: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

ENIGH

Every 2 years the National Institute for Official Statistics (INEGI) implements asurvey

The survey is based on a complex design of households, i.e., different selectionprobabilities for each household

Survey is nationwide and includes a module of socio-economic conditions

Clusters will be produced based on these 7-dimensional poverty indicators plusfew other variables

Available data are mixed mode, i.e., continuous, discrete, ordinal and nominal

Challenge : Produce clusters of households using mixed scale data taking intoaccount the complex sampling design of the survey

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 6 / 31

Page 7: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Notation

yi = yij , j = 1 . . . , p, i = 1, . . . , n, is the multivariate response

p = total number of variables, such that

p = c + o + mc = # continuous variables,o = # ordinal variables,m = # nominal variables

In summary, y′i = (yi,1, . . . , yi,c , yi,c+1, . . . , yi,c+o, yi,c+o+1, . . . , yi,c+o+m)

For each yi of dim p ⇐⇒ latent zi of dim q, with q ≥ p

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 7 / 31

Page 8: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Latent variables

If yij is continuous⇒ zij = gj (yij ), where gj (·) a normalising transformation

If yij is ordinal with values in ϑk with Kj different values⇒ zij satisfies yij = ϑk iffγj,k−1 < zij ≤ γj,k and γj,0, . . . , γj,Kj +1 are fixed thresholds

Note : a binary variable is a special case of an ordinal variable with Kj = 2

If yij is nominal with Lj categories we need Lj − 1 latentszij = zil , l = c + o +

∑j−1h=c+o+1(Lh − 1) + 1, . . . , c + o +

∑jh=c+o+1(Lh − 1) such that

yij =

Lj , if maxl (zi,l ) < 0k, if zi,s = maxl (zi,l ) & zi,s > 0

with s = c + o +∑j−1

h=c+o+1(Lh − 1) + k, and k = 1, . . . , Lj − 1

Finally, q = c + o +∑c+o+m

j=c+o+1(Lj − 1)

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 8 / 31

Page 9: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Non iid data

Data come from a complex survey sampleIndividual yi has known sampling probability πiωi = 1/πi are sampling design weights or expansion factors

[Lumley, 2010] : Weighted least squares

minn∑

i=1

1πi

(yi − α− βxi )2

[Chambers and Skinner, 2003] : Likelihood re-weighting

n∏i=1

f (yi | θ)1/πi

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 9 / 31

Page 10: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Model

Observed y ⇐⇒ z latents

We proposezi | µi ,Σ ∼ Nq(µi , κπiΣ)

Plus some estimability constraints on Σ

For j = 1, . . . , c⇒ σ2j > 0

For j = c + 1, . . . , c + o⇒ σ2j > 0 if Kj > 2 and σ2

j = 1 if Kj = 2For j = c + o + 1, . . . ,

∑c+o+mh=c+o+1(Lh − 1)⇒ σ2

j = 1To ensure positive definite : Separation strategy

Σ = ΛΩΛ

with Λ = diag(σ1, . . . , σq) and Ω a correlation matrix

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 10 / 31

Page 11: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Priors

For µi : Nonparametric prior [Pitman and Yor, 1997]

µi |Giid∼ G, for i = 1, . . . , n with G ∼ PD(a, b,G0)

with G0(µ) = N(0,Σµ) and Σµ = diag(σ2µ1, . . . , σ

2µq)

G is a.s. discrete G(·) =∑∞

k=1 wkδξk (·) with ξkiid∼ G0 and w1 = v1,

wk = vk∏

l<k (1− vl ), with vkind∼ Be(1− a, b + ka)

Marginally f (µi | µ−i ) =b+a ri

b+n−1 g0(µi ) +∑ri

j=1

n∗j,i−a

b+n−1 δµ∗j,i(µi )

For σ2j :

σ2j

iid∼ IGa(dz0 , d

z1 )

For Ω : Marginally uniform [Barnard et al., 2000]

f (Ω) ∝ |Ω|q(q−1)/2−1

∏j

|Ωjj |

−(q+1)/2

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 11 / 31

Page 12: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Hyper priors

For (a, b) : a ∈ [0, 1) and b > −a

f (a, b) = f (a)f (b | a)

withf (a) = αI0(a) + (1− α)Be(a | da

0 , da1 )

andf (b | a) = Ga(b + a | db

0 , db1 )

For σ2µj :

σ2µj

iid∼ IGa(dµ0 , dµ1 )

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 12 / 31

Page 13: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Posteriors

Posterior inference is based on MCMC

Apart from the conditional posterior distributions of all model parameters, we alsosample from the conditional predictive distribution of zij which are truncatednormals for the latent variables not associated to the continuous variables

Implemented in the R-package BNPMIXcluster

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 13 / 31

Page 14: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Clustering selection

For each MCMC iteration we have an n× n adjacency matrix (1 if elements i and jshare the same value of µ∗ and 0 otherwise)

Compute a similarity matrix (Montecarlo average of all the adjacency matrices).This represents the “average clustering”.

We select the adjacency matrix of the iteration with minimum squared distancefrom the average similarity matrix [Dahl, 2006]

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 14 / 31

Page 15: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Clustering comparison

Get rid of the scales of the variables, define new variables y∗ij as : for a numericalvariable, yij is standardized across all individuals ; for a categorical variable, if thenumber of categories is two then y∗ij = yij , otherwise define y∗il a latent indicatorvariable for each category l = 1, . . . , Lj

HM(C1, . . . ,Cr ) =r∑

k=1

nk

p∗∑j=1

S2kj , where S2

kj =

nk∑i=1

w (k)i y∗ij

2− nk∑

i=1

w (k)i y∗ij

2

,

(1)with w (k)

i = wi/∑

l∈Ckwl the normalized weights and p∗ is the number of

resulting y∗ij variables

We want small HM and small r .

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 15 / 31

Page 16: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Simulation study 1

3 underlying groups defined by three vars. z′ = (z1, z2, z3)

f (z) =13

N (z | µ1,Σ1) +13

N (z | µ2,Σ2) +13

N (z | µ3,Σ3)

µ1 =

225

µ2 =

642

µ3 =

162

Σ1 =

1 0 00 1 00 0 1

Σ2 =

0.1 0 00 2 00 0 0.1

Σ3 =

2 0 00 0.1 00 0 0.1

n = 100 data points

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 16 / 31

Page 17: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

SS1 : Data

−4 −2 0 2 4 6 8

12

34

56

78

0 1 2 3 4 5 6 7 8

Z1

Z2

Z3

−2 0 2 4 6

12

34

56

7

Z1

Z2

−2 0 2 4 6

23

45

67

Z1

Z3

1 2 3 4 5 6 7

23

45

67

Z2

Z3

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 17 / 31

Page 18: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

SS1 : Variables

Considered three scenarios :(I) Three continuous variables (y1, y2, y3) : yi = zi , i = 1, 2, 3.

(II) Two binary variables (y1, y3) : y1 = 11(z1 > 5) and y3 = 11(z3 > 3).(III) Two binary variables (y1, y3), as in (II) plus one ordinal variable

y2 = 11(4 < z2 ≤ 5) + 211(z2 > 5), and a continuous variable y4 ∼ N(0, 1)

Cut-off points (γ0, γ1, γ2) = (−∞, 0,∞) for the binary variables, and(γ0, γ1, γ2, γ3) = (−∞, 0, 4,∞) for the ordinal variable with 3 categoriesPriors on variances :

A) dz0 = dµ0 = 0.1 and dz

1 = dµ1 = 0.1B) dz

0 = dµ0 = 1 and dz1 = dµ1 = 1

C) dz0 = dµ0 = 2.1 and dz

1 = dµ1 = 30

α = 0.5, da0 = da

1 = db0 = db

1 = 1, κ = 1 and πi = 1.

Gibbs sampler with 4700 iterations, a burn-in of 200 and a thinning of 3. A total of1500 MCMC draws were kept for inference

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 18 / 31

Page 19: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

SS1 : Convergence speed

Real cluster

Mod

el c

lust

erite

r=1

1 2 3

010

2030

4050

60

Real cluster

Mod

el c

lust

erite

r=11

1 2 3

24

68

Real cluster

Mod

el c

lust

erite

r=20

1 2 3

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Real cluster

Mod

el c

lust

erite

r=30

1 2 3

1.0

1.5

2.0

2.5

3.0

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 19 / 31

Page 20: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

SS1 : Results

0.00

0.05

0.10

0.15

I (A)

number of clusters

81 83 85 87 89 91 93 95 97

0.00

0.04

0.08

0.12

I (B)

number of clusters

30 33 36 39 42 45 48 52 55

0.0

0.2

0.4

0.6

0.8

I (C)

number of clusters

3 4 5 6

0.00

0.04

0.08

0.12

II (A)

number of clusters

3 6 9 13 17 21 25 29 33 37

0.00

0.04

0.08

0.12

II (B)

number of clusters

3 6 9 13 17 21 25 30 34 38

0.00

0.05

0.10

0.15

0.20

II (C)

number of clusters

3 5 7 9 11 13 15 18

0.00

0.04

0.08

III (A)

number of clusters

74 77 80 83 86 89 92 95 98

0.00

0.04

0.08

0.12

III (B)

number of clusters

54 57 60 63 66 69 72 75

0.00

0.05

0.10

0.15

0.20

III (C)

number of clusters

4 6 8 10 12 14 16

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 20 / 31

Page 21: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Simulation study 2

0 10 20 30 40 50

0.00

0.02

0.04

0.06

0.08

0.10

0.12

z

Den

sity

Took n = 200 mutually exclusive intervals Ai = (τi−1, τi ] where τ0 = 0 andτi = τi−1 + 0.25, for i = 1, . . . , n. Calculated pi = Pr(Ai ) under density f (z)

Simulate a single value zi uniformly from Ai , that is zi ∼ Un(τi−1, τi ], and defineyi = zi , i = 1, . . . , n. Clearly, the data yi would look like a uniform sample in theinterval (0, 50].

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 21 / 31

Page 22: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

SS2 : Priors

Let N=hypothetical population size, define wi = Npi as in a complex samplingdesign. Here

∑ni=1 wi = N and w = N/n.

We consider three scenarios :(IV) Ignoring the sample design, πi = 1 and κ = 1(V) Acknowledging the sample design, πi = 1/wi and κ = w/15

(VI) Acknowledging the sample design, πi = 1/wi and κ = w/25,

Note that wi and κ affect multiplicatively the variance of the latent variables, andsince w/wi = p/pi , where p = (1/n)

∑ni=1 pi , there is no need to specify N

We took prior (C) for the variances and the same all other priors

The Gibbs sampler was run for 4,700 it with a bi of 200 and thinning of 3

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 22 / 31

Page 23: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

SS2 : Results0.

00.

20.

40.

60.

8

IV

number of clusters

1 2 3 4 5 6

0.00

0.10

0.20

0.30

V

number of clusters

3 4 5 6 7 8 9 11 13 15

0.00

0.05

0.10

0.15

VI

number of clusters

3 5 7 9 11 14 17 20 23

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 23 / 31

Page 24: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Real data : EdoMex

Facts of EdoMex :16.2 million inhabitants (13.5% of the Country pop.) Largest State4.2 million householdsENIGH is a sample of 1, 730 households (0.04% of the households)Each of the households in sample represents between 960 and 5, 286 households

w_i

Den

sity

1000 2000 3000 4000 5000

0e+

002e

−04

4e−

046e

−04

8e−

04

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 24 / 31

Page 25: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Real data : Variables

Continuous :Y1 =Income per capita⇒ Z1 = log(Y1 + c), c = ξ0.01

Binary : γj0 = −∞, γj1 = 0, γj2 =∞Y2 =Education deprivation (yes or no)Y3 =Access to health services (yes or no)Y4 =Access to social security (yes or no)Y5 =Housing quality (bad or good)Y6 =Access to basic public services (yes or no)Y7 =Access to feeding (yes or no)

Ordinal : γ8,0 = −∞, γ8,1 = 0, γ8,2 = 4 and γ8,3 =∞Y8 =Education level (0–incomplete primary, 1–incomplete secondary, 2–completesecondary or more)

Nominal :Y9 =Town size (1–[100000,∞), 2–[15000, 100000), 3–2500, 15000), 4–(0, 2500)inhabitants)

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 25 / 31

Page 26: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Real data : Prior specifications

Considered three cases :i) Ignoring the sampling design, πi = 1 and κ = 1ii) Acknowledging the sample design, πi = 1/wi and κ = 2wiii) Acknowledging the sample design, πi = 1/wi and κ = 4w

Prior (C) for the variances and the same all other priors

Gibbs sampler was run for 3,200 it with a bi of 200 and a thinning of 3

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 26 / 31

Page 27: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Real data : (i)

r = 163, HM = 1246

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 27 / 31

Page 28: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Real data : (ii)

r = 35, HM = 2240

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 28 / 31

Page 29: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Real data : (iii)

r = 9, HM = 3000

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 29 / 31

Page 30: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

Real data : Group means (iii)

group income feed health house edu serv ss hedu ts :1 ts :2 ts :3 ts :4 size1 5934 0.22 0.35 0.11 0.32 0.11 0.80 1.91 0.59 0.13 0.15 0.13 36.0%2 11374 0.16 0.34 0.07 0.36 0.07 0.71 1.93 0.64 0.12 0.14 0.10 30.5%3 22682 0.06 0.34 0.01 0.23 0.04 0.67 1.99 0.74 0.09 0.10 0.06 12.4%4 3091 0.26 0.42 0.06 0.43 0.24 0.89 1.78 0.54 0.11 0.16 0.20 9.0%5 1783 0.37 0.23 0.09 0.95 0.41 0.60 0.21 0.41 0.05 0.18 0.36 4.0%6 5006 0.25 0.32 0.13 0.94 0.15 0.64 0.53 0.46 0.11 0.26 0.16 4.2%7 44991 0.04 0.09 0.00 0.14 0.02 0.30 2.00 0.63 0.29 0.06 0.02 2.7%8 570 0.69 0.24 0.46 0.68 0.29 1.00 1.46 0.16 0.06 0.48 0.30 0.7%9 219578 0.16 0.27 0.00 0.00 0.00 0.27 2.00 0.89 0.11 0.00 0.00 0.5%

pop 11212 0.19 0.34 0.08 0.38 0.11 0.74 1.79 0.61 0.12 0.15 0.13 4240837

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 30 / 31

Page 31: Model-based approach for household clustering with mixed ...allman.rhon.itam.mx/~lnieto/index_archivos/Trinity2018.pdf · Model-based approach for household clustering with mixed

Motivation Data Model Simulation studies Real data analysis

References

Barnard, J., McCulloch, R. and Meng, X.-L. (2000). Modeling covariance matrices in terms of standarddeviations and correlations, with application to shrinkage. Statistica Sinica 10, 1281–1311.

Chambers, R. L. and Skinner, C. J. (2003). Analysis of Survey Data. Wiley, Chichester.

Carmona, C., Nieto-Barajas, L. E. & Canale, A. (2017). Model-based approach for household clustering withmixed scale variables. Submitted to Advances in Data Analysis and Classification.

Coneval (2009). Metodología para la medición multidimensional de la pobreza en México. Consejo Nacionalde Evaluación de la Política de Desarrollo Social, México. (In Spanish). Available at :http ://www.coneval.org.mx/rw/resource/Metodologia_Medicion_Multidimensional.pdf

Dahl, D.B. (2006). Model-based clustering for expression data via a Dirichlet process mixture model. InBayesian Inference for Gene Expression and Proteomics, Eds. M. Vanucci, K.-A. Do and P. Müller.Cambridge University Press, Cambridge.

Lumley, T. (2010). Complex Surveys. Wiley, New Jersey.

Pitman, J. and Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from a stablesubordinator. The Annals of Probability 25, 855–900.

Luis E. Nieto Barajas Household clustering Trinity College – 22 june 2018 31 / 31