Incomplete data: Indirect estimation of migration flows Modelling approaches

Incomplete data:

Indirect estimation ofmigration flows

Modelling approaches

Aim:

Synthetic data base byeffective combination of data

from different sources

Requirements

• Data representation: a mathematical model of the ‘complete’ or desired migration data

• Data types: the different ways of measuring migration– Data on events [relocations] (‘movement data’)

• Migrations

– Data on changes in status [place of residence]• Migrants

Requirements• Typology of missing or incomplete data

• Related to data types: what is missing?

• Typology of available data• Related to data types: what is available?

– Primary data– Auxiliary data (e.g. historical migration matrix)

• Measure of reliability of available data.

• Method to infer missing data from available statistical data and ‘soft’ information on migration

Existing approaches

• Net migration: residual method• Gross migration flows: spatial interaction models

– Gravity model– Entropy maximisation– Information-theoretic approaches– Iterative proportional fitting (bi- and

multiproportional adjustment [RAS])

• Age profile: model migration schedules

The approach• Migration is a manifestation of behavioural

processes and random processes (choice and chance)

• Describe the processes and get plausible/accurate parameter estimates based on the (incomplete) data and additional information

• Apply the model to predict migration flows

Data types

• Micro-data– Migration data (event data)

• Occurrence of migration in observation period

• Time at migration

– Migrant data (status data; transition data)• Current status

• Status at two or more points in time (panel)– Equal interval

– Unequal interval (e.g. place of birth and place of current residence)

• Grouped data

Data types

• Micro-data• Grouped data (aggregate data; tabulations)

– Migrations (events)– Migrants (transitions)

• Observation in continuous time (e.g. population register)

• Observation in discrete time

Types of incompleteness

• Non-response

• Net migration vs gross flows

• Migrants vs migrations (events)

• Single migration recorded instead of sequence of migrations (e.g. last migration)

• Partially missing data– e.g. Origin by age or covariates– Some information missing for some persons

Solutions to incomplete data

• Collect missing information

• Use ancillary data and/or information on comparable population

• Live with it and minimise distortions caused by missing data

• Infer missing data from all the information you can get (combine sources)

Probability models of migration

Migration is a realisation of a Poisson process

]exp[- !

} Pr{ ij

ij

ij

ijij

nnN

nij

] exp[ μμμμλAB

ij

B

j

A

iij

μ ln μμμλAB

ij

B

j

A

iij

N N λ ]Var[λ ]E[ ijij ijij

Log-rate model: rate = events/exposure

]u u u [u exp mm

N ABij

Bj

Ai

ij

ij

ij

ij E

mijjiij cijjiij exp

Gravity model

!ln - - ln n) ;,l( nmmn ijijjiijjiij

]exp[- !

} Pr{ ij

ij

ij

ijij

nnN

nij

mijjiij

α̂i ni

jβ̂j mij

(4.6)

l j

n j

j

i

i mij 0

β̂j n j

iα̂i mij

(4.7)

l i

ni

i

j

j mij 0

α(2r 1)i

ni

j

β(2r 2)j mij

β(2r)j

n j

i

α(2r 1)i mij

DSF procedure (DSF = Deming, Stephan, Furness) (Sen and Smith, 1995, p. 374)

RAS, Biproportional adjustment, etc.

l i

ni i

i

0

l j

n j j

j

0

Likelihood equations may be written as:

Marginal totals are sufficient statistics

λ̂ij α̂i β̂j mij β̂j mij

jβ̂j mij

ni λ̂ij

λ̂ini π̂ij ni (5.1)

mijjiij

α̂i ni

jβ̂j mij

(4.6)

A different way of writing the spatial interaction model:

Link Poisson - Multinomial

The gravity model is a log-linear model

The entropy model is a log-linear model

The RAS model is as log-linear (log-rate) model

Parameter estimation

• Maximise (log) likelihood function: probability that the model predicts the data

• Expectation: predict E[Nrs] = rs given the model and initial parameter estimates.

• Maximisation: maximise the ‘complete-data’ log-likelihood.

f(y; p,µ,2) p1 f1(y; µ1,2) p2 f2(y; µ2,2) i

pi fi(y; µi,2) (5.6)

L(p; µ, ) mk 12i 1

pi fi(yk; µk,2) zki

pi 1m m

k 1zki i 1, 2 (5.7)

Zki : Individual k is member of group i

l k

zk1 ln p1 zk2 ln p2 k

zk1 ln p1 zk2 ln (1 p1)

l(µ,) k

izki lnpi

k

izki lnfi(yk; µk,2) (5.8)

When k and 2 are known, then

p̂1 1nk zk1 (5.9)

Conclusion

• A unified approach to the prediction of migration from different types of data and different data sources

• Approach based on probability theory and theory of statistical inference (not ad hoc)

• The EM algorithm is studied extensively. Much experience gathered.

• ‘Soft’data (e.g. expert opinions) can be added

Documents

Incomplete data: Indirect estimation of migration flows Modelling approaches