Upload
gilbert-kelley
View
231
Download
1
Tags:
Embed Size (px)
Citation preview
Incomplete data:
Indirect estimation ofmigration flows
Modelling approaches
Aim:
Synthetic data base byeffective combination of data
from different sources
Requirements
• Data representation: a mathematical model of the ‘complete’ or desired migration data
• Data types: the different ways of measuring migration– Data on events [relocations] (‘movement data’)
• Migrations
– Data on changes in status [place of residence]• Migrants
Requirements• Typology of missing or incomplete data
• Related to data types: what is missing?
• Typology of available data• Related to data types: what is available?
– Primary data– Auxiliary data (e.g. historical migration matrix)
• Measure of reliability of available data.
• Method to infer missing data from available statistical data and ‘soft’ information on migration
Existing approaches
• Net migration: residual method• Gross migration flows: spatial interaction models
– Gravity model– Entropy maximisation– Information-theoretic approaches– Iterative proportional fitting (bi- and
multiproportional adjustment [RAS])
• Age profile: model migration schedules
The approach• Migration is a manifestation of behavioural
processes and random processes (choice and chance)
• Describe the processes and get plausible/accurate parameter estimates based on the (incomplete) data and additional information
• Apply the model to predict migration flows
Data types
• Micro-data– Migration data (event data)
• Occurrence of migration in observation period
• Time at migration
– Migrant data (status data; transition data)• Current status
• Status at two or more points in time (panel)– Equal interval
– Unequal interval (e.g. place of birth and place of current residence)
• Grouped data
Data types
• Micro-data• Grouped data (aggregate data; tabulations)
– Migrations (events)– Migrants (transitions)
• Observation in continuous time (e.g. population register)
• Observation in discrete time
Types of incompleteness
• Non-response
• Net migration vs gross flows
• Migrants vs migrations (events)
• Single migration recorded instead of sequence of migrations (e.g. last migration)
• Partially missing data– e.g. Origin by age or covariates– Some information missing for some persons
Solutions to incomplete data
• Collect missing information
• Use ancillary data and/or information on comparable population
• Live with it and minimise distortions caused by missing data
• Infer missing data from all the information you can get (combine sources)
Probability models of migration
Migration is a realisation of a Poisson process
]exp[- !
} Pr{ ij
ij
ij
ijij
nnN
nij
] exp[ μμμμλAB
ij
B
j
A
iij
μ ln μμμλAB
ij
B
j
A
iij
N N λ ]Var[λ ]E[ ijij ijij
Log-rate model: rate = events/exposure
]u u u [u exp mm
N ABij
Bj
Ai
ij
ij
ij
ij E
mijjiij cijjiij exp
Gravity model
!ln - - ln n) ;,l( nmmn ijijjiijjiij
]exp[- !
} Pr{ ij
ij
ij
ijij
nnN
nij
mijjiij
α̂i ni
jβ̂j mij
(4.6)
l j
n j
j
i
i mij 0
β̂j n j
iα̂i mij
(4.7)
l i
ni
i
j
j mij 0
α(2r 1)i
ni
j
β(2r 2)j mij
β(2r)j
n j
i
α(2r 1)i mij
DSF procedure (DSF = Deming, Stephan, Furness) (Sen and Smith, 1995, p. 374)
RAS, Biproportional adjustment, etc.
l i
ni i
i
0
l j
n j j
j
0
Likelihood equations may be written as:
Marginal totals are sufficient statistics
λ̂ij α̂i β̂j mij β̂j mij
jβ̂j mij
ni λ̂ij
λ̂ini π̂ij ni (5.1)
mijjiij
α̂i ni
jβ̂j mij
(4.6)
A different way of writing the spatial interaction model:
Link Poisson - Multinomial
The gravity model is a log-linear model
The entropy model is a log-linear model
The RAS model is as log-linear (log-rate) model
Parameter estimation
• Maximise (log) likelihood function: probability that the model predicts the data
• Expectation: predict E[Nrs] = rs given the model and initial parameter estimates.
• Maximisation: maximise the ‘complete-data’ log-likelihood.
f(y; p,µ,2) p1 f1(y; µ1,2) p2 f2(y; µ2,2) i
pi fi(y; µi,2) (5.6)
L(p; µ, ) mk 12i 1
pi fi(yk; µk,2) zki
pi 1m m
k 1zki i 1, 2 (5.7)
Zki : Individual k is member of group i
l k
zk1 ln p1 zk2 ln p2 k
zk1 ln p1 zk2 ln (1 p1)
l(µ,) k
izki lnpi
k
izki lnfi(yk; µk,2) (5.8)
When k and 2 are known, then
p̂1 1nk zk1 (5.9)
Conclusion
• A unified approach to the prediction of migration from different types of data and different data sources
• Approach based on probability theory and theory of statistical inference (not ad hoc)
• The EM algorithm is studied extensively. Much experience gathered.
• ‘Soft’data (e.g. expert opinions) can be added