Applying multiple imputation onwaterbird census dataComparing two imputation methods
ISEC, Montpellier, 1 july 2014Thierry Onkelinx, Koen Devos & Paul Quataert
Overzicht
1 Introduction
2 Testing the Underhill index
3 An alternative way of imputing
4 Testing multiple imputation using INLA
5 Conclusions
2 / 23
1 Introduction
2 Testing the Underhill index
3 An alternative way of imputing
4 Testing multiple imputation using INLA
5 Conclusions
3 / 23
Waterbird census in Flanders (Belgium)
• Aim: monitor winteringbirds
• Total number• Average over months
per winter
• Data collected byvolunteers
• 1200 sites• 23 winters• 6 months per winter
• Missing data on average26% (9% - 42%)
• Impution required
35°
40°
45°
50°
55°
60°
−10° 0° 10° 20° 30°
4 / 23
Underhill-index
Described by Underhill and Prys-Jones (1994)Algorithm
1 Replace all missings with starting value
2 Fit model to imputed dataset
3 Predict missing data
4 Replace imputed value with rounded prediction when prediction islarger
5 Re-iterate from 2. until imputations are stable
Negative binomial GLM with global effects for winter, month and site
Potential problems• Imputed values can never decrease: risk for bias
• Imputing with model predictions: risk for reduced standard errors
5 / 23
Underhill-index
Described by Underhill and Prys-Jones (1994)Algorithm
1 Replace all missings with starting value
2 Fit model to imputed dataset
3 Predict missing data
4 Replace imputed value with rounded prediction when prediction islarger
5 Re-iterate from 2. until imputations are stable
Negative binomial GLM with global effects for winter, month and site
Potential problems• Imputed values can never decrease: risk for bias
• Imputing with model predictions: risk for reduced standard errors
5 / 23
1 Introduction
2 Testing the Underhill index
3 An alternative way of imputing
4 Testing multiple imputation using INLA
5 Conclusions
6 / 23
Test setup
• Generate dataset (40 sites, 24 winters, 6 months)• Remove 25% data completely at random• Impute missing data• Calculate total per winter and month over all sites• Model totals
• Poisson regression with overdisperion
• Estimate per winter
• Relative bias: exp(βUnderhill )exp(βcomplete)
• Relative SE:σβUnderhill
σβcomplete
1 Original Underhill-index, starting value = zero
2 Original Underhill-index, starting value = geometric mean
3 Altered Underhill-index, starting value = zero
4 Altered Underhill-index, starting value = geometric mean
Altered index: replace imputation with rounded predictions
7 / 23
Test setup
• Generate dataset (40 sites, 24 winters, 6 months)• Remove 25% data completely at random• Impute missing data• Calculate total per winter and month over all sites• Model totals
• Poisson regression with overdisperion
• Estimate per winter
• Relative bias: exp(βUnderhill )exp(βcomplete)
• Relative SE:σβUnderhill
σβcomplete
1 Original Underhill-index, starting value = zero
2 Original Underhill-index, starting value = geometric mean
3 Altered Underhill-index, starting value = zero
4 Altered Underhill-index, starting value = geometric mean
Altered index: replace imputation with rounded predictions
7 / 23
Test setup
• Generate dataset (40 sites, 24 winters, 6 months)• Remove 25% data completely at random• Impute missing data• Calculate total per winter and month over all sites• Model totals
• Poisson regression with overdisperion
• Estimate per winter
• Relative bias: exp(βUnderhill )exp(βcomplete)
• Relative SE:σβUnderhill
σβcomplete
1 Original Underhill-index, starting value = zero
2 Original Underhill-index, starting value = geometric mean
3 Altered Underhill-index, starting value = zero
4 Altered Underhill-index, starting value = geometric mean
Altered index: replace imputation with rounded predictions
7 / 23
Data generating model
Counts follow negative binomial distribution
• Fixed size• Variable mean (defined on log-scale)
• Intercept
• Linear trend and random walk along winter
• Random intercept and random walk along winter per site
• Sine wave within a winter with variable phase among winters
• Gaussian noise at observation level
• Different dataset per simulation• All datasets based on the same hyperparameters
8 / 23
Example of simulated dataset
0
50
100
150
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Time (winters)
True
mea
n pe
r si
te
Site 12 Site 13 Site 18 Site 27
Site 30 Site 31 Site 34 Site 8
10
20
30
40
468
10
510152025
20304050
10
20
30
40
4
8
12
16
1.01.52.02.53.0
2
3
4
5
2 4 6 2 4 6 2 4 6 2 4 6
Month
True
mea
n pe
r si
te
Year
1
2
3
4
5
Figure : Example of simulated dataset
9 / 23
Evaluation of Underhill index
● ● ●● ●●●● ● ●●● ● ●●● ● ●●● ●●●● ●●● ●●●● ●● ●● ● ●●● ●● ● ●● ●● ● ●● ● ●● ●●●● ●●●● ●● ●● ●● ●●● ●●● ●●● ●●●● ●● ● ●● ●●●●● ●●● ● ●●●●●
● ● ●● ●●●● ● ●●● ● ●●● ● ●●● ●●●● ●●● ●●●● ●● ●● ● ●●● ●● ● ●● ●● ● ●● ● ●● ●●●● ●●●● ●● ● ●● ●●● ●●● ●●● ●●●● ●● ● ●● ●●●●● ●●● ● ●●●●●
● ● ●● ●●●● ● ●●● ● ● ●●● ● ●●● ●●●● ●●● ●●●● ●● ●● ● ●●● ●● ● ●● ●● ● ●● ● ●● ●●●● ●●●● ●● ●● ●● ●●● ●●● ●●● ●●●● ●● ● ●● ●●●●● ●●● ● ●●●●●
●● ●● ●● ●● ●●●●●● ●● ● ●● ● ●●● ● ● ● ●●● ●● ● ●● ●● ●● ● ●●●●● ●● ● ●●●● ●● ●●● ●●●●● ●● ● ●● ●●● ●●●●● ●●● ●● ●● ●●● ●● ●●● ●● ● ●●● ● ●●● ●● ●●● ●●●● ●●●●● ●● ●● ●●● ● ● ●●● ●●●● ●●●● ●●● ● ●●● ● ●● ●●●●● ●●●● ●●● ●● ● ● ● ●●●●● ●●● ●●
Altered
Original
80% 100% 120% 140% 160%
Relative bias, 100% = complete dataset
Alg
orith
m
Startingvalue
Mean
Zero
Relative bias
●● ●● ● ●● ● ●●●● ●● ●● ● ●●●●● ●●●● ●●●●● ●●● ●● ●●● ●●●● ● ●●●● ● ●●● ●●● ●●● ●●●
●● ●● ● ●● ● ●●●● ●● ●● ●● ●●●●● ●●●● ●●●●● ●●● ●● ●●● ●●●● ● ●●●● ● ●●● ●●● ●●● ●●●
●● ●● ● ●● ● ●●●● ●● ●● ●● ●●●●● ●●●● ●●●●● ●●● ●● ●●● ●●●● ● ●●●● ● ●●● ●●● ●●● ●●●
● ●● ● ●●● ●●● ● ●●● ●● ●●●●● ●●● ● ● ●●● ●● ●●● ●● ●●● ●●● ●●● ● ●● ●●●●●●●● ●●● ● ●● ● ●●● ●● ●●● ● ● ● ●● ●● ●●●●●
Altered
Original
70% 80% 90% 100%
Relative SE, 100% = complete dataset
Alg
orith
m
Startingvalue
Mean
Zero
Relative SE
Figure : Evaluation of the Underhill index
10 / 23
1 Introduction
2 Testing the Underhill index
3 An alternative way of imputing
4 Testing multiple imputation using INLA
5 Conclusions
11 / 23
Requirements
• Choose model that doesn’t require starting values
• We choose a negative binomial GLMM
• Winter and month as fixed effect (factors)• Site as random intercept• Fitted in R (R Core Team 2014) with INLA (Rue et al. 2009)
• Take the uncertainty of predictions into account
• Sample from negative binomial distribution
• Size
• Sample from distribution of hyperparameter
• Mean
• Sample from gaussian distribution• Mean and SE of prediction on the link scale
12 / 23
Requirements
• Choose model that doesn’t require starting values
• We choose a negative binomial GLMM
• Winter and month as fixed effect (factors)• Site as random intercept• Fitted in R (R Core Team 2014) with INLA (Rue et al. 2009)
• Take the uncertainty of predictions into account
• Sample from negative binomial distribution
• Size
• Sample from distribution of hyperparameter
• Mean
• Sample from gaussian distribution• Mean and SE of prediction on the link scale
12 / 23
1 Introduction
2 Testing the Underhill index
3 An alternative way of imputing
4 Testing multiple imputation using INLA
5 Conclusions
13 / 23
Test setup
• Same test datasets as for testing Underhill-index• Fit INLA model to observed dataset• Generate M sets of imputed values• For each set m
• Calculate total per winter and month over all sites
• Model totals
• Save the regression parameters βi m and their SE σi m
• Aggregate over all M sets (Rubin 1987)
β̄i =1
M
M∑m=1
βi m
σ̄i =
√√√√ 1
M
M∑m=1
σ2i m +
M + 1
M
M∑m=1
(βi m − β̄i )2
M − 1
14 / 23
Test setup
• Same test datasets as for testing Underhill-index• Fit INLA model to observed dataset• Generate M sets of imputed values• For each set m
• Calculate total per winter and month over all sites
• Model totals
• Save the regression parameters βi m and their SE σi m
• Aggregate over all M sets (Rubin 1987)
β̄i =1
M
M∑m=1
βi m
σ̄i =
√√√√ 1
M
M∑m=1
σ2i m +
M + 1
M
M∑m=1
(βi m − β̄i )2
M − 1
14 / 23
Evaluation of multiple imputation
●●●●●● ●● ●●●●● ●●● ●●●● ●●● ● ●●● ●● ●● ●● ●●● ●●●●●● ●●●●●●●● ● ●● ●●● ●● ●●●●●●●● ●● ●● ●● ●● ●●●●●● ●●● ● ● ●●●● ●●●●● ●●● ●●●●●● ● ●●●●●●●● ●●●● ●●●●● ●●●●● ●●● ● ●●●●● ● ●●●●●●●●●● ●●● ●●● ● ●●●●● ●●●●●●●●●● ● ●●● ●●●● ●● ● ●● ●●●●● ●● ● ●● ●● ●● ●●●●●● ●●●●● ●● ●●● ● ●●●●● ●● ●●●●● ●●●● ●● ● ●●● ●●● ●●●● ●●● ●●●●● ●●● ●●● ●●●● ●●●●● ●● ●●● ●●●● ●●●● ●●●●●●● ●● ●●●●●●● ●●● ●●● ●● ●●●●● ●● ● ●●●●● ●● ●● ●●● ●● ●●● ●●● ●●●● ●●●●● ● ●● ●●●● ●●● ●●●● ●●● ●●●●●● ●●●●●●● ●●● ●●● ●● ●●● ●●● ● ●●● ● ●● ●● ●●●●●● ●● ●●●● ●●●● ●● ● ●● ●●● ●●● ●● ●● ●● ●● ●●●●● ●●● ●●●●● ● ●●●● ●●● ●●●● ●●● ●● ● ●● ● ●● ●●●● ●● ●●●●● ●●●● ● ●●●● ●● ●●●●●●● ●●● ●●●● ●● ● ●●●● ●●●● ●●● ●● ●●● ●● ●●●● ●●●●● ●●● ●●●●●●●● ●●●●● ●● ●●●● ●●●● ●●● ● ●●●●● ●●● ●●●●●●● ●●●●● ●●●●● ●● ●●● ●● ●●● ●●●● ●●●● ●●●● ●● ●●● ● ● ●●● ●●● ●●●●●● ● ●● ●● ●●●● ● ●●●● ●● ●● ●●● ●● ●●●● ●●●● ● ●●●● ●● ●●●●● ●●●●●● ●● ●●●●● ●● ●● ● ●●●● ●●●● ● ● ●●●●● ●●● ●●●● ●●●●● ●●●● ●●● ●●●● ●●● ●● ●●● ● ●● ●● ●●● ●● ●● ●●●●●●●● ●● ●● ●●●● ●●●●● ●●● ●●●●● ●●●● ●●●●● ●● ●●● ● ●●● ●● ●●● ●●●● ●● ●●● ●●● ● ●●●● ●● ● ●●● ●● ●● ● ●●●●●● ●●● ●● ●●● ●● ●● ●●●●● ●● ●●●●●●●●●●● ●● ●● ●●●●● ●● ●●●●● ●●●●● ●● ●● ●●●●●●● ●●● ●● ●●●●● ●●●● ●● ●●● ●●●● ●●●●● ●●●● ●● ●● ●● ●●●●●● ● ●●●● ●● ●● ●● ●● ●● ●●● ●●●● ●●● ●●● ●● ●●●● ●● ● ●● ●● ●●● ●●●●● ●●● ●●● ●●●●● ●●● ●●● ●● ●
●● ●● ●● ●● ●● ●● ●● ●●●●●● ●●●●● ● ●●●●●●●● ●●●●● ● ● ●● ●●●● ● ● ●●●● ●●●●●●● ●● ● ●● ●●●●● ● ●●●●● ●●● ● ●● ●●●●●● ●● ●●● ● ● ●● ●●●●●●●●● ● ●●●●●● ●●● ●● ●● ●●●● ●● ●● ●● ●●●● ● ●● ●● ●● ●●● ●● ●●●● ●●●● ●●● ●● ● ●● ●●● ●● ●● ●● ●●● ●●● ●● ●● ●●●●● ●●●●● ● ●● ●●● ● ●●● ●●●●● ●● ● ●●●● ● ●● ●●● ●● ●●● ● ●●●●●●●●●●● ●●● ●●●●● ●●●● ●●●● ●●● ● ●●●● ●● ● ● ●●●●●● ● ●●● ●● ●●●● ●● ●● ●●●●●● ●●●●●● ● ●●●●●● ●●●●● ●● ●● ●● ●●●●● ●●●●● ● ● ●●● ● ● ●●● ●● ●● ●●
● ● ●● ●●●● ● ●●● ● ●●● ●● ● ●● ●●●● ●●● ● ●● ●● ●● ●● ●●● ● ●● ● ●●● ●● ● ●● ●●● ●●● ●●● ●● ●● ●●● ●● ●●●● ●● ●● ●●● ●●● ●●● ● ●● ●●●●● ●●● ●● ●●●●
● ●● ●●● ● ● ●●● ●●● ● ●●●● ●●● ●●● ●● ●● ●●● ●●●● ● ●● ●● ● ● ●● ●● ●● ● ●● ●●● ●● ●● ●●● ●● ●●●● ●●●● ●● ●●● ●● ●●● ● ●● ●● ●● ●
●●● ● ●● ●● ● ●●● ●●● ● ● ●● ●● ●●● ●●● ●●● ● ●●●●● ●●●●●● ●● ●● ● ●●● ● ●● ●●●●● ● ●● ●●● ●● ●● ● ●● ●●● ●●●●●● ●● ● ● ●● ● ●
1%
5%
25%
50%
75%
60% 80% 100% 120% 140%
Relative bias, 100% = complete dataset
Pro
port
ion
of m
issi
ng d
ata
Relative bias
●●●●●● ●●●●●● ●●●●●●●●●●● ●●● ●● ●● ●●●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●● ●●●●● ●●●●●●● ●●●●●● ●●●● ● ●●●●● ●● ●●●●● ● ●●●●●●●● ●●●●●●●●●●● ● ●●●●●●●● ●●●●●● ●●●● ●●●●●●●●● ●●●●● ●●● ●●● ●●●●●●● ●●●●●●● ●●● ●● ●●●●●● ●●●● ●●●●●●●●● ●●●●●● ●●● ●●● ●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●●●●●●●●● ● ●●● ● ●●●●●●●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●● ●●●● ●●●●●● ●●●●●●●● ●●●●● ●● ●●●●●●●●●●● ●●● ●●●●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●● ●
● ●●● ●●●●● ●●●●● ●● ●●●●●● ●●●●●● ●●● ●●● ●●● ●●● ●●●●●●●●● ●● ● ●●●●● ●●●●●●● ●●● ●● ●●●●●●●● ●●●● ●●●●● ● ●●● ●● ● ●● ●●● ●● ●● ●● ●●●●●●● ● ●●●●●●●● ●●●● ●●● ●●●●●● ●●●●● ●●●●● ●●●● ● ●● ●●● ●● ●●●● ●●● ●●●●● ●● ●● ●●●●● ●●● ●● ●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ●●●●●●● ●●● ●●● ● ●● ●● ●●●● ●●●●
●● ●●● ●●● ● ●●● ●● ● ●● ●● ●●●●●● ●●● ●● ●●● ●●●● ● ● ●●●● ● ●●● ● ●● ●●●● ●●● ●●● ● ●●● ●● ●● ● ●● ●●● ●●●● ●●●●●● ●● ●●●●● ●●●● ●●● ●● ●●● ● ●● ●●● ●●●● ●● ●●● ●●●
●●●●● ●● ●● ● ●●● ● ●● ● ●●● ● ●● ●●●● ●● ●●● ●●●● ● ●●●●● ●●● ●●● ● ● ●●● ●● ●● ●● ●● ●●● ● ●●●● ●● ●●● ●● ●● ●●● ●● ● ●●●● ●● ●●●● ●●● ● ●● ● ●●● ●●●● ●● ●
●● ● ●●● ● ●●●●● ●● ●●● ● ●●●● ●●●●●●● ●● ● ●● ●●● ●● ●●● ●● ● ●● ●● ●●● ● ●●● ●● ●●● ●●● ●● ●●● ●●● ●●● ●●● ●●●● ● ●●● ●● ●●
1%
5%
25%
50%
75%
90% 110% 130%
Relative SE, 100% = complete dataset
Pro
port
ion
of m
issi
ng d
ata
Relative SE
Figure : Evaluation of multiple imputation
15 / 23
Effect of design
●●● ●● ●●● ●●●● ●● ●● ● ●●●●● ●●● ●● ●● ●● ●● ●● ● ●● ●●● ●● ● ●● ● ●●● ● ● ●● ●● ● ● ●●● ●● ●●●● ●●● ● ●● ●●● ●● ●●●● ●● ●● ●●● ●●●● ●●●●● ●● ●●● ●● ●●●
●● ●● ●●● ●● ●● ●●● ●●●● ●●●●● ● ●●● ● ● ●●● ● ●● ●●●●●● ● ●● ●●● ●● ●● ●● ●● ●●●● ●● ●●●● ●● ●●
●● ●●● ●●● ●●● ●● ● ●●● ●●● ●● ●● ●● ●● ● ●●● ●●●●●
● ●● ●● ●●● ● ●● ●● ●●●●● ●● ●●● ●●● ●●●● ●●● ● ●● ●●● ●●●● ● ●●●● ●●● ●●●● ● ●● ●●●● ● ●●● ● ●● ●●●●● ●●●● ●● ●● ● ●●
● ●● ●● ●● ●●● ●● ●●● ●●●●● ● ● ●● ●● ●●●● ●● ● ● ●●●●● ●●●●●● ● ●
●● ●● ●●● ●● ●● ● ● ●●● ●● ●
● ●●●● ●● ●●● ●● ●● ●● ●●● ●● ●● ●●● ●●●●● ●●●●● ●●●●● ●●●●● ● ●●● ●●● ●● ● ●● ●●●●● ●●
●● ●●●● ●●● ●●● ●● ●●● ●●● ●● ●● ●●●● ●●●● ●● ●●
●●● ● ●●●● ●●● ● ●●● ●● ●●●
●● ●●● ● ●●● ● ●● ● ●● ●● ●● ●● ●●● ●●● ● ●● ●●●● ●●● ●●● ●●● ●●● ●●●● ● ●● ●●● ● ●●● ●● ●●● ● ●
●●● ●●●● ●●●●● ●● ●● ●● ● ● ●●●● ●●●●● ●● ●●●
●●● ●●● ●● ●● ●●
10
20
40
80
75% 100% 125% 150%
Relative bias, 100% = complete dataset
Num
ber
of s
ites Winters
5
10
20
Relative bias
●● ●● ● ●●●●● ●●●● ●●●●● ●●● ●●●● ● ● ● ●● ●●● ● ●●●●● ●● ●● ● ●●● ● ●● ●●● ●● ●●●● ●● ●● ●●● ●● ●●● ●● ● ●● ●● ●● ●●●● ●●●● ●● ● ●●●● ●●●● ●
●● ●●● ●● ● ●●● ● ●● ● ●●●● ●●● ●● ● ●● ●● ● ●●● ●●● ●●●●●●● ●● ●●
●● ●● ●● ● ● ●●●● ●●● ●
●● ● ● ●●● ●●● ●●● ●●●● ● ●●●●●● ●● ●●● ●● ●●●● ●●●● ● ●●● ●● ● ●● ●●● ●● ●●●● ●●● ●● ●●●● ● ●● ●●●● ●● ●● ●●●●● ●●● ●● ●●● ●●●
●● ●●●●● ●●●●● ●● ● ● ● ●● ●●● ●● ●● ● ●● ● ●●●●● ●● ● ●●● ● ●● ●●●● ●● ●
●● ●● ●● ●●● ●● ● ●●● ●● ● ● ●●●●● ● ●
●● ●●●●●● ●●●● ●● ●● ●● ●●●●●● ●● ●●● ●●●● ●●● ●●●●● ● ●●●●●● ● ●●●●● ● ●●●●●●●● ● ●●●● ●●● ●● ● ●●●●● ●●● ●●●● ●● ● ●●●●● ●●● ●● ●●●●● ● ●● ●
●●● ●●●●● ●● ●●●●●● ●●●●●●●●● ●●●● ● ●
● ●● ●●●● ● ●●● ● ●●● ●●●●
●●●●●● ●● ●● ●● ●● ●●●● ● ●●● ● ●●●●●●● ●●● ●●●● ●●●● ●●
●●● ●●
●●●● ●●●●● ●●● ● ●●●●●●●● ●
10
20
40
80
80% 100% 120% 140% 160%
Relative SE, 100% = complete dataset
Num
ber
of s
ites Winters
5
10
20
Relative SE
Figure : Effect of design
16 / 23
Effect of model for imputation
● ● ●● ●●●● ● ●●● ● ●●● ●● ● ●● ●●●● ●●● ● ●● ●● ●● ●● ● ●● ● ●● ● ●●● ●● ● ●● ●●● ●●● ●●● ●● ●● ●●● ●● ●●●● ●● ●● ●●● ●●● ●● ● ● ●● ●●●●● ●●● ●● ●●●●
● ● ● ●●●● ● ●● ●● ● ●● ●● ●●● ●●●● ● ●●● ●● ● ●● ●● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ●●● ●●● ●● ●●● ●● ● ●● ● ● ● ●●● ● ●● ● ●●●● ●● ●● ●
● ●●●●●●● ●● ●● ● ●●●●● ● ● ● ● ●● ●● ●● ●●●● ●● ● ●● ●● ● ●● ●●●●● ●●●● ●● ● ●● ●● ●● ●●● ●●●● ● ●●● ●●●● ●● ●● ●● ●● ●● ●● ●● ● ●●●● ● ●●●● ●● ●●●● ●●●●●●● ●● ●● ●●● ●● ●●●●
Constantsite
Randomwalk
per site
Truemean
70% 80% 90% 100% 110% 120% 130%
Relative bias, 100% = complete dataset
Mod
el fo
r im
puta
tion
Relative bias
●● ●●● ●●● ● ●●● ●● ● ●● ●● ●● ●●●● ●● ● ●● ●● ● ●●●● ● ● ●●●● ● ●●● ● ●● ●●●● ●●● ●●● ● ●●● ●● ●● ● ●● ●●● ●●●● ●●●●●● ●● ●●●●● ●●●● ●●● ●● ●●● ● ●● ●● ● ●●●● ●● ●●● ●●●
●● ● ●● ●●● ● ●●● ●● ●●● ●●●●● ● ● ●● ●● ● ●●●● ● ● ●●●● ●●●● ●● ● ● ●● ●●● ●●● ● ● ● ●●●●● ●●●●● ●● ●● ●● ● ●● ●● ●● ●● ● ●●● ● ●●●● ●● ●●●● ●●●● ● ●●● ●● ●●●
●● ● ● ●● ●●● ●● ●● ●● ●●● ●●● ●● ●● ● ●● ● ●●●● ●●●● ●●●●● ●● ●● ●● ● ●● ●● ●● ●●● ● ●●● ●● ●● ●● ●● ●●● ● ●●●● ●● ●● ●● ●● ●● ●●●●●● ●●●●● ●●● ●●● ●● ●●●● ● ●●● ●●
Constantsite
Randomwalk
per site
Truemean
90% 100% 110% 120% 130%
Relative SE, 100% = complete dataset
Mod
el fo
r im
puta
tion
Relative SE
Figure : Effect of imputation model
17 / 23
Real life examples
●●
●
●
● ●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●●
●
●●
●
●●
●
●
● ●
● ● ●
●
● ●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●● ●
●
● ●
●
●● ●
●●
●
● ●●
● ●●
Anser brachyrhynchus: 10% missing, 18 sites, 4 months Numenius arquata: 24% missing, 208 sites, 6 months
Anas platyrhynchos: 40% missing, 852 sites, 6 months Haematopus ostralegus: 42% missing, 270 sites, 6 months
0
10,000
20,000
0
2,000
4,000
6,000
8,000
0
25,000
50,000
75,000
100,000
0
5,000
10,000
15,000
20,000
1995 2000 2005 2010 2015 2000 2005 2010 2015
1995 2000 2005 2010 2015 1995 2000 2005 2010 2015
Winter
Win
ter
aver
age
Figure : Reallife examples
18 / 23
1 Introduction
2 Testing the Underhill index
3 An alternative way of imputing
4 Testing multiple imputation using INLA
5 Conclusions
19 / 23
Underhill-index
• Must use zero as starting value
• Otherwise biased upward
Underestimates standard errors• Incorrect Type I errors!
• Too optimistic
20 / 23
Multiple imputation
• Unbiased estimates
Increased standard errors• Imputation = more uncertainty
• Implies lower power• Sample size actual dataset < sample size complete dataset
• Increase of standard error depends on
1 Proportion of missing data
2 Size of dataset
3 Imputation model
• Marginal improvement with increased number of imputations
21 / 23
Questions?
ReferencesR Core Team. 2014. “R: A Language and Environment for StatisticalComputing.” Vienna, Austria: R Foundation for Statistical Computing.http://www.r-project.org/.Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. NewYork, NY: John Wiley & Sons, Ltd.Rue, H\aa vard, Sara Martino, Finn Lindgren, Daniel Simpson, and AndreaRiebler. 2009. INLA: Functions Which Allow to Perform Full BayesianAnalysis of Latent Gaussian Models Using Integrated Nested LaplaceApproximation.Underhill, L. G., and R. P. Prys-Jones. 1994. “Index Numbers for WaterbirdPopulations. I. Review and Methodology.” Journal of Applied Ecology 31(3): 463–480. doi:10.2307/2404443.
22 / 23