Transcript

Applying multiple imputation onwaterbird census dataComparing two imputation methods

ISEC, Montpellier, 1 july 2014Thierry Onkelinx, Koen Devos & Paul Quataert

Overzicht

1 Introduction

2 Testing the Underhill index

3 An alternative way of imputing

4 Testing multiple imputation using INLA

5 Conclusions

2 / 23

1 Introduction

2 Testing the Underhill index

3 An alternative way of imputing

4 Testing multiple imputation using INLA

5 Conclusions

3 / 23

Waterbird census in Flanders (Belgium)

• Aim: monitor winteringbirds

• Total number• Average over months

per winter

• Data collected byvolunteers

• 1200 sites• 23 winters• 6 months per winter

• Missing data on average26% (9% - 42%)

• Impution required

35°

40°

45°

50°

55°

60°

−10° 0° 10° 20° 30°

4 / 23

Underhill-index

Described by Underhill and Prys-Jones (1994)Algorithm

1 Replace all missings with starting value

2 Fit model to imputed dataset

3 Predict missing data

4 Replace imputed value with rounded prediction when prediction islarger

5 Re-iterate from 2. until imputations are stable

Negative binomial GLM with global effects for winter, month and site

Potential problems• Imputed values can never decrease: risk for bias

• Imputing with model predictions: risk for reduced standard errors

5 / 23

Underhill-index

Described by Underhill and Prys-Jones (1994)Algorithm

1 Replace all missings with starting value

2 Fit model to imputed dataset

3 Predict missing data

4 Replace imputed value with rounded prediction when prediction islarger

5 Re-iterate from 2. until imputations are stable

Negative binomial GLM with global effects for winter, month and site

Potential problems• Imputed values can never decrease: risk for bias

• Imputing with model predictions: risk for reduced standard errors

5 / 23

1 Introduction

2 Testing the Underhill index

3 An alternative way of imputing

4 Testing multiple imputation using INLA

5 Conclusions

6 / 23

Test setup

• Generate dataset (40 sites, 24 winters, 6 months)• Remove 25% data completely at random• Impute missing data• Calculate total per winter and month over all sites• Model totals

• Poisson regression with overdisperion

• Estimate per winter

• Relative bias: exp(βUnderhill )exp(βcomplete)

• Relative SE:σβUnderhill

σβcomplete

1 Original Underhill-index, starting value = zero

2 Original Underhill-index, starting value = geometric mean

3 Altered Underhill-index, starting value = zero

4 Altered Underhill-index, starting value = geometric mean

Altered index: replace imputation with rounded predictions

7 / 23

Test setup

• Generate dataset (40 sites, 24 winters, 6 months)• Remove 25% data completely at random• Impute missing data• Calculate total per winter and month over all sites• Model totals

• Poisson regression with overdisperion

• Estimate per winter

• Relative bias: exp(βUnderhill )exp(βcomplete)

• Relative SE:σβUnderhill

σβcomplete

1 Original Underhill-index, starting value = zero

2 Original Underhill-index, starting value = geometric mean

3 Altered Underhill-index, starting value = zero

4 Altered Underhill-index, starting value = geometric mean

Altered index: replace imputation with rounded predictions

7 / 23

Test setup

• Generate dataset (40 sites, 24 winters, 6 months)• Remove 25% data completely at random• Impute missing data• Calculate total per winter and month over all sites• Model totals

• Poisson regression with overdisperion

• Estimate per winter

• Relative bias: exp(βUnderhill )exp(βcomplete)

• Relative SE:σβUnderhill

σβcomplete

1 Original Underhill-index, starting value = zero

2 Original Underhill-index, starting value = geometric mean

3 Altered Underhill-index, starting value = zero

4 Altered Underhill-index, starting value = geometric mean

Altered index: replace imputation with rounded predictions

7 / 23

Data generating model

Counts follow negative binomial distribution

• Fixed size• Variable mean (defined on log-scale)

• Intercept

• Linear trend and random walk along winter

• Random intercept and random walk along winter per site

• Sine wave within a winter with variable phase among winters

• Gaussian noise at observation level

• Different dataset per simulation• All datasets based on the same hyperparameters

8 / 23

Example of simulated dataset

0

50

100

150

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Time (winters)

True

mea

n pe

r si

te

Site 12 Site 13 Site 18 Site 27

Site 30 Site 31 Site 34 Site 8

10

20

30

40

468

10

510152025

20304050

10

20

30

40

4

8

12

16

1.01.52.02.53.0

2

3

4

5

2 4 6 2 4 6 2 4 6 2 4 6

Month

True

mea

n pe

r si

te

Year

1

2

3

4

5

Figure : Example of simulated dataset

9 / 23

Evaluation of Underhill index

● ● ●● ●●●● ● ●●● ● ●●● ● ●●● ●●●● ●●● ●●●● ●● ●● ● ●●● ●● ● ●● ●● ● ●● ● ●● ●●●● ●●●● ●● ●● ●● ●●● ●●● ●●● ●●●● ●● ● ●● ●●●●● ●●● ● ●●●●●

● ● ●● ●●●● ● ●●● ● ●●● ● ●●● ●●●● ●●● ●●●● ●● ●● ● ●●● ●● ● ●● ●● ● ●● ● ●● ●●●● ●●●● ●● ● ●● ●●● ●●● ●●● ●●●● ●● ● ●● ●●●●● ●●● ● ●●●●●

● ● ●● ●●●● ● ●●● ● ● ●●● ● ●●● ●●●● ●●● ●●●● ●● ●● ● ●●● ●● ● ●● ●● ● ●● ● ●● ●●●● ●●●● ●● ●● ●● ●●● ●●● ●●● ●●●● ●● ● ●● ●●●●● ●●● ● ●●●●●

●● ●● ●● ●● ●●●●●● ●● ● ●● ● ●●● ● ● ● ●●● ●● ● ●● ●● ●● ● ●●●●● ●● ● ●●●● ●● ●●● ●●●●● ●● ● ●● ●●● ●●●●● ●●● ●● ●● ●●● ●● ●●● ●● ● ●●● ● ●●● ●● ●●● ●●●● ●●●●● ●● ●● ●●● ● ● ●●● ●●●● ●●●● ●●● ● ●●● ● ●● ●●●●● ●●●● ●●● ●● ● ● ● ●●●●● ●●● ●●

Altered

Original

80% 100% 120% 140% 160%

Relative bias, 100% = complete dataset

Alg

orith

m

Startingvalue

Mean

Zero

Relative bias

●● ●● ● ●● ● ●●●● ●● ●● ● ●●●●● ●●●● ●●●●● ●●● ●● ●●● ●●●● ● ●●●● ● ●●● ●●● ●●● ●●●

●● ●● ● ●● ● ●●●● ●● ●● ●● ●●●●● ●●●● ●●●●● ●●● ●● ●●● ●●●● ● ●●●● ● ●●● ●●● ●●● ●●●

●● ●● ● ●● ● ●●●● ●● ●● ●● ●●●●● ●●●● ●●●●● ●●● ●● ●●● ●●●● ● ●●●● ● ●●● ●●● ●●● ●●●

● ●● ● ●●● ●●● ● ●●● ●● ●●●●● ●●● ● ● ●●● ●● ●●● ●● ●●● ●●● ●●● ● ●● ●●●●●●●● ●●● ● ●● ● ●●● ●● ●●● ● ● ● ●● ●● ●●●●●

Altered

Original

70% 80% 90% 100%

Relative SE, 100% = complete dataset

Alg

orith

m

Startingvalue

Mean

Zero

Relative SE

Figure : Evaluation of the Underhill index

10 / 23

1 Introduction

2 Testing the Underhill index

3 An alternative way of imputing

4 Testing multiple imputation using INLA

5 Conclusions

11 / 23

Requirements

• Choose model that doesn’t require starting values

• We choose a negative binomial GLMM

• Winter and month as fixed effect (factors)• Site as random intercept• Fitted in R (R Core Team 2014) with INLA (Rue et al. 2009)

• Take the uncertainty of predictions into account

• Sample from negative binomial distribution

• Size

• Sample from distribution of hyperparameter

• Mean

• Sample from gaussian distribution• Mean and SE of prediction on the link scale

12 / 23

Requirements

• Choose model that doesn’t require starting values

• We choose a negative binomial GLMM

• Winter and month as fixed effect (factors)• Site as random intercept• Fitted in R (R Core Team 2014) with INLA (Rue et al. 2009)

• Take the uncertainty of predictions into account

• Sample from negative binomial distribution

• Size

• Sample from distribution of hyperparameter

• Mean

• Sample from gaussian distribution• Mean and SE of prediction on the link scale

12 / 23

1 Introduction

2 Testing the Underhill index

3 An alternative way of imputing

4 Testing multiple imputation using INLA

5 Conclusions

13 / 23

Test setup

• Same test datasets as for testing Underhill-index• Fit INLA model to observed dataset• Generate M sets of imputed values• For each set m

• Calculate total per winter and month over all sites

• Model totals

• Save the regression parameters βi m and their SE σi m

• Aggregate over all M sets (Rubin 1987)

β̄i =1

M

M∑m=1

βi m

σ̄i =

√√√√ 1

M

M∑m=1

σ2i m +

M + 1

M

M∑m=1

(βi m − β̄i )2

M − 1

14 / 23

Test setup

• Same test datasets as for testing Underhill-index• Fit INLA model to observed dataset• Generate M sets of imputed values• For each set m

• Calculate total per winter and month over all sites

• Model totals

• Save the regression parameters βi m and their SE σi m

• Aggregate over all M sets (Rubin 1987)

β̄i =1

M

M∑m=1

βi m

σ̄i =

√√√√ 1

M

M∑m=1

σ2i m +

M + 1

M

M∑m=1

(βi m − β̄i )2

M − 1

14 / 23

Evaluation of multiple imputation

●●●●●● ●● ●●●●● ●●● ●●●● ●●● ● ●●● ●● ●● ●● ●●● ●●●●●● ●●●●●●●● ● ●● ●●● ●● ●●●●●●●● ●● ●● ●● ●● ●●●●●● ●●● ● ● ●●●● ●●●●● ●●● ●●●●●● ● ●●●●●●●● ●●●● ●●●●● ●●●●● ●●● ● ●●●●● ● ●●●●●●●●●● ●●● ●●● ● ●●●●● ●●●●●●●●●● ● ●●● ●●●● ●● ● ●● ●●●●● ●● ● ●● ●● ●● ●●●●●● ●●●●● ●● ●●● ● ●●●●● ●● ●●●●● ●●●● ●● ● ●●● ●●● ●●●● ●●● ●●●●● ●●● ●●● ●●●● ●●●●● ●● ●●● ●●●● ●●●● ●●●●●●● ●● ●●●●●●● ●●● ●●● ●● ●●●●● ●● ● ●●●●● ●● ●● ●●● ●● ●●● ●●● ●●●● ●●●●● ● ●● ●●●● ●●● ●●●● ●●● ●●●●●● ●●●●●●● ●●● ●●● ●● ●●● ●●● ● ●●● ● ●● ●● ●●●●●● ●● ●●●● ●●●● ●● ● ●● ●●● ●●● ●● ●● ●● ●● ●●●●● ●●● ●●●●● ● ●●●● ●●● ●●●● ●●● ●● ● ●● ● ●● ●●●● ●● ●●●●● ●●●● ● ●●●● ●● ●●●●●●● ●●● ●●●● ●● ● ●●●● ●●●● ●●● ●● ●●● ●● ●●●● ●●●●● ●●● ●●●●●●●● ●●●●● ●● ●●●● ●●●● ●●● ● ●●●●● ●●● ●●●●●●● ●●●●● ●●●●● ●● ●●● ●● ●●● ●●●● ●●●● ●●●● ●● ●●● ● ● ●●● ●●● ●●●●●● ● ●● ●● ●●●● ● ●●●● ●● ●● ●●● ●● ●●●● ●●●● ● ●●●● ●● ●●●●● ●●●●●● ●● ●●●●● ●● ●● ● ●●●● ●●●● ● ● ●●●●● ●●● ●●●● ●●●●● ●●●● ●●● ●●●● ●●● ●● ●●● ● ●● ●● ●●● ●● ●● ●●●●●●●● ●● ●● ●●●● ●●●●● ●●● ●●●●● ●●●● ●●●●● ●● ●●● ● ●●● ●● ●●● ●●●● ●● ●●● ●●● ● ●●●● ●● ● ●●● ●● ●● ● ●●●●●● ●●● ●● ●●● ●● ●● ●●●●● ●● ●●●●●●●●●●● ●● ●● ●●●●● ●● ●●●●● ●●●●● ●● ●● ●●●●●●● ●●● ●● ●●●●● ●●●● ●● ●●● ●●●● ●●●●● ●●●● ●● ●● ●● ●●●●●● ● ●●●● ●● ●● ●● ●● ●● ●●● ●●●● ●●● ●●● ●● ●●●● ●● ● ●● ●● ●●● ●●●●● ●●● ●●● ●●●●● ●●● ●●● ●● ●

●● ●● ●● ●● ●● ●● ●● ●●●●●● ●●●●● ● ●●●●●●●● ●●●●● ● ● ●● ●●●● ● ● ●●●● ●●●●●●● ●● ● ●● ●●●●● ● ●●●●● ●●● ● ●● ●●●●●● ●● ●●● ● ● ●● ●●●●●●●●● ● ●●●●●● ●●● ●● ●● ●●●● ●● ●● ●● ●●●● ● ●● ●● ●● ●●● ●● ●●●● ●●●● ●●● ●● ● ●● ●●● ●● ●● ●● ●●● ●●● ●● ●● ●●●●● ●●●●● ● ●● ●●● ● ●●● ●●●●● ●● ● ●●●● ● ●● ●●● ●● ●●● ● ●●●●●●●●●●● ●●● ●●●●● ●●●● ●●●● ●●● ● ●●●● ●● ● ● ●●●●●● ● ●●● ●● ●●●● ●● ●● ●●●●●● ●●●●●● ● ●●●●●● ●●●●● ●● ●● ●● ●●●●● ●●●●● ● ● ●●● ● ● ●●● ●● ●● ●●

● ● ●● ●●●● ● ●●● ● ●●● ●● ● ●● ●●●● ●●● ● ●● ●● ●● ●● ●●● ● ●● ● ●●● ●● ● ●● ●●● ●●● ●●● ●● ●● ●●● ●● ●●●● ●● ●● ●●● ●●● ●●● ● ●● ●●●●● ●●● ●● ●●●●

● ●● ●●● ● ● ●●● ●●● ● ●●●● ●●● ●●● ●● ●● ●●● ●●●● ● ●● ●● ● ● ●● ●● ●● ● ●● ●●● ●● ●● ●●● ●● ●●●● ●●●● ●● ●●● ●● ●●● ● ●● ●● ●● ●

●●● ● ●● ●● ● ●●● ●●● ● ● ●● ●● ●●● ●●● ●●● ● ●●●●● ●●●●●● ●● ●● ● ●●● ● ●● ●●●●● ● ●● ●●● ●● ●● ● ●● ●●● ●●●●●● ●● ● ● ●● ● ●

1%

5%

25%

50%

75%

60% 80% 100% 120% 140%

Relative bias, 100% = complete dataset

Pro

port

ion

of m

issi

ng d

ata

Relative bias

●●●●●● ●●●●●● ●●●●●●●●●●● ●●● ●● ●● ●●●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●● ●●●●● ●●●●●●● ●●●●●● ●●●● ● ●●●●● ●● ●●●●● ● ●●●●●●●● ●●●●●●●●●●● ● ●●●●●●●● ●●●●●● ●●●● ●●●●●●●●● ●●●●● ●●● ●●● ●●●●●●● ●●●●●●● ●●● ●● ●●●●●● ●●●● ●●●●●●●●● ●●●●●● ●●● ●●● ●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●●●●●●●●● ● ●●● ● ●●●●●●●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●● ●●●● ●●●●●● ●●●●●●●● ●●●●● ●● ●●●●●●●●●●● ●●● ●●●●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●● ●

● ●●● ●●●●● ●●●●● ●● ●●●●●● ●●●●●● ●●● ●●● ●●● ●●● ●●●●●●●●● ●● ● ●●●●● ●●●●●●● ●●● ●● ●●●●●●●● ●●●● ●●●●● ● ●●● ●● ● ●● ●●● ●● ●● ●● ●●●●●●● ● ●●●●●●●● ●●●● ●●● ●●●●●● ●●●●● ●●●●● ●●●● ● ●● ●●● ●● ●●●● ●●● ●●●●● ●● ●● ●●●●● ●●● ●● ●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ●●●●●●● ●●● ●●● ● ●● ●● ●●●● ●●●●

●● ●●● ●●● ● ●●● ●● ● ●● ●● ●●●●●● ●●● ●● ●●● ●●●● ● ● ●●●● ● ●●● ● ●● ●●●● ●●● ●●● ● ●●● ●● ●● ● ●● ●●● ●●●● ●●●●●● ●● ●●●●● ●●●● ●●● ●● ●●● ● ●● ●●● ●●●● ●● ●●● ●●●

●●●●● ●● ●● ● ●●● ● ●● ● ●●● ● ●● ●●●● ●● ●●● ●●●● ● ●●●●● ●●● ●●● ● ● ●●● ●● ●● ●● ●● ●●● ● ●●●● ●● ●●● ●● ●● ●●● ●● ● ●●●● ●● ●●●● ●●● ● ●● ● ●●● ●●●● ●● ●

●● ● ●●● ● ●●●●● ●● ●●● ● ●●●● ●●●●●●● ●● ● ●● ●●● ●● ●●● ●● ● ●● ●● ●●● ● ●●● ●● ●●● ●●● ●● ●●● ●●● ●●● ●●● ●●●● ● ●●● ●● ●●

1%

5%

25%

50%

75%

90% 110% 130%

Relative SE, 100% = complete dataset

Pro

port

ion

of m

issi

ng d

ata

Relative SE

Figure : Evaluation of multiple imputation

15 / 23

Effect of design

●●● ●● ●●● ●●●● ●● ●● ● ●●●●● ●●● ●● ●● ●● ●● ●● ● ●● ●●● ●● ● ●● ● ●●● ● ● ●● ●● ● ● ●●● ●● ●●●● ●●● ● ●● ●●● ●● ●●●● ●● ●● ●●● ●●●● ●●●●● ●● ●●● ●● ●●●

●● ●● ●●● ●● ●● ●●● ●●●● ●●●●● ● ●●● ● ● ●●● ● ●● ●●●●●● ● ●● ●●● ●● ●● ●● ●● ●●●● ●● ●●●● ●● ●●

●● ●●● ●●● ●●● ●● ● ●●● ●●● ●● ●● ●● ●● ● ●●● ●●●●●

● ●● ●● ●●● ● ●● ●● ●●●●● ●● ●●● ●●● ●●●● ●●● ● ●● ●●● ●●●● ● ●●●● ●●● ●●●● ● ●● ●●●● ● ●●● ● ●● ●●●●● ●●●● ●● ●● ● ●●

● ●● ●● ●● ●●● ●● ●●● ●●●●● ● ● ●● ●● ●●●● ●● ● ● ●●●●● ●●●●●● ● ●

●● ●● ●●● ●● ●● ● ● ●●● ●● ●

● ●●●● ●● ●●● ●● ●● ●● ●●● ●● ●● ●●● ●●●●● ●●●●● ●●●●● ●●●●● ● ●●● ●●● ●● ● ●● ●●●●● ●●

●● ●●●● ●●● ●●● ●● ●●● ●●● ●● ●● ●●●● ●●●● ●● ●●

●●● ● ●●●● ●●● ● ●●● ●● ●●●

●● ●●● ● ●●● ● ●● ● ●● ●● ●● ●● ●●● ●●● ● ●● ●●●● ●●● ●●● ●●● ●●● ●●●● ● ●● ●●● ● ●●● ●● ●●● ● ●

●●● ●●●● ●●●●● ●● ●● ●● ● ● ●●●● ●●●●● ●● ●●●

●●● ●●● ●● ●● ●●

10

20

40

80

75% 100% 125% 150%

Relative bias, 100% = complete dataset

Num

ber

of s

ites Winters

5

10

20

Relative bias

●● ●● ● ●●●●● ●●●● ●●●●● ●●● ●●●● ● ● ● ●● ●●● ● ●●●●● ●● ●● ● ●●● ● ●● ●●● ●● ●●●● ●● ●● ●●● ●● ●●● ●● ● ●● ●● ●● ●●●● ●●●● ●● ● ●●●● ●●●● ●

●● ●●● ●● ● ●●● ● ●● ● ●●●● ●●● ●● ● ●● ●● ● ●●● ●●● ●●●●●●● ●● ●●

●● ●● ●● ● ● ●●●● ●●● ●

●● ● ● ●●● ●●● ●●● ●●●● ● ●●●●●● ●● ●●● ●● ●●●● ●●●● ● ●●● ●● ● ●● ●●● ●● ●●●● ●●● ●● ●●●● ● ●● ●●●● ●● ●● ●●●●● ●●● ●● ●●● ●●●

●● ●●●●● ●●●●● ●● ● ● ● ●● ●●● ●● ●● ● ●● ● ●●●●● ●● ● ●●● ● ●● ●●●● ●● ●

●● ●● ●● ●●● ●● ● ●●● ●● ● ● ●●●●● ● ●

●● ●●●●●● ●●●● ●● ●● ●● ●●●●●● ●● ●●● ●●●● ●●● ●●●●● ● ●●●●●● ● ●●●●● ● ●●●●●●●● ● ●●●● ●●● ●● ● ●●●●● ●●● ●●●● ●● ● ●●●●● ●●● ●● ●●●●● ● ●● ●

●●● ●●●●● ●● ●●●●●● ●●●●●●●●● ●●●● ● ●

● ●● ●●●● ● ●●● ● ●●● ●●●●

●●●●●● ●● ●● ●● ●● ●●●● ● ●●● ● ●●●●●●● ●●● ●●●● ●●●● ●●

●●● ●●

●●●● ●●●●● ●●● ● ●●●●●●●● ●

10

20

40

80

80% 100% 120% 140% 160%

Relative SE, 100% = complete dataset

Num

ber

of s

ites Winters

5

10

20

Relative SE

Figure : Effect of design

16 / 23

Effect of model for imputation

● ● ●● ●●●● ● ●●● ● ●●● ●● ● ●● ●●●● ●●● ● ●● ●● ●● ●● ● ●● ● ●● ● ●●● ●● ● ●● ●●● ●●● ●●● ●● ●● ●●● ●● ●●●● ●● ●● ●●● ●●● ●● ● ● ●● ●●●●● ●●● ●● ●●●●

● ● ● ●●●● ● ●● ●● ● ●● ●● ●●● ●●●● ● ●●● ●● ● ●● ●● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ●●● ●●● ●● ●●● ●● ● ●● ● ● ● ●●● ● ●● ● ●●●● ●● ●● ●

● ●●●●●●● ●● ●● ● ●●●●● ● ● ● ● ●● ●● ●● ●●●● ●● ● ●● ●● ● ●● ●●●●● ●●●● ●● ● ●● ●● ●● ●●● ●●●● ● ●●● ●●●● ●● ●● ●● ●● ●● ●● ●● ● ●●●● ● ●●●● ●● ●●●● ●●●●●●● ●● ●● ●●● ●● ●●●●

Constantsite

Randomwalk

per site

Truemean

70% 80% 90% 100% 110% 120% 130%

Relative bias, 100% = complete dataset

Mod

el fo

r im

puta

tion

Relative bias

●● ●●● ●●● ● ●●● ●● ● ●● ●● ●● ●●●● ●● ● ●● ●● ● ●●●● ● ● ●●●● ● ●●● ● ●● ●●●● ●●● ●●● ● ●●● ●● ●● ● ●● ●●● ●●●● ●●●●●● ●● ●●●●● ●●●● ●●● ●● ●●● ● ●● ●● ● ●●●● ●● ●●● ●●●

●● ● ●● ●●● ● ●●● ●● ●●● ●●●●● ● ● ●● ●● ● ●●●● ● ● ●●●● ●●●● ●● ● ● ●● ●●● ●●● ● ● ● ●●●●● ●●●●● ●● ●● ●● ● ●● ●● ●● ●● ● ●●● ● ●●●● ●● ●●●● ●●●● ● ●●● ●● ●●●

●● ● ● ●● ●●● ●● ●● ●● ●●● ●●● ●● ●● ● ●● ● ●●●● ●●●● ●●●●● ●● ●● ●● ● ●● ●● ●● ●●● ● ●●● ●● ●● ●● ●● ●●● ● ●●●● ●● ●● ●● ●● ●● ●●●●●● ●●●●● ●●● ●●● ●● ●●●● ● ●●● ●●

Constantsite

Randomwalk

per site

Truemean

90% 100% 110% 120% 130%

Relative SE, 100% = complete dataset

Mod

el fo

r im

puta

tion

Relative SE

Figure : Effect of imputation model

17 / 23

Real life examples

●●

● ●

● ●

●●

●●

●● ●

●●

●●

●●

● ●

● ● ●

● ●●

●●

●●

●●

●● ●

● ●

●● ●

●●

● ●●

● ●●

Anser brachyrhynchus: 10% missing, 18 sites, 4 months Numenius arquata: 24% missing, 208 sites, 6 months

Anas platyrhynchos: 40% missing, 852 sites, 6 months Haematopus ostralegus: 42% missing, 270 sites, 6 months

0

10,000

20,000

0

2,000

4,000

6,000

8,000

0

25,000

50,000

75,000

100,000

0

5,000

10,000

15,000

20,000

1995 2000 2005 2010 2015 2000 2005 2010 2015

1995 2000 2005 2010 2015 1995 2000 2005 2010 2015

Winter

Win

ter

aver

age

Figure : Reallife examples

18 / 23

1 Introduction

2 Testing the Underhill index

3 An alternative way of imputing

4 Testing multiple imputation using INLA

5 Conclusions

19 / 23

Underhill-index

• Must use zero as starting value

• Otherwise biased upward

Underestimates standard errors• Incorrect Type I errors!

• Too optimistic

20 / 23

Multiple imputation

• Unbiased estimates

Increased standard errors• Imputation = more uncertainty

• Implies lower power• Sample size actual dataset < sample size complete dataset

• Increase of standard error depends on

1 Proportion of missing data

2 Size of dataset

3 Imputation model

• Marginal improvement with increased number of imputations

21 / 23

Questions?

ReferencesR Core Team. 2014. “R: A Language and Environment for StatisticalComputing.” Vienna, Austria: R Foundation for Statistical Computing.http://www.r-project.org/.Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. NewYork, NY: John Wiley & Sons, Ltd.Rue, H\aa vard, Sara Martino, Finn Lindgren, Daniel Simpson, and AndreaRiebler. 2009. INLA: Functions Which Allow to Perform Full BayesianAnalysis of Latent Gaussian Models Using Integrated Nested LaplaceApproximation.Underhill, L. G., and R. P. Prys-Jones. 1994. “Index Numbers for WaterbirdPopulations. I. Review and Methodology.” Journal of Applied Ecology 31(3): 463–480. doi:10.2307/2404443.

22 / 23

Bibliografie

23 / 23