10
of 10 Estimating the correlation Estimating the correlation coefficient with censored data coefficient with censored data 2013 ASA-CSP 1 Yanming Li 1 Brenda W. Gillespie 1 Kerby Shedden 1 John A. Gillespie 2 1. University of Michigan 2. University of Michigan Dearborn

Estimating the correlation coefficient with censored data

  • Upload
    ouida

  • View
    63

  • Download
    2

Embed Size (px)

DESCRIPTION

Estimating the correlation coefficient with censored data. Yanming Li 1 Brenda W. Gillespie 1 Kerby Shedden 1 John A. Gillespie 2 1. University of Michigan 2. University of Michigan Dearborn. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Estimating the correlation coefficient with censored data

of 10

Estimating the correlation coefficient Estimating the correlation coefficient with censored datawith censored data

2013 ASA-CSP 1

Yanming Li1 Brenda W. Gillespie1

Kerby Shedden1 John A. Gillespie2

1. University of Michigan 2. University of Michigan Dearborn

Page 2: Estimating the correlation coefficient with censored data

MotivationMotivation• A study on Belgian barn owls* aimed at investigating how chemical

concentrations of perfluoroalkyl substances (PFASs) and perfluorooctane sulfonate (PFOS) in tail feathers and other soft tissues are correlated.

• Statistical methods for censored data with user-friendly interfaces are needed to cope with levels below the limit of detection (LOD).

2013 ASA-CSP

Some examples of using our novel method and R package analyzing the Belgian barn owl data:

• Left panel: A scatter plot showing fully observed data, censored (left and interval censored) data, and partially missing data.

• Right panel: profile likelihood function for the correlation coefficient, showing the point estimate and the 95% confidence interval.

* Perfluoroalkyl substances in soft tissues and tail feathers of Belgian barn owls using statistical methods for left –censored data to handle non-detects. Veerle J. et al, Environment International 53(2013) 9-16.

of 102

Page 3: Estimating the correlation coefficient with censored data

OutlineOutline

• Estimating the correlation coefficient for bivariate Gaussian data with censoring or/and missing.

• Using parametric likelihood-based inference.

• Presenting an R package capable of handling different types of censoring (left, right, interval and mixtures of those types).

• Presenting ways of making scatterplots with censored bivariate data and graphing the profile likelihood function.

2013 ASA-CSP of 103

Page 4: Estimating the correlation coefficient with censored data

Censored Data: Their Likelihood, Maximum-Censored Data: Their Likelihood, Maximum-likelihood Estimation and Confidence Interval likelihood Estimation and Confidence Interval EstimationEstimation

2013 ASA-CSP

• Likelihood (for left censored data only, o=observed, c=censored) *

• Construct confidence interval via likelihood ratio tests

and a confidence interval with coverage probability is the set

complete data x censored, y observed x observed, y censored both x & y censored

log profile likelihood fixed at marginal maxima

chi-square critical value

* assumes missing completely at random and random censoring. of 104

Page 5: Estimating the correlation coefficient with censored data

The R package: ClikcorrCensored data likelihood based correlation inference

2013 ASA-CSP

• Input data format

One variableLower bounds Upper bounds

Observed 10.9 10.9Left censored NA 3.6

Right censored 16.7 NAInterval censored 7.8 13.4

Missing NA NA

• Output

• Syntax to run the main estimating function

Clikcorr(Data, "L1", "U1", "L2", "U2", cp=.95)

L1, U1: Lower and upper bounds for the 1st variableL2, U2: Lower and upper bounds for the 2nd variablecp: Coverage probability of the confidence interval

• Maximized likelihood estimate of the correlation coefficient

• Estimated bivariate variance covariance matrix

• Estimated means

• p-value for likelihood ratio test with null hypothesis r=0

• Lower bound of the CI

• Upper bound of the CI

• log likelihood value at MLEof 105

Page 6: Estimating the correlation coefficient with censored data

The R package: Graphics The R package: Graphics

Clikcorr.profilePlot(Data, "L1", "U1", "L2", "U2", cp=0.95)

Clikcorr.scatterPlot(Data, c("L1","L2","L3"), c("U1","U2","U3"))

of 106

Page 7: Estimating the correlation coefficient with censored data

Results From Simulated DataResults From Simulated Data

0% censored* 25% censored 75% censoredr=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95

n=50 0.938 0.962 0.946 0.938 0.946 0.968 0.958 0.954 0.956(0.956) (0.968) (0.954) -- -- -- -- -- --

n=200 0.944 0.948 0.942 0.954 0.960 0.948 0.972 0.964 0.960(0.948) (0.954) (0.950) -- -- -- -- -- --

n=500 0.932 0.934 0.952 0.946 0.948 0.964 0.952 0.944 0.964(0.938) (0.948) (0.954) -- -- -- -- -- --

Table 1: 95% Confidence interval coverage probabilities for different censoring proportions

Coverage probabilities are estimated from 500 replications.* Coverage probabilities in parentheses are calculated from Fisher transformation in the case of no censoring.

Censoring % X 0%; XY 30%; Y 0% X 15%; XY 15%; Y 15% X 30%; XY 0%; Y 30%r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95

n=50 36.15 50.46 58.65 22.02 25.39 31.98 1.16 1.41 1.78n=200 166.26 206.09 233.09 88.98 102.72 110.46 1.51 2.10 2.19n=500 267.11 514.87 727.94 175.78 275.37 374.27 1.78 2.38 3.50

Table 2: Run time (seconds) for different settings of r, n and censoring percentages

of 1072013 ASA-CSP

Page 8: Estimating the correlation coefficient with censored data

Results From Simulated DataResults From Simulated Data

2013 ASA-CSP

Table 3: Bias (MSE) for normally distributed detection limits, where data are simulated from an independent N(0,1) distribution

N(-2,1); avg. 25% censored N(0,1); avg. 50% censored N(2,1); avg. 75% censoredr=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95

n=50 0.006 -0.021 -0.018 0.031 0.007 0.008 -- -- --(0.019) (0.030) (0.053) (0.035) (0.064) (0.076) (--) (--) (--)

n=200 -0.005 -0.005 -0.026 0.020 0.003 -0.037 -0.017 0.053 0.021(0.008) (0.012) (0.014) (0.012) (0.010) (0.014) (0.024) (0.034) (0.029)

n=500 0.005 -0.008 -0.002 0.008 -0.001 -0.010 -0.006 -0.018 -0.010(0.002) (0.003) (0.005) (0.003) (0.006) (0.007) (0.010) (0.009) (0.013)

Bias and MSE are estimated from 50 replications.

of 108

Page 9: Estimating the correlation coefficient with censored data

Sensitivity to MisspecificationSensitivity to Misspecification

2013 ASA-CSP

n=50 n=200 n=500r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95

df=3 0.954 0.844 0.694 0.962 0.802 0.578 0.956 0.744 0.542df=5 0.946 0.914 0.828 0.936 0.912 0.818 0.958 0.884 0.792df=10 0.944 0.930 0.882 0.928 0.956 0.892 0.946 0.942 0.920df=20 0.948 0.916 0.930 0.944 0.944 0.922 0.964 0.938 0.944

Table 4: 95% confidence interval coverage probabilities of bivariate normal estimates for bivariate t generated data

• Coverage probabilities are estimated from 500 replications.

of 109

Page 10: Estimating the correlation coefficient with censored data

CSCAR at the University of MichiganCSCAR at the University of Michigan

The Center for Statistical Consultation and Research (CSCAR) provides support and training to University of Michigan researchers in a variety of areas relating to the management, collection, and analysis of data. CSCAR also supports the use of technical software and advanced computing in research.

Find us at: http://www.cscar.research.umich.edu/about/

2013 ASA-CSP

• Yanming Li, Graduate Student Research Assistant. [email protected]• Kerby Shedden, CSCAR Director. [email protected]• Brenda W. Gillespie, CSCAR Associate Director. [email protected]• John A. Gillespie, Professor of Mathematics and Statistics, University of Michigan Dearborn. [email protected]

of 1010