Upload
ouida
View
63
Download
2
Embed Size (px)
DESCRIPTION
Estimating the correlation coefficient with censored data. Yanming Li 1 Brenda W. Gillespie 1 Kerby Shedden 1 John A. Gillespie 2 1. University of Michigan 2. University of Michigan Dearborn. Motivation. - PowerPoint PPT Presentation
Citation preview
of 10
Estimating the correlation coefficient Estimating the correlation coefficient with censored datawith censored data
2013 ASA-CSP 1
Yanming Li1 Brenda W. Gillespie1
Kerby Shedden1 John A. Gillespie2
1. University of Michigan 2. University of Michigan Dearborn
MotivationMotivation• A study on Belgian barn owls* aimed at investigating how chemical
concentrations of perfluoroalkyl substances (PFASs) and perfluorooctane sulfonate (PFOS) in tail feathers and other soft tissues are correlated.
• Statistical methods for censored data with user-friendly interfaces are needed to cope with levels below the limit of detection (LOD).
2013 ASA-CSP
Some examples of using our novel method and R package analyzing the Belgian barn owl data:
• Left panel: A scatter plot showing fully observed data, censored (left and interval censored) data, and partially missing data.
• Right panel: profile likelihood function for the correlation coefficient, showing the point estimate and the 95% confidence interval.
* Perfluoroalkyl substances in soft tissues and tail feathers of Belgian barn owls using statistical methods for left –censored data to handle non-detects. Veerle J. et al, Environment International 53(2013) 9-16.
of 102
OutlineOutline
• Estimating the correlation coefficient for bivariate Gaussian data with censoring or/and missing.
• Using parametric likelihood-based inference.
• Presenting an R package capable of handling different types of censoring (left, right, interval and mixtures of those types).
• Presenting ways of making scatterplots with censored bivariate data and graphing the profile likelihood function.
2013 ASA-CSP of 103
Censored Data: Their Likelihood, Maximum-Censored Data: Their Likelihood, Maximum-likelihood Estimation and Confidence Interval likelihood Estimation and Confidence Interval EstimationEstimation
2013 ASA-CSP
• Likelihood (for left censored data only, o=observed, c=censored) *
• Construct confidence interval via likelihood ratio tests
and a confidence interval with coverage probability is the set
complete data x censored, y observed x observed, y censored both x & y censored
log profile likelihood fixed at marginal maxima
chi-square critical value
* assumes missing completely at random and random censoring. of 104
The R package: ClikcorrCensored data likelihood based correlation inference
2013 ASA-CSP
• Input data format
One variableLower bounds Upper bounds
Observed 10.9 10.9Left censored NA 3.6
Right censored 16.7 NAInterval censored 7.8 13.4
Missing NA NA
• Output
• Syntax to run the main estimating function
Clikcorr(Data, "L1", "U1", "L2", "U2", cp=.95)
L1, U1: Lower and upper bounds for the 1st variableL2, U2: Lower and upper bounds for the 2nd variablecp: Coverage probability of the confidence interval
• Maximized likelihood estimate of the correlation coefficient
• Estimated bivariate variance covariance matrix
• Estimated means
• p-value for likelihood ratio test with null hypothesis r=0
• Lower bound of the CI
• Upper bound of the CI
• log likelihood value at MLEof 105
The R package: Graphics The R package: Graphics
Clikcorr.profilePlot(Data, "L1", "U1", "L2", "U2", cp=0.95)
Clikcorr.scatterPlot(Data, c("L1","L2","L3"), c("U1","U2","U3"))
of 106
Results From Simulated DataResults From Simulated Data
0% censored* 25% censored 75% censoredr=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95
n=50 0.938 0.962 0.946 0.938 0.946 0.968 0.958 0.954 0.956(0.956) (0.968) (0.954) -- -- -- -- -- --
n=200 0.944 0.948 0.942 0.954 0.960 0.948 0.972 0.964 0.960(0.948) (0.954) (0.950) -- -- -- -- -- --
n=500 0.932 0.934 0.952 0.946 0.948 0.964 0.952 0.944 0.964(0.938) (0.948) (0.954) -- -- -- -- -- --
Table 1: 95% Confidence interval coverage probabilities for different censoring proportions
Coverage probabilities are estimated from 500 replications.* Coverage probabilities in parentheses are calculated from Fisher transformation in the case of no censoring.
Censoring % X 0%; XY 30%; Y 0% X 15%; XY 15%; Y 15% X 30%; XY 0%; Y 30%r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95
n=50 36.15 50.46 58.65 22.02 25.39 31.98 1.16 1.41 1.78n=200 166.26 206.09 233.09 88.98 102.72 110.46 1.51 2.10 2.19n=500 267.11 514.87 727.94 175.78 275.37 374.27 1.78 2.38 3.50
Table 2: Run time (seconds) for different settings of r, n and censoring percentages
of 1072013 ASA-CSP
Results From Simulated DataResults From Simulated Data
2013 ASA-CSP
Table 3: Bias (MSE) for normally distributed detection limits, where data are simulated from an independent N(0,1) distribution
N(-2,1); avg. 25% censored N(0,1); avg. 50% censored N(2,1); avg. 75% censoredr=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95
n=50 0.006 -0.021 -0.018 0.031 0.007 0.008 -- -- --(0.019) (0.030) (0.053) (0.035) (0.064) (0.076) (--) (--) (--)
n=200 -0.005 -0.005 -0.026 0.020 0.003 -0.037 -0.017 0.053 0.021(0.008) (0.012) (0.014) (0.012) (0.010) (0.014) (0.024) (0.034) (0.029)
n=500 0.005 -0.008 -0.002 0.008 -0.001 -0.010 -0.006 -0.018 -0.010(0.002) (0.003) (0.005) (0.003) (0.006) (0.007) (0.010) (0.009) (0.013)
Bias and MSE are estimated from 50 replications.
of 108
Sensitivity to MisspecificationSensitivity to Misspecification
2013 ASA-CSP
n=50 n=200 n=500r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95 r=0.00 r=0.50 r=0.95
df=3 0.954 0.844 0.694 0.962 0.802 0.578 0.956 0.744 0.542df=5 0.946 0.914 0.828 0.936 0.912 0.818 0.958 0.884 0.792df=10 0.944 0.930 0.882 0.928 0.956 0.892 0.946 0.942 0.920df=20 0.948 0.916 0.930 0.944 0.944 0.922 0.964 0.938 0.944
Table 4: 95% confidence interval coverage probabilities of bivariate normal estimates for bivariate t generated data
• Coverage probabilities are estimated from 500 replications.
of 109
CSCAR at the University of MichiganCSCAR at the University of Michigan
The Center for Statistical Consultation and Research (CSCAR) provides support and training to University of Michigan researchers in a variety of areas relating to the management, collection, and analysis of data. CSCAR also supports the use of technical software and advanced computing in research.
Find us at: http://www.cscar.research.umich.edu/about/
2013 ASA-CSP
• Yanming Li, Graduate Student Research Assistant. [email protected]• Kerby Shedden, CSCAR Director. [email protected]• Brenda W. Gillespie, CSCAR Associate Director. [email protected]• John A. Gillespie, Professor of Mathematics and Statistics, University of Michigan Dearborn. [email protected]
of 1010