Upload
stiven
View
213
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [University of New Mexico]On: 23 November 2014, At: 08:33Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Human and Ecological Risk Assessment:An International JournalPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/bher20
Estimation of the Upper ConfidenceLimit on the Mean of Datasets withCount-Based Concentration ValuesWilliam Brattin a , Timothy Barry b & Stiven Foster ca SRC, Inc. , Denver , CO , USAb U.S. Environmental Protection Agency, Office of Policy, Economics,and Innovation , Washington , DC , USAc U.S. Environmental Protection Agency , Office of Solid Waste andEmergency Response , Washington , DC , USAPublished online: 16 Mar 2012.
To cite this article: William Brattin , Timothy Barry & Stiven Foster (2012) Estimationof the Upper Confidence Limit on the Mean of Datasets with Count-Based ConcentrationValues, Human and Ecological Risk Assessment: An International Journal, 18:2, 435-455, DOI:10.1080/10807039.2012.652469
To link to this article: http://dx.doi.org/10.1080/10807039.2012.652469
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
Human and Ecological Risk Assessment, 18: 435–455, 2012Copyright C© Taylor & Francis Group, LLCISSN: 1080-7039 print / 1549-7860 onlineDOI: 10.1080/10807039.2012.652469
STATISTICAL MODELS AND METHODS ARTICLE
Estimation of the Upper Confidence Limiton the Mean of Datasets with Count-BasedConcentration Values
William Brattin,1 Timothy Barry,2 and Stiven Foster3
1SRC, Inc., Denver, CO, USA; 2U.S. Environmental Protection Agency, Officeof Policy, Economics, and Innovation, Washington, DC, USA; 3U.S. EnvironmentalProtection Agency, Office of Solid Waste and Emergency Response, Washington,DC, USA
ABSTRACTMathematical approaches are not well established for calculating the upper con-
fidence limit (UCL) of the mean of a set of concentration values that have beenmeasured using a count-based analytical approach such as is commonly used forasbestos in air. This is because the uncertainty around the sample mean is deter-mined not only by the authentic between-sample variation (sampling error), butalso by random Poisson variation that occurs in the measurement of sample concen-trations (measurement error). This report describes a computer-based application,referred to as CB-UCL, that supports the estimation of UCL values for asbestos andother count-based samples sets, with special attention to datasets with relatively smallnumbers of samples and relatively low counts (including datasets with all-zero countsamples). Evaluation of the performance of the application with a range of testdatasets indicates the application is useful for deriving UCL estimates for datasetsof this type.
Key Words: UCL, asbestos, CB-UCL.
Received 1 November 2010; revised manuscript accepted 7 January 2011.This project was sponsored by the U.S. Environmental Protection Agency, Office of SolidWaste and Emergency Response. SRC, Inc. is a contractor for USEPA. The authors declarethere are no specific potential competing financial interests as a consequence of employment.The opinions expressed in this report are not necessarily those of the U.S. EnvironmentalProtection Agency.Address correspondence to Stiven Foster, U.S. Environmental Protection Agency, Ariel RiosBuilding, 1200 Pennsylvania Avenue, N.W., Mail Code: 5103T, Washington, DC 20460, USA.E-mail: [email protected]
435
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
W. Brattin et al.
INTRODUCTION
Human health risk from exposure to most environmental contaminants is relatedto the average exposure concentration, which is generally estimated by collectinga set of samples that are representative over space and time, and computing themean of the sample set. However, because the sample mean is only an estimate ofthe true mean, and the true mean may be either higher or lower than the samplemean, the U.S. Environmental Protection Agency (USEPA) calls for use of the 95%upper confidence limit (UCL) of the sample mean rather than the sample meanitself when performing quantitative risk calculations (USEPA 1989). This approachensures that there is only a small probability (5%) that the exposure concentrationused in risk calculations will be lower than the true mean, which in turn helps limitthe likelihood of making a false negative risk management decision (deciding thatan exposure is safe when it is not).
The most appropriate statistical approach for calculating the UCL of a datasetdepends on the nature of the distribution from which samples are drawn. As sum-marized in USEPA (1992), approaches are well known for computing the UCL ofdatasets that are known or assumed to be normal (Nelson 1982) or lognormal (Land1971; Crow and Shimizu 1988). However, because not all datasets are well charac-terized by normal or lognormal distributions, the USEPA has developed a softwareapplication called ProUCL (USEPA 2007a) to assist risk assessors determine themost appropriate approach and to derive the most appropriate UCL value for use.In brief, the user provides a dataset of concentration values to the application, andProUCL estimates a number of UCL values for the data using a series of alternativestrategies. This includes fitting three parametric distributions (normal, lognormal,and gamma) to the data by the method of maximum likelihood estimation (MLE)and estimating the UCL from the MLE parameters, as well as several non-parametricapproaches that utilize the Chebychev inequality or various types of bootstrapping.From the set of alternative UCL estimates, the ProUCL application then identifieswhich result is recommended for use.
Unfortunately, ProUCL is not well suited for calculation of UCL values for an-alytes such as asbestos where concentration values are estimated using a count-based analytical method. In the case of asbestos in air (the exposure medium ofprimary health concern for asbestos), samples for analysis are obtained by draw-ing air through a filter, and then examining the filter under a microscope todetermine the number of asbestos structures on the filter. The concentration ofasbestos in the sample is estimated as C = Y /V in which C is the concentra-tion (asbestos counts per unit volume), Y is the observed fiber count, and Vis the volume of air that passed through the area of filter that was examined.Datasets derived by count-based analytical methods can be problematic for ProUCLbecause:
1. In count-based analyses, all samples with a count of zero are bone fide observations,and must be evaluated as such (Cameron et al . 2007; Haas et al . 1999; USEPA1999). However, datasets with one or more samples with a count of zero (and
436 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
UCL of Asbestos Datasets
hence a concentration value of zero) prohibit ProUCL from evaluating eithergamma or lognormal fits.
2. ProUCL assumes the data to be evaluated are drawn from a continuous dis-tribution. However, in the case of concentration values derived by count-basedmethods, observed concentrations are not continuous, but must take on discretevalues (0/V, 1/V, 2/V, etc.).
3. ProUCL assumes that the only source of variation among samples in a dataset isauthentic spatial or temporal variation in the true concentration of the samples(i.e., sampling variation). However, in the case of asbestos samples, an additionalsource of variation arises because each estimate of concentration is based onan observed count that is a Poisson random variable: Y (observed) ∼ Poisson[C(true) · V ].
The discrete nature of the “Poisson-filtered” dataset, as well as the increased vari-ance in the data due to random variation in the observed count (Poisson “countingerror”), combine to alter the mathematics by which the UCL is calculated. In spe-cial cases where there are no zero-count samples in the dataset and where thecoefficient of variation due to Poisson counting error is small compared to thecoefficient of variation in the underlying distribution, then the effect of Poissoncounting error is likely to have minimal effect, and the ProUCL application islikely to work approximately as well on asbestos datasets as on other datasets. How-ever, when samples with zero-counts are present and/or the coefficient of variationfrom analytical variability approaches or exceeds the coefficient of variation dueto sampling variability, the effect of Poisson counting error can generally not beignored.
Based on our experience at several Superfund1 sites where asbestos is an envi-ronmental contaminant of potential concern, most asbestos datasets for air exhibitrelatively low mean counts and contain at least some samples with zero counts, andthe coefficient of variation due to Poisson error is not small compared to the coeffi-cient of variation in the underlying sampling distribution. Consequently, the effectof Poisson measurement error can usually not be ignored, and a new approach forestimating the UCL of this type of asbestos dataset is needed.
To this end, the objective of our efforts was to develop and test a computer-based application for estimating UCL values for asbestos datasets, with special atten-tion to statistically “weak” datasets with small to moderate sample sizes and relativelylow mean counts, including datasets with all zero counts. This is a particularlychallenging objective, since estimating statistics for weak datasets is always inher-ently difficult. For asbestos datasets, this difficulty is compounded by the effect ofrandom Poisson counting error. In addition, we considered it essential that the
1Editor’s note: Superfund sites are uncontrolled hazardous waste sites in the United Statesthat are designated for remediation under the authorities of the Federal ComprehensiveEnvironmental Response, Compensation, and Liability Act, as amended (also called theSuperfund Act).
Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 437
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
W. Brattin et al.
computer-based application could be easily and reliably implemented by risk asses-sors without special expertise in statistics or computer programming.
DESCRIPTION OF THE APPLICATION
Overview
The application that we have developed implements an analysis strategy thatis generally similar to that used by ProUCL. That is, the application seeks to fita number of alternative models to the data and to determine the model that ismost consistent with the data. Then, the UCL is calculated based on that model.However, all model fitting procedures and UCL calculations include the effect ofrandom Poisson counting error. For this reason, we have called the application CB-UCL (Count-Based UCL). The main components of the CB-UCL application aredescribed below.
Data Input
The dataset that is provided to the application consists of a set of N paired fibercounts and volumes, {Yk,Vk} for k = 1,2, . . . ,N . As noted above, counts of zero arevalid and are entered as such, and Vk is the volume of air that passed through thearea of filter examined. Volume is usually expressed in units of cubic centimeters(cc), but may be expressed in any unit that is convenient. Fiber counts are usuallybased on structures that satisfy phase-contrast microscopy (PCM) or transmissionelectron microscopy PCM-Equivalent (PCME) counting rules, but may be based onany other counting rules of interest.
Module 1: Evaluate Data Adequacy
In Module 1, the dataset is evaluated to determine if the data are adequate tosupport fitting alternative distributions to the data. The factors that are consideredin this evaluation include (a) the number of independent samples in the datasetand (b) the number of samples that have one or more fibers detected. Although thechoice of what constitutes an “adequate” dataset is a matter of judgment, based onexperience gained during testing of various trial datasets, the criteria for proceedingto a data-fitting phase have been set to the following:
• Minimum dataset size = 5• Minimum number of samples with non-zero fiber counts = 3
If both of these criteria are satisfied, the data are then processed through imple-mentation of Module 2. If either or both criteria are not satisfied, Module 3 isimplemented.
438 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
UCL of Asbestos Datasets
Module 2: Data Fitting and UCL Estimation for Adequate Datasets
Module 2A: Fitting
Given a dataset that is adequate, the data are fit to four alternative parametricdistributions, as follows:
• Poisson (P)• Poisson-Exponential (PE)• Poisson-Gamma (PG)• Poisson-Lognormal (PLN)
The Poisson distribution is used to account for datasets in which the true con-centration values are constant or nearly so, and the only significant source ofvariation among sample results is random Poisson counting variation. This situa-tion is characterized by data in which the mean and variance are approximatelyequal.
For many environmental datasets, the variation in the count data count is oftenlarger than can be accounted for by ordinary Poisson variability. This is referredto as over-dispersion. In this event, it is common to model the count data as a Pois-son mixture in order to accommodate the larger observed variance. In a Poissonmixture, the Poisson parameter is, itself, a random variable. Some commonly usedmixing distributions include gamma, lognormal, inverse Gaussian, and general-ized inverse Gaussian. For a more complete discussion, see Johnson et al . (1992,1994).
In the CB-UCL application, the PE, PG, and PLN mixing distributions are in-cluded to deal with datasets that are over-disperse. These three different models,along with the Poisson model, allow the application to deal with a wide rangeof shapes and skewness in the data. Fitting of each model to the data is per-formed using the method of maximum likelihood estimation (MLE), as detailed inAppendix A.
Module 2B: Estimating the UCL
For datasets that are best characterized as Poisson, a conservative 95% UCL onthe mean is given by (Nelson 1982):
UCL = 12
∑k Vk
Inν χ 2(
0.95; 2∑
k
Yk + 2)
where Inν χ 2 is the inverse of the chi-squared distribution.For datasets that are best characterized by one of the three Poisson mixture
distributions (PE, PG, PLN), the UCL is estimated using a Bayesian ParametricBootstrapping (BPBS) approach. This procedure is detailed in Appendix A, and issummarized in Text Box 1, using the PLN model as an example.
Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 439
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
W. Brattin et al.
Text Box 1. Example BPBS Procedure for a Poisson-Lognormal Sample
a) Let {µ, σ }be the MLE parameter estimates derived by fitting a PLN model tothe data. Assume that uncertainty around these parameter estimates is similarto that of a simple lognormal distribution (Table 1).
b) Utilizing the assumed uncertainty distribution for {µ, σ }, draw a random valuefor each:
µ∗ ← µ+ σ√N
t(ν) σ 2∗ ← νσ 2
χ 2(ν)
where ν is the degrees of freedom (N -1).c) Draw a set of k = 1,2,..,N concentration values and a set of corresponding
random fiber counts
Ck ← LN (µ∗, σ ∗) followed by Yk ← Poisson(CkVk)
d) Fit the simulated dataset {Yk, Vk} to a PLN model by maximum likelihood,yielding the new parameter estimates{µ′, σ ′}. Estimate the mean from
mean = exp(
µ′ + 12σ ′ 2
)e) Repeat steps b-d many times1
f) Estimate the UCL as the 95th percentile of the empiric distribution of themeans
1Because of the long computational times, the number of iterations used in the CB-UCL applicationfor the PLN fit is only 100. For the Poisson-Exponential, 2,500 iterations are performed. For thePoisson-Exponential, 10,000 iterations are performed.
A similar approach as shown in Text Box 1 is used for the PE distribution, wherethe uncertainty around the MLE parameter estimate α is given by (Martz and Waller1982):
2N C/α ∼ χ 2(2N )
In the case of the gamma distribution, explicit small sample uncertainty distributionsare not available for the shape and scale parameters. In this case, we used Metropolis-Hastings/Markov Chain Monte Carlo (MCMC) simulation with vague priors toestimate the uncertainty in the mean (Bolstad 2010; Carlin and Louis 2000; Gelmanet al . 2004; Gilks et al . 1996; Ntzoufras 2009).
Given the UCL values for each model, the preferred model fit and associatedUCL is identified based on Akaike’s Information Criterion (AIC) (Akaike 1974):
AI C = −2L + 2P
where L = log-likelihood value and P = number of model parameters.In general, the UCL associated with the lowest AIC is preferred, although any
model with an AIC value that does not exceed the minimum AIC value by more than
440 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
Tab
le1.
Para
met
erun
cert
ain
tych
arac
teri
zati
ons.
Dis
trib
utio
nPa
ram
eter
sPa
ram
eter
Un
cert
ain
tyR
efer
ence
UC
LR
efer
ence
s
Pois
son
λ(P
oiss
onra
te)
2λV∼
χ2(2
Y+
1)N
elso
n(1
982)
;Mar
tzan
dW
alle
r(1
982)
UC
L=
12
∑ V kIn
νχ
2
(0.9
5,2
∑ Y k+
2)N
elso
n(1
982)
Log
nor
mal
µ(l
n-m
ean
)µ−µ
σ/√ N∼
t(N−
1)B
oxan
dT
iao
(199
2)U
CL=
exp(
µ+
0.5σ
2
+H
σ/√ N−
1)L
and
(197
1)
σ(l
n-s
tdev
)(N−1
)σ2
σ2∼
χ2(N−
1)H=
f(σ
,N);
see
USE
PA19
92fo
ra
tabl
eof
valu
esG
amm
aα
(sh
ape)
From
MC
MC
wit
hfl
atva
gue
prio
rsG
elm
anet
al.(
2004
);U
CL=
2Nα
bcC
/
Inνχ
2(0
.05,2N
αbc
)G
rice
and
Bai
n(1
980)
;
β(s
cale
)N
tzou
fras
(200
9)α
bc=
(N−
3)α
MLE/
N+
2/(3
N)
USE
PA(2
007b
)
Exp
onen
tial
α(m
ean
)2N
Cα∼
χ2(2
N)
Nel
son
(198
2);M
artz
and
Wal
ler
(198
2)U
CL=
2NC
/
Inνχ
2(0
.05,
2N)
Nel
son
(198
2)
441
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
W. Brattin et al.
about 2 may also be considered as an acceptable option (Burnham and Anderson2002).
Module 3: Evaluation of “Weak” Datasets
If the dataset is not adequate to support meaningful analysis using Module 2,the data are evaluated using Module 3. Note that this includes datasets that are all“non-detect.” As noted earlier, zero count samples are not true “non-detects,” butare bona fide samples.
Because the data are not sufficient to support any meaningful evaluation ofdistributional form, it is necessary to make some assumptions in order to proceed.In order to be conservative, this module assumes the underlying distribution isnot more strongly skewed than a lognormal distribution, and that the geometricstandard deviation (GSD) of the assumed lognormal distribution is not larger thansome specified value (GSDmax). Given these assumptions, Module 3 finds the highestvalue of the log-mean that has a no more than a 5% probability of generating adataset with more counts that the number observed in the dataset being evaluated.This is done using Monte Carlo simulation, as described in Text Box 2. If theassumptions used to calculate the CUB are conservative, then the UCL is less thanthe CUB.
Text Box 2. Evaluation of Weak Datasets
a) Given a dataset of counts and volumes, {Yk,Vk} k = 1,2, . . . N , determine thetotal observed fiber count, Y0 = �Yk
b) Specify a plausible maximum geometric standard deviation, GSDmax, and astarting log-mean, µ. Calculate σ max = ln(GSDmax).
c) Draw a random set of N random concentration values Ck, k = 1,2, . . . ,N fromLN(µ,σ max).
d) Draw an observed count for each sample: Yk← Poisson(Ck · Vk)e) Calculate the sum of the counts: YT = �Yk
f) Repeat steps b-e many times1, and find the frequency that YT exceeds theobserved count Y0. If the frequency is less than 5%, increase the assumedvalue of µ until the frequency equals 5%. If the observed frequency exceeds5%, decrease the assumed value of µ until the frequency equals 5%.
g) After finding the value of µ that yields a 5% probability of having total countsequal to the dataset being evaluated, calculate a conservative upper bound(CUB) on the mean from the parameters: CUB = exp(µ5% + 0.5σ max
2).
1For the CB-UCL application described here, 50,000 iterations are employed for initial bracketing,and 100,000 iterations are used for final calculations.
EVALUATION OF CB-UCL PERFORMANCE
Ideally, the results of the CB-UCL application would be evaluated by comparingthe UCL value generated by the application to the “true” UCL value for a series of
442 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
UCL of Asbestos Datasets
alternative datasets. However, in this case, we are not aware of any way to calculate the“true” value of the UCL for a Poisson mixture dataset. Consequently, performanceof the application was evaluated as described below.
Evaluation of Module 2
In order to evaluate the performance of Module 2, a series of “synthetic” testdatasets were generated from a number of specified “truths,” and the UCL returnedby the application was compared to the UCL that would have been obtained in theabsence of Poisson counting error. The step-wise procedure is summarized in TextBox 3.
Text Box 3. Process for Evaluation of Module 2
a) Specify a “true” distributional form (exponential, gamma, lognormal) for theunderlying concentration values, and specify the “true” parameters of theconcentration distribution.
b) Specify a sample size N .c) Draw a concentration dataset {C1, C2, . . . CN } from the specified distributiond) Calculate the “true” UCL for that dataset, using the appropriate equation for
the distributional form (Table 1)e) Divide the “true” UCL by the true mean to generate a UCL ratio, Rtrue
f) Specify the volume analyzed for each sample {V1, V2, . . . VN). For most tests,all volumes in a dataset were identical.
g) Draw an “observed” count for each sample: Yk← Poisson(Ck · Vk)h) Provide the simulated dataset {Yk, Vk} to the application and obtain the UCL
estimatei) Divide the UCL returned by the application by the true mean to yield a UCL
ratio, Rapp
j) Repeat steps c-i many times (e.g., 400–2000), and generate an empiric cumu-lative distribution function (cdf) for both Rtrue and Rapp.
k) Repeat steps a-j for a series of alternative “truths”, selecting differing distri-butional forms, parameters, sample sizes, volumes, etc.
For each specified “truth,” the performance of the application was evaluated bycomparing the cdf for Rapp to that for Rtrue. Ideal behavior includes the following:
a) Coverage. In the absence of Poisson error, the cdf for Rtrue is expected to have 95%coverage (95% of all ratio values are ≥1.0). Ideally, the cdf for Rapp will also haveapproximately 95% coverage in all cases.
b) Effect of Sample Size. A priori, it is expected that, as the size of the dataset decreases,the cdf of both Rapp to Rtrue should tend to tip to the right, maintaining about 5%of the values equal to or less than 1.
c) Effect of Volume. A priori, it is expected that, as the average volume analyzed de-creases, the average count will decrease, and the relative contribution of Poisson
Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 443
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
W. Brattin et al.
N = 40
N = 20
N = 10
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUE
Ybar = 20
Ybar = 2
Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUE
Ybar = 20
Ybar = 2
Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUE
Ybar = 20
Ybar = 2
Ybar = 0.2
Figure 1. Evaluation of Poisson-exponential truths (mean = 0.02). (Color figureavailable online.)
measurement error will tend to increase. Thus, the cdf for Rapp should tend tobe tipped further to the right compared to the cdf for Rtrue as volume tends todecrease.
444 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
UCL of Asbestos Datasets
Panel A: CV = 2.0, Shape = 0.25, Scale = 0.08 Panel B: CV = 0.9, Shape = 1.235, Scale = 0.016
04 = N04 = N
02 = N02 = N
01 = N01 = N
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUEYbar = 20Ybar = 2Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUEYbar = 20Ybar = 2Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUEYbar = 20Ybar = 2Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUEYbar = 20Ybar = 2Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUEYbar = 20Ybar = 2Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUEYbar = 20
Ybar = 2Ybar = 0.2
Figure 2. Evaluation of Poisson-Gamma truths (mean= 0.02). (Color figure avail-able online.)
The results of this evaluation approach are shown in Figure 1 (Poisson-Exponentialmodel), Figure 2 (Poisson-Gamma model), and Figure 3 (Poisson-Lognormalmodel). In all cases, the true mean of each underlying distribution is set to 0.02. Thisvalue is arbitrary, and identical results would be obtained for other means, assumingthat the ratio of mean and analytical sensitivity (the product of mean times volume)remains constant.
In each figure, results are shown for three different sample sizes (40, 20, and 10).For PG (Figure 2) and PLN (Figure 3), results are shown for two differing degreesof skewness. In each panel, the cdf for Rtrue is shown by a solid red line labeledTRUE. This line reflects the UCL values that would have been obtained from theseries of datasets generated under conditions where Poisson error is absent. Eachof the other cdfs in each panel reflects the UCL ratios that were generated usingthe CB-UCL application (Rapp), with each cdf reflecting the UCLs for dataset with
Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 445
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
W. Brattin et al.
17.7 = DSG ,8 = VC :B lenaP83.5 = DSG ,4 = VC :A lenaP04 = N04 = N
02 = N02 = N
01 = N01 = N
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100 1000 10000
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUE
Ybar = 20
Ybar = 2
Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100 1000 10000
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUE
Ybar = 20
Ybar = 2
Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100 1000 10000
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUE
Ybar = 20
Ybar = 2
Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100 1000 10000
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUE
Ybar = 20
Ybar = 2
Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100 1000 10000
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUE
Ybar = 20
Ybar = 2
Ybar = 0.2
0.00.10.20.30.40.50.60.70.80.91.0
0.1 1 10 100 1000 10000
Cum
ulat
ive
Pro
babi
lity
Ratio (UCL / True Mean)
TRUE
Ybar = 20
Ybar = 2
Ybar = 0.2
Figure 3. Evaluation of Poisson-Lognormal truths (mean = 0.02). (Color figureavailable online.)
three different volumes analyzed (1,000, 100, or 10), resulting in expected averagecounts (Ybar) of 20, 2, or 0.2.
Inspection of these figures reveals the following main observations:
• As expected, decreasing sample size tends to increase uncertainty, with the cdfsfor both Rtrue and Rapp tending to become tipped more and more to the rightas sample size decreases.• As expected, the cdf for Rtrue is typically the left-most and steepest of each
group. This is because the UCLtrue values are based on datasets that includeonly sampling variability, and do not include Poisson uncertainty. When Pois-son uncertainty is included, in cases of high average count, the Rapp cdf mayapproach the Rtrue line quite closely. This is because Poisson uncertainty tendsto become less important as average count increases. However, as mean count
446 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
UCL of Asbestos Datasets
decreases, the effect of Poisson uncertainty increases, and the Rapp cdfs tipfurther and further to the right.• In some cases, the low end of the Rapp cdfs also tend to shift somewhat to the
right, resulting in an increase in coverage from the target of 95% to somehigher value. However, the degree to which the low end is right-shifted isusually relatively minor, which indicates that this application does not tend togenerate UCLs that are unduly conservative.• For gamma and lognormal distributions, increasing skewness (expressed as
CV = stdev/mean) tends to increase uncertainty and shift cdfs for both Rtrue
and Rapp to the right. This is expected because increasing skewness increasesthe uncertainty due to random sampling variation.
Evaluation of Module 3
In Figure 4 are presented conservative upper bound (CUB) values calculatedusing Module 3 for a number of datasets with differing sample sizes, volumes (allsamples were assumed to have the same volume), total counts, and GSDmax values.Results are expressed as a concentration value (asbestos structures per unit volume)rather than a ratio of the CUB to the true mean because each test dataset is consistentwith a range of alternative “truths.” Inspection of Figure 4 reveals the following mainpoints:
• CUB values decrease with increasing sample size• CUB values decrease with increasing sample volume analyzed• CUB values increase with increasing total count• CUB values increase with increasing GSDmax
DISCUSSION
Other Approaches Tested
The method that is implemented in the application described earlier was selectedonly after testing a range of alternative strategies for estimating the UCL of count-based datasets. Other methods that were evaluated but not selected are summarizedin Table 2. While most approaches yielded reasonable results when presented withrobust datasets (large sample size, high average counts, high detection frequency),these methods did not perform well when presented with relative weak datasets (lowsample size, low average count, low detection frequency). In some cases, the meth-ods became mathematically unstable when the data were weak, and considerableexpertise and programming skill was needed to obtain solutions. In other cases, theresults (evaluated using the same approach as described above) were inferior to thecurrent application, with cdfs of Rapp that did not have the desired attributes whencompared to Rtrue.
Other Distributions Tested
The four distributions included in this application were selected only after evalua-tion of a wider range of mixing distributions, including two unbounded distributions
Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 447
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
Tab
le2.
Sum
mar
yof
alte
rnat
ive
stra
tegi
esin
vest
igat
ed.
Met
hod
Nam
eD
escr
ipti
onC
omm
ents
1B
ayes
MC
MC
,im
plem
ente
dus
ing
Win
BU
Gsa
Spec
ify
prio
rsfo
rm
odel
para
met
ers.
Com
bin
epr
iors
wit
hda
ta(a
seto
fYk
and
Vk
pair
s)us
ing
Bay
esM
CM
Cto
esti
mat
epo
ster
iors
onm
odel
para
met
ers.
For
each
acce
pted
pair
inth
epo
ster
ior,
calc
ulat
em
ean
from
para
met
ers.
UC
L=
95th
ofm
ean
s
Wor
ksw
ellf
orst
ron
gda
tase
ts.F
orw
eak
data
sets
,re
sult
sbe
com
est
ron
gly
depe
nde
nto
npr
iors
(can
getn
earl
yan
yre
sult
desi
red
byal
teri
ng
form
orpa
ram
eter
sof
prio
rs).
Req
uire
sfa
mili
arit
yw
ith
Win
BU
GS
and
con
side
rabl
epr
ofes
sion
alju
dgm
entt
oen
sure
outp
utis
relia
ble.
2B
ayes
MC
MC
,im
plem
ente
dus
ing
Cco
de
Sim
ilar
toM
eth
od1,
exce
ptth
eca
lcul
atio
ns
are
perf
orm
edus
ing
anex
ecut
able
Cpr
ogra
mso
expe
rtis
ew
ith
Win
BU
Gs
isn
otn
eede
d.T
he
prio
rsar
eas
sum
edto
beco
mpl
etel
yun
info
rmed
.
Wor
ksw
ellf
orst
ron
gda
tase
ts.F
orw
eak
data
sets
,th
epr
ogra
mm
ayex
peri
ence
num
eric
prob
lem
san
dU
CL
valu
este
nd
tobe
very
hig
h.O
btai
nin
gso
luti
ons
requ
ires
expe
rtju
dgm
enta
nd
con
side
rabl
esk
illin
mod
ifyi
ng
the
Cco
de.
3M
LE
para
met
eres
tim
atio
nFi
tdat
aset
ofC
k=
Y k/
V kpa
irs
toan
assu
med
dist
ribu
tion
(e.g
.,L
N)
usin
gM
LE
.Est
imat
eth
eU
CL
from
the
resu
ltin
gM
LE
para
met
eres
tim
ates
usin
gth
eap
prop
riat
eeq
uati
on(T
able
1).
Ten
dsto
over
-est
imat
eU
CL
atlo
wco
unts
,un
dere
stim
ate
UC
Lat
hig
hco
unts
.
4In
depe
nde
ntP
oiss
onSa
mpl
esTr
eate
ach
sam
ple
asun
cert
ain
,wit
h2C
kVk∼
χ2(2
Y k+
1).D
raw
sim
ulat
edse
tsof
Ck
valu
esfr
omch
isqu
are
pdfs
,an
dca
lcul
ate
the
mea
n.U
CL=
95th
perc
enti
leof
the
empi
ric
mea
ns
Acc
oun
tsfo
rPo
isso
nun
cert
ain
tyon
ly,a
nd
does
not
incl
ude
vari
abili
ty(o
verd
ispe
rsio
n)
betw
een
the
sam
ples
.
5Fi
tted
Ch
iSqu
are
Trea
teac
hsa
mpl
eas
unce
rtai
n,w
ith
2CkV
k∼
χ2(2
Y k+
1).D
raw
sim
ulat
edse
tsof
Ck,
fit
toas
sum
eddi
stri
buti
on(e
.g.,
LN
),ca
lcul
ate
UC
Lfr
omth
eM
LE
para
met
ers
usin
gth
eap
prop
riat
eeq
uati
on(T
able
1).T
ake
som
epe
rcen
tile
ofth
eU
CL
dist
ribu
tion
asth
eU
CL
poin
test
imat
e.
Th
eop
tim
alpe
rcen
tile
depe
nds
onav
erag
eco
unt.
For
hig
hav
erag
eco
unt,
the
opti
mum
isn
ear
the
50th
perc
enti
le.A
sav
erag
eco
untd
ecre
ases
,th
eop
tim
umpe
rcen
tile
decr
ease
s,ap
proa
chin
gth
e5t
hpe
rcen
tile
inso
me
case
s.
a Win
BU
GS
isa
free
war
eap
plic
atio
nth
atpr
ovid
esan
inte
ract
ive
Win
dow
sve
rsio
nof
the
BU
GS
prog
ram
for
Bay
esia
nan
alys
isof
com
plex
stat
isti
calm
odel
sus
ing
Mar
kov
chai
nM
onte
Car
lo(M
CM
C)
tech
niq
ues.
Ava
ilabl
efr
omh
ttp:
//w
ww.
mrc
-bsu
.cam
.ac.
uk/b
ugs/
win
bugs
/co
nte
nts
.sh
tml
448
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
UCL of Asbestos Datasets
02 = DSG :B lenaP01 = DSG :A lenaP01 = V01 = V
001 = V001 = V
0001 = V0001 = V
0.0001
0.001
0.01
0.1
1
10
0 2 4 6 8
CU
B (s
/cc)
Y(total)
N = 10N = 20N = 40
0.0001
0.001
0.01
0.1
1
10
0 2 4 6 8
CU
B (s
/cc)
Y(total)
N = 10N = 20N = 40
0.0001
0.001
0.01
0.1
1
10
0 2 4 6 8
CU
B (s
/cc)
Y(total)
N = 10N = 20N = 40
0.0001
0.001
0.01
0.1
1
10
0 2 4 6 8
CU
B (s
/cc)
Y(total)
N = 10N = 20N = 40
0.0001
0.001
0.01
0.1
1
10
0 2 4 6 8
CU
B (s
/cc)
Y(total)
N = 10N = 20N = 40
0.0001
0.001
0.01
0.1
1
10
0 2 4 6 8
CU
B (s
/cc)
Y(total)
N = 10N = 20N = 40
Figure 4. Conservative upper bound (CUB) values for datasets with fewer thanthree detects.
(inverse Gaussian and generalized inverse Gaussian) and two bounded mixing distribu-tions (beta and Kumaraswamy). After substantial testing, we concluded that it was notnecessary to include all of these alternatives in the application, because the datasetsprovided could usually be well fit by one or more of the four that were selected, andthat no meaningful increase in fit quality was achieved with the other distributions.This is perhaps expected for relatively weak datasets, since the data usually are notsufficient to provide any strong distinction between similarly shaped distributions.
Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 449
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
W. Brattin et al.
We did not evaluate a Poisson-Normal distribution because this distribution allowsconcentrations values to take on negative values.
Performance of Module 2
The results presented in Figures 1 to 3 demonstrate that Module 2 performsreasonably well, yielding UCL cdfs that approach the theoretical distribution (Rtrue)when datasets are robust, and which reflect increasing uncertainty as the effect ofPoisson counting error increases. However, it is important to note that Module 2is based on approximate methods, and that UCL values do not always have idealcharacteristics. For example, the ideal coverage for Rapp is exactly 95%, but in manycases, the actual coverage increases to 100% (i.e., no UCL value is lower than thetrue mean). For practical purposes, this is usually not a significant limitation, just solong as the low end of the Rapp cdf is not too far right-shifted from Rtrue (5% at Rtrue =1.0). Otherwise, the UCLs become overly conservative, which increases the risk offalse-positive decision-making.
Performance of Module 3
Module 3 assumes that the underlying distribution of concentration values is notmore skewed than a lognormal, and has a GSD value that does not exceed GSDmax.Ideally, the choice of GSDmax will be based on site-specific experience with othersimilar datasets where GSD values for PLN fits could be estimated. In cases whereno site-specific experience is available, extrapolation of results and experience fromother sites may be appropriate. In the absence of any frame of reference, the choiceof GSDmax must be based entirely on judgment. In this case, we suggest that GSDmax
values of 10 and 20 be investigated, based on our expectation that most datasets areunlikely to be more skewed than this. However, the final choice must be made bysite risk assessors and risk managers.
In cases where the CUB for a dataset is at or greater than a level of human healthconcern, it may be helpful to either collect more samples or to extend the analysis ofexisting samples to achieve a lower analytical sensitivity (a higher volume) in orderto increase the accuracy of the UCL estimate. In cases where the CUB is less than alevel of human health concern, further effort to improve on the UCL estimate maynot be required.
Samples Analyzed to Constant Count
It is important to recognize that the CB-UCL application assumes that the volumeof air analyzed for a sample (Vk) is either a constant or a random variable, andthat volume analyzed is not determined by concentration. This assumption will betrue if all samples are analyzed to some constant target sensitivity, or if the datasetis comprised of samples that have been analyzed to random target sensitivities.However, if samples are analyzed to a constant count, the mathematics of the currentapplication are not appropriate, and this application should not be used for datasetsthat have a substantial fraction (e.g., >10–20%) of samples that were counted to aspecified number of counts.
450 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
UCL of Asbestos Datasets
Application to Other Count-Based Data Types
The approach we have developed is applicable not only to asbestos (the mainfocus of this report), but also to other types of data in which results are subject tosignificant Poisson counting variation (e.g., low levels of microorganisms in water).
OBTAINING AND USING THE APPLICATION
The CB-UCL application and a User’s Guide may be obtained by submittinga request by e-mail to: [email protected]. In the subject line, please enter thefollowing: “Request CB-UCL.”
In brief, the User’s Guide describes how to enter datasets into input files, andhow to provide each data input file to the application for processing. The results ofeach dataset are provided in a separate output file, using a file name specified bythe user. For datasets that are evaluated using Module 2, the output file includes theMLE parameters for each of the alternative distributions and the UCL associatedwith each distribution, rank ordered by AIC. For datasets evaluated using Module 3,the output includes the value of µ5% and the CUB.
REFERENCES
Akaike H. 1974. A new look at the statistical model identification. IEEE Transactions onAutomatic Control 19 (6):716–23
Bolstad WM. 2010. Understanding Computational Bayesian Statistics. John Wiley and Sons,New York, NY, USA
Box G and Tiao GC. 1992. Bayesian Inference in Statistical Analysis. John Wiley & Sons, Inc,New York, NY, USA
Burnham KP and Anderson DR. 2002. Model Selection and Multimodal Inference. Springer,New York, NY, USA
Cameron A, Colin T, and Pravin K. 2007. Regression Analysis of Count Data. CambridgeUniversity Press. New York, NY, USA
Carlin BP and Louis TA. 2000. Bayes and Empirical Bayes Methods for Data Analysis. Chapmanand Hall/CRC, New York, NY, USA
Crow EL and Shimizu K. 1988. Lognormal Distributions. Theory and Applications. MarcelDekker, New York, NY, USA
Gelman A, Carlin JB, Stern HS, et al. 2004. Bayesian Data Analysis. Chapman and Hall/CRCPress, New York, NY, USA
Gilks WR, Richardson S, and Spiegelhalter DJ. 1996. Markov Chain Monte Carlo in Practice.Chapman and Hall, New York, NY, USA
Grice JV and Bain LJ. 1980. Inferences concerning the mean of the gamma distribution. JAm Stat Assoc 75(372): 929–33
Haas CN, Rose JB, and Gerba CP. 1999. Quantitative Microbial Risk Assessment. John Wiley& Sons, New York, NY, USA
Haight FA. 1967. Handbook of the Poisson Distribution. John Wiley & Sons, Inc, New York,NY, USA
Izsak RR. 2007. Maximum likelihood fitting of the Poisson lognormal distribution. EnvironEcol Stat 15:143–56
Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 451
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
W. Brattin et al.
Johnson NL, Kotz S, and Kemp A. 1992. Univariate Discrete Distributions. John Wiley & Sons,Inc, New York, NY, USA
Johnson NL, Kotz S, and Balakrishnan N. 1994. Continuous Univariate Distributions, vol 1.John Wiley and Sons, New York, NY, USA
Land CE. 1971. Confidence intervals for linear functions of the normal mean and variance.Annals Math Stat 42(4):1187–205
Martz HF and Waller RA. 1982. Bayesian Reliability Analysis. John Wiley & Sons, Inc, NewYork, NY, USA
Nelson WB. 1982. Applied Life Data Analysis. Wiley Interscience, Hoboken, NJ, USANtzoufras I. 2009. Bayesian Modeling Using WinBUGS. John Wiley & Sons, Inc, New York,
NY, USAPawitan Y. 2001. In All Likelihood: Statistical Modeling and Inference Using Likelihood.
Oxford Press, Oxford, UKUSEPA (US Environmental Protection Agency). 1989. Risk Assessment Guidance for Su-
perfund (RAGS), vol I. Human Health Evaluation Manual (Part A). EPA/540/1-89/002.Office of Solid Waste and Emergency Response, Washington, DC, USA. Publication. De-cember. Available at http://www.epa.gov/oswer/riskassessment/ragsa/
USEPA. 1992. Supplemental Guidance to RAGS: Calculating the Concentration Term.Office of Solid Waste and Emergency Response, Washington, DC, USA. Publica-tion 9285.7-081. May. Available at http://www.deq.state.or.us/lq/pubs/forms/tanks/UCLsEPASupGuidance.pdf
USEPA. 1999. M/DBP Stakeholder Meeting Statistics Workshop Meeting Summary: Novem-ber 19, 1998, Governor’s House, Washington DC. Final. Report prepared for U.S. Environ-mental Protection Agency, Office of Ground Water and Drinking Water by RESOLVE,Washington, DC, and SAIC, McLean, VA. EPA Contract No. 68-C6-0059. Available athttp://water.epa.gov/lawsregs/rulesregs/sdwa/mdbp/st2nov98.cfm
USEPA. 2007a. ProUCL Version 4.00.02. Users Guide. Prepared for Office of Research andDevelopment, by Lockheed Martin Environmental Services, Las Vegas, NV. PublicationEPA/600/R-07/038. Available at http://www.epa.gov/esd/tsc/TSC form.htm
USEPA. 2007b. ProUCL Version 4.0. Technical Guide. Prepared for Office of Research andDevelopment, by Lockheed Martin Environmental Services, Las Vegas, NV. PublicationEPA/600/R-07/041. April 2007. Available at http://www.epa.gov/esd/tsc/ProUCL v4.00.02/ProUCL v4.0 Tech Guide.pdf
APPENDIX A: MAXIMUM LIKELIHOOD ESTIMATION (MLE) FITTINGOF MODELS TO DATASETS
Poisson Mixtures (Compound Poisson Distributions)
A Poisson mixture model is used to describe a Poisson process in which thePoisson parameter is itself randomly distributed (Haight 1967; Johnson et al . 1992;Haas et al . 1999). A Poisson mixture may be defined by the general probability massfunction (pmf )
p M (y ; V , θ) =∞∫
0
(CV )y exp(−CV )y !
dM(C ; θ)
in which y is the fiber count in a volume V , C is the mean mass density (concentra-tion), and M is the mixing distribution with parameters θ characterizing the natural
452 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
UCL of Asbestos Datasets
heterogeneity in fiber concentrations. For some mixing distributions, the above in-tegral can be evaluated in terms of known functions; for others, the integral must benumerically evaluated. In the following, we summarize the Poisson mixtures used inthis analysis.
Poisson-Gamma Mixtures
If C follows a gamma distribution with scale parameter α and shape parameter β,C ∼ Gamma(α,β), the Poisson-Gamma compound pmf is:
p(y) =∞∫
0
(CV )y exp(−CV )y !
{(Cα
)β−1
exp(−C
α
)}dC
α �(β)
which may be integrated (Hass et al . 1999)
p(y) = �(y + β)y !�(β)
(αV
1+ αV
)y
(1+ αV )−β y = 0,1,2, . . .
with mean and variance
E {Y } = αβV Var {Y } = β(αV )2 + E {Y }In the special case of integer shape parameters, the Poisson-gamma reduces to thenegative binomial distribution.
Poisson-Exponential Mixture
The exponential distribution is a gamma distribution with a shape parameterβ = 1. For this Poisson mixture, concentration is modeled as C ∼ Gamma(α,1).From above, the Poisson-exponential pmf is
p(y) = (αV )y
(1+ αV )y+1E {Y } = αV Var {Y } = (αV )2 + E {Y }
Poisson-Lognormal Mixture
If fiber concentrations follow a lognormal distribution, C ∼ lognormal(µ,σ), theresulting Poisson-lognormal compound distribution may be expressed as
p(y) =∞∫
0
(CV )y exp(−CV )y !
1
σ√
2πexp
[−1
2
(log C − µ
σ
)2]
dCC
with mean and variance (Crow and Shimizu 1988)
E {Y } = V exp(µ+ 1
2σ 2
)Var {Y } = E 2{Y }[exp(σ 2)− 1]+ E {Y }
The probability mass function for the Poisson-lognormal cannot be expressedin terms of known functions; numerical integration is required. Haas et al . (1999)
Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 453
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
W. Brattin et al.
suggest using Gauss-Hermite quadrature to evaluate p(y) but we found that methodinsufficiently accurate for maximum likelihood fitting and simulation. To achievethe necessary accuracy for this analysis, we followed the suggestion by Izsak (2007)in which the infinite integration interval in broken up into two finite intervals overthe interval [0,1].
p(y) =1∫
0
(CV )y exp(−CV )y !
1
σ√
2πexp
[−1
2
(log C − µ
σ
)2]
dCC
+1∫
0
(V /C)y exp(−V /C)y !
1
σ√
2πexp
[−1
2
(− log C − µ
σ
)2]
dCC
Maximum Likelihood Fitting of Poisson Mixtures
There are several techniques for estimating the parameters of Poisson mixtures,including moment matching, regression, and maximum likelihood. For this study,we investigated moment matching and maximum likelihood (Pawitan 2001). Ourinitial investigation showed that maximum likelihood performed better than match-ing moments, with moment matching tending to lead to under-estimates of theUCL and providing poor coverage. Consequently, based on these observations, weselected maximum likelihood for parameter estimation.
Given a set of observed fiber counts in known volumes, {Yk,Vk} for k = 1,2, . . . N ,we estimated the parameters of the mixing distributions by maximum likelihood
θMLE = ar g maxθ ∈ �
{N∑
k=1
log p (θ |Yk,Vk)
}
Numerical minimization/maximization methods must be used for the Poisson-exponential, Poisson-gamma and Poisson-lognormal mixtures. However, numericalestimation for the Poisson-exponential mixture involves a one parameter searchrather than two and hence, is considerably less difficult. To within a constant, thelog-likelihood for the Poisson-exponential sample likelihood is
L(α) ∝ ln(α)N∑
k=1
Yk −N∑
k=1
(1+ Yk) ln(1+ αVk)
The maximum likelihood solution is found by setting the derivative of the log-likelihood with respect to α equal to zero, i.e.,
dL(α)dα
∝ 1α
N∑k=1
Yk −N∑
k=1
(1+ Yk)Vk
1+ αVk= 0
or equivalently,N∑
k=1
αVk
1+ αVk(1+ Yk)−
N∑k=1
Yk = 0
454 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014
UCL of Asbestos Datasets
In the special case of constant volume measurements, Vk = constant, then
αV1+ αV
N∑k=1
(1+ Yk)−N∑
k=1
Yk = 0
that leads to
α = 1N V
N∑k=1
Yk
Notice that the maximum likelihood solution for the Poisson-Exponential withconstant volume samples is identical to the simple Poisson with constant volumesamples, that is,
α(Poisson) =∑N
k=1 Yk∑Nk=1 Vk
= 1N V
N∑k=1
Yk for constant V
Approximate Bayesian Parametric Bootstrap for Confidence Bounds
MLE solution
Start with the maximum likelihood solution, θMLE . Use Bayesian posterior den-sities based on diffuse priors for the parameters of the mixing distributions tocharacterize uncertainty, θ ∗ ∼ f (ϕ|Data), in which ν is the vector of distributionalparameters for the posterior density.
Parametric bootstrap
Draw k = 1,2, . . . ,N parametric bootstrap samples, first from the fitted mixingdistribution, then from the Poisson count distribution
C∗k ← f (ϕMLE |Data)Y ∗k ← Poisson(VkCk)
in which← means to draw from the distribution.Find the maximum likelihood solutions for the simulated dataset
θ ∗MLE = ar g maxθ∗ ∈ �
{N∑
k=1
log p(θ ∗|Y ∗k ,Vk
)}
Use the new estimates of the mixing distribution parameters to calculate themean concentration for that dataset. For R random draws from the uncertaintydistributions for the parameters of the mixing distribution, there will be R estimatesof the mean fiber concentration. The estimate of the sample UCL is taken as the95th percentile of the empiric distribution of R means.
Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 455
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ew M
exic
o] a
t 08:
33 2
3 N
ovem
ber
2014