of 23 /23
This article was downloaded by: [University of New Mexico] On: 23 November 2014, At: 08:33 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Human and Ecological Risk Assessment: An International Journal Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/bher20 Estimation of the Upper Confidence Limit on the Mean of Datasets with Count-Based Concentration Values William Brattin a , Timothy Barry b & Stiven Foster c a SRC, Inc. , Denver , CO , USA b U.S. Environmental Protection Agency, Office of Policy, Economics, and Innovation , Washington , DC , USA c U.S. Environmental Protection Agency , Office of Solid Waste and Emergency Response , Washington , DC , USA Published online: 16 Mar 2012. To cite this article: William Brattin , Timothy Barry & Stiven Foster (2012) Estimation of the Upper Confidence Limit on the Mean of Datasets with Count-Based Concentration Values, Human and Ecological Risk Assessment: An International Journal, 18:2, 435-455, DOI: 10.1080/10807039.2012.652469 To link to this article: http://dx.doi.org/10.1080/10807039.2012.652469 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Estimation of the Upper Confidence Limit on the Mean of Datasets with Count-Based Concentration Values

  • Upload
    stiven

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

This article was downloaded by: [University of New Mexico]On: 23 November 2014, At: 08:33Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Human and Ecological Risk Assessment:An International JournalPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/bher20

Estimation of the Upper ConfidenceLimit on the Mean of Datasets withCount-Based Concentration ValuesWilliam Brattin a , Timothy Barry b & Stiven Foster ca SRC, Inc. , Denver , CO , USAb U.S. Environmental Protection Agency, Office of Policy, Economics,and Innovation , Washington , DC , USAc U.S. Environmental Protection Agency , Office of Solid Waste andEmergency Response , Washington , DC , USAPublished online: 16 Mar 2012.

To cite this article: William Brattin , Timothy Barry & Stiven Foster (2012) Estimationof the Upper Confidence Limit on the Mean of Datasets with Count-Based ConcentrationValues, Human and Ecological Risk Assessment: An International Journal, 18:2, 435-455, DOI:10.1080/10807039.2012.652469

To link to this article: http://dx.doi.org/10.1080/10807039.2012.652469

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

Human and Ecological Risk Assessment, 18: 435–455, 2012Copyright C© Taylor & Francis Group, LLCISSN: 1080-7039 print / 1549-7860 onlineDOI: 10.1080/10807039.2012.652469

STATISTICAL MODELS AND METHODS ARTICLE

Estimation of the Upper Confidence Limiton the Mean of Datasets with Count-BasedConcentration Values

William Brattin,1 Timothy Barry,2 and Stiven Foster3

1SRC, Inc., Denver, CO, USA; 2U.S. Environmental Protection Agency, Officeof Policy, Economics, and Innovation, Washington, DC, USA; 3U.S. EnvironmentalProtection Agency, Office of Solid Waste and Emergency Response, Washington,DC, USA

ABSTRACTMathematical approaches are not well established for calculating the upper con-

fidence limit (UCL) of the mean of a set of concentration values that have beenmeasured using a count-based analytical approach such as is commonly used forasbestos in air. This is because the uncertainty around the sample mean is deter-mined not only by the authentic between-sample variation (sampling error), butalso by random Poisson variation that occurs in the measurement of sample concen-trations (measurement error). This report describes a computer-based application,referred to as CB-UCL, that supports the estimation of UCL values for asbestos andother count-based samples sets, with special attention to datasets with relatively smallnumbers of samples and relatively low counts (including datasets with all-zero countsamples). Evaluation of the performance of the application with a range of testdatasets indicates the application is useful for deriving UCL estimates for datasetsof this type.

Key Words: UCL, asbestos, CB-UCL.

Received 1 November 2010; revised manuscript accepted 7 January 2011.This project was sponsored by the U.S. Environmental Protection Agency, Office of SolidWaste and Emergency Response. SRC, Inc. is a contractor for USEPA. The authors declarethere are no specific potential competing financial interests as a consequence of employment.The opinions expressed in this report are not necessarily those of the U.S. EnvironmentalProtection Agency.Address correspondence to Stiven Foster, U.S. Environmental Protection Agency, Ariel RiosBuilding, 1200 Pennsylvania Avenue, N.W., Mail Code: 5103T, Washington, DC 20460, USA.E-mail: [email protected]

435

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

W. Brattin et al.

INTRODUCTION

Human health risk from exposure to most environmental contaminants is relatedto the average exposure concentration, which is generally estimated by collectinga set of samples that are representative over space and time, and computing themean of the sample set. However, because the sample mean is only an estimate ofthe true mean, and the true mean may be either higher or lower than the samplemean, the U.S. Environmental Protection Agency (USEPA) calls for use of the 95%upper confidence limit (UCL) of the sample mean rather than the sample meanitself when performing quantitative risk calculations (USEPA 1989). This approachensures that there is only a small probability (5%) that the exposure concentrationused in risk calculations will be lower than the true mean, which in turn helps limitthe likelihood of making a false negative risk management decision (deciding thatan exposure is safe when it is not).

The most appropriate statistical approach for calculating the UCL of a datasetdepends on the nature of the distribution from which samples are drawn. As sum-marized in USEPA (1992), approaches are well known for computing the UCL ofdatasets that are known or assumed to be normal (Nelson 1982) or lognormal (Land1971; Crow and Shimizu 1988). However, because not all datasets are well charac-terized by normal or lognormal distributions, the USEPA has developed a softwareapplication called ProUCL (USEPA 2007a) to assist risk assessors determine themost appropriate approach and to derive the most appropriate UCL value for use.In brief, the user provides a dataset of concentration values to the application, andProUCL estimates a number of UCL values for the data using a series of alternativestrategies. This includes fitting three parametric distributions (normal, lognormal,and gamma) to the data by the method of maximum likelihood estimation (MLE)and estimating the UCL from the MLE parameters, as well as several non-parametricapproaches that utilize the Chebychev inequality or various types of bootstrapping.From the set of alternative UCL estimates, the ProUCL application then identifieswhich result is recommended for use.

Unfortunately, ProUCL is not well suited for calculation of UCL values for an-alytes such as asbestos where concentration values are estimated using a count-based analytical method. In the case of asbestos in air (the exposure medium ofprimary health concern for asbestos), samples for analysis are obtained by draw-ing air through a filter, and then examining the filter under a microscope todetermine the number of asbestos structures on the filter. The concentration ofasbestos in the sample is estimated as C = Y /V in which C is the concentra-tion (asbestos counts per unit volume), Y is the observed fiber count, and Vis the volume of air that passed through the area of filter that was examined.Datasets derived by count-based analytical methods can be problematic for ProUCLbecause:

1. In count-based analyses, all samples with a count of zero are bone fide observations,and must be evaluated as such (Cameron et al . 2007; Haas et al . 1999; USEPA1999). However, datasets with one or more samples with a count of zero (and

436 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

UCL of Asbestos Datasets

hence a concentration value of zero) prohibit ProUCL from evaluating eithergamma or lognormal fits.

2. ProUCL assumes the data to be evaluated are drawn from a continuous dis-tribution. However, in the case of concentration values derived by count-basedmethods, observed concentrations are not continuous, but must take on discretevalues (0/V, 1/V, 2/V, etc.).

3. ProUCL assumes that the only source of variation among samples in a dataset isauthentic spatial or temporal variation in the true concentration of the samples(i.e., sampling variation). However, in the case of asbestos samples, an additionalsource of variation arises because each estimate of concentration is based onan observed count that is a Poisson random variable: Y (observed) ∼ Poisson[C(true) · V ].

The discrete nature of the “Poisson-filtered” dataset, as well as the increased vari-ance in the data due to random variation in the observed count (Poisson “countingerror”), combine to alter the mathematics by which the UCL is calculated. In spe-cial cases where there are no zero-count samples in the dataset and where thecoefficient of variation due to Poisson counting error is small compared to thecoefficient of variation in the underlying distribution, then the effect of Poissoncounting error is likely to have minimal effect, and the ProUCL application islikely to work approximately as well on asbestos datasets as on other datasets. How-ever, when samples with zero-counts are present and/or the coefficient of variationfrom analytical variability approaches or exceeds the coefficient of variation dueto sampling variability, the effect of Poisson counting error can generally not beignored.

Based on our experience at several Superfund1 sites where asbestos is an envi-ronmental contaminant of potential concern, most asbestos datasets for air exhibitrelatively low mean counts and contain at least some samples with zero counts, andthe coefficient of variation due to Poisson error is not small compared to the coeffi-cient of variation in the underlying sampling distribution. Consequently, the effectof Poisson measurement error can usually not be ignored, and a new approach forestimating the UCL of this type of asbestos dataset is needed.

To this end, the objective of our efforts was to develop and test a computer-based application for estimating UCL values for asbestos datasets, with special atten-tion to statistically “weak” datasets with small to moderate sample sizes and relativelylow mean counts, including datasets with all zero counts. This is a particularlychallenging objective, since estimating statistics for weak datasets is always inher-ently difficult. For asbestos datasets, this difficulty is compounded by the effect ofrandom Poisson counting error. In addition, we considered it essential that the

1Editor’s note: Superfund sites are uncontrolled hazardous waste sites in the United Statesthat are designated for remediation under the authorities of the Federal ComprehensiveEnvironmental Response, Compensation, and Liability Act, as amended (also called theSuperfund Act).

Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 437

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

W. Brattin et al.

computer-based application could be easily and reliably implemented by risk asses-sors without special expertise in statistics or computer programming.

DESCRIPTION OF THE APPLICATION

Overview

The application that we have developed implements an analysis strategy thatis generally similar to that used by ProUCL. That is, the application seeks to fita number of alternative models to the data and to determine the model that ismost consistent with the data. Then, the UCL is calculated based on that model.However, all model fitting procedures and UCL calculations include the effect ofrandom Poisson counting error. For this reason, we have called the application CB-UCL (Count-Based UCL). The main components of the CB-UCL application aredescribed below.

Data Input

The dataset that is provided to the application consists of a set of N paired fibercounts and volumes, {Yk,Vk} for k = 1,2, . . . ,N . As noted above, counts of zero arevalid and are entered as such, and Vk is the volume of air that passed through thearea of filter examined. Volume is usually expressed in units of cubic centimeters(cc), but may be expressed in any unit that is convenient. Fiber counts are usuallybased on structures that satisfy phase-contrast microscopy (PCM) or transmissionelectron microscopy PCM-Equivalent (PCME) counting rules, but may be based onany other counting rules of interest.

Module 1: Evaluate Data Adequacy

In Module 1, the dataset is evaluated to determine if the data are adequate tosupport fitting alternative distributions to the data. The factors that are consideredin this evaluation include (a) the number of independent samples in the datasetand (b) the number of samples that have one or more fibers detected. Although thechoice of what constitutes an “adequate” dataset is a matter of judgment, based onexperience gained during testing of various trial datasets, the criteria for proceedingto a data-fitting phase have been set to the following:

• Minimum dataset size = 5• Minimum number of samples with non-zero fiber counts = 3

If both of these criteria are satisfied, the data are then processed through imple-mentation of Module 2. If either or both criteria are not satisfied, Module 3 isimplemented.

438 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

UCL of Asbestos Datasets

Module 2: Data Fitting and UCL Estimation for Adequate Datasets

Module 2A: Fitting

Given a dataset that is adequate, the data are fit to four alternative parametricdistributions, as follows:

• Poisson (P)• Poisson-Exponential (PE)• Poisson-Gamma (PG)• Poisson-Lognormal (PLN)

The Poisson distribution is used to account for datasets in which the true con-centration values are constant or nearly so, and the only significant source ofvariation among sample results is random Poisson counting variation. This situa-tion is characterized by data in which the mean and variance are approximatelyequal.

For many environmental datasets, the variation in the count data count is oftenlarger than can be accounted for by ordinary Poisson variability. This is referredto as over-dispersion. In this event, it is common to model the count data as a Pois-son mixture in order to accommodate the larger observed variance. In a Poissonmixture, the Poisson parameter is, itself, a random variable. Some commonly usedmixing distributions include gamma, lognormal, inverse Gaussian, and general-ized inverse Gaussian. For a more complete discussion, see Johnson et al . (1992,1994).

In the CB-UCL application, the PE, PG, and PLN mixing distributions are in-cluded to deal with datasets that are over-disperse. These three different models,along with the Poisson model, allow the application to deal with a wide rangeof shapes and skewness in the data. Fitting of each model to the data is per-formed using the method of maximum likelihood estimation (MLE), as detailed inAppendix A.

Module 2B: Estimating the UCL

For datasets that are best characterized as Poisson, a conservative 95% UCL onthe mean is given by (Nelson 1982):

UCL = 12

∑k Vk

Inν χ 2(

0.95; 2∑

k

Yk + 2)

where Inν χ 2 is the inverse of the chi-squared distribution.For datasets that are best characterized by one of the three Poisson mixture

distributions (PE, PG, PLN), the UCL is estimated using a Bayesian ParametricBootstrapping (BPBS) approach. This procedure is detailed in Appendix A, and issummarized in Text Box 1, using the PLN model as an example.

Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 439

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

W. Brattin et al.

Text Box 1. Example BPBS Procedure for a Poisson-Lognormal Sample

a) Let {µ, σ }be the MLE parameter estimates derived by fitting a PLN model tothe data. Assume that uncertainty around these parameter estimates is similarto that of a simple lognormal distribution (Table 1).

b) Utilizing the assumed uncertainty distribution for {µ, σ }, draw a random valuefor each:

µ∗ ← µ+ σ√N

t(ν) σ 2∗ ← νσ 2

χ 2(ν)

where ν is the degrees of freedom (N -1).c) Draw a set of k = 1,2,..,N concentration values and a set of corresponding

random fiber counts

Ck ← LN (µ∗, σ ∗) followed by Yk ← Poisson(CkVk)

d) Fit the simulated dataset {Yk, Vk} to a PLN model by maximum likelihood,yielding the new parameter estimates{µ′, σ ′}. Estimate the mean from

mean = exp(

µ′ + 12σ ′ 2

)e) Repeat steps b-d many times1

f) Estimate the UCL as the 95th percentile of the empiric distribution of themeans

1Because of the long computational times, the number of iterations used in the CB-UCL applicationfor the PLN fit is only 100. For the Poisson-Exponential, 2,500 iterations are performed. For thePoisson-Exponential, 10,000 iterations are performed.

A similar approach as shown in Text Box 1 is used for the PE distribution, wherethe uncertainty around the MLE parameter estimate α is given by (Martz and Waller1982):

2N C/α ∼ χ 2(2N )

In the case of the gamma distribution, explicit small sample uncertainty distributionsare not available for the shape and scale parameters. In this case, we used Metropolis-Hastings/Markov Chain Monte Carlo (MCMC) simulation with vague priors toestimate the uncertainty in the mean (Bolstad 2010; Carlin and Louis 2000; Gelmanet al . 2004; Gilks et al . 1996; Ntzoufras 2009).

Given the UCL values for each model, the preferred model fit and associatedUCL is identified based on Akaike’s Information Criterion (AIC) (Akaike 1974):

AI C = −2L + 2P

where L = log-likelihood value and P = number of model parameters.In general, the UCL associated with the lowest AIC is preferred, although any

model with an AIC value that does not exceed the minimum AIC value by more than

440 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

Tab

le1.

Para

met

erun

cert

ain

tych

arac

teri

zati

ons.

Dis

trib

utio

nPa

ram

eter

sPa

ram

eter

Un

cert

ain

tyR

efer

ence

UC

LR

efer

ence

s

Pois

son

λ(P

oiss

onra

te)

2λV∼

χ2(2

Y+

1)N

elso

n(1

982)

;Mar

tzan

dW

alle

r(1

982)

UC

L=

12

∑ V kIn

νχ

2

(0.9

5,2

∑ Y k+

2)N

elso

n(1

982)

Log

nor

mal

µ(l

n-m

ean

)µ−µ

σ/√ N∼

t(N−

1)B

oxan

dT

iao

(199

2)U

CL=

exp(

µ+

0.5σ

2

+H

σ/√ N−

1)L

and

(197

1)

σ(l

n-s

tdev

)(N−1

)σ2

σ2∼

χ2(N−

1)H=

f(σ

,N);

see

USE

PA19

92fo

ra

tabl

eof

valu

esG

amm

(sh

ape)

From

MC

MC

wit

hfl

atva

gue

prio

rsG

elm

anet

al.(

2004

);U

CL=

2Nα

bcC

/

Inνχ

2(0

.05,2N

αbc

)G

rice

and

Bai

n(1

980)

;

β(s

cale

)N

tzou

fras

(200

9)α

bc=

(N−

3)α

MLE/

N+

2/(3

N)

USE

PA(2

007b

)

Exp

onen

tial

α(m

ean

)2N

Cα∼

χ2(2

N)

Nel

son

(198

2);M

artz

and

Wal

ler

(198

2)U

CL=

2NC

/

Inνχ

2(0

.05,

2N)

Nel

son

(198

2)

441

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

W. Brattin et al.

about 2 may also be considered as an acceptable option (Burnham and Anderson2002).

Module 3: Evaluation of “Weak” Datasets

If the dataset is not adequate to support meaningful analysis using Module 2,the data are evaluated using Module 3. Note that this includes datasets that are all“non-detect.” As noted earlier, zero count samples are not true “non-detects,” butare bona fide samples.

Because the data are not sufficient to support any meaningful evaluation ofdistributional form, it is necessary to make some assumptions in order to proceed.In order to be conservative, this module assumes the underlying distribution isnot more strongly skewed than a lognormal distribution, and that the geometricstandard deviation (GSD) of the assumed lognormal distribution is not larger thansome specified value (GSDmax). Given these assumptions, Module 3 finds the highestvalue of the log-mean that has a no more than a 5% probability of generating adataset with more counts that the number observed in the dataset being evaluated.This is done using Monte Carlo simulation, as described in Text Box 2. If theassumptions used to calculate the CUB are conservative, then the UCL is less thanthe CUB.

Text Box 2. Evaluation of Weak Datasets

a) Given a dataset of counts and volumes, {Yk,Vk} k = 1,2, . . . N , determine thetotal observed fiber count, Y0 = �Yk

b) Specify a plausible maximum geometric standard deviation, GSDmax, and astarting log-mean, µ. Calculate σ max = ln(GSDmax).

c) Draw a random set of N random concentration values Ck, k = 1,2, . . . ,N fromLN(µ,σ max).

d) Draw an observed count for each sample: Yk← Poisson(Ck · Vk)e) Calculate the sum of the counts: YT = �Yk

f) Repeat steps b-e many times1, and find the frequency that YT exceeds theobserved count Y0. If the frequency is less than 5%, increase the assumedvalue of µ until the frequency equals 5%. If the observed frequency exceeds5%, decrease the assumed value of µ until the frequency equals 5%.

g) After finding the value of µ that yields a 5% probability of having total countsequal to the dataset being evaluated, calculate a conservative upper bound(CUB) on the mean from the parameters: CUB = exp(µ5% + 0.5σ max

2).

1For the CB-UCL application described here, 50,000 iterations are employed for initial bracketing,and 100,000 iterations are used for final calculations.

EVALUATION OF CB-UCL PERFORMANCE

Ideally, the results of the CB-UCL application would be evaluated by comparingthe UCL value generated by the application to the “true” UCL value for a series of

442 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

UCL of Asbestos Datasets

alternative datasets. However, in this case, we are not aware of any way to calculate the“true” value of the UCL for a Poisson mixture dataset. Consequently, performanceof the application was evaluated as described below.

Evaluation of Module 2

In order to evaluate the performance of Module 2, a series of “synthetic” testdatasets were generated from a number of specified “truths,” and the UCL returnedby the application was compared to the UCL that would have been obtained in theabsence of Poisson counting error. The step-wise procedure is summarized in TextBox 3.

Text Box 3. Process for Evaluation of Module 2

a) Specify a “true” distributional form (exponential, gamma, lognormal) for theunderlying concentration values, and specify the “true” parameters of theconcentration distribution.

b) Specify a sample size N .c) Draw a concentration dataset {C1, C2, . . . CN } from the specified distributiond) Calculate the “true” UCL for that dataset, using the appropriate equation for

the distributional form (Table 1)e) Divide the “true” UCL by the true mean to generate a UCL ratio, Rtrue

f) Specify the volume analyzed for each sample {V1, V2, . . . VN). For most tests,all volumes in a dataset were identical.

g) Draw an “observed” count for each sample: Yk← Poisson(Ck · Vk)h) Provide the simulated dataset {Yk, Vk} to the application and obtain the UCL

estimatei) Divide the UCL returned by the application by the true mean to yield a UCL

ratio, Rapp

j) Repeat steps c-i many times (e.g., 400–2000), and generate an empiric cumu-lative distribution function (cdf) for both Rtrue and Rapp.

k) Repeat steps a-j for a series of alternative “truths”, selecting differing distri-butional forms, parameters, sample sizes, volumes, etc.

For each specified “truth,” the performance of the application was evaluated bycomparing the cdf for Rapp to that for Rtrue. Ideal behavior includes the following:

a) Coverage. In the absence of Poisson error, the cdf for Rtrue is expected to have 95%coverage (95% of all ratio values are ≥1.0). Ideally, the cdf for Rapp will also haveapproximately 95% coverage in all cases.

b) Effect of Sample Size. A priori, it is expected that, as the size of the dataset decreases,the cdf of both Rapp to Rtrue should tend to tip to the right, maintaining about 5%of the values equal to or less than 1.

c) Effect of Volume. A priori, it is expected that, as the average volume analyzed de-creases, the average count will decrease, and the relative contribution of Poisson

Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 443

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

W. Brattin et al.

N = 40

N = 20

N = 10

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUE

Ybar = 20

Ybar = 2

Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUE

Ybar = 20

Ybar = 2

Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUE

Ybar = 20

Ybar = 2

Ybar = 0.2

Figure 1. Evaluation of Poisson-exponential truths (mean = 0.02). (Color figureavailable online.)

measurement error will tend to increase. Thus, the cdf for Rapp should tend tobe tipped further to the right compared to the cdf for Rtrue as volume tends todecrease.

444 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

UCL of Asbestos Datasets

Panel A: CV = 2.0, Shape = 0.25, Scale = 0.08 Panel B: CV = 0.9, Shape = 1.235, Scale = 0.016

04 = N04 = N

02 = N02 = N

01 = N01 = N

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUEYbar = 20Ybar = 2Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUEYbar = 20Ybar = 2Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUEYbar = 20Ybar = 2Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUEYbar = 20Ybar = 2Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUEYbar = 20Ybar = 2Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUEYbar = 20

Ybar = 2Ybar = 0.2

Figure 2. Evaluation of Poisson-Gamma truths (mean= 0.02). (Color figure avail-able online.)

The results of this evaluation approach are shown in Figure 1 (Poisson-Exponentialmodel), Figure 2 (Poisson-Gamma model), and Figure 3 (Poisson-Lognormalmodel). In all cases, the true mean of each underlying distribution is set to 0.02. Thisvalue is arbitrary, and identical results would be obtained for other means, assumingthat the ratio of mean and analytical sensitivity (the product of mean times volume)remains constant.

In each figure, results are shown for three different sample sizes (40, 20, and 10).For PG (Figure 2) and PLN (Figure 3), results are shown for two differing degreesof skewness. In each panel, the cdf for Rtrue is shown by a solid red line labeledTRUE. This line reflects the UCL values that would have been obtained from theseries of datasets generated under conditions where Poisson error is absent. Eachof the other cdfs in each panel reflects the UCL ratios that were generated usingthe CB-UCL application (Rapp), with each cdf reflecting the UCLs for dataset with

Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 445

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

W. Brattin et al.

17.7 = DSG ,8 = VC :B lenaP83.5 = DSG ,4 = VC :A lenaP04 = N04 = N

02 = N02 = N

01 = N01 = N

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100 1000 10000

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUE

Ybar = 20

Ybar = 2

Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100 1000 10000

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUE

Ybar = 20

Ybar = 2

Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100 1000 10000

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUE

Ybar = 20

Ybar = 2

Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100 1000 10000

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUE

Ybar = 20

Ybar = 2

Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100 1000 10000

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUE

Ybar = 20

Ybar = 2

Ybar = 0.2

0.00.10.20.30.40.50.60.70.80.91.0

0.1 1 10 100 1000 10000

Cum

ulat

ive

Pro

babi

lity

Ratio (UCL / True Mean)

TRUE

Ybar = 20

Ybar = 2

Ybar = 0.2

Figure 3. Evaluation of Poisson-Lognormal truths (mean = 0.02). (Color figureavailable online.)

three different volumes analyzed (1,000, 100, or 10), resulting in expected averagecounts (Ybar) of 20, 2, or 0.2.

Inspection of these figures reveals the following main observations:

• As expected, decreasing sample size tends to increase uncertainty, with the cdfsfor both Rtrue and Rapp tending to become tipped more and more to the rightas sample size decreases.• As expected, the cdf for Rtrue is typically the left-most and steepest of each

group. This is because the UCLtrue values are based on datasets that includeonly sampling variability, and do not include Poisson uncertainty. When Pois-son uncertainty is included, in cases of high average count, the Rapp cdf mayapproach the Rtrue line quite closely. This is because Poisson uncertainty tendsto become less important as average count increases. However, as mean count

446 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

UCL of Asbestos Datasets

decreases, the effect of Poisson uncertainty increases, and the Rapp cdfs tipfurther and further to the right.• In some cases, the low end of the Rapp cdfs also tend to shift somewhat to the

right, resulting in an increase in coverage from the target of 95% to somehigher value. However, the degree to which the low end is right-shifted isusually relatively minor, which indicates that this application does not tend togenerate UCLs that are unduly conservative.• For gamma and lognormal distributions, increasing skewness (expressed as

CV = stdev/mean) tends to increase uncertainty and shift cdfs for both Rtrue

and Rapp to the right. This is expected because increasing skewness increasesthe uncertainty due to random sampling variation.

Evaluation of Module 3

In Figure 4 are presented conservative upper bound (CUB) values calculatedusing Module 3 for a number of datasets with differing sample sizes, volumes (allsamples were assumed to have the same volume), total counts, and GSDmax values.Results are expressed as a concentration value (asbestos structures per unit volume)rather than a ratio of the CUB to the true mean because each test dataset is consistentwith a range of alternative “truths.” Inspection of Figure 4 reveals the following mainpoints:

• CUB values decrease with increasing sample size• CUB values decrease with increasing sample volume analyzed• CUB values increase with increasing total count• CUB values increase with increasing GSDmax

DISCUSSION

Other Approaches Tested

The method that is implemented in the application described earlier was selectedonly after testing a range of alternative strategies for estimating the UCL of count-based datasets. Other methods that were evaluated but not selected are summarizedin Table 2. While most approaches yielded reasonable results when presented withrobust datasets (large sample size, high average counts, high detection frequency),these methods did not perform well when presented with relative weak datasets (lowsample size, low average count, low detection frequency). In some cases, the meth-ods became mathematically unstable when the data were weak, and considerableexpertise and programming skill was needed to obtain solutions. In other cases, theresults (evaluated using the same approach as described above) were inferior to thecurrent application, with cdfs of Rapp that did not have the desired attributes whencompared to Rtrue.

Other Distributions Tested

The four distributions included in this application were selected only after evalua-tion of a wider range of mixing distributions, including two unbounded distributions

Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 447

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

Tab

le2.

Sum

mar

yof

alte

rnat

ive

stra

tegi

esin

vest

igat

ed.

Met

hod

Nam

eD

escr

ipti

onC

omm

ents

1B

ayes

MC

MC

,im

plem

ente

dus

ing

Win

BU

Gsa

Spec

ify

prio

rsfo

rm

odel

para

met

ers.

Com

bin

epr

iors

wit

hda

ta(a

seto

fYk

and

Vk

pair

s)us

ing

Bay

esM

CM

Cto

esti

mat

epo

ster

iors

onm

odel

para

met

ers.

For

each

acce

pted

pair

inth

epo

ster

ior,

calc

ulat

em

ean

from

para

met

ers.

UC

L=

95th

ofm

ean

s

Wor

ksw

ellf

orst

ron

gda

tase

ts.F

orw

eak

data

sets

,re

sult

sbe

com

est

ron

gly

depe

nde

nto

npr

iors

(can

getn

earl

yan

yre

sult

desi

red

byal

teri

ng

form

orpa

ram

eter

sof

prio

rs).

Req

uire

sfa

mili

arit

yw

ith

Win

BU

GS

and

con

side

rabl

epr

ofes

sion

alju

dgm

entt

oen

sure

outp

utis

relia

ble.

2B

ayes

MC

MC

,im

plem

ente

dus

ing

Cco

de

Sim

ilar

toM

eth

od1,

exce

ptth

eca

lcul

atio

ns

are

perf

orm

edus

ing

anex

ecut

able

Cpr

ogra

mso

expe

rtis

ew

ith

Win

BU

Gs

isn

otn

eede

d.T

he

prio

rsar

eas

sum

edto

beco

mpl

etel

yun

info

rmed

.

Wor

ksw

ellf

orst

ron

gda

tase

ts.F

orw

eak

data

sets

,th

epr

ogra

mm

ayex

peri

ence

num

eric

prob

lem

san

dU

CL

valu

este

nd

tobe

very

hig

h.O

btai

nin

gso

luti

ons

requ

ires

expe

rtju

dgm

enta

nd

con

side

rabl

esk

illin

mod

ifyi

ng

the

Cco

de.

3M

LE

para

met

eres

tim

atio

nFi

tdat

aset

ofC

k=

Y k/

V kpa

irs

toan

assu

med

dist

ribu

tion

(e.g

.,L

N)

usin

gM

LE

.Est

imat

eth

eU

CL

from

the

resu

ltin

gM

LE

para

met

eres

tim

ates

usin

gth

eap

prop

riat

eeq

uati

on(T

able

1).

Ten

dsto

over

-est

imat

eU

CL

atlo

wco

unts

,un

dere

stim

ate

UC

Lat

hig

hco

unts

.

4In

depe

nde

ntP

oiss

onSa

mpl

esTr

eate

ach

sam

ple

asun

cert

ain

,wit

h2C

kVk∼

χ2(2

Y k+

1).D

raw

sim

ulat

edse

tsof

Ck

valu

esfr

omch

isqu

are

pdfs

,an

dca

lcul

ate

the

mea

n.U

CL=

95th

perc

enti

leof

the

empi

ric

mea

ns

Acc

oun

tsfo

rPo

isso

nun

cert

ain

tyon

ly,a

nd

does

not

incl

ude

vari

abili

ty(o

verd

ispe

rsio

n)

betw

een

the

sam

ples

.

5Fi

tted

Ch

iSqu

are

Trea

teac

hsa

mpl

eas

unce

rtai

n,w

ith

2CkV

k∼

χ2(2

Y k+

1).D

raw

sim

ulat

edse

tsof

Ck,

fit

toas

sum

eddi

stri

buti

on(e

.g.,

LN

),ca

lcul

ate

UC

Lfr

omth

eM

LE

para

met

ers

usin

gth

eap

prop

riat

eeq

uati

on(T

able

1).T

ake

som

epe

rcen

tile

ofth

eU

CL

dist

ribu

tion

asth

eU

CL

poin

test

imat

e.

Th

eop

tim

alpe

rcen

tile

depe

nds

onav

erag

eco

unt.

For

hig

hav

erag

eco

unt,

the

opti

mum

isn

ear

the

50th

perc

enti

le.A

sav

erag

eco

untd

ecre

ases

,th

eop

tim

umpe

rcen

tile

decr

ease

s,ap

proa

chin

gth

e5t

hpe

rcen

tile

inso

me

case

s.

a Win

BU

GS

isa

free

war

eap

plic

atio

nth

atpr

ovid

esan

inte

ract

ive

Win

dow

sve

rsio

nof

the

BU

GS

prog

ram

for

Bay

esia

nan

alys

isof

com

plex

stat

isti

calm

odel

sus

ing

Mar

kov

chai

nM

onte

Car

lo(M

CM

C)

tech

niq

ues.

Ava

ilabl

efr

omh

ttp:

//w

ww.

mrc

-bsu

.cam

.ac.

uk/b

ugs/

win

bugs

/co

nte

nts

.sh

tml

448

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

UCL of Asbestos Datasets

02 = DSG :B lenaP01 = DSG :A lenaP01 = V01 = V

001 = V001 = V

0001 = V0001 = V

0.0001

0.001

0.01

0.1

1

10

0 2 4 6 8

CU

B (s

/cc)

Y(total)

N = 10N = 20N = 40

0.0001

0.001

0.01

0.1

1

10

0 2 4 6 8

CU

B (s

/cc)

Y(total)

N = 10N = 20N = 40

0.0001

0.001

0.01

0.1

1

10

0 2 4 6 8

CU

B (s

/cc)

Y(total)

N = 10N = 20N = 40

0.0001

0.001

0.01

0.1

1

10

0 2 4 6 8

CU

B (s

/cc)

Y(total)

N = 10N = 20N = 40

0.0001

0.001

0.01

0.1

1

10

0 2 4 6 8

CU

B (s

/cc)

Y(total)

N = 10N = 20N = 40

0.0001

0.001

0.01

0.1

1

10

0 2 4 6 8

CU

B (s

/cc)

Y(total)

N = 10N = 20N = 40

Figure 4. Conservative upper bound (CUB) values for datasets with fewer thanthree detects.

(inverse Gaussian and generalized inverse Gaussian) and two bounded mixing distribu-tions (beta and Kumaraswamy). After substantial testing, we concluded that it was notnecessary to include all of these alternatives in the application, because the datasetsprovided could usually be well fit by one or more of the four that were selected, andthat no meaningful increase in fit quality was achieved with the other distributions.This is perhaps expected for relatively weak datasets, since the data usually are notsufficient to provide any strong distinction between similarly shaped distributions.

Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 449

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

W. Brattin et al.

We did not evaluate a Poisson-Normal distribution because this distribution allowsconcentrations values to take on negative values.

Performance of Module 2

The results presented in Figures 1 to 3 demonstrate that Module 2 performsreasonably well, yielding UCL cdfs that approach the theoretical distribution (Rtrue)when datasets are robust, and which reflect increasing uncertainty as the effect ofPoisson counting error increases. However, it is important to note that Module 2is based on approximate methods, and that UCL values do not always have idealcharacteristics. For example, the ideal coverage for Rapp is exactly 95%, but in manycases, the actual coverage increases to 100% (i.e., no UCL value is lower than thetrue mean). For practical purposes, this is usually not a significant limitation, just solong as the low end of the Rapp cdf is not too far right-shifted from Rtrue (5% at Rtrue =1.0). Otherwise, the UCLs become overly conservative, which increases the risk offalse-positive decision-making.

Performance of Module 3

Module 3 assumes that the underlying distribution of concentration values is notmore skewed than a lognormal, and has a GSD value that does not exceed GSDmax.Ideally, the choice of GSDmax will be based on site-specific experience with othersimilar datasets where GSD values for PLN fits could be estimated. In cases whereno site-specific experience is available, extrapolation of results and experience fromother sites may be appropriate. In the absence of any frame of reference, the choiceof GSDmax must be based entirely on judgment. In this case, we suggest that GSDmax

values of 10 and 20 be investigated, based on our expectation that most datasets areunlikely to be more skewed than this. However, the final choice must be made bysite risk assessors and risk managers.

In cases where the CUB for a dataset is at or greater than a level of human healthconcern, it may be helpful to either collect more samples or to extend the analysis ofexisting samples to achieve a lower analytical sensitivity (a higher volume) in orderto increase the accuracy of the UCL estimate. In cases where the CUB is less than alevel of human health concern, further effort to improve on the UCL estimate maynot be required.

Samples Analyzed to Constant Count

It is important to recognize that the CB-UCL application assumes that the volumeof air analyzed for a sample (Vk) is either a constant or a random variable, andthat volume analyzed is not determined by concentration. This assumption will betrue if all samples are analyzed to some constant target sensitivity, or if the datasetis comprised of samples that have been analyzed to random target sensitivities.However, if samples are analyzed to a constant count, the mathematics of the currentapplication are not appropriate, and this application should not be used for datasetsthat have a substantial fraction (e.g., >10–20%) of samples that were counted to aspecified number of counts.

450 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

UCL of Asbestos Datasets

Application to Other Count-Based Data Types

The approach we have developed is applicable not only to asbestos (the mainfocus of this report), but also to other types of data in which results are subject tosignificant Poisson counting variation (e.g., low levels of microorganisms in water).

OBTAINING AND USING THE APPLICATION

The CB-UCL application and a User’s Guide may be obtained by submittinga request by e-mail to: [email protected]. In the subject line, please enter thefollowing: “Request CB-UCL.”

In brief, the User’s Guide describes how to enter datasets into input files, andhow to provide each data input file to the application for processing. The results ofeach dataset are provided in a separate output file, using a file name specified bythe user. For datasets that are evaluated using Module 2, the output file includes theMLE parameters for each of the alternative distributions and the UCL associatedwith each distribution, rank ordered by AIC. For datasets evaluated using Module 3,the output includes the value of µ5% and the CUB.

REFERENCES

Akaike H. 1974. A new look at the statistical model identification. IEEE Transactions onAutomatic Control 19 (6):716–23

Bolstad WM. 2010. Understanding Computational Bayesian Statistics. John Wiley and Sons,New York, NY, USA

Box G and Tiao GC. 1992. Bayesian Inference in Statistical Analysis. John Wiley & Sons, Inc,New York, NY, USA

Burnham KP and Anderson DR. 2002. Model Selection and Multimodal Inference. Springer,New York, NY, USA

Cameron A, Colin T, and Pravin K. 2007. Regression Analysis of Count Data. CambridgeUniversity Press. New York, NY, USA

Carlin BP and Louis TA. 2000. Bayes and Empirical Bayes Methods for Data Analysis. Chapmanand Hall/CRC, New York, NY, USA

Crow EL and Shimizu K. 1988. Lognormal Distributions. Theory and Applications. MarcelDekker, New York, NY, USA

Gelman A, Carlin JB, Stern HS, et al. 2004. Bayesian Data Analysis. Chapman and Hall/CRCPress, New York, NY, USA

Gilks WR, Richardson S, and Spiegelhalter DJ. 1996. Markov Chain Monte Carlo in Practice.Chapman and Hall, New York, NY, USA

Grice JV and Bain LJ. 1980. Inferences concerning the mean of the gamma distribution. JAm Stat Assoc 75(372): 929–33

Haas CN, Rose JB, and Gerba CP. 1999. Quantitative Microbial Risk Assessment. John Wiley& Sons, New York, NY, USA

Haight FA. 1967. Handbook of the Poisson Distribution. John Wiley & Sons, Inc, New York,NY, USA

Izsak RR. 2007. Maximum likelihood fitting of the Poisson lognormal distribution. EnvironEcol Stat 15:143–56

Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 451

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

W. Brattin et al.

Johnson NL, Kotz S, and Kemp A. 1992. Univariate Discrete Distributions. John Wiley & Sons,Inc, New York, NY, USA

Johnson NL, Kotz S, and Balakrishnan N. 1994. Continuous Univariate Distributions, vol 1.John Wiley and Sons, New York, NY, USA

Land CE. 1971. Confidence intervals for linear functions of the normal mean and variance.Annals Math Stat 42(4):1187–205

Martz HF and Waller RA. 1982. Bayesian Reliability Analysis. John Wiley & Sons, Inc, NewYork, NY, USA

Nelson WB. 1982. Applied Life Data Analysis. Wiley Interscience, Hoboken, NJ, USANtzoufras I. 2009. Bayesian Modeling Using WinBUGS. John Wiley & Sons, Inc, New York,

NY, USAPawitan Y. 2001. In All Likelihood: Statistical Modeling and Inference Using Likelihood.

Oxford Press, Oxford, UKUSEPA (US Environmental Protection Agency). 1989. Risk Assessment Guidance for Su-

perfund (RAGS), vol I. Human Health Evaluation Manual (Part A). EPA/540/1-89/002.Office of Solid Waste and Emergency Response, Washington, DC, USA. Publication. De-cember. Available at http://www.epa.gov/oswer/riskassessment/ragsa/

USEPA. 1992. Supplemental Guidance to RAGS: Calculating the Concentration Term.Office of Solid Waste and Emergency Response, Washington, DC, USA. Publica-tion 9285.7-081. May. Available at http://www.deq.state.or.us/lq/pubs/forms/tanks/UCLsEPASupGuidance.pdf

USEPA. 1999. M/DBP Stakeholder Meeting Statistics Workshop Meeting Summary: Novem-ber 19, 1998, Governor’s House, Washington DC. Final. Report prepared for U.S. Environ-mental Protection Agency, Office of Ground Water and Drinking Water by RESOLVE,Washington, DC, and SAIC, McLean, VA. EPA Contract No. 68-C6-0059. Available athttp://water.epa.gov/lawsregs/rulesregs/sdwa/mdbp/st2nov98.cfm

USEPA. 2007a. ProUCL Version 4.00.02. Users Guide. Prepared for Office of Research andDevelopment, by Lockheed Martin Environmental Services, Las Vegas, NV. PublicationEPA/600/R-07/038. Available at http://www.epa.gov/esd/tsc/TSC form.htm

USEPA. 2007b. ProUCL Version 4.0. Technical Guide. Prepared for Office of Research andDevelopment, by Lockheed Martin Environmental Services, Las Vegas, NV. PublicationEPA/600/R-07/041. April 2007. Available at http://www.epa.gov/esd/tsc/ProUCL v4.00.02/ProUCL v4.0 Tech Guide.pdf

APPENDIX A: MAXIMUM LIKELIHOOD ESTIMATION (MLE) FITTINGOF MODELS TO DATASETS

Poisson Mixtures (Compound Poisson Distributions)

A Poisson mixture model is used to describe a Poisson process in which thePoisson parameter is itself randomly distributed (Haight 1967; Johnson et al . 1992;Haas et al . 1999). A Poisson mixture may be defined by the general probability massfunction (pmf )

p M (y ; V , θ) =∞∫

0

(CV )y exp(−CV )y !

dM(C ; θ)

in which y is the fiber count in a volume V , C is the mean mass density (concentra-tion), and M is the mixing distribution with parameters θ characterizing the natural

452 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

UCL of Asbestos Datasets

heterogeneity in fiber concentrations. For some mixing distributions, the above in-tegral can be evaluated in terms of known functions; for others, the integral must benumerically evaluated. In the following, we summarize the Poisson mixtures used inthis analysis.

Poisson-Gamma Mixtures

If C follows a gamma distribution with scale parameter α and shape parameter β,C ∼ Gamma(α,β), the Poisson-Gamma compound pmf is:

p(y) =∞∫

0

(CV )y exp(−CV )y !

{(Cα

)β−1

exp(−C

α

)}dC

α �(β)

which may be integrated (Hass et al . 1999)

p(y) = �(y + β)y !�(β)

(αV

1+ αV

)y

(1+ αV )−β y = 0,1,2, . . .

with mean and variance

E {Y } = αβV Var {Y } = β(αV )2 + E {Y }In the special case of integer shape parameters, the Poisson-gamma reduces to thenegative binomial distribution.

Poisson-Exponential Mixture

The exponential distribution is a gamma distribution with a shape parameterβ = 1. For this Poisson mixture, concentration is modeled as C ∼ Gamma(α,1).From above, the Poisson-exponential pmf is

p(y) = (αV )y

(1+ αV )y+1E {Y } = αV Var {Y } = (αV )2 + E {Y }

Poisson-Lognormal Mixture

If fiber concentrations follow a lognormal distribution, C ∼ lognormal(µ,σ), theresulting Poisson-lognormal compound distribution may be expressed as

p(y) =∞∫

0

(CV )y exp(−CV )y !

1

σ√

2πexp

[−1

2

(log C − µ

σ

)2]

dCC

with mean and variance (Crow and Shimizu 1988)

E {Y } = V exp(µ+ 1

2σ 2

)Var {Y } = E 2{Y }[exp(σ 2)− 1]+ E {Y }

The probability mass function for the Poisson-lognormal cannot be expressedin terms of known functions; numerical integration is required. Haas et al . (1999)

Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 453

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

W. Brattin et al.

suggest using Gauss-Hermite quadrature to evaluate p(y) but we found that methodinsufficiently accurate for maximum likelihood fitting and simulation. To achievethe necessary accuracy for this analysis, we followed the suggestion by Izsak (2007)in which the infinite integration interval in broken up into two finite intervals overthe interval [0,1].

p(y) =1∫

0

(CV )y exp(−CV )y !

1

σ√

2πexp

[−1

2

(log C − µ

σ

)2]

dCC

+1∫

0

(V /C)y exp(−V /C)y !

1

σ√

2πexp

[−1

2

(− log C − µ

σ

)2]

dCC

Maximum Likelihood Fitting of Poisson Mixtures

There are several techniques for estimating the parameters of Poisson mixtures,including moment matching, regression, and maximum likelihood. For this study,we investigated moment matching and maximum likelihood (Pawitan 2001). Ourinitial investigation showed that maximum likelihood performed better than match-ing moments, with moment matching tending to lead to under-estimates of theUCL and providing poor coverage. Consequently, based on these observations, weselected maximum likelihood for parameter estimation.

Given a set of observed fiber counts in known volumes, {Yk,Vk} for k = 1,2, . . . N ,we estimated the parameters of the mixing distributions by maximum likelihood

θMLE = ar g maxθ ∈ �

{N∑

k=1

log p (θ |Yk,Vk)

}

Numerical minimization/maximization methods must be used for the Poisson-exponential, Poisson-gamma and Poisson-lognormal mixtures. However, numericalestimation for the Poisson-exponential mixture involves a one parameter searchrather than two and hence, is considerably less difficult. To within a constant, thelog-likelihood for the Poisson-exponential sample likelihood is

L(α) ∝ ln(α)N∑

k=1

Yk −N∑

k=1

(1+ Yk) ln(1+ αVk)

The maximum likelihood solution is found by setting the derivative of the log-likelihood with respect to α equal to zero, i.e.,

dL(α)dα

∝ 1α

N∑k=1

Yk −N∑

k=1

(1+ Yk)Vk

1+ αVk= 0

or equivalently,N∑

k=1

αVk

1+ αVk(1+ Yk)−

N∑k=1

Yk = 0

454 Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014

UCL of Asbestos Datasets

In the special case of constant volume measurements, Vk = constant, then

αV1+ αV

N∑k=1

(1+ Yk)−N∑

k=1

Yk = 0

that leads to

α = 1N V

N∑k=1

Yk

Notice that the maximum likelihood solution for the Poisson-Exponential withconstant volume samples is identical to the simple Poisson with constant volumesamples, that is,

α(Poisson) =∑N

k=1 Yk∑Nk=1 Vk

= 1N V

N∑k=1

Yk for constant V

Approximate Bayesian Parametric Bootstrap for Confidence Bounds

MLE solution

Start with the maximum likelihood solution, θMLE . Use Bayesian posterior den-sities based on diffuse priors for the parameters of the mixing distributions tocharacterize uncertainty, θ ∗ ∼ f (ϕ|Data), in which ν is the vector of distributionalparameters for the posterior density.

Parametric bootstrap

Draw k = 1,2, . . . ,N parametric bootstrap samples, first from the fitted mixingdistribution, then from the Poisson count distribution

C∗k ← f (ϕMLE |Data)Y ∗k ← Poisson(VkCk)

in which← means to draw from the distribution.Find the maximum likelihood solutions for the simulated dataset

θ ∗MLE = ar g maxθ∗ ∈ �

{N∑

k=1

log p(θ ∗|Y ∗k ,Vk

)}

Use the new estimates of the mixing distribution parameters to calculate themean concentration for that dataset. For R random draws from the uncertaintydistributions for the parameters of the mixing distribution, there will be R estimatesof the mean fiber concentration. The estimate of the sample UCL is taken as the95th percentile of the empiric distribution of R means.

Hum. Ecol. Risk Assess. Vol. 18, No. 2, 2012 455

Dow

nloa

ded

by [

Uni

vers

ity o

f N

ew M

exic

o] a

t 08:

33 2

3 N

ovem

ber

2014