Nonparametric Analysis of Dose-Response Relationships

223

Nonparametric Analysis of Dose-Response Relationships

K. ULM

a

Institut für Medizinische Statistik und Epidemiologie,Technische Universität München, Munich, Germany

A

BSTRACT

: A nonparametric method, isotonic regression, is proposed foranalyzing a dose-response relationship and for assessing a threshold value.There are several advantages of this method compared to parametric models.No specific form of the relationship (type of model and use of the covariates) isrequired. The only assumption is monotonicity. Rejection of specific hypothe-sis can be based on the result of a permutation test. Several applications (para-aramid, crystalline silica, and PNOC) are presented. In these examples thedose-response relationships are analyzed. Where a relationship is present theexistence of a threshold is investigated.

INTRODUCTION

The analysis of the dose-response relationship is an important criterion needed inestablishing causality. Furthermore, if a relationship is present the question of con-sequences needs to be discussed. In occupational medicine the most important ques-tion is whether a threshold can be assessed. There are several statistical methodsavailable for analysis of a given set of data. In the case of an ordinal response, severalof these methods have been described in form of a tutorial.

1

In larger samples differ-ent methods tend to exhibit comparable results. In smaller samples, however, as wellas for certain outcomes, the methods can lead to differing results.

2

In a binary re-sponse the logistic and the linear regression have been compared using the dose inthe original form, and then transformed. The results are not always consistent.Assessment of a threshold value can also be based on a specific statistical model.

3

Itis well-known that different models can lead to contradictory results.

4

Very recentlyit was shown that even if the same model is applied, the result may depend on theform in which the dose is used.

5

In order to obtain a solution that is independent of the form in which the dose isused, a nonparametric method, isotonic regression,

6

is proposed. First, isotonicregression is described very briefly and subsequently several applications are pre-sented.

a

Address for correspondence: Prof. Dr. K. Ulm, Institute for Medical Statistics and Epidemi-ology, Technical University Munich, Ismaninger Straße 22, 81675 Munich, Germany. +49-89-4140-4321 (voice); +49-89-4140-4850 (fax).

e-mail: [email protected]

224 ANNALS NEW YORK ACADEMY OF SCIENCES

METHOD

Isotonic Regression

In many epidemiological studies, especially in occupational medicine, a mono-tonic dose-response relationship is assumed. Nearly all parametric models, forexample, logistic regression, are based on this assumption. If only the dose isconsidered and

p

i

is the risk associated with dose level

d

i

, the assumption of mono-tonicity is fulfilled if the following relation holds:

.

(1)

Using a model of the form,

where is a certain transformation of the response probability

P

(

d

), relation

(1)

always holds. However, the estimates of

P

(

d

) can depend on how

d

is used in themodel—in the original or transformed form. Sometimes the dose is grouped intoseveral categories and in this situation the same problems also occur. Therefore, amethod independent of the form in which the dose is used has certain advantages.Isotonic regression provides a maximum likelihood estimate for the response prob-ability

P

(

d

) that satisfies relation

(1)

and is independent of the form of the dose. Anymonotonic transformation of the dose leads to an identical result.

With only one variable the algorithm is simple. If the relation

(1)

fails to hold forthe observed response rates within pair , both groups are pooled using thesize (

n

j

,

n

j

+1

) as weights. This method is called the pooling adjacent violators algo-rithm.

6

Proof of a dose-response relationship is based on likelihood ratio statistics

R

, bycomparing the appropriate likelihood function under

H

0

(all response rates are iden-tical) and under

H

1

(result of isotonic regression). The large sample distribution ofthe test-statistic is non standard and is a weighted sum of

χ

2

-distributions withdifferent degrees of freedom.

6

For smaller samples the performance of a permutationtest is recommended in order to calculate the appropriate

p

-value.

2

The same procedure can also be used to take additional covariates into account.In the case of two covariates the algorithm provided by Dykstra and Robertson canbe applied.

7

In this case the restriction of monotonicity means

with and .To test the influence of both covariates again the likelihood ratio statistic

R

canbe used. However, if one is interested in the effect of a single covariate, dose forexample, given the influence of the other covariate, for example, time since firstexposure, no standard test procedure is available. The problem is related to theappropriate number of degrees of freedom. One solution to this problem is again theperformance of a permutation test.

If more than one additional covariate is taken into account, no algorithm for esti-mating the response rates is available. In this case the model has to be specified, forexample, an additive isotonic model.

8

p1 p2 … pI≤ ≤ ≤

F P d( )( ) α β d⋅+=F ·( )

j j 1+,( )

p j* p j 1+

*p j n j⋅ p j 1+ n j 1+⋅+

n j n j 1++---------------------------------------------------= =

pij pi ′j ′≤

i i′≤ j j′≤

225ULM: NONPARAMETRIC ANALYSIS OF DOSE-RESPONSE RELATIONSHIPS

Permutation Test

Based on the observations, a large number of permutations (e.g.,

m

= 10.000) canbe analyzed. Each individual is characterized by a data pair, ,with

d

i

denoting the dose and

δ

i

the status (

δ

i

= 0 without event and

δ

i

= 1 with event).For the permutation test this pair is separated and the dose and the status are com-bined randomly. Within each permutation

H

0

(equal risk in all dose-groups) is con-sidered. Each permutation is analyzed by the test proposed and leads to a teststatistics

t

perm

. If

t

obs

is the observed value of the test statistic of the original data,the

p

-value is merely the probability that the result of a permutation is equal to

t

obs

or exceeds it.

.

Sometimes the numerator and the denominator are increased by unity, which has noimpact if

m

is large. If the

p

-value is less than the predefined significance level

α

,

H

0

is rejected and a dose-response relationship can be assumed.This idea can be extended to analyze the influence of a certain covariate,

d

, giventhe effect of other covariates, . The data are then given in the form

. This vector is now separated into two parts (

d

i

) and that are randomly combined. The same procedure as that described for the univariatesituation can be applied.

Estimation of a Threshold Value

The use of isotonic regression can be extended to estimate a threshold value fol-lowing a proposal of Schell and Singh.

9

The idea is simply to amalgamate adjacentgroups and to compare the corresponding likelihood functions. If the difference be-tween the values of the likelihood functions is too small both groups can be pooled,otherwise a threshold can be assessed.

For example, assume are the results of the isotonic regressionwith ln

L

the value of the corresponding likelihood function. In the first step, doselevel 1 and 2 are pooled leading to a response rate of

p

12

with ln

L

12

the value of thecorresponding likelihood function. The difference

is of interest. If

D

is small the difference between

p

1

and

p

2

can be ignored. The teststatistic

D

should follow approximately a

χ

2

-distribution with one degree of free-dom. If

D

is less than, for example, 2.71 (= ), both groups can be amalgam-ated and the procedure continued. If

D

exceeds that critical level, dose

d

1

can beassessed as threshold value.

RESULTS

Para-aramid

The classification of man-made mineral fibers as carcinogenic is still controver-sial. One type of fiber, para-aramid, is especially, under discussion. The main sourcefor classifying this type of fiber is an animal experiment that has been evaluatedmany times. The latest update is given in T

ABLE

1.

di δi( , ) i, 1 … n, ,=

p Pr tperm tobs≥( ) I+ tperm tobs≥( ) m⁄∑= =

x x1 x2 … xn, ,( , )=di xi δi, ,( ) i, 1 … n, ,= xi δi,( )

p1 p2 … pI< < <

D 2 L L12ln–ln( )⋅=

χ1 90%,2


One of the methods used to analyze this study is logistic regression. Three formsof the dose assignment (dose, log-dose, and index) are applied (see TABLE 2). Thebest fit is obtained in using the index (p = 0.024). However, if dose is used, thehypothesis of an association cannot be accepted (p = 0.053). The level is slightly lessthan significant. The log-dose assignment leads also to significant value (p = 0.04).A similar result was obtained by applying the usual test for trend from Cochran andArmitage.2

Our data were also analyzed by isotonic regression. The observed response ratesalready fulfill the criterion of monotonicity and are therefore the result of isotonicregression. The corresponding test statistic yields a value R = 4.99. The p-value ob-tained from the large sample approximation is slightly above 0.05 (p = 0.058). Thehypothesis H0 cannot be rejected. The exact p-value based on the permutation test(m = 10,000 permutations) also exceeds 0.05 (p = 0.111).

From this analysis one can conclude that there is no statistically significant cor-relation between dose and tumor rate. However, the results are only slightly belowthe significance level. IARC concluded10 that an increased incidence of cystic kera-tinizing squamous-cell carcinomas was reported. The biological significance ofthese lesions is unclear. There is inadequate evidence from experimental animals forthe carcinogenicity of para-aramid fibers.

Silica Dust

The question of whether silica dust itself is carcinogenic is still open. IARC hasclassified crystalline silica as category 1, which means that it is carcinogenic to hu-mans.10 One study that had great impact on this decision is that from Checkoway.11

In order to investigate the dose-response relationship, cumulative exposure has beenclassified into five categories. The lowest category serves as a baseline. The data arepresented in TABLE 3.

TABLE 1. Data from para-aramid study (Ref. 10)

dose[· 106F/m3] 0 2.5 25 100 400

number of tumorsa

aAdenoma, bronchido-alveolar without squamous-cell carcinoma.

1 1 1 4 3

number of animals 137 133 132 137 92

TABLE 2. Results of the analysis of the para-aramid data (see TABLE 1) with thelogistic model and isotonic regression

Dose Assignment

Type of regression dose log(dose + 0.01) index

logistic p-value 0.053 0.040 0.024

isotonic p-value (exact) 0.11


TAB

LE

3.

Res

ult

of is

oton

ic r

egre

ssio

n an

alyz

ing

the

stud

y by

Che

ckow

ay (

Ref

. 11)

of

the

asso

ciat

ion

betw

een

crys

talli

ne s

ilica

and

lung

can

cer

Cum

ulat

ive

expo

sure

lung

can

cer

isot

onic

reg

ress

ion

redu

ced

isot

onic

reg

ress

ion

(mg/

m3

year

s)ob

sex

pS

MR

RR

ln L

(H0)

SM

RR

Rln

L(H

1)S

MR

RR

ln L

(H2)

0–0.

517

15.2

51.

121.

004.

270.

991.

00−0

.18

1.07

1.00

1.09

0.5–

1.1

1411

.73

1.19

1.07

3.52

0.99

1.00

−0.1

41.

071.

000.

90

1.1–

2.1

711

.41

0.61

0.55

1.76

0.99

1.00

−0.0

71.

071.

000.

45

2.1–

5.0

1511

.30

1.33

1.19

3.77

1.33

1.34

4.24

1.07

1.00

0.96

5.0+

2410

.20

2.35

2.11

6.03

2.35

2.38

20.5

32.

352.

2120

.53

sum

7759

.90

1.29

19.3

424

.38

23.9

4

R10

.39

9.21


There is a significant trend toward using Poisson regression.11 The main problemassociated with this type of regression analysis is quantification of dose. In the situ-ation where each category contains a range of exposures, several dose-assignmentoptions are possible. The mean or median of that category, or just the index, are fre-quently used. These can lead to different results.

The expected values are not given in Reference 11. They are estimated accordingto the total number of expected values and the relative risks. In the observed esti-mates for the relative risk, the monotonicity constraint fails between the second (RR= 1.07) and third (RR = 0.55) category. Pooling both categories, an SMR of 0.91 isobserved, which is smaller than the risk in the first category. Therefore, the threelowest categories need to be pooled leading to an SMR of 0.99. The fourth categoryhas an SMR of 1.33. The highest category exhibits an SMR of 2.35 based on 24 cases(see TABLE 3). The likelihood ratio statistic for isotonic regression yields a statisti-cally significant value R = 10.09 ( ; p < 0.01).

In order to assess a threshold value for the first step, the fourth category is amal-gamated with the three lower categories already pooled by isotonic regression. Thevalue of the corresponding likelihood function is reduced to ln L12 = 23.94. The differ-ence between the values of both likelihood functions leads to D = 0.88. This differenceis too small to indicate an increase in the risk from the three lower categories to thefourth category. Therefore, only the fifth category leads to the significant increase inrisk. The difference then shows a value D = 9.20 = 2·(23.94 − 19.34), which is too largeto indicate equal risks. Based on this analysis a threshold value of about 5·mg/m3 canbe assumed.

PNOC

Recently the assessment of a threshold for particles not otherwise classified(PNOC) has been investigated. An analysis of one particular cohort is given below.The data of that cohort are described in TABLE 4 (for details see Ref. 5).

The data have been analyzed by various logistic models leading to different re-sults.5 The result is highly dependent on the form in which the dose is used (linearor log-transformed). Without any transformation, a nonsignificant threshold value of2 mg/m3 is obtained (p = 0.07). Using a log-transformation, a statistically significantthreshold of 3.75 mg/m3 is obtained (p < 0.01).

The same set of data has been analyzed by isotonic regression. To simplify theanalysis the data are grouped into five-year time categories and dust intervals of

p 2 24.38 19.34–( )⋅=

Table 4. Description of smokers in the cohort (PNOC-study)

with CBR without CBR total

sample size 241 (26.2%) 679 (73.8%) 920

median (min-max)

total inhalable dust

concentration (in mg/m3)

4.6

(0.3–12.1)

1.1

(0.2–15)

1.4

(0.2–15)

time since first exposure

(in years)

28

(6–9)

24

(3–5)

23

(3–51)


0.5 mg/m3. Without any covariate the likelihood function has a value of 2 · lnL/H0 =−1058.17. Using time as a covariate, this value is reduced to 2· lnL/H0 = −1004.62.The difference, R = 54.55, indicates a statistically significant influence of time sincefirst exposure.

If time is used without grouping the fit is only slightly better. Taking into accountthe possible influence from dust, the value of the likelihood function further increas-es to . The corresponding likelihood ratio statistic Ryields a value of R = 50.54 indicating a statistically significant influence from dustbased on the permutation test (p < 0.01). The result of the isotonic regression depict-ed in FIGURE 1.

In order to assess a threshold value, the dose groups are pooled starting with thetwo lowest groups. In the first step, the likelihood function is changed to −954.85(see TABLE 5). The difference of D = 0.77, is too small to indicate a significant in-crease in risk. The change in risk over the time categories 10–15 and 15–20 years iseither too small or the number of individuals in that subgroups is too low. Poolingthe next dose group with the two lowest groups, the difference of D = 0.86 again istoo small to indicate a threshold. In the next four dose groups up to 3 mg/m3 there isno change in the risk. Therefore, the value of the likelihood function remains thesame. Pooling the next three groups (up to 4.5 mg/m3) the difference gives a valueD = 0.26. If the next dose group (4.5–5 mg/m3) is aggregated the difference attainsa value D = 7.68. This difference is large enough to indicate a significant increase inrisk from that dose and beyond. Therefore, based on isotonic regression, a thresholdlevel of 4.5 mg/m3 is obtained.

2 L time, dust( )ln⋅ 954.08–=

10

20

30

40

50

years since first exposure

2

4

6

8

total dust (mg/m3)

00.

20.

40.

60.

81

P(C

BR

)

FIGURE 1. Isotonic regression analysis of the data presented in TABLE 4.


DISCUSSION

The analysis of a dose-response relationship, as well as the existence and valueof a threshold, are of great importance in occupational epidemiology. These resultsare essential to the safety of the workforce and to the economic aspects. Several sta-tistical methods are available for these analyses. However, the various methods canlead to different results. Some of the models may indicate a significant dose-response relationship, whereas others may fail to do so. The outcome of the analysiscan depend on several aspects (form of the relationship and transformation of cova-riates). The isotonic regression method is independent of these assumptions. Theonly requirement is monotonicity. However, this seems to be of no limitation, sinceall the parametric models imply this assumption.

There are at least two problems related to isotonic regression. The first concernsthe appropriate test statistics. The likelihood ratio statistic, R, can be used. Howeverthe distribution of R under H0 is known only for large samples and with an equalnumber of observations in each of the various subgroups. This problem can be solvedby applying a permutation test. By performing a large number of permutations an ad-equate p-value is obtained. The other problem is related to the number of covariatesused in the analysis. The program presently available can handle up to only two co-variates. For more variables an additive model needs to be applied.

Isotonic regression can also be used to assess a threshold value. The main prob-lem here is again related to the appropriate test statistics. In order to give a reason-able solution to that problem, the χ2-distribution is proposed. However, the questionconcerns the number of degrees of freedom. In the examples presented, it was pos-sible to assess a threshold value. More work remains to be done in order to use thismethod.

TABLE 5. Result of the isotonic regression on the data from TABLE 4.

−2· lnL

no covariate 1058.17

time 1004.62

time + dust 954.08

aggregating dust—categories (mg/m3)

−1.0 954.85

−1.5 955.71

−2.0 955.71

−2.5 955.71

−3.0 955.71

−3.5 955.71

−4.0 955.97

−4.5 955.97

−5.0 963.65


REFERENCES

1. CHUANG-STEIN, C. & A. AGRESTI. 1997. Tutorial in biostatistics: a review of tests fordetecting a monotone dose-response relationship with ordinal response data. Statis-tics in Medicine 16: 2599–2618.

2. ULM, K., F. DANNEGGER & U. BECKER. 1998. Test on trends in binary response. Dis-cussion paper 115, SFB 386. University of Munich.

3. ULM, K. 1999. A statistical method for assessing a threshold in epidemiological stud-ies. Statistics in Medicine 10: 341–349.

4. ULM, K. 1991. On the estimation of threshold values (correspondence). Biometrics 45:1324–1326.

5. KÜCHENHOFF, H. & K. ULM. 1997. Comparison of statistical methods for assessingthreshold limiting values in occupational epidemiology. Computational Statistics 12:249–264.

6. ROBERTSON, T., F.T. WRIGHT & R.L. DYKSTRA. 1988. Order Restricted StatisticalInference. J. Wiley, New York.

7. DYKSTRA, R.L. & T. ROBERTSON. 1982. An algorithm for isotonic regression tests onmultinomial and Poisson parameters: the sharpened restriction. Ann. Statistics 10:1246–1252.

8. BACCHETTI, P. 1989. Additive isotonic models. J. Am. Stat. Assoc. 84: 289–294.9. SCHELL, M.J. & B. SINGH. 1997. The Reduced Monotonic Regression Method. J. Am.

Stat. Assoc. 92: 128–135.10. IARC-MONOGRAPHS. Silica, Some silicates, coal dust and par-aramid fibrils. World

Health Organization. International Agency for Research on Cancer.11. CHECKOWAY, H., N.J. HEYER, N.S. SEIXAS et al. 1997 Dose-response associations of

silica with nonmalignant respiratory disease and lung cancer mortality in the diato-maceous earth industry. Am. J. Epidemiol. 145: 680–688.

Documents

Nonparametric Analysis of Dose-Response Relationships