Objective assessment of image quality. III. ROC metrics, ideal observers, and likelihood-generating functions

1520 J. Opt. Soc. Am. A/Vol. 15, No. 6 /June 1998 Barrett et al.

Objective assessment of image quality. III. ROCmetrics, ideal observers,

and likelihood-generating functions

Harrison H. Barrett, Craig K. Abbey, and Eric Clarkson

Department of Radiology, Optical Sciences Center and Program in Applied Mathematics, University of Arizona,Tucson, Arizona 85724 5067

Received October 30, 1997; accepted January 22, 1998; revised manuscript received February 18, 1998

We continue the theme of previous papers [J. Opt. Soc. Am. A 7, 1266 (1990); 12, 834 (1995)] on objective (task-based) assessment of image quality. We concentrate on signal-detection tasks and figures of merit related tothe ROC (receiver operating characteristic) curve. Many different expressions for the area under an ROCcurve (AUC) are derived for an arbitrary discriminant function, with different assumptions on what informa-tion about the discriminant function is available. In particular, it is shown that AUC can be expressed by aprincipal-value integral that involves the characteristic functions of the discriminant. Then the discussion isspecialized to the ideal observer, defined as one who uses the likelihood ratio (or some monotonic transforma-tion of it, such as its logarithm) as the discriminant function. The properties of the ideal observer are exam-ined from first principles. Several strong constraints on the moments of the likelihood ratio or the log likeli-hood are derived, and it is shown that the probability density functions for these test statistics are intimatelyrelated. In particular, some surprising results are presented for the case in which the log likelihood is nor-mally distributed under one hypothesis. To unify these considerations, a new quantity called the likelihood-generating function is defined. It is shown that all moments of both the likelihood and the log likelihood un-der both hypotheses can be derived from this one function. Moreover, the AUC can be expressed, to anexcellent approximation, in terms of the likelihood-generating function evaluated at the origin. This expres-sion is the leading term in an asymptotic expansion of the AUC; it is exact whenever the likelihood-generatingfunction behaves linearly near the origin. It is also shown that the likelihood-generating function at the ori-gin sets a lower bound on the AUC in all cases. © 1998 Optical Society of America [S0740-3232(98)02106-1]

OCIS codes: 110.3000, 110.4280, 110.5010.

1. INTRODUCTIONThis is the third in a series of papers1,2 on the theoreticalbasis of objective assessment of image quality (OAIQ).Fundamental to this theory is the idea that image qualitymust be defined by the performance of some observer onsome specific task. The tasks of interest can be broadlycategorized as classification and estimation. An impor-tant special case of classification is binary hypothesistesting. In image analysis it is often necessary to detectsome object or abnormality, and the two hypotheses canbe described as signal absent and signal present; in thatcase the task is called signal detection.

Previous papers in this series have considered signal-detection tasks as one possible basis for a definition of im-age quality. The tasks have, in principle, allowed for adegree of randomness in both signal and background, butthe theory has been restricted to linear observers, i.e.,ones that calculate a discriminant function that is a lin-ear functional of the data. It is well known that linearobservers are suboptimal for detecting random signals, sothe theory presented previously does not accurately re-flect the maximum achievable performance on such detec-tion tasks. The figure of merit used was a signal-to-noiseratio (SNR) based on first- and second-order statistics ofthe discriminant function, and there was little discussionof how it related to other possible figures of merit.

0740-3232/98/061520-16$15.00 ©

This paper also deals with signal detection and otherbinary hypothesis testing, but the performance assess-ment is based on a construct from radar theory called thereceiver operating characteristic, or ROC curve. TheROC curve depicts the trade-off between the probabilityof detection (true-positive fraction) and the false-alarmrate (false-positive fraction) as the decision threshold isvaried. The area under the ROC curve (AUC) is often ad-vocated as an overall figure of merit for the task.

For a normally distributed discriminant function, therelation between the AUC and the SNR used in previouspapers in the series is well understood, but real-world im-ages are seldom normal, and even if we model the imagestatistics by a multivariate normal (Gaussian) density,with nonlinear discriminants the normality may be lost.

It is well known, and shown in Section 2, that the idealdiscriminant function is the likelihood ratio. Sincemonotonic transformations of the discriminant functiondo not change a decision, the logarithm of the likelihoodratio, or log likelihood for short, can also be used. A de-cision strategy based on the likelihood or log likelihood isreferred to as the ideal observer. It sets an upper limit tothe performance of any observer on a particular detectiontask and thus gives a measure of the quality of imagedata that is uncomplicated by consideration of the limita-tions of humans or other suboptimal observers. On the

1998 Optical Society of America

Barrett et al. Vol. 15, No. 6 /June 1998 /J. Opt. Soc. Am. A 1521

other hand, the ideal observer must perform nonlinearoperations on the image data in almost all cases, so it hasbeen difficult to compute its performance.

In Section 3 we develop a general theory of the ROCcurve that is applicable to an arbitrary linear or nonlineardiscriminant function. We derive several new expres-sions for the AUC, depending on what is known about thestatistical properties of the discriminant function.

Then in Section 4 we apply this theory to the ideal ob-server. First we examine the statistical properties of thelog likelihood and demonstrate that there are strong con-straints on the form of its probability density functionsunder the two hypotheses. This leads to definition of thelikelihood-generating function, a function of one scalarvariable from which all moments of both the likelihoodand the log likelihood can be derived under both hypoth-eses. Moreover, the value of the likelihood-generatingfunction at the origin will be shown to play a key role indetermining the AUC.

2. BACKGROUNDA. Image DataA digital image consists of a set of M real numbers, oftencalled gray levels or pixel values. We can arrange thesevalues as an M 3 1 column vector g, with the mth com-ponent being the gray level associated with pixel m. Weregard each gm as a continuous random variable and thusg as a continuous random vector. In practice, of course,the gray levels are quantized by analog-to-digital conver-sion or the discrete number of photons contributing to apixel value, but these effects are not treated here. In ad-dition, real gray levels may be restricted to positive val-ues, since they correspond to irradiances, but it is never-theless convenient to take the range of each gm as(2`, `); the positivity restriction is then contained inthe probability density function. We also assume thatthe sum of the squares of the elements of g is finite, soeach image is a vector in an M-dimensional Euclideanspace, which we refer to as data space.

An image is related to an object by the action of an im-aging system, which can include image-forming elements,image sensors, and image-reconstruction algorithms orother computer processing. We do not consider any ofthese items here in detail. The viewpoint is that the im-aging system delivers an image vector g, which we shalluse to perform a specific task, and we need only to knowthe statistical properties of g to assess the quality of theimaging system.

We have already begun to describe these statisticalproperties in detail in the first paper in this series (Ref. 1,designated OAIQ I), which dealt with measurement noiseand object variability, and in two papers on statisticalproperties of the expectation–maximization algorithm.3,4

We shall continue the process in a later paper with a dis-cussion of more general nonlinear reconstruction algo-rithms and their effects on image quality.

B. Binary Decisions and ROC CurvesFor each image we must decide between two hypotheses,H0 and H1 . For definiteness we refer to these hypoth-eses as, respectively, signal absent and signal present,

but the mathematics applies as well to discrimination be-tween two signals or two object classes. Moreover, weshall allow wide latitude in what is considered to be a sig-nal; the signal is whatever component distinguishes H1from H0 .

Initially we impose only two restrictions on the decisionstrategy: it must be nonrandom (a particular imagemust always lead to the same decision), and it must notallow equivocation (every image must lead to some deci-sion, either signal present or signal absent). These con-ditions imply that decision making is equivalent to parti-tioning the data space. For all data vectors g in onesubspace G1 the decision will be that the signal is present,and for its orthogonal complement G0 the decision will besignal absent. Devising a decision strategy is equivalentto defining G1 .5

The decision strategy can also be expressed as a two-step process: first compute a discriminant functiont 5 u (g); then compare it to a threshold x. If t > x,choose hypothesis H1 ; otherwise choose H0 . This pro-cess is equivalent to partitioning data space, since thecontours u (g) 5 x are the surfaces dividing G0 from G1 .In this view the decision strategy amounts to computing a(generally nonlinear) discriminant function u (g) and com-paring it to a threshold.

There are four possible outcomes for each individualdecision. If the decision is signal present and it really ispresent, the decision is a true positive (TP), while a deci-sion of signal present for an image with no signal is afalse positive (FP). The conditional probability of a posi-tive decision, given that the signal is actually present, iscalled the true-positive fraction, or TPF. In the medicalliterature, TPF is called sensitivity; in the radar litera-ture, it is the probability of detection.

True negatives (TN) and false negatives (FN) and asso-ciated fractions (TNF and FNF) are defined similarly. Inradar, FPF is called the false-alarm rate. In medicine,TNF (which is the same as 1 2 FPF) is called the speci-ficity.

The TPF at threshold x is given by

TPF~x ! 5 Pr~t > xuH1! 5 Ex

`

dtpr~tuH1!, (2.1)

where Pr (t > xuH1) is the probability that t > x andpr (tuH1) is the probability density function of the con-tinuous random variable t; both of these quantities areconditional on hypothesis H1 being true or on the signalactually being present. In general, we use Pr (•) for prob-abilities and pr (•) for probability density functions,though other notations are also introduced for the latteras needed.

The FPF at threshold x is given by

FPF~x ! 5 Pr~t > xuH0! 5 Ex

`

dtpr~tuH0!. (2.2)

The threshold x controls the trade-off between TPF andFPF. Graphically, this trade-off is portrayed by the ROCcurve, which is a plot of TPF(x) versus FPF(x).


C. Optimal DecisionsMost approaches to decision theory start by defining acost or loss function. For a binary decision problem, thecost is a 2 3 2 matrix C, where element Cij (i, j 5 0 or 1)is the cost of making decision Di when hypothesis Hj istrue. Positive costs will usually be assigned to incorrectdecisions and zero or negative costs to correct ones.

The risk is the average value of the cost, but Bayesianand frequentist approaches to decision theory5–7 differ inhow this average is computed. The frequentist would av-erage over many realizations of the data for the same truehypothesis, thereby getting separate risks for the signal-absent and signal-present conditions. The pure Baye-sian, on the other hand, would average with respect to(possibly subjective) prior probabilities on the hypotheses,saying nothing about other realizations of the data. Thisprocedure leads to the Bayesian expected loss,6 which is afunction of g.

A hybrid approach, used in many books and adoptedhere, is to average over prior probabilities and data, byuse of both Pr (Hj) and pr (guHj). These probabilities areintended to be objective ones, readily interpretable in asampling sense. To interpret Pr (H1), for example, weenvision repeated experiments in which the signal ispresent in this fraction of the repetitions. Devout Baye-sians often object to this interpretation, but it is justifiedhere since we can actually do such experiments, undercontrolled conditions with specified frequencies of occur-rence. Such experiments would form the basis for an em-pirical measure of image quality, and the quantities com-puted here should be regarded as long-run, frequentistaverages of those empirical measures. In fact, the ROCcurve itself is a frequentist concept, even when it appliesto a discriminant function derived from a Bayesian view-point. The ROC curve is a way of keeping score over re-peated trials, and real Bayesians do not keep score.

The double averaging yields a quantity often called theBayes risk, which can be viewed as a frequentist averageof the Bayesian expected loss. The Bayes risk is given by

R 5 (i50

1

(j50

1

CijPr~Di , Hj!

5 (i50

1

(j50

1

CijPr~DiuHj!Pr~Hj!. (2.3)

Since decision Di is a certainty for g in region G i , theBayes risk can also be written as

R 5 (i50

1

(j50

1

CijPr~Hj!EGi

dMgpr ~guHj!. (2.4)

The objective is to minimize R through choice of G0 andG1 .

Following van Trees,5 we rewrite Eq. (2.4) as an inte-gral over G1 alone. Since the decision rule does not allowequivocation, G0 and G1 together constitute all of the dataspace, and we must have

EG0

dMgpr ~guHj! 1 EG1

dMgpr ~guHj! 5 1. (2.5)

With this condition and a little algebra, Eq. (2.4) becomes

R 5 C01Pr~H1! 1 C00Pr~H0!

1 EG1

dMgF C11Pr~H1!pr~guH1!

1 C10Pr~H0!pr~guH0! 2 C01Pr~H1!pr~guH1!

2 C00Pr~H0!pr~guH0!G . (2.6)

The first two terms are constants, independent of g, andhence do not affect the decision strategy. To minimizethe integral, we must choose G1 to include all portions ofdata space for which the integrand is negative and ex-clude those for which it is positive. A point g is then inG1 if

~C00 2 C10!Pr ~H0!pr~guH0!

. ~C11 2 C01!pr ~guH1!Pr~H1!. (2.7)

In other words, we make decision D1 if 8

pr ~guH1!

pr ~guH0!.

~C10 2 C00!Pr ~H0!

~C01 2 C11!Pr ~H1!. (2.8)

The left-hand side of this relation is the likelihood ratio,defined by

L~g! 5pr~guH1!

pr~guH0!. (2.9)

Thus the optimal discriminant function is the likelihoodratio, and the optimal threshold is

x 5~C10 2 C00!Pr ~H0!

~C01 2 C11!Pr ~H1!. (2.10)

We refer to any decision strategy based on computingthe likelihood ratio and comparing it to a threshold as theideal observer. Equivalently, the ideal observer can com-pute the logarithm of L, called the log likelihood l, andcompare it to the logarithm of the threshold. Since thelogarithm is a monotonic function, exactly the same deci-sion is reached with either L or l, and the latter is fre-quently computationally easier.

3. FIGURES OF MERITA. General Considerations and DefinitionsThe Bayes risk is one possible figure of merit for binarydecision problems. If the ideal observer, using data fromsystem A, achieves a lower risk on a particular problemthan it would using data from system B, system A can beranked higher, at least for this task and observer. Therisk value is, however, difficult to interpret. There are nogenerally accepted rules for defining costs, or even anystandard units of measure for them.

Moreover, Bayes risk is seldom useful for suboptimalobservers. For example, linear discriminants are fre-quently computationally tractable9 when we do not havesufficient information to compute the Bayesian discrimi-nant function or the Bayes risk. Another rationale forcertain linear observers is that they may be good models


of human observers.10 In this case the utility of the ob-server must be assessed by its correlation with psycho-physical data.

An alternative figure of merit, applicable to both opti-mal and suboptimal observers, is the TPF at some speci-fied FPF (or, in radar terminology, the probability of de-tection at a specified false-alarm rate). This figure ofmerit is called the Neyman–Pearson criterion. Its com-putation requires knowledge of the probability law for thediscriminant function under the two hypotheses but notpriors or costs.

Both the Bayes risk and TPF at a specified FPF dependnot only on the task and the quality of the data but alsoon the chosen operating point on the ROC curve. The op-erating point, however, is fairly arbitrary; different usersof the image data will assign different costs and priorsand hence use different operating points. For this rea-son, many workers in signal detection and image qualityadvocate using the entire ROC curve as the quality met-ric. A common scalar figure of merit is the area underthe ROC curve, denoted AUC. AUC varies from 0.5 for aworthless system to 1.0 for a system that allows the taskto be performed perfectly.

Since the likelihood ratio is the optimum discriminantfunction for any particular choice of operating point onthe ROC curve, it gives the maximum TPF at any speci-fied FPF, and it can also be shown11 to maximize the areaunder the ROC curve. Thus the likelihood observer isideal with respect to each of these figures of merit.

Another figure of merit, which can be defined for anydiscriminant function t, is the SNR, given by

SNR t2 [

@^t&1 2 ^t&0#2

12 var1~t ! 1

12 var0~t !

, (3.1)

where ^t& j denotes the conditional expectation of t, giventhat Hj is true and varj ( • ) denotes the correspondingconditional variance.

If t is normally distributed under both hypotheses, it iswell known (and shown in Subsection 3.D) that SNR t isrelated to AUC by

AUC 512

112

erf S SNRt

2 D , (3.2)

where erf( • ) is the error function. Thus in this casethere is a simple monotonic relation between AUC andSNR t , so it does not matter which we adopt as a figure ofmerit.

We can define another SNR simply by inverting Eq.(3.2):

SNR(AUC) [ 2 erf 21~2 AUC 2 1 !, (3.3)

where erf21( • ) is the inverse of the error function. Inthe literature, SNR(AUC) is often referred to as da or dA .Of course, knowledge of SNR(AUC) allows exact compu-tation of AUC, but it is difficult to compute SNR(AUC)from first principles without normality assumptions. Onthe other hand, the results of psychophysical studies ofhuman-observer performance are almost universally re-ported in terms of AUC or SNR(AUC), so methods of com-

puting these metrics for the ideal observer are needed ifwe wish to see how nearly the human approximates theideal.

In the remainder of this section we consider variousways of computing the AUC, depending on what knowl-edge we have of the statistics of the problem.

B. Discriminant Function with Known Probability LawSuppose that we are given a discriminant function t5 u (g) and that we know its densities pr(tuH0) andpr(tuH1). For notational convenience we define

pr ~tuHj! [ pj~t !. (3.4)

The area under the ROC curve is given by

AUC 5 E0

1

TPFd(FPF), (3.5)

where TPF(x) and FPF(x) are given by Eqs. (2.1) and(2.2), respectively. Since FPF is a monotonic function ofx, we can change the variable of integration from FPF(x)to x, thus obtaining

AUC 5 2E2`

`

d x TPF~x !d

d xFPF~x ! (3.6)

where the minus sign arises, since FPF(x) → 1 asx → 2`.

From Eq. (2.2) and Leibniz’s rule, we have

dd x

FPF~x ! 5 2p0~x !, (3.7)

so

AUC 5 E2`

`

d xp0~x !Ex

`

dtp1~t !. (3.8)

There are various ways of rewriting this expression. Oneis to recognize that the cumulative probability distribu-tion function of t under H1 is given by

P1~x ! [ Pr ~t , xuH1! 5 E2`

x

dtp1~t ! 5 1 2 Ex

`

dtp1~t !.

(3.9)

Thus

AUC 5 1 2 E2`

`

dx p0~x !P1~x !. (3.10)

Another form for AUC is obtained by use of the step func-tion to rewrite Eq. (3.8) as

AUC 5 E2`

`

d xE2`

`

d t p0~x !p1~t !step~t 2 x !.

(3.11)

With a change of variables y 5 t 2 x, we obtain

AUC 5 E2`

`

d xE2`

`

d yp0~x !p1~y 1 x !step~ y !

5 E0

`

d y@ p0 ^ p1#~ y !, (3.12)


where ^ denotes a one-dimensional correlation integral.Computation of AUC by this formula thus requires crosscorrelating p0 and p1 and then integrating the resultfrom 0 to `.

We can also use the signum function, related to thestep by

step~x ! 512 1

12 sgn~x !, (3.13)

so Eq. (3.11) becomes

AUC 512

112 E

2`

`

dxE2`

`

dtp0~x !p1~t !sgn~t 2 x !.

(3.14)

The step function can be expressed in terms of its Fouriertransform,

F $step~x !% 512

d~j! 1 P H 12pijJ , (3.15)

from which we easily find

step~x ! 512

11

2piP E

2`

` dj

jexp~2pij x !, (3.16)

where P indicates that the singular integral must be in-terpreted as a Cauchy principal value. With Eq. (3.16),Eq. (3.11) becomes

AUC 512

11

2piP E

2`

` dj

j E2`

`

dxE2`

`

dtp0~x !

3 p1~t !exp@2pij~t 2 x !#

512

11

2piP E

2`

` dj

jc0~j!c1* ~j!, (3.17)

where c j(j) is the characteristic function for t under hy-pothesis Hj . Specifically,

c j~j! [ êxp~22pijt !& j

5 E2`

`

dtpj~t !exp~22pijt !

5 F $ pj~t !%. (3.18)

C. Discriminant Function with Known MomentsIn the derivation in Subsection 3.B we assumed that thedensities pj (t) were known, but we rarely have that muchinformation about a test statistic. Suppose now that weknow only a few low-order moments of t under the two hy-potheses.

The moments can, of course, be related to derivatives ofthe characteristic function, but it is somewhat more con-venient to use the moment-generating function, definedby

Mj~b! 5 êxp~bt !& j 5 E2`

`

dtpj~t !exp~bt ! 5 c j S ib2p D .

(3.19)

If we think of j and b as complex variables, the functionsc j (j) and Mj (b) are related by a 90° rotation in the com-plex plane and a scaling of the argument by 2p.

Except for an unconventional sign in the exponent,Mj (b) is the two-sided Laplace transform12 of pj(t). Theintegral in Eq. (3.19) converges for complex b in a stripparallel to and including the imaginary axis. The strip isdefined by 2c1 , Re b , c2 , where Re denotes real partand c1 and c2 are some positive constants. In AppendixA we show by use of the Cauchy–Riemann conditions thatMj (b) is an analytic function in this strip.

A related function is the cumulant-generating function,which is simply the logarithm of the moment-generatingfunction:

Lj ~b! 5 log@Mj~b!#. (3.20)

Since Mj (b) is analytic at the origin, so is its logarithm,and Lj (b) can therefore be expanded in a Taylor seriesabout the origin (Maclaurin series) as

Lj ~b! 5 (n50

` 1

n!Lj

~n !~0 !bn, (3.21)

where Lj(n) (b) denotes the nth derivative of Lj(b). De-

rivatives of Lj (b) (known as cumulants) are related to de-rivatives of Mj (b), which in turn are related to momentsof t by derivatives of the moment-generating function:

^tn& j 5 M ~n !~0 !. (3.22)

The first five coefficients in Eq. (3.21) are

Lj~0 !~0 ! 5 log @Mj~0 !# 5 0, (3.23a)

Lj~1 !~0 ! 5

Mj~1 !~0 !

Mj~0 !5 ^t& j [ t j, (3.23b)

Lj~2 !~0 ! 5 ^~t 2 t j!

2& j [ s j2, (3.23c)

Lj~3 !~0 ! 5 ^~t 2 t j!

3& j [ s j3Sj , (3.23d)

Lj~4 !~0 ! 5 @^~t 2 t j !4& j 2 3^~t 2 t j!

2& j2# [ s j

4Kj .

(3.23e)

Here t j, s j2, Sj , and Kj are, respectively, the mean, vari-

ance, skewness, and kurtosis of pj(t). Different defini-tions of kurtosis appear in the literature, but with the oneused here, K 5 0 for a Gaussian.

With these moments, c j(j) can be written as

c j~j! 5 Mj~22pij! 5 expF22pi t jj 2 2p2s j2j2

1 i4p3

3s j

3Sjj3 1

2p4

3s j

4Kjj4 1 . . . G .

(3.24)

From Eqs. (3.17) and (3.24), AUC is given by

AUC 512

11

2pP E

2`

` dj

j

3 sinF2p~ t1 2 t0!j 24p3

3~s1

3S1

2 s03S0!j3 1 ¯Gexp@22p2~s0

2 1 s12!j2

1 ~2p4/3!~s14K1 1 s0

4K0!j4 1 . . . #.

(3.25)


Care must be used in truncating this expansion. Forexample, suppose that the term in j 4 has a negative coef-ficient but the one in j 6 has a positive coefficient; then theintegral would converge if terms through j 4 were re-tained but not with terms through j 6.

D. Univariate Normal StatisticsIf p0(t) and p1(t) are both univariate normal (not neces-sarily with the same variance), then the skewness andkurtosis of each are zero, and all higher terms in the ex-pansion (3.24) vanish identically. In that case, themoment-generating and characteristic functions aregiven by, respectively,

Mj~b! 5 exp~tjb 112 s j

2b2!, (3.26)

c j~j! 5 exp~22pitjj 2 2p2s j2j2!. (3.27)

Note that c j (j) falls off rapidly as Re j→ 6`, but Mj(b)blows up as Re b→ 6`.

From Eq. (3.25), we now have

AUC 512

11

2pP E

2`

` dj

jsin@2p~t1 2 t0!j#

3 exp@22p2~s02 1 s1

2!j2#. (3.28)

The principal value can be implemented as a limit:

AUC 51

21

1

2plim

e → 0E

2`

`

djj

j2 1 esin@2p~t1 2 t0!j#

exp@22p2~s02 1 s1

2!j2#. (3.29)

A tabulated integral13 then yields the error-function rela-tion, Eq. (3.2).

E. Arbitrary Test Statistic, Unknown Probability LawWhen t is a complicated function of g, we may know nei-ther its densities nor its moments. We may, however,know pr(guHj) from the basic physics of the image-forming process and from knowledge of the signal andbackground.14 In those cases we can express AUC interms of integrals over g rather than over t.

Even though t is a specific function u (g), it is useful toregard this function as a probabilistic mapping. As a for-mal device15 we can write

pr ~tug! 5 d @t 2 u~g!#, (3.30)

so that

pr~tuHj! 5 E dMgpr~tug!pr~guHj!

5 E dMgpr~guHj!d @t 2 u~g!#. (3.31)

A shorthand form for this expression is obtained if we usethe notation of Eq. (3.4) and also define

qj~g! [ pr~guHj!. (3.32)

Then Eq. (3.31) is

pj~t ! 5 E dMgqj~g!d @t 2 u~g!#. (3.33)

The delta function defines an (M 2 1)-dimensional sur-face in the M-dimensional data space; all points on thissurface have u (g) 5 t and hence contribute to the prob-ability density on t at the same t.

From Eqs. (3.11) and (3.33) we have

AUC 5 E2`

`

dxE2`

`

dt E dMgq0~g!d @x 2 u~g!#

3 E dMg8q1~g8!d @t 2 u ~g8!#step~t 2 x ! .

(3.34)

The delta functions allow us to perform the integrals overt and x, with the result

AUC 5 E dMg E dMg8q0~g!q1~g8!step@u~g8! 2 u~g!#.

(3.35)

This expression demonstrates that AUC is unchangedby a monotonic point transformation. If we replace u (g)with U(g) 5 h@u (g)#, where h(x) is a monotonically in-creasing function, then the step function remains un-changed: if step @u (g8) 2 u (g)# 5 1 for some set of val-ues of g8 and g, then step@U(g8) 2 U(g)# 5 1 forprecisely this same set.

As in Subsection 3.B, we can represent the step func-tion by means of its Fourier transform and obtain

AUC 512

11

2piP E

2`

` dj

j E dMg E dMg8q0~g!

3 q1~g8!exp$2pij@u ~g8! 2 u ~g!#%

512

11

2piP E

2`

` dj

jêxp@22piju ~g!#&0

3 êxp@2piju ~g8!#&1

512

11

2piP E

2`

` dj

jc0~j!c1* ~j!. (3.36)

The final form of Eq. (3.36) is identical to Eq. (3.17); wehave taken advantage of the fact that the expectationscan be computed from either the probability density on gor the one on u (g).

F. Two-Alternative Forced-Choice InterpretationsEquations (3.11) and (3.35) can be interpreted in terms oftwo-alternative forced-choice (2AFC) experiments. In a2AFC experiment, two independent data vectors g and g8are generated, with g drawn from pr(guH0) and g8 drawnfrom pr (g8uH1). Two test statistics u (g) and u (g8) arecomputed, and the data vector that gives the higher valueis assigned to H1 . This assignment is correct if u(g8). u (g). Thus the probability of a correct decision is

Pr(correct)5Pr @u~g8! . u ~g!#

5 E dMg E dMg8q0~g!q1~g8!step@u ~g8! 2 u ~g!#,

(3.37)

which, by Eq. (3.35), is AUC.


A similar interpretation applies to Eq. (3.11). If we de-note the test statistics by x 5 u (g) and t 5 u (g8), the2AFC decision is correct if t . x and Eq. (3.11) gives theprobability of this event occurring.

G. Linear DiscriminantsNow suppose that t 5 u (g) is a linear function of thedata, so we can write it as

u ~g! 5 w tg, (3.38)

where w is an M 3 1 vector and w tg is the scalar productof w and g.

With this form of the test statistic, Eq. (3.35) becomes

AUClin 5 E dMg E dMg8q0~g!q1~g8!step@wt~g8 2 g!#.

(3.39)

The change of variables g 9 5 g8 2 g yields

AUClin 5 E dMg9 E dMgq0~g!q1~g 1 g 9!step~wtg 9!

5 E dMg9@q0 ^ q1#~g 9!step~wtg 9!, (3.40)

where @q0 ^ q1#(g 9) denotes a multidimensional cross-correlation integral with shift g 9. This equation showsthat the AUC can be found by cross correlating q0 and q1and then integrating the result over the half-space w tg. 0.

The similarity in form between Eqs. (3.12) and (3.40)should be noted; Eq. (3.12) holds for an arbitrary discrimi-nant function (but requires the probability densities forthat function), while Eq. (3.40) holds specifically for a lin-ear discriminant and requires knowledge of the data den-sities.

With a linear discriminant, we can also relate AUC tothe multivariate characteristic functions for g, defined by

C j~r! 5 E dMgp~guHj!exp~22pir tg!, (3.41)

where j 5 0 or 1 and r is an M-dimensional frequencyvector conjugate to the data vector g.

From the definition of c j(j) in Eq. (3.18), along withEq. (3.38), we have

c j~j! 5 E dMgp~guHj!exp~22pijwtg! 5 C j~wj!,

(3.42)

so that Eq. (3.36) becomes

AUClin 512

11

2piP E

2`

` dj

jC0~wj!C1* ~wj!.

(3.43)

Thus all we need to know to compute AUC for a lineardiscriminant is the behavior of the characteristic func-tions of the data along a line through the origin and par-allel to w in the M-dimensional Fourier space. Thisstatement is equivalent to saying that all we need are in-tegrals of the data densities q0 and q1 over(M 2 1)-dimensional hyperplanes normal to w. With

nonlinear discriminants we need integrals over(M 2 1)-dimensional surfaces defined by u (g)5 constant.

In practice, however, we can often get by with even lessinformation for computing AUClin. A linear discriminantw tg is univariate normal if g is multivariate normal, soSNR t is easily computed and the conditions for the use ofEq. (3.2) are exactly satisfied. Moreover, even if g is notnormally distributed, we can often appeal to the central-limit theorem and show that w tg is approximatelyunivariate normal anyway. In summary, it is usuallysafe to compute AUC from SNR by Eq. (3.2) for a lineardiscriminant.

4. IDEAL OBSERVERAs we saw in Section 2, the discriminant function for theideal observer can be either the likelihood ratio L(g) or itslogarithm l(g). To compute the ideal-observer AUC withthe formalism of Section 3, we need the probability den-sity functions of either L or l under both hypotheses.Equivalently, we can also get AUC from the characteristicfunctions or moment-generating functions. If these exactspecifications prove difficult to get, we can attempt tocompute some low-order moments of L or l and relatethem approximately to AUC.

With all of these approaches, the ideal observer differsfundamentally from other observers because its discrimi-nant function (whether L or l) already contains all of therelevant statistical information about the task. As weshall see, this fact imposes strong constraints on theforms of the densities, moments, or other statistical de-scriptors.

We shall illustrate these points first in Subsection 4.Awith respect to moments and moment-generating func-tions and then in Subsection 4.B for the probability den-sity functions. A particularly strong constraint willemerge when we consider normal log likelihoods in Sub-section 4.C.

A. Moments of L and lThe likelihood ratio is a ratio of two densities q1(g) andq0(g), and the same two densities are the ones needed tocompute moments of L under two hypotheses. It followsat once that the moments under H0 are related to thoseunder H1 by

^Lk11&0 5 E dMgq0~g!Fq1~g!

q0~g!Gk11

5 E dMgq1~g!Fq1~g!

q0~g!Gk

5 ^Lk&1 . (4.1)

In particular, the mean of L under H0 is always 1, since

^L&0 5 ^L0&1 5 E dMgq0~g!q1~g!

q0~g!5 E dMgq1~g! 5 1,

(4.2)

and the variance of L under H0 is easily expressed interms of the mean under H1 :

var0~L! 5 ^L2&0 2 ^L&02 5 ^L&1 2 1. (4.3)


Moreover, since L 5 el, we can rewrite Eq. (4.1) as

êxp@~k 1 1 !l#&0 5 êxp~kl!&1 . (4.4)

Since Eq. (4.4) holds for arbitrary (even complex) k, wesee that

M0~b 1 1 ! 5 M1~b!, (4.5)

where Mj (b) denotes the moment-generating function forl under Hj . In fact, this property of the moment-generating functions holds only for log-likelihood ratios.Swensson and Green16 showed that there is no other sta-tistic whose moment-generating functions satisfy Eq.(4.5).

From Eqs. (4.5) and (3.19), the corresponding relationfor characteristic functions is

c0S j 1i

2p D 5 c1~j!. (4.6)

Notice that M0(b) can be used to generate moments ofboth l and L under both hypotheses. From Eq. (3.22)with t 5 l we have

^lk&0 5 M 0~k !

~0 !, (4.7)

and with Eq. (4.5),

^lk&1 5 M1~k !~0 ! 5 M0

~k !~1 !. (4.8)

Moments of L are found from M0(b) even more simply;moments under H0 are given by

^Lk&0 5 êxp~kl!&0 5 M0~k ! (4.9)

and under H1 by

^Lk&1 5 êxp@~k 1 1 !l#&0 5 M0~k 1 1 !. (4.10)

B. Probability Density FunctionsSince a probability density function is uniquely deter-mined by the corresponding characteristic function,17 Eq.(4.6) implies that there is a relation between p0(l) andp1(l). To derive this relation, we write

p1~l! 5 F 21$c1~j!% 5 E2`

`

djc0S j 1i

2p D exp~2pijl!

5 elE2`1i/2p

`1i/2p

dzc0~z !exp~2pizl!, (4.11)

where z 5 j 1 i/(2p). If c0(z) is analytic in the strip0 < Im(z) < 1/(2p), we can shift the contour and get

p1~l! 5 elE2`

`

dzc0~z !exp~2pizl!. (4.12)

We show in Appendix A that the shift is allowed as longas ^L&1 is finite.

Equation (4.12) shows that we need only multiply p0(l)by el to get p1(l):

p1~l! 5 elp0~l!. (4.13)

Of course, both p0(l) and p1(l) must be properly normal-ized to unity, so the only functions that can be densitiesfor the log likelihood under H0 are ones that remain nor-malized after multiplication by el, i.e.,

E2`

`

dl elp0~l! 5 1. (4.14)

If we know that p0(l) really is the density for a log like-lihood under H0 , however, Eq. (4.14) is trivially satisfied,since it is equivalent to Eq. (4.2).

From Eq. (4.13) we readily find a relation between thedensities for L under the two hypotheses. Since l and Lare related by a monotonic transformation, we can write

pj ~l! 5pr ~LuHj!

udl /dLu. (4.15)

The Jacobian udl /dLu is the same under H0 and H1 , soEq. (4.13) becomes

pr ~LuH1! 5 elpr ~LuH0! 5 Lpr ~LuH0!. (4.16)

It is instructive to rewrite this equation as

pr ~LuH1!

pr ~LuH0!5 L. (4.17)

In this form, the relation was known to Green andSwets,18 who described it as follows: ‘‘ To paraphraseGertrude Stein, the likelihood ratio of the likelihood ratiois the likelihood ratio.’’ To paraphrase Green and Swets,the likelihood ratio is a sufficient statistic for deciding be-tween H0 and H1 . If we were given any function of thedata, t(g) and we wanted to make an optimal decisionbased only on t(g) and not on the original g, we wouldform the likelihood ratio pr(t(g)uH1)/pr(t(g)uH0) andcompare it with a threshold. In most cases this strategy,though optimal when only t(g) is available, would be in-ferior to forming the likelihood ratio from the originaldata, and there would thus be an information loss inher-ent in using t(g) in place of g. From Eq. (4.17) we seethat there is no such information loss if t(g) is the suffi-cient statistic L(g).

C. Normal Log LikelihoodsMuch of the literature on the ideal observer has pro-ceeded from the assumption—implicit or explicit—thatthe log likelihood is normally distributed. One justifica-tion for this assumption is that l is a linear functional ofg for nonrandom signals and Gaussian noise, and it maybe approximately linear even with random signals and/ornon-Gaussian noise. As noted in Subsection 3.G, thecentral-limit theorem often leads to normality for a lineardiscriminant.

Even if l is not a linear discriminant, however, it maystill be approximately or asymptotically normal. It iscommon in statistics to consider data sets g that consist ofmany independent observations gk . Because of the inde-pendence, we can write

pr ~guHj! 5 )k51

K

pr ~gkuHj!. (4.18)

The log likelihood is then given by

l 5 (k51

K

@log pr ~gkuH1! 2 log pr ~gkuH0!#. (4.19)


By the central-limit theorem, l is asymptotically normalas the number of observations K becomes large.

A similar argument can be made in an imaging context.Suppose that the image g can be divided up into K statis-tically independent subimages gk , so that Eq. (4.18)again applies. If N of the subimages are different underthe two hypotheses [so that the log probabilities in Eq.(4.19) do not simply cancel], then l is a sum of N indepen-dent random variables; it is therefore normally distrib-uted by the central-limit theorem if N is large.

There are situations in imaging in which normality isnot a good approximation, and a detailed discussion ofideal-observer performance in those cases is given in Sub-section 5.C, but first we explore some little-recognizedconsequences of a normality assumption for the log like-lihood.

Suppose that l is normally distributed under H0 withmean l0 and variance var0(l). One might expect that l0and var0(l) could be specified independently and that twoadditional independent parameters would be needed tospecify the mean and variance under H1 ; we shall showthat this expectation is incompatible with Eq. (4.13).

The moment-generating functions for a general, nor-mally distributed, discriminant function are given by Eq.(3.26). If, however, the discriminant function is the loglikelihood, then Eq. (4.2) requires that ^L&0 5 1, or, withEqs. (3.26) and (4.9),

M0~1 ! 5 exp@ l0 112 var0~l!# 5 1. (4.20)

Thus l0 and s02 must be related by

l0 5 212 var0~l!, (4.21)

and Eq. (3.26) must take the form

M0~b! 5 exp@12 ~b2 2 b!var0~l!# (4.22)

if it is to apply to a log likelihood.For Eq. (4.13) to hold, p1(l) must also be normal, and

we can apply Eq. (4.8) to Eq. (4.22) and determine themean and variance of l under H1 . The results are

l1 5 M0~1 !

~1 ! 512 var0~l! 5 2l0 , (4.23)

var1~l! 5 M0~2 !

~1 ! 2 @M0~1 !~1 !#2 5 var0~l!. (4.24)

The moment-generating function under H1 is then givenby

M1~b! 5 M0~b 1 1 ! 5 exp@12 ~b2 1 b!var0~l!#.

(4.25)

The probability density functions for a normal log likeli-hood are thus

p0~l! 51

@2p var0~l!#1/2 expF2@l 1

12 var0~l!#2

2 var0~l!G ,

(4.26)

p1~l! 51

@2p var0~l!#1/2 expF2@l 2

12 var0~l!#2

2 var0~l!G .

(4.27)

It is easy to verify that Eq. (4.13) is satisfied.

Relations (4.23)–(4.27) were also derived byFukunaga,8 who started with the assumption that thedata are normally distributed with equal covariance,which implies that the log likelihood is normal under bothhypotheses. The derivation given above uses only theweaker assumption that l is normal under H0 . Since ex-act or approximate normality of l can occur with decid-edly nonnormal data, our approach is much more general.

With this normal model, all statistical properties of lunder both hypotheses are determined by the single pa-rameter var0(l). From Eqs. (4.10) and (4.22), this pa-rameter can also be expressed as

var0~l! 5 log^L&1 . (4.28)

Therefore, to fully characterize a normal log likelihood,we need only calculate the mean of the likelihood underH1 .

Next we examine the AUC for this model. Since thediscriminant function is normal under both hypotheses,AUC is given exactly by Eq. (3.2), and the relevant SNR isgiven by

SNRl2 5

@^l&1 2 ^l&0#2

12 var1~l! 1

12 var0~l!

5 var0~l! 5 log^L&1 .

(4.29)

Hence with Eq. (3.2),

AUC 512 1

12 erf @

12 Alog^L&1#. (4.30)

This expression, like all the others in Subsection 4.C, fol-lows rigorously once we make the initial assumption thatl is normally distributed under H0 ; no further assump-tions or approximations are required.

5. LIKELIHOOD-GENERATING FUNCTIONA. Definitions and Basic PropertiesIn view of Eq. (4.13), both p0(l) and p1(l) can always bederived from a single nonnegative function f(l) as fol-lows:

p0~l! 5 exp~212 l!f ~l!, p1~l! 5 exp~

12 l!f~l!.

(5.1)

Though it is easy to find the f (l) associated with the nor-mal densities of Eqs. (4.26) and (4.27), normality is not re-quired for the existence of this function or for any of theother results in this section.

The characteristic functions and moment-generatingfunctions are determined from f(l) by

c0~j! 5 FS j 2i

4p D , c1~j! 5 FS j 1i

4p D ,

(5.2)

M0~b! 5 FL~b 212 !, M1~b! 5 FL~b 1

12 !,

(5.3)

where F(j) is the Fourier transform of f (l) and FL(b) isits two-sided Laplace transform,

FL~b! 5 E2`

`

dl f ~l!exp~bl!. (5.4)


Normalization of the densities requires that c j (0)5 0 and Mj (0) 5 0, which means that F(6i/4p) 5 1and FL(6

12 ) 5 1. We can enforce these conditions by de-

fining new functions T(j) and G(b) such that

F~j! 5 expF S j 1i

4p D S j 2i

4p DT~j!G , (5.5)

FL~b! 5 exp@~b 112 !~b 2

12 !G~b!#. (5.6)

The functions T(j) and G(b) are related to each other by

T~j! 5 24p2G~22pij!. (5.7)

If we allow complex arguments, we can derive all statis-tical properties of the likelihood and log likelihood fromeither T(j) or G(b). We choose to work with G(b),which we call the likelihood-generating function.

In terms of G(b), the characteristic and moment-generating functions for the log likelihood are given by

c0~j! 5 expFjS j 2i

2p DTS j 2i

4p D G5 expF24p2jS j 2

i2p DGS 22pij 2

12 D G , (5.8)

c1~j! 5 expFjS j 1i

2p DTS j 1i

4p D G5 expF24p2jS j 1

i2p DGS 22pij 1

12 D G , (5.9)

M0~b! 5 exp@b~b 2 1 !G~b 212 !#, (5.10)

M1~b! 5 exp@b~b 1 1 !G~b 112 !#. (5.11)

From these equations we see that the basic requirementsc j (0) 5 1 and Mj (0) 5 1 are satisfied.

Two equivalent expressions for G(b) can be derivedfrom Eqs. (5.10) and (5.11) by changing variables:

G~b! 5log M0~b 1

12 !

~b 212 !~b 1

12 !

5log M1~b 2

12 !

~b 212 !~b 1

12 !

.

(5.12)

Moments of the likelihood ratio are easily expressed interms of G(b):

log^Lk&1 5 log^Lk11&0 5 k~k 1 1 !G~k 112 !.

(5.13)

Furthermore, we can compute SNRl if we know G(b); therequisite moments are given by

l0 5 2G~212 !, l1 5 G~

12 !; (5.14)

var0~l! 5 2@G~212 ! 2 G8~2

12 !#,

var1~l! 5 2@G~12 ! 1 G8~ 1

2 !#, (5.15)

where G8(b) is the derivative of G(b). Hence

SNRl2 5

@G~12 ! 1 G~2

12 !#2

G~12 ! 1 G~2

12 ! 1 G8~ 1

2 ! 2 G8~212 !

.

(5.16)

We see that SNRl2 has the structure X2/X if the deriva-

tive of G is approximately the same at 12 and 2

12. In that

case,

SNRl2 . l1 2 l0 5 G~

12 ! 1 G~2

12 ! . 2G~0 !.

(5.17)

It is interesting to compare these results with those fora normally distributed log likelihood, as discussed in Sub-section 4.C. Comparing Eqs. (5.10) and (5.11) with Eqs.(4.22) and (4.25), we see that the likelihood-generatingfunction for a normal log likelihood is G(b)5

12 var0(l) 5 constant. With that observation, Eq.

(5.17) is in accord with Eq. (4.29); by Eq. (4.8), log M1(1)5 log L1 5 2G( 3

2 ), so Eqs. (4.29) and (5.17) agree exactlyin the Gaussian case in which G( 3

2 ) 5 G( 12) 5 G(2

12).

B. Restrictions on the Likelihood-Generating FunctionOne restriction on the form of G(b) is Marcinkiewicz’stheorem,17 which says that if exp(P) is a characteristicfunction and P is a polynomial, the order of the polyno-mial can be at most 2. This means that the only polyno-mial form for G(b) is the one assumed when l is normallydistributed, namely, G 5 constant. Nevertheless, it maybe a useful approximation to treat G(b) as a low-orderpolynomial if we restrict attention to b near the origin.

Another restriction arises from the so-called hermitic-ity property of Fourier transforms (which has nothing todo with Hermitian operators). Since the function f (l)defined in Eq. (5.1) is real, its Fourier transform must sat-isfy F(2j) 5 F* (j) for real j. In Eq. (5.5) the coefficientof T(j) in the exponent is real and even, so hermiticity ofF(j) implies that T(2j) 5 T* (j), which in turn requiresthat

G~2pij! 5 G* ~22pij!, ~j real!. (5.18)

Other restrictions on the behavior of G(b) arise fromfundamental inequalities. Jensen’s inequality1,19 saysthat h(^x&) > ^h(x)&, where h is any concave function(negative second derivative) and x is a random variable.Applying this inequality to Eq. (5.11) with b real andh(x) 5 log(x) yields

b~ b 1 1 !G~b 112 ! 5 logêxp~bl!&1 > bl1 5 bG~

12 !.

(5.19)

With a change of variables, we find

~b 112 !G~b! > G~

12 !, b real, b > 1

2 , (5.20)

which says that G(b) can fall off with increasing b alongthe real axis but no faster than 1/(b 1

12).

Other useful results can be obtained by applying Jens-en’s inequality to the expectation of La (a real and non-negative):

log^La& j > a^l& j , j 5 0, 1. (5.21)


For a 5 1 and j 5 0, we find

2G~212 ! 5 l0 < 0, (5.22)

and for a 5 1 and j 5 1,

l1 < log L1 5 2G~32!. (5.23)

C. Relation of the Likelihood-Generating Function toArea under an ROC CurveThe key expression for AUC is Eq. (3.17), which we canrewrite in terms of the likelihood-generating function as

AUC 512

11

2piP E

2`

` dj

jexpH 24p2S j2 2

ij2p D

3 FGS 22pij 212 D 1 GS 2pij 1

12 D G J

512

11

2piP E

2`

` dj

jexpH 24p2S j2 2

ij2p DH~j!J ,

(5.24)

where

H~j! [ G~22pij 212 ! 1 G~2pij 1

12 !. (5.25)

The AUC is thus determined completely by a particularcombination of likelihood-generating functions, denotedH(j). From the fact that c0(j)c1* (j) is the Fouriertransform of a real quantity, we can show that H* (j)5 H(2j) for real j. That means that we can write

H~j! 5 Hr~j! 1 iHi~j!, (5.26)

where Hr(j) and Hi(j) are both real and

Hr~2j! 5 Hr~j!, Hi~2j! 5 2Hi~j!. (5.27)

Moreover, from basic properties of characteristicfunctions17 we know that uc j(j)u < 1 for real j. Henceuc0(j)c 1* (j)u < 1, from which we can show that

j2Hr~j! 1j

2pHi~j! > 0. (5.28)

To make use of these properties, we recognize that thefactor 1/j in the AUC integral is odd and the integral isover a symmetric interval, so only the odd part ofc0(j)c1* (j) contributes to the integral. Thus

AUC 512

21

2pP E

2`

` dj

j

3 expH 24p2Fj2Hr~j! 1j

2pHi~j!G J

3 sinH 4p2Fj2Hi~j! 2j

2pHr~j!G J . (5.29)

Because of relation (5.28), the exponential factor in Eq.(5.29) falls off rapidly as j increases unless Hr(j) de-creases more rapidly than 1/j 2. In addition, the factor1/j serves to emphasize small j, so the main contributionto the integral comes from j very near the origin. It istherefore reasonable to expand H(j) in a Taylor seriesabout j 5 0 (Maclaurin series). It is shown in AppendixA that H(j) is analytic for all real j, so this series is guar-anteed to converge.

Since H(j) is a combination of likelihood-generatingfunctions, we begin by expanding G(b) as

G~b! 5 (n50

` 1

n!G ~n !~0 !bn. (5.30)

The derivatives G (n)(0) are all real, since G(b) is real forreal b.

When we combine Eqs. (5.25) and (5.30) the terms withodd n cancel, so the expansion for H(j) is

H~j! 5 2(k50

` 1

~2k !!G ~2k !~0 !S 2pij 1

1

2 D 2k

. (5.31)

As a first approximation we might assume that the ex-pansion for G(b) can be truncated after n 5 1, so thatG(b) is approximated by a linear function near the origin.If this approximation is valid, then H(j) . 2G(0). TheAUC integral then has the same structure as Eq. (3.28),and we find readily that

AUC 512 1

12 erf @

12 A2G~0 !#. (5.32)

This result is compatible with relations (3.2) and (5.17).If G(b) is approximately linear, the derivative terms inthe denominator of Eq. (5.16) cancel and SNRl

2 is 2G(0).In addition, AUC is determined solely by G(0) in this ap-proximation.

The upshot of this calculation is that we have extendedthe range of validity of the error-function formula, Eq.(3.2). As originally derived, it held exactly only forGaussian discriminant functions, which in the case of theideal observer would mean G(b) 5 constant. Now wesee that the formula also holds with non-Gaussian loglikelihoods as long as we can approximate G(b) by a lin-ear function near the origin.

Though motivated by the approximation that G(b) islinear near the origin, Eq. (5.32) is actually much moregeneral. We demonstrate in Appendix B that Eq. (5.32)is the leading term in an asymptotic expansion for AUC.This term alone approaches the true AUC as G(0) in-creases. As shown in Eq. (B22), the first correction termis proportional to the second derivative G9(0), and it fallsoff as @G(0)#23 exp@2

12G(0)#. Thus, if the curvature of

the likelihood-generating function at the origin is small orif G(0) is moderately large, Eq. (5.32) is an excellent ap-proximation to AUC.

D. Lower Limit on Area under an ROC CurveWe shall now show that G(0) can be used to set a lowerlimit on AUC with no approximations at all. The start-ing point is Eq. (B8), which, with the change of variablesa 5 2pj, becomes

AUC 5 1 21

2p E0

` da

a2 114

expF22S a2 114 DRe G~ia!G .

(5.33)

From Eq. (5.10) with b 512 1 ia and Eq. (4.9), it follows

that

exp@22~a2 114 !G~ia!# 5 M0~

12 1 ia! 5 ^L1/2L ia&0 .

(5.34)


Now we can separate G(ia) into real and imaginary partsand recognize exp@22i(a 2 1

14)Im G(ia)# as a pure phase

factor, so

exp@22~a2 114 !Re G~ia!# 5 uM0~

12 1 ia!u

5 u^L1/2L ia&0u, ~a real!.

(5.35a)

Using this equation with

u^L1/2L ia&0u < ûL1/2L iau&0 5 ^L1/2&0 5 exp@214 G~0 !#,

(5.35b)

we can infer that

exp@22~a2 114 !Re G~ia!# < exp@2

14 G~0 !#.

(5.36)

Returning to the AUC integral in Eq. (5.33), we now have

AUC > 1 212 exp@2

12 G~0 !#. (5.37)

This inequality was derived without any approximationson the form of G(b). It provides a lower bound for theAUC obtained by an ideal observer, regardless of theprobability laws for the data or for the likelihood ratio.As with the approximate formula (5.32), the bound de-pends solely on G(0).

E. Some Interpretations and InterrelationsWe have just seen that G(0) plays a pivotal role in speci-fying the performance of an ideal observer. It fully speci-fies AUC if the linearity approximation holds, and it setsan exact lower bound in all cases. Because of its impor-tance, we now look at some ways of interpreting G(0).

Note from Eqs. (5.1) and (5.6) that

G~0 ! 5 24 log@FL~0 !# 5 24 logF E2`

`

dlf ~l!G .

(5.38)

The nonnegative function f (l) introduced in Eq. (5.1) isnot a normalized probability density function, so the loga-rithm does not vanish. Curiously, when f (l) is multi-plied by either exp(1

2l) or exp(2 12l), it is a normalized

density; but without the exponential factors it is not nor-malized, and, moreover, the ideal-observer performance isdetermined (or at least strongly influenced) by just howfar away it is from being normalized.

Other forms for G(0) are obtained by expressing f (l)in terms of p0(l) or p1(l), yielding

G~0 ! 5 24 logF E2`

`

dlp1~l!exp~212 l!G

5 24 log^L21/2&1 5 24 logF E2`

`

dlp0~l!exp~12 l!G

5 24 log^L1/2&0 . (5.39)

Thus, within the linear approximation, AUC is fully de-termined by ^L21/2&1 or ^L1/2&0 . This conclusion is to becompared with Eq. (4.30), in which we related AUC to^L&1 . There is no contradiction in this difference; Eq.(4.30) was derived on the assumption that l was normally

distributed, which means G(b) 5 constant. From Eq.(5.13) we see that log^L&1 5 2G( 3

2), so Eqs. (4.30) and(5.32) agree if G( 3

2) 5 G(0), as it must for a normal loglikelihood. If normality fails, Eq. (5.32) will be the betterapproximation, since it approximates G(6

12) by G(0)

rather than G( 32).

The average ^L1/2&0 can also be expressed as an inte-gral over data space, so Eq. (5.39) can be written as

G~0 ! 5 24 logF E dMgAq0~g!q1~g!G . (5.40)

This form shows that G(0) increases as the overlap be-tween q0(g) and q1(g) is reduced.

An interesting way to visualize the effect of G(0) is toplot M0(b) (which is identical to ^Lb&0) versus b for real b(see Fig. 1). From Eq. (5.10) this plot must necessarily gothrough unity at b 5 0 and b 5 1, but the value at b

512 (which is not necessarily the minimum of the curve)

is given by

M0~12 ! 5 ^L1/2&0 5 exp@2

14 G~0 !# 5 E

2`

`

dlf ~l!.

(5.41)

It can be shown from the Schwarz inequality that G(0)> 0 and hence M0( 1

2 ) < 1. At b 532, the plot goes

through ^L&1 , which is >1. From the behavior of thecurve in Fig. 1, we see that larger values for ^L&1 implysmaller values for ^L1/2&0 .

In summary, AUC increases as any of the followingthings occur: G(0) gets larger, the integral of f (l) getssmaller, the expectation of the square root of the likeli-hood under H0 gets smaller, the expectation of the likeli-hood ratio under H1 get larger, or the overlap betweenq0(g) and q1(g) is reduced. In a signal-detection prob-lem, all of these things occur when the signal contrast isincreased or the noise level is reduced. Conversely, inthe limit of very small signal contrast or large noise, f (l)is indistinguishable from p0(l) or p1(l) and hence nor-malized to unity, ^L&1 is indistinguishable from ^L&0 andhence unity, q0(g) is indistinguishable from q1(g) so theintegral of their geometric mean in Eq. (5.40) is unity,G(0) is zero, and AUC 5 1/2. We note that inequality(5.37) becomes an equality in both limits, large and smallG(0).

Fig. 1. Illustration of the behavior of the function M0(b).


As a practical matter, we can use (5.39) to evaluateG(0) for any problem in which we have an analytic ex-pression for L and a way of generating sample data vec-tors g under the null hypothesis. All we have to do is re-peatedly generate g, form @L(g)#1/2 for each g, andapproximate the ensemble average ^L1/2&0 with a sampleaverage. The same approach gives us a way of numeri-cally evaluating G(b) for any b, even complex ones.

6. SUMMARY AND CONCLUSIONSThe main purpose of this paper has been to bridge the gapbetween two segments of the literature on task-based as-sessment of image quality. One segment, which includesthe earlier papers in this series, has mainly expressedperformance on detection and classification tasks in termsof simple SNR’s determined from the first and second mo-ments of the discriminant function. The other segment,which includes much of the psychophysical literature, hasreported the full ROC curve and used the area under it asa summary figure of merit. In the past, contact betweenthese two segments was made mainly by assuming nor-mal statistics and relating AUC to SNR by the error-function formula (3.2). Little attention was paid to es-tablishing the validity of the normality assumption or toinvestigating the relation between SNR and AUC if nor-mality did not hold.

Here we began in Section 2 with a broad overview ofdecision theory. After demonstrating that any nonran-domized decision rule could be cast in the form of compar-ing a discriminant function to a threshold, we derived thewell-known result that the optimum discriminant func-tion was the likelihood ratio or its logarithm. The idealobserver was defined as one that used one of these dis-criminants.

In Section 3 we derived many different expressions forSNR and AUC metrics. Several of these expressionshave not appeared previously in the literature. In par-ticular, we demonstrated that AUC for an arbitrary dis-criminant function could be expressed by a principal-value integral involving the characteristic functions of thediscriminant function. This formula was to play a keyrole in the subsequent discussion.

In Section 4 we examined the properties of the ideal ob-server from first principles. Several strong constraintson the moments of the likelihood ratio or the log likeli-hood were derived, and it was shown that the probabilitydensity functions for these test statistics were intimatelyrelated. In particular, we were able to derive some sur-prising results for the case in which the log likelihood isnormally distributed under one hypothesis. We showedthat it is then necessarily normal under the other hypoth-esis and that the two means and two variances could beexpressed in terms of a single parameter. This led to un-anticipated new expressions for AUC in the normal case.In particular, we found that AUC was determined exactlyby the mean of L under H1 .

In Section 5 we attempted to unify these considerationsby defining a new quantity called the likelihood-generating function G(b). We showed that all momentsof both the likelihood and the log likelihood under bothhypotheses could be derived from this one function.

Moreover, the AUC could be expressed exactly as aprincipal-value integral involving the likelihood-generating function. Perhaps the most surprising resultwas that the AUC could be expressed, with one reason-able approximation, in terms of a single value, G(0).Moreover, this same value sets a lower bound to AUCwithout approximation.

The obvious next step in the exploitation of this newtheory is to apply it to practical problems in signal detec-tion and image quality. This will be the theme of a sub-sequent paper in this series.

APPENDIX A: ANALYTICITYCONSIDERATIONSA function of a complex variable is analytic if and only if itsatisfies the Cauchy–Riemann conditions.20 To applythis test to the characteristic function c j(z), we write

c j~z ! [ uj~j,h! 1 ivj~j,h! 5 E2`

`

dlpj~l!exp~22pizl!

5 E2`

`

dlpj~l!exp~22pijl!exp~2phl!, (A1)

where z 5 j 1 ih. The function c j(z) is analytic atpoint z if and only if

]uj~j,h!

]j5

]vj~j,h!

]h,

]vj~j,h!

]j5 2

]uj~j,h!

]h.

(A2)

Since pj (l) is real, separating real and imaginary parts ofEq. (A1) shows that

uj~j,h! 5 E2`

`

dlpj~l!cos~2pjl!exp~2phl!, (A3)

vj~j,h! 5 2E2`

`

dlpj~l!sin~2pjl!exp~2phl!. (A4)

It is easy to show that Eq. (A2) is satisfied if we can dif-ferentiate under the integral sign. Lang21 gives the con-ditions under which this operation is permissible; in es-sence, the integral must be absolutely convergent beforeand after differentiation. The conditions are thus

E2`

`

dlpj~l!exp~2phl! , `, (A5)

E2`

`

dlpj~l!uluexp~2phl! , `. (A6)

If pj (l) is specifically the density for the log likelihoodunder Hj , relation (A5) reads

^L2ph& j , `. (A7)

In addition, the Cauchy–Schwarz inequality shows thatrelation (A6) is satisfied if

^l2& j^L4ph& j , `. (A8)

For h 5 0 (i.e., z on the real axis), relation (A7) is trivi-ally satisfied and relation (A8) is satisfied if the secondmoment of l exists, so c j (z) is analytic on the real axisunder this weak assumption.


If uhu . 0, however, the exponential factor might causeone of the integrals to diverge. Some special cases are ofinterest. For example, in some problems the range ofvalues of l is restricted. If pj (l) 5 0 for l < a < 0,then c j (z) is analytic in the upper half-plane, and ifpj (l) 5 0 for l > b > 0, then c j (z) is analytic in thelower half-plane. And if pj (l) 5 0 unless a < l < b,then c j(z) is an entire function (analytic for all finite z);this result is a statement of the Paley–Wienertheorem.22,23

In Subsection 4.B we need to assume that c0(z) is ana-lytic in the strip 0 < Im(z) < 1/(2p). For any point inthis strip we can bound the integral in relation (A5) by us-ing Jensen’s inequality with the concave function h(x)5 x2ph. The result is

E2`

`

dlp0~l!exp~2phl!

< E2`

`

dlp0~l!exp~l! 5 êxp~l!&0 . (A9)

This expectation is the same as ^L&0 , which we know tobe unity by Eq. (4.2), so relation (A7) is always satisfied inthe strip under consideration. By a similar argument,relation (A8) is satisfied if ^L2&0 5 ^L&1 , `. Thus, aslong as the mean of L is finite under H1 , we are safe inmaking the assumption required to derive Eq. (4.13).

If we have established a horizontal strip of analyticityfor c j (z), we get a vertical strip of analyticity for Mj (b)by dint of Eq. (3.19). If c j (z) is analytic for a < Im z< b, then Mj(b) is analytic for 2pa < Re b < 2pb.With the minimal assumptions of the last paragraph, thisargument shows that M0(b) is analytic at least over 0< Re b < 1.

Moreover, from the strip for M0(b), we can get a stripof analyticity for G(b) by rewriting Eq. (5.10) as

GS b 212 D 5

log M0~b!

b~b 2 1 !. (A10)

Appearances not withstanding, the right-hand side of thisequation is analytic in the same vertical strip whereM(b) is. In spite of the denominator, there are no polesat b 5 0 or 1; M0(b) must be unity at these points forproper normalization of the densities (see Subsection5.A), so the logarithm vanishes. The multiple-valued na-ture of the logarithm causes no problems either, sinceM0(b) does not go to zero in the strip. The branch cutlies outside the strip, so the strip lies entirely in a singlebranch of the logarithm.

Thus G(b212 ) is analytic in the strip 0 < Re b

< 1, and G(b) itself is analytic in the shifted strip2

12 < b < 1

2. Specifically, this guarantees thatG(2pij 6

12 ) is analytic for all real j, so the function H(j)

defined in Eq. (5.25) is analytic for all real j.

APPENDIX B: AN ASYMPTOTICEXPANSION FOR THE AREA UNDER ANROC CURVE OF THE LIKELIHOOD RATIOBeginning with Eq. (3.17) and using Eq. (4.6) togetherwith c 1* (j) 5 c1(2j), we may write

AUC 512

11

2p iP E

2`

` dj

jc0~j! c0S 2j 1

i2p D .

(B1)

Let z 5 j 1 ih as in Appendix A. From that appendixwe know that c0(z) is analytic for 0 < Im(z) < 1/2p.Therefore c0(2z 1 i/2p) is analytic for 0 < Im(2z1 i/2p) < 1/2p, which simplifies to 0 < Im(z) < 1/2palso. Thus the integrand in Eq. (B.1) is analytic in thisregion except for the simple pole at j 5 0. Since c0(0)5 c1(0) 5 1, the residue at this pole is 1. Let C be thecontour that traverses the negative real axis in the zplane from z 5 2` to z 5 2e, follows the upper semi-circle of radius e centered at the origin to z 5 e, and thencontinues over the rest of the real axis from z 5 e to z5 `. Then, letting e → 0, we have

AUC 512

11

2pi EC

dzz

c0~z !

3 c0S 2z 1i

2p D 1 S 12pi Dp i. (B2)

The last term cancels the contribution from the integralover the semicircle, so that what is left is the principal-value integral in Eq. (B.1).

Now let C8 be the contour that traverses the horizontalline Im(z) 5 1/4p from Re(z) 5 2` to Re(z) 5 `. Sincethe integrand is analytic between these two contours, wemay replace C with C8 in Eq. (B2). This gives

AUC 5 1 11

2pi E2`

` dj

j 1i

4p

c0S j 1i

4p D3 c0S 2j 1

i4p D . (B3)

By use of Eq. (5.2), this is

AUC 5 1 11

2pi E2`

` dj

j 1 i/~4p!F~j!F~2j!. (B4)

From Eq. (5.5) we may express this in terms of T(j). Us-ing the fact that F(2j) 5 F* (j), we know that T(2j)5 T* (j). This means that T(2j) 1 T(j) 5 2 Re T(j)and that this is an even function of j. Now we have

AUC 5 1 11

2piE

2`

` dj

j 1 i/~4p!

3 expF2S j2 11

16p2DRe T~j!G . (B5)

A little algebra gives us

1

j 1 i/~4p!5

j

j2 1 1/~16p2!2

1

4p

i

j2 1 1/~16p2!.

(B6)

The odd part of the integrand drops out, and we are leftwith


AUC 5 1 21

4p2 E0

` dj

j2 1 1/~16p2!

3 expF2S j2 11

16p2DRe T~j!G . (B7)

To express this in terms of G(z), we use Eq. (5.7) and thefact that Re T(j) 5 Re T(2j) to get

AUC 5 1 21

4p2 E0

` dj

j2 1 1/~16p2!

3 expF28p2S j2 11

16p2DRe G~2pij!G .

(B8)

From Section 5 we know that G(0) is real and positive.Let G(z) 5 G(z) 2 G(0). Then we may write

AUC 5 1 2 I0 1 I, (B9)

with

I0 51

4p2 E0

` dj

j2 1 1/16p2

3 expF28p2S j2 11

16p2DG~0 !G , (B10)

I 51

4p2 E0

`

dj expF28p2S j2 11

16p2DG~0 !Gb~j!,

(B11)

b~j! 51 2 exp$28p2@j2 1 1/~16p2!#Re G~2pij!%

j2 1 1/~16p2!.

(B12)If we define

h~j! 5 exp~28p2j2!, (B13)

then

I 5 expF2G~0 !

2 G E0

`

djh@AG~0 !j#b~j!. (B14)

Note that b(0) 5 0 and b8(0) 5 0. This means that themagnitude of I can be expected to be small if G(0) is nottoo small and b9(0) is not too large.

The integral I0 can be computed analytically22 to give

AUC 512 1

12 erf @

12 A2G~0 !# 1 I (B15)

For I there is an asymptotic expansion as AG(0) → `.23

To compute this expansion, we need the Maclaurin seriesfor b(j),

b~j! 5 (n51

`

bnj2n, (B16)

and the Mellin transform of h at the odd integers,

Mh~2n 1 1 ! 5 E0

` dj

jh~j!j2n11. (B17)

The asymptotic expansion for I is then given by

I ;1

4p2 expF2G~0 !

2 G (n51

`

bnMh~2n 1 1 ! @AG~0 !#22n21.

(B18)

Assuming that this expansion is valid, we have for anygiven integer N > 1 positive numbers DN and EN suchthat, for AG(0) . DN ,

U4p2I expFG~0 !

2 G 2 (n51

N21

bnMh~2n 1 1 ! @AG~0 !#22n21 U, EN@AG~0 !#22N21. (B19)

(For N 5 1 there is no sum on the left). Generallyspeaking, we expect DN to increase without bound as Nincreases so that, for a fixed G(0), there are only a finitenumber of N with AG(0) . DN . This is becauseasymptotic series do not necessarily converge. They arenevertheless useful approximations in many circum-stances.

The N 5 1 case in relation (B19) gives

UAUC 2 S 12

112

erf F12

A2G~0 !G D U<

E1

4p2@AG~0 !#3expF2

12

G~0 !G (B20)

for AG(0) . D1 . The values for E1 and D1 depend ulti-mately on the probability density p0(L). Clearly, forlarge enough G(0), the error-function expression pro-vides a good approximation to the AUC.

To compute this asymptotic expansion explicitly, weuse22

Mh~2n 1 1 ! 51 3 2 3 ¯ 3 ~2n 2 1 !

4~4p!2n A 1

2p, (B21)

Re G~2pij! 5 22p2G9~0 !j2 12p4

3G ~4 !~0 !j4 1 . . . .

(B22)

The latter expansion may be used to compute the num-bers bn . For example, b1 5 216p4G9(0). Using thisvalue in the first term of the asymptotic expansion for Igives us an approximation

AUC '12

112

erf F12

A2G~0 !G2

G9~0 !

8expF2

G~0 !

2 G@AG~0 !#23. (B23)

We would expect this to be good for large G(0). Thereare also indications that the approximation

AUC ' 12 1

12 erf @

12 A2G~0 !# (B24)

is good for small G(0).

ACKNOWLEDGMENTSThe authors have benefited greatly from discussion withmany people. We especially thank Michel Defrise forpointing out the asymptotic properties of log likelihoods;


Jack Denny for calling attention to relevant segments ofthe statistics literature; Brandon Gallas for advice onMonte Carlo simulation of likelihood ratios; Robert Wag-ner for convincing us that SNRl was more likely to be use-ful than SNRL and for introducing us to Gertrude Stein;and Kyle Myers, Charles Metz, and Richard Swensson forvery careful and critical reading of the manuscript. Thiswork was supported by the National Institutes of Health(NIH) through grant RO1 CA52643, but it does not repre-sent an official position of the NIH.

Address correspondence to Harrison H. Barrett, De-partment of Radiology, Radiology Research Laboratory,University of Arizona, Tucson, Arizona 85724. E-mailaddress, [email protected].

REFERENCES1. H. H. Barrett, ‘‘Objective assessment of image quality: ef-

fects of quantum noise and object variability,’’ J. Opt. Soc.Am. A 7, 1266–1278 (1990).

2. H. H. Barrett, J. L. Denny, R. F. Wagner, and K. J. Myers,‘‘Objective assessment of image quality. II. Fisher infor-mation, Fourier crosstalk, and figures of merit for task per-formance,’’ J. Opt. Soc. Am. A 12, 834–852 (1995).

3. H. H. Barrett, D. W. Wilson, and B. M. W. Tsui, ‘‘Noiseproperties of the EM algorithm I: theory,’’ Phys. Med.Biol. 39, 833–846 (1994).

4. D. W. Wilson, B. M. W. Tsui, and H. H. Barrett, ‘‘Noiseproperties of the EM algorithm II: Monte Carlo simula-tions,’’ Phys. Med. Biol. 39, 847–872 (1994).

5. H. L. Van Trees, Detection, Estimation, and ModulationTheory (Wiley, New York, 1968), Vol. I.

6. J. O. Berger, Statistical Decision Theory and BayesianAnalysis (Springer-Verlag, New York, 1985).

7. J. L. Melsa and D. L. Cohn, Decision and Estimation Theory(McGraw-Hill, New York, 1978).

8. K. Fukunaga, Introduction to Statistical Pattern Recogni-tion, 2nd ed. (Academic, New York, 1990).

9. H. H. Barrett, T. A. Gooley, K. A. Girodias, J. P. Rolland, T.A. White, and J. Yao, ‘‘Linear discriminants and imagequality,’’ Image Vision Comput. 10, 451–460 (1992).

10. H. H. Barrett, J. Yao, J. Rolland, and K. J. Myers, ‘‘Modelobservers for assessment of image quality,’’ Proc. Natl.Acad. Sci. USA 90, 9758–9765 (1993).

11. C. E. Metz, Department of Radiology, University of Chi-cago, Chicago, Ill. (personal communication, 1998).

12. R. Bracewell, The Fourier Transform and Its Applications(McGraw-Hill, New York, 1965).

13. I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Seriesand Products, 4th ed., Formula 3.954 (Academic, New York,1965).

14. H. H. Barrett and C. K. Abbey, ‘‘Bayesian detection of ran-dom signals on random backgrounds,’’ in Information Pro-cessing in Medical Imaging, Proceedings of the 15th Inter-national Conference, IPMI 97, Poultney, Vt., June 9–13,1997, Lecture Notes in Computer Science, J. Duncan and G.Gindi, eds. (Springer-Verlag, Berlin, 1997).

15. D. Middleton, An Introduction to Statistical Communica-tion Theory (McGraw-Hill, New York, 1960).

16. R. M. Swensson and D. M. Green, ‘‘On the relations be-tween random walk models for two-choice response times,’’J. Math. Psychol. 15, 282–291 (1977).

17. A. N. Shiryayev, Probability (Springer-Verlag, New York,1984).

18. D. M. Green and J. A. Swets, Signal Detection Theory andPsychophysics (Wiley, New York, 1966).

19. L. Rade and B. Westergren, Beta Mathematics Handbook,2nd ed. (CRC Press, Boca Raton, Fla., 1990).

20. R. V. Churchill, J. W. Brown, and R. F. Verkey, ComplexVariables and Applications, 3rd ed. (McGraw-Hill, NewYork, 1974).

21. S. Lang, Real and Functional Analysis, 3rd ed. (Springer-Verlag, New York, 1993).

22. R. E. A. C. Paley and N. Wiener, Fourier Transforms in theComplex Domain (American Mathematical Society, NewYork, 1934), Vol. 19.

23. R. Strichartz, A Guide to Distribution Theory and FourierTransforms (CRC Press, Boca Raton, Fla., 1994).

Documents

Objective assessment of image quality. III. ROC metrics, ideal observers, and likelihood-generating functions