Analysis of Multibeam Data for Neutron Reflectivity †

Analysis of Multibeam Data for Neutron Reflectivity†

N. F. Berk*,‡,§ and C. F. Majkrzak§

Department of Materials Science and Engineering, UniVersity of Maryland, and NIST Center for NeutronResearch, National Institute of Standards and Technology, 100 Bureau DriVe, Stop 6102,

Gaithersburg, MD 20899-6102

ReceiVed August 26, 2008. ReVised Manuscript ReceiVed December 3, 2008

We offer mathematical proof that multiple-beam neutron reflectivity, corresponding to simultaneous collection ofdata at multiple angles (wavevector transfers) does not perform better, errorwise for counting noise, than single-beamdata collection for the same total number of reflected neutronssand may perform much worse, depending on the beammodulation strategy used. The basic idea is that the nominal statistical benefit of summing data at, say, N differentwavevector transfers is undone by needing to collect N differently modulated (i.e., weighted) sums in order to extractthe reflectivities. To our knowledge, a general proof of this behavior for arbitrary strategies has been lacking. Theformal result can be summarized by saying that the best nondiagonal matrix modulation strategies are orthogonal(unitary) matrices, or constant multiples thereof, and that these can do no better than diagonalsi.e., single-beamsstrategies.

Introduction

We offer mathematical proof that multiple-beam (or “mul-tiplexed”) neutron reflectivity, corresponding to simultaneouscollection of data at multiple angles (wavevector transfers) doesnot perform better, errorwise for counting (shot) noise, than single-beam data collection for the same total number of reflectedneutronssand may perform much worse, depending on the beammodulation strategy used. The basic idea is that the nominalstatistical benefit of summing data at, say, N different wavevectortransfers is undone by needing to collect N differently modulated(i.e., weighted) sums in order to extract the reflectivities. Suchbehavior has been described before,1 but to our knowledge ageneral proof for arbitrary strategies has been lacking. The formalresult can be summarized by saying that the best nondiagonalmatrix modulation strategies are orthogonal (unitary) matrices,or constant multiples thereof, and that these can do no better thandiagonalsi.e., single-beamsstrategies.

In this paper we lay out precisely what we mean by multiple-beam measurements and discuss shot noise “error” propagationthrough such systems. We derive a multiplication factor for theaverage weight of observed reflectivity noise relative to theaverage weight of the associated single-beam reflectivity noiseand prove a theorem that this factor cannot be less than unityfor any beam modulation strategy. The smallest possible value,unity, of the noise factor is achieved only by “trivial” (diagonalmatrix) strategies and by nontrivial modulations described byorthogonal (unitary) matrices. Ill-chosen (nonunitary) strategiescan lead to noise that grows with the number of beams in theconfiguration, as we illustrate in the Examples section. In theConclusions, we briefly comment on recent innovative uses ofdivergent beams (a form of multiplexing) in reflectivity studies.2,3

Inverting Multiple Beam Data

Noiseless Data. Consider an instrument consisting of N beamssimultaneously incident on a reflecting film. These beams haveknown intensities, Bn, n ) 1,..., N, comprising the set denotedby BnN. The corresponding reflected beams are detected by Ndetectors configured to correspond to known scattering wavevec-tors QsN, s ) 1,..., N at known wavelengths. We label eachsuch reflected beam by the index s, for “slot.” In the absence ofcounting noise, the ideal reflected intensities comprise the setIsN, where Is ) BsR0s, and where R0sN are the theoreticalreflectivities associated with the corresponding QsN (the “0”subscript denotes their theoretical nature). Thus each slot alsois associated with a “perfect” reflectivity, R0s ) R0(Qs). Forspecular reflection, each detector records only a single reflectedsignalsviz., the Is “in” slot ssand there is no problem inidentifying the unique reflectivities R0s ) Is/Bs in each slot. Fornonspecular reflection, however, a given detector may receiveup to N reflected beams, corresponding to all the QsN in theconfiguration. The resulting signal in the giVen detector thus isthe sum of signals

S)∑s)1

N

Is )∑s)1

N

BsR0s (1)

(Of course, some R0s may be zero in the given detector.) Clearly,a single measurement of S is insufficient to determine R0sN

uniquely, even though BnN is known. However, by performingN separate measurements, S1, S2,..., SN, in the same detector, witheach one corresponding to different sets of incident beams overthe N slots, B1sN, B2sN,..., BNsN, we obtain the set of equations

Sm )∑s)1

N

Ims )∑s)1

N

BmsR0s (2)

for m ) 1,..., N. Here we we use index m for “measurement.”In obvious matrix notation, these measurements are summarizedas

S)BR0 (3)

where S and R0 are N-dimensional column vectors and B(sometimes notated below as BN) is an N × N matrix consistingof N rows (the mth row representing a measurement Sm) and N

† Part of the Neutron Reflectivity special issue.* Corresponding author. E-mail: [email protected].‡ University of Maryland.§ NIST.(1) Harwit, M.; Sloane, N. J. Hadamard Transform Optics; Academic Press:

New York, 1979. These authers show that other error types can benefit frommultibeam strategies.

(2) Rekveldt, M. Th.; Bouman, W. G.; Kraan, W. H.; Uca, O.; Grigoriev,S. V.; Habicht, K.; Keller, T. Elastic Neutron Scattering Measurements UsingLarmor Precession of Polarized Neutrons. In Lecture Notes in Physics, NeutronSpin Echo Spectroscopy, Basics, Trends, and Applications; Mezei, F., Gutberlet,T., Pappas, C., Eds.; Springer: Berlin, 2003; p 87.

4145Langmuir 2009, 25, 4145-4153

10.1021/la802780v CCC: $40.75 2009 American Chemical SocietyPublished on Web 01/30/2009

columns (the sth column corresponding to the sth slot. The beammatrix B can be said to represent a “modulation” of the incidentbeam. A particular choice of B will be called a modulation (orbeam) strategy, and we will call the intersections of measurementsand slots “channels.” Thus in a beam strategy, each channel c) m, s is associated with an incident beam Bc ) Bms. If B isinvertable (we assume B is perfectly known), i.e., if det B * 0,then (3) is uniquely solved by

R)B-1S)R0 (4)

or (anticipating forthcoming notation)

R(r)) (B-1S)r )R0s|s)r (5)

where index r labels a “result” (in this case, a solution). That is,the rth result is the reflectivity in the s ) rth slot.

Notice that the details the modulation strategy are unimportantfor this purpose, as long as B is well conditioned. A squarematrix B is called singular if it is not invertable, i.e., if det B )0. It is called ill conditioned if |det B| < ε, where ε is some smallnumber relevant to computational accuracy (typically “machineaccuracy”). Singular matrices contain at least one row or columnthat is a linear combination of the others, and it is more-or-lesseasy to eliminate such possibilities in designing a strategy. Aswe will learn in the context of noisy data, however, “good”conditioning of a modulation strategy B requires more than anot-too-small determinant. Roughly speaking, it is necessarythat both B and B-1 be well conditioned. Since det (B-1) ) (detB)-1, this implies that |det B| should not be allowed to be toolarge, as well as not too small. This is restrictive but, it turns out,not sufficient for a good strategy.

Noisy Data. Allowing for Poisson counting noise (shot noise)in the reflected intensities introduces more than nuisancecomputational problems. For known Bms, the reflected intensityis Pran(Ims) ) Pran(BmsR0s), where X ) Pran(x) is a Poisson randomvariable with mean value x and standard deviation x. Theobservable reflectivity then is a random variable defined by

Rms )Pran(BmsR0s)/Bms (6)

and using (6), the mth measurement of S yields

Sm )∑s)1

N

BmsRms (7)

Alternatively, one could require Rms ) R0s for all m, R0s beingthe theoretical or ideal reflectivity in slot s, and then assert Bms

) Pran(Ims)/R0s as the random variable expressing the countingnoise. The latter viewpoint, however, entails the awkward notionof defining incident beams in terms of sample-dependentsandtherefore unknownsreflectivities, which is not common practicein single-beam reflectometry (unless R0s ) 1). The noisyreflectivities appearing in (6), while not directly observed in amultibeam measurement, could be measured one at a time in asequence of single-beam experiments; and it is natural, therefore,to identify them as being the essential random variables of theproblem. Indeed, considering B as “perfectly” known greatlyfacilitates devising an approximate solution, as we see next.Moreover, any uncertainty or “error” in Bms can be transferredto Rms, since, obviously, (xBms)(yRms))Bms(xyRms), for any x andy. Thus, the ensuing analysis can be applied to the problem ofsolving a linear system of equations when only the matrix ofcoefficients is uncertain, i.e., x * 1, y ) 1, by interchanging xand y.

It is clear we can never measure a sufficient number of Sm toexactly determine all the Rms, since each instance of S introducesN additional (noisy) reflectivities; thus N measurements generateN2 unknowns, and it is impossible to catch up. However, if Bis invertable, we can follow our nose and, in the manner of (4),define a solution as

R(r)) (B-1S)r (8)

for r ) 1,..., N, as if the RcN2 were not random. We will callthis result the nominal inverse; in the noiseless case above, thenominal inverse is the true inverse, i.e., the exact solution. Thenominal inverse is the exact solution of the linear regression

minR(r)N∑m)1

N

|Sm -BR(m)|p (9)

for known B, with term-by-term zero minimum for any objectivepower p. Thus R(r) has the unique attribute

Sm )∑r)1

N

BmrR(r))∑s)1

N

BmsRms (10)

for m ) 1, 2,..., N. Now, because the linear system in (7) isstrongly underdetermined, each Sm is degenerate over sets RmsN2,meaning that, given B and SmN, the number of solutions isinfinite. Geometrically, BmN and RmN can be viewed as vectorsBbm and Rbm in the N-dimensional Cartesian space VN. Then Bbm ·Rbm

) Sm defines the hyperplane normal to Bbm that terminates allvectors Rbm with specified projection Sm onto Bbm. For nonsingularB, the point intersection of these N hyperplanes terminates the

unique vector Rb

having the same SmN, which is the nominal

inverse. Depending on B, Rb

may or may not lie close to the“cluster” RbmN determining SmN. For singular B, in particular,BbmN can not span VN, since it contains linearly dependentvectors; the hyperplanes normal to BbmN thus can not all meetat a point. The noiseless limit, RmnN2 f R0sN ) R(r)N,restores consistency for nonsingular B. For noisy measurements,on the other hand, we must access the degree of consistency ofdifferent strategies by gauging how close R(r)N comes to thedesired but unobservable R0sN. With the geometric picture inmind, one may well imagine that orthogonal modulations, withBbm ·Rbm′ ) cδmm′ (constant c), do better in this regard thannonorthogonal strategies. For these latter, angular dispersion ofthe vectors within BbmN tends to compress with increasing N,effectively degrading their linear independence. We will giveexamples of such behavior later in the discussion.

The results of nominal inversion easily can be written as

R(r)) ∑m,s)1

N

Cms(r)Rms (11a)

where

Cms(r)) (B-1)rmBms (11b)

This form explicitly expresses the N nominal inverses R(r) asweighted averages of the N2 unknown Rms. The Cms(r) satisfytrivial sum rules

∑m)1

N

Cms(r)) δsr (12a)

∑s,m)1

N

Csm(r)) 1 (12b)

and(3) Pynn, R.; Fitzsimmons, M. R.; Fritzsche, H.; Gierlings, M.; Major, J.;

Jason, A. ReV. Sci. Inst. 2005, 76, 053902.

4146 Langmuir, Vol. 25, No. 7, 2009 Berk and Majkrzak

∑r)1

N

Cmr(r)) 1 (12c)

For noiseless data, we define Rms ) R0s, for m ) 1,..., N,indicating that the reflected beams in each slot are equal to the“true” R0 in every measurement. Then from (11) and (12),

R(r)) ∑m,s)1

N

Cms(r)R0s )R0 (13)

which agrees with (4). Thus the nominal inverse of S is a trueinverse for noiseless data. We can consider the promulgation ofshot noise through the nominal inversion by defining shot noiseand result deviations (“errors”), ∆ms and ∆(r), respectively, byway of

Rms )R0s +∆ms (14a)

and

R(r))R0r + ∆(r) (14b)

so that from (11) and (12),

∆(r)) ∑m,s)1

N

Cms(r)∆ms (15)

with the same weights as in (11). Using the channel concept,with m, s ) c, this also is

∆(r))∑c)1

N2

Cc(r)∆c (16)

viz., a sum over all channels. The first line gives ∆(r) as anaccumulation of slot errors, while the second defines each ofthese as B-dependent weighted sums of shot noise. Equation 16relates the error in R(r)srelative to R0(r)sto the channel shotnoise. (At this point of the discussion, we believe that “error”is more appropriate than “uncertainty,” since the R(r) are definedquantities.)

The simplest consequence of (16), by analogy with thederivation of (13), is that “systematic errors” in the Rc, viz.,m-independent measurement errors ∆ms ) ∆0s, simply shift theR0s by the channel bias; i.e., bias is “passed through” withoutaugmentation or diminution.

Now consider an ensemble of experiments on the same setup,and let ⟨ · · · ⟩ represent the formal average over these. Clearly,for (unbiased) shot noise, we can assume

⟨∆c⟩ ) 0 (17a)

and

⟨∆c∆c′ ⟩ ) 0 (17b)

for c′ * c. Thus from (16),

⟨ ∆(r)⟩ ) 0 (18a)

and

var ∆(r))∑s)1

N

⟨∆•s2⟩ ∑

m)1

N

Cms2(r))∑

s)1

N

Γs(r)var ∆•s(18b)

where var(•) ) ⟨(• - ⟨•⟩)2⟩ and

Γs(r)) ∑m)1

N

Cms2(r) (18c)

In (18b), ∆•s refers to the shot noise in any fixed measurementposition (m) 1,..., N) in an experiment. (N ordered measurementsof S comprise an experiment; “generating” the ensemble causesmeasurement number m ) • to be repeated an infinite numberof times.) Thus in (18b), var ∆•s is the shot noise varianceassociated with (any measurement of) the sth slot in the setup.According to (18b), the ensemble variance of ∆(r) is a sum overslots of shot noise variances, weighted by “sensitivity” factors,Γs(r). It is important to note in (18b) that both Γs(r) and var ∆•s

depend on the modulation matrix B: the former, manifestly so,of course, because of Cms; the latter, because ∆ms ) Rms - R0s,where Rms is a Poisson random variable with mean R0s and varianceR0s/Bms for nonzero elements of B. (For now, let us continue toassume that Bms g 0. An effective beam strategy, B f B′, canhave negative elements in a modification of the setup in whichreflectivity differences are measured, which will be discussedbelow in Differential Modulation Strategies).

As a touchstone, consider a mathematically trivial modulationstrategy consisting of N separate single-slot measurements Thecorresponding beam matrix can be defined as diagonal, withnonzero elements Bnn ) bn, and thus

Cms(r)) δmrδsr (19)

Therefore, in (18b), Γs(r) ) δsr for a trivial strategy, and so

var ∆(r)) var ∆•s|s)r (20)

as expected for single-beam measurements.The mean result variance, or mean square result error (MSRE),

is

MSRE) var ∆(r))∑s)1

N

Γs′ var ∆•s (21a)

with

Γs′ )N-1∑r)1

N

Γs(r) (21b)

Thus MSRE is a weighted sum of noise variances. A usefulfigure of merit for a beam strategy is

Ω)∑s)1

N

Γs′ )N-1∑r,s

Γs(r) (22)

So if var ∆•s ) σ2 for all slotssa reasonable approximationwhen R0sN covers a small dynamical rangesthen MSRE )Ωσ2. With the definition of Cms(r) in (11b), we can recast (22)as

Ω ) N-1∑m,r,s

|Cms(r)|2

) N-1∑m)1

N

(BB†)mm((B†B)-1)mm

(23)

Since B is real, |Cms(r)| ) Cms(r), and B† ) BT; however, weprefer to cast these formulas in the more general terminology ofcomplex linear algebra. It must be emphasized that Ω is a course-grained measure of result error. Nevertheless it is a useful indicatorof the reliability of a multiple-beam strategy, while also enablinga rough estimate of measurement uncertainty: say, e.g., reportingresults as

Multibeam Data for Neutron ReflectiVity Langmuir, Vol. 25, No. 7, 2009 4147

R(r)( k√MRSE) R(r)( k√Ωσ (24)

where now σ ) (var ∆•s)1/2, and where k is a “coverage factor,”usually k ) 1, 2, or 3, depending on local practice. Below, weprove that Ω g 1 for all B.

Differential Modulation Strategies. It will prove very usefulto slightly expand our earlier definitions of setup and experiment.Let us now allow the application of two distinct modulationstrategies, B1 and B2, over the same channels on the setup, callingthe corresponding data sets S1,mN and S2,mN, respectively.Thus, from (7), on the setup we have the two data sets

Si,m )∑s)1

N

Bi,msRi,ms (25)

for i ) 1, 2, and their difference can be represented as

S1,m - S2,m )∑s)1

N

(B1,ms -B2,ms)R1,ms +R2,ms

2+

∑s)1

N

(B1,ms +B2,ms)R1,ms -R2,ms

2(26)

Or, from (14),

S1,m - S2,m )∑s)1

N

(B1,ms -B2,ms)(R0s +∆1,ms +∆2,ms

2 )+∑s)1

N

(B1,ms +B2,ms)∆1,ms -∆2,ms

2(27)

To make this useful, define

Bi )B0 -(-1)i

2B′ (28)

where B0 is a uniform matrix, B0,ms) b0si.e., no modulationsandB′ is the modulation “of interest.” Then (14) becomes

Sm′ )∑

s)1

N

Bms′ (R0s +∆ms

′ )+ S0,m′ (29a)

where

Sm′ ) S1,m - S2,m (29b)

∆ms′ ) 1

2(∆1,ms +∆2,ms) (29c)

and

S0,m′ )

b0

2 ∑s)1

N

(∆1,ms -∆2,ms) (29d)

By analogy with (8) and ensuing equations, the nominal inversionof (29a)swith respect to B′sgives

R′(r)) ((B′)-1S′)r )R0s + ∆′(r)+ ∆0′(r) (30a)

with

∆′(r)) ∑m,s)1

N

Cms′ (r)∆ms

′ (30b)

and

∆0′(r)) ((B′)-1S0

′)r (30c)

In (30b), Cms′ (r) is just the Cms(r) defined by (11b), evaluated here

for matrix B′. The extra error term ∆0′ (r) in (30) is new, a “price

paid” for using differential modulation. We will call it difference-

induced noise (DIN) and note that it is separately measurable.On the same setup, do two experiments with the uniform beamstrategy B0, and compute the difference S0′; then with the givenB′, compute the DIN from (30c)sor more precisely, a value ofDIN drawn from the setup ensemble.

If there is a cost to a differential strategy (aside from additionalmeasurement time), what is the benefit? The answer is that whileB for “regular” strategies must be non-negative in all channels,since the Bms are beam intensities, B′ can have negative elements,since each is an intensity difference. The only restriction is thatB0 be chosen so that |Bms′ |e b0 in all channels of a given B′. Thisextra freedom provides a much larger class of potentially usefulmodulation schemes to choose from than do regular strategies.For example, for N > 2, unitary (orthogonal) matrices tend tohave roughly equal numbers of non-negative and negativeelements, so regular strategies can not be unitary or even sensiblyapproximate unitary matrices. Unitary modulations have thedesirable property that Ω ) 1, so that gross error magnificationis prevented. Adding DIN essentially doubles the error, since∆′(r) and ∆0′ (r) depend on the conditioning of B′ in roughly thesame way, but for unitary B′, this, more-or-less, is the only penalty.

In the remaining discussion, we drop the (′) notation. Exceptfor the appearance of DIN, there is no structural distinctionbetween regular and differential modulations, as shown by (29);and the differential shot noise ∆ms

′ is, in general terms, statisticallyindistinguishable from ∆ms for regular strategies, as long as it isremembered that the scale of shot noise is determined by truebeam intensities, not differential intensitiessi.e., in (28), by theB1,ms and B2,ms, not the |Bms′ |.

Theorem and Corollary on Noise Propagation

Recalling (23), for any N × N matrix B, let

Ω[B])N-1∑m)1

N

(BB†)mm((B†B)-1)mm (31)

It is easy to see that Ω ) 1 if B is diagonal or a constant scalarmultiple c of a unitary matrix. In the latter case, BB† ) B†B )c2 × 1, the constant then cancels out, and Ω ) N-1 ∑m)1

N 1 )1. We will not distinguish between unitary (det B ) 1) andantiunitary (det B ) -1), since the two are indistinguishablewith respect to Ω.

A diagonal B realizes a “trivial” strategy, one indistinguishablefrom N single-beam measurements, so it is not surprising thatit does not magnify propagated shot noise. A unitary strategy isnot trivial, in the sense we have defined it, but it also assures that(on average) shot noise is propagated without magnification. Ofcourse, a prime motivation for considering nontrivial inversionstrategies rests in the hope that they suppress shot noisepropagation to some degree, thus taking “full advantage” of themultiplicity of beams contributing to the summed data in eachdetector element. That is, we desire modulation strategies forwhich Ω< 1. While not immediately obvious from (31) that thisgoal is unattainable, computing Ω for various B of possible interestmay lead one to suspect that, at the least, it is hard to reach. (Wewill survey some cases below.) In fact, we will now prove thatthere exist no matrix strategies B that produce Ω < 1.

Theorem. Ω[B] g 1 for all square matrices B, with Ω[B] )1 only for B that are diagonal and for B that are constant scalarmultiples of unitary (or antiunitary) matrices.

Corollary. No multiple-beam modulation strategy can, onaVerage, propagate less shot noise through nominal inVersionthan a unitary (orthogonal) strategy or a diagonal (triVial)strategy. Thus, in this sense, no multiple-beam measurement canoutperform equiValent single-beam measurements.


The corollary is an interpretive restatement of the theorem,given that Ω[B] represents the ratio of the average variance ofthe error in nominal inversion to the average variance of themeasurement shot noise. Thus, if the interpretation is accepted,proving the theorem proves the corollary.

Proof. That Ω[B] ) 1 if B is diagonal or unitary is self-evident, as described above. Since Ω[cB] ) Ω[B] for scalarconstant c, we can assume, for now, that c ) 1. In order to showthat for all other matrices, Ω[B] > 1, we use the method ofsingular value decomposition.

Every square matrix B has a singular value decomposition(SVD), which is representable in Dirac notation as

B)∑ν)1

N

|ψν⟩ν⟨φν| (32)

such that Bms) ⟨m|B|s⟩ )∑ν)1N ν⟨m|ψν⟩⟨φν|s⟩ )∑ν)1

N νψν(m)φν(s).The ν, for ν ) 1, 2,..., N, are the real-valued singular values ofB, and vectors |ψν⟩ and |φν⟩ simultaneously satisfy

B|ψν⟩ ) ν|φν⟩ andB†|φν⟩ ) ν|ψν⟩

(33)

with

⟨ψν|ψµ⟩ ) ⟨φν|φµ⟩ ) δµν (34a)

It follows that

BB†|ψν⟩ ) λν|ψν⟩ and

B†B|φν⟩ ) λν|φν⟩ with

λν ) ν2

(35)

for ν) 1, 2,..., N. Thus the hermitian (in this case, real-symmetric)matrices BB† and B†B share common real-valued, non-negativeeigenvalues, and their eigenvectors, |ψν⟩ and |φν⟩ , derivable fromthe singular value decomposition (SVD) of B, comprise (usuallydistinct) complete bases for the N-dimensional Hilbert spaceholding B, with

∑µ)1

N

|ψµ⟩⟨ ψµ|)∑µ)1

N

|φµ⟩⟨ φµ|) 1 (36)

with the 1 on the right-hand side (rhs) representing the identityoperator. The corresponding SVD of B-1 is

B1 )∑ν)1

N

|φν⟩ν-1⟨ψν| (37)

which is easily checked by forming BB-1 and B-1B and usingthe properties of the ψ and φ bases.

Thus, Ω[B] has the SVD representation

Ω[B])∑µ,ν

aµν

λµ

λν(38)

with ∑µ,ν standing for ∑µ)1,ν)1N and where

aµν )N-1∑m)1

N

|ψµ(m)|2|φµ(m)|2 (39)

From the completeness of the ψ and φ bases, the aµν satisfy thesum rules

∑µ)1

N

aµν )∑ν)1

N

aµν )N-1 (40)

and therefore,

∑µ,ν

aµν ) 1 (41)

Thus

Ω[B]) 1+∑µ,ν

aµν(λµ

λν- 1))

1+∑µ<ν

[aµν(λµ

λν- 1)+ aνµ(λν

λµ- 1)] (42)

Next we wish to view Ω[B] as a function of variables λν, ν )1, 2,..., N, for fixed aµν ) aµν

(0); i.e.,

Ω[λb])∑µ<ν

[aµν(0)(λµ

λν- 1)+ aνµ

(0)(λν

λµ- 1)] (43)

where λb )(λ1, λ2,..., λN). SVD provides a mechanism for this.For given B ) B(0), construct the SVD

B(0) )∑ν)1

N

|ψν(0) ⟩ν

(0)⟨φν(0)| (44)

and define the function

B[b])∑ν)1

N

|ψν(0)⟩ν⟨φν

(0)| (45)

such that B[b(0)])B(0). This represents a continuous “deformation”of B(0) to all matrices sharing the same sets of ψ ) ψ(0) and φ

) φ(0). In particular, there is always a continuous path to

B[1b])∑ν)1

N

|ψν(0) ⟩⟨ φν

(0)| (46)

where 1b ) (1, 1,..., 1), which is unitary. (In conventional SVDnotation, B[1b] ) UV†, where U and V are unitary.) Since B(0) canbe arbitrary, for every matrix B there exists a unitary matrix B[1b]into which it can be continuously transformed by transformingits singular values, bf 1b. Thus by taking any B ) B(0)f B[b],we arrive at Ω[λb], with

aµν(0) )N-1∑

m)1

N

|ψµ(0)(m)|2|φµ

(0)(m)|2 > 0 (47)

and λb) b2. Note that λνg 0 for ν ) 1, 2,..., N; and by restrictingourselves to nonsingular matrices, we also can assert λν > 0.Thus it is easy to see that

∂λµ

2 B[λb]) 2

λµ3∑

ν*µaµν

(0)λν > 0 (48)

for ν ) 1, 2,..., N and, therefore, that Ω[λb] is always concave-upward. Thus its minimum value is attainable along anycontinuous λ-path. A convenient one for us is along the N -1-dimensional diagonal λbd) (λ, λ,..., λ, 1), where we have definedλN ) 1. This can be done without loss of generality, since it isequivalent to taking B f NB. Then

Ω[λbd])∑µ<N

[aµN(0)(λ- 1)+ aNµ

(0)(1λ- 1)]) 1+ bτ(λ) (49)

where, from the completeness property of ψ(0) and φ(0),

b)N1[1- ∑m)1

N

|ψN(0)(m)|2|φN

(0)(m)|2] (50)

and where


τ(λ)) λ+ λ-1 - 2 (51)

Since ψ(0) and φ(0) are normalized to unity, we necessarily have0 < b < N-1 and τ(λ) g 0 for λ > 0, with its unique minimumoccurring at λ) 1. Therefore Ω[λbd]sand thus also Ω[λb]sattainsits minimum value at λb ) 1b, corresponding to a unitary matrixB[1b]. Since this is true for every matrix, the theorem 1 is proved.

Comments

(1) For nonunitary diagonal matrices Bd, the SVD can alwaysbe arranged to make |ψν⟩ and |φν⟩ unit vectors along the ν-axes,reducing the SVD to a trivial identity. This induces b ) 0, sothat the hypersurface Ω[λb] degenerates to the hyperplane Ω[λb]) 1.

(2) A matrix B is normal (“diagonalizable”) if it can be broughtinto diagonal form by a similarity transformation. A necessaryand sufficient condition for normality is BB† ) B†B. In this case,we can take aµν ) N-1 ∑m)1

N |ψµ(m)|2|ψµ(m)|2 ) aνµ, and then

Ω[λb]) 1+∑µ<ν

aµν(0)τ(λν/λµ) (52)

Each term in the µν-summation now is manifestly positive forarbitrary λb, in contrast to the general case, and has a uniqueminimum at λν/λµ ) 1. Therefore, Ω[λb] g 1, and the minimumoccurs at λb ) 1b, corresponding to (a multiple of) a unitary B.Clearly, the restriction to normal B allows a somewhat moredirect proof of the theorem, but modulation strategies of interestneed not be normal, as we will see below.

(3) For N ) 2, only µ ) 1 and ν ) 2 occur in Ω[λb] ) Ω[(λ,1)]; therefore a12 ) a21, regardless of whether B is normal. ThusΩ[(λ, 1)] ) 1 + a12τ(λ) g 1 and Ω[(λ, 1)] ) 1 for λ ) 1, i.e.,for unitary B. For diagonal B, a12 ) 0, since, as in the generalcase, we can arrange then to have ψ1(m)φ2(m) ) 0.

(4) One sees from (42) that large values of λν/λµ tend to producelarge contributions to the µν-summation. Thus, we can expectthat B values characterized by large condition numberssi.e.,large values of max /min will also be characterized by largevalues of Ω. Indeed many types of matrices produce at least onelarge singular value that increases with the dimension N andseparates from the rest of the spectrum, thereby becoming ill-conditioned. We show examples of this phenomenon below.

Broken Symmetry. Let us say two matrix modulations arelogically equivalent if they differ only by permutations of rowsand columns. Permuting columns and rows corresponds torelabeling beam slots and measurements, respectively. Every Bthus belongs to the logically equivalent class consisting of B andall B′ obtainable from it by row and column permutations; |detB| is an obvious invariant of the class. We would expect that theresults of an experiment are invariant to mere relabeling of thecomponents of the setup and thus, in particular, that Ω[B] is thesame for all logically equivalent B (hence our choice ofterminology). This indeed is true of nominal inversion in theabsence of noise. For then, a nominal inverse is a true inverse;for nonsingular B, B-1 necessarily “finds” B to produce the identitymatrix, regardless of the form of B. Noise, however, inhibits theformation B-1B in the nominal inverse. Thus logical equivalenceis not a guaranteed symmetry of nominal inversion, and Ω[B]may or may not be constant within a logically equivalent classof modulations.

This broken symmetry is a defect of nominal inversion but notone of particular practical consequence. A unitary matrix remainsunitary under row and column permutations, and thus unitarystrategies preserve logical equivalence symmetry. For putativenonunitary strategies, one can survey, or at least sample, Ω[B]over the corresponding logically equivalent class and choose a

B within the class that gives the smallest or near-smallest valueof Ω. “Almost” unitary strategies, i.e., modulations for whichΩ[B] ≈ 1, generally remain almost unitary across their logicallyequivalent classes. Similarly, poorly behaving modulations,marked by Ω[B] . 1, generally share that attribute across theirlogically equivalent classes, and such strategies likely would beabandoned altogether. A singular class of exceptions arenonunitary diagonal matrices Bd, i.e., diagonal matrices Bd * 1.In such a case, some members Bd′ of the logically equivalentclass may produce Ω[Bd′ ] . 1 even though, necessarily, Ω[Bd]) 1. This exception arises from the fact that Ω[Bd] ) 1 is theonly nonunitary case for which Ω[B])1. The logically equivalentclass of the identity matrix contains only unitary matrices. Onesemantic consequence of broken symmetry is that our identifica-tion of trivial strategies Bb with single-beam measurements doesnot extend to all their logically equivalent nondiagonal versions,except, of course, in the unitary case.

Condition Number. The condition numberκof a square matrixB is defined in general terms by κ[B] ) |B||B-1|, where |B|denotes a chosen norm for measuring the “magnitude” of B,analogously, in some sense, to a vector norm. A commonly usednorm for these purposes is the so-called “2-norm”, |B|2 ) max, i.e., the largest singular value of B. The 2-norm conditionnumber then is

κ2[B]) max min

(53)

The condition number signifies a B that is “(very) wellconditioned”, κ ≈ 1, or “(very) ill conditioned”, κ . 1, for thepurpose of solving a system of linear equations in which B isthe matrix of coefficients, and where the input, or possibly Bitself, is uncertain. This also is our concern, to be sure, butgenerally κ2[B], unlike Ω[B], does not actually emerge as theerror scale from analyses of such systems; mostly it is invokedas a handy but imprecise predictor of error magnification.However, κ2[B] shares some important properties with Ω[B]:κ[cB]2 ) κ[B] for scalar c, κ2[B] g 1, and κ2[B] ) 1 for unitaryB. κ2 also happens to be an invariant of logical equivalencesymmetry, which Ω is not. On the other hand, κ2[Bd] > 1, soκ2[B] does not characterize trivial strategies appropriately.

From (42), considering that λν/λµ e max λ/min λ in everyterm, we easily obtain

Ω[B]e κ2[B]2 (54)

Except for well-conditioned B, for which Ω[B] ≈ κ2[B] ≈ 1, thisbound is much too generous, which is not surprising, given thederivation. Typically, for ill-conditioned B, Ω[B] ≈ κ2[B] )O(N). Also typically, Ω[B] < κ2[B], i.e. κ[B] overestimates theerror scale, but it is easy to exhibit examples to the contrary: e.g.,

(1.81941 1.03541 1.045881.03541 1.91493 1.066691.04588 1.06669 1.37565

) (55)

for which (rounded) Ω ) 11.22 and κ2 ) 8.28.Finally, we note in passing that an alternative approximation

of Ω[B] can be had from (38) by setting |φµ(m)|2 ) |ψµ(m)|2 )N-1, minimally consistent with the required bases normalizations.Call the resulting summation

κ2[B])N-2∑µ)1

N

λµ∑ν)1

N

λν-1 ) λµλν

-1 (56)

Clearly, 1 e κ2[B] e κ2[B], with equality for unitary B.


Examples

For some matrix types, it is possible to work out explicitformulas for Ω[B].

(1) Uniform Upper Left Triangular (Hankel) Matrices.

B1(N)) (1 1 ... 11 1 ... 0l l ... l1 0 ... 0 ) (57a)

Ω[B1(N)])N (57b)

B1(N) is normal for all N, since B1B1† ) B1

†B1. Its singularvalues thus are the absolute values of its eigenvalues, whichhave been described by Blank,4 and, in decreasing order, behaveasymptotically with large N as

ν(N)) N(ν- 1/2)π

+O(1) (58)

Thus the largest singular value grows as 1(N) ) 2N/π. (Notethat the leading term of Blank’s asymptotic formula cannot beuse to compute |det B1| ) Πν)1

N ν(N) ) 1, since the missingterms of O(1) are important for this purpose; however, the crudeestimate “O(1)” ) N-1 ∑ν)1

N (ν(N) - N/[(ν - 1/2)π]) gets theorder of magnitude correctly for N not too large.)

(2) Uniform Lower Left Triangular Matrices.

B2(N)) (1 0 ... 01 1 ... 0l l ... l1 1 ... 1 ) (59a)

Ω[B2])N+ (N- 1)/N (59b)

Thus Ωf N + O(1) with increasing N. Note that while B2(N)has only unity eigenvalues, as read off the diagonal, it is notnormal. However, since B2B2

† ) B1B1†, B1(N) and B2(N) have the

same singular values, although different SVD bases.One also sees that B2(N) is a row permutation of B1(N) and

thus is its logical equivalent, but with Ω[B2(N)] > Ω[B1(N)],illustrating the broken symmetry discussed above. The perfor-mance difference, however, decreases to O(N-1) as N increases.Using commonly available mathematical software, one will findthat among all N! row permutations of B1(N), there are only Ndistinct values of Ω, one for each subset of permutations sharingthe same fixed row, which increase from Ω[B1(N)] to Ω[B2(N)]according to N + (n - 1)/N for n ) 1, 2,..., N. The columnpermutations of any row permutation of B1(N) yield the same setof distinct Ω values, corresponding now to fixed-columnpermutations. Thus, all tolled, the (N!)2 logical equivalents ofB1(N) are characterized by just N Ω values, each occurring N!(N- 1)! times over the class. The class minimum Ω is shared byB1(N) and its corresponding lower right triangular matrix, whilethe maximum is shared by B2(N) and its upper right triangularcounterpart. Of course, as N increases, the entire class performsincreasing poorly by the Ω characteristic. We note in passingthat |det B| ) 1 for the class, as for unitary B; clearly det B isnot a useful quantifier of error propagation for this class. However,the “condition number” max N/min N ) O(N) does “track”Ω[B1(N)] for large N.

(3) Zero-Diagonal Uniform Matrices.

B3 ) (0 1 ... 11 0 ... 1l l ... l1 1 ... 0 ) (60a)

Ω[B3])N- 1- 3(1-N-1)2 (60b)

Thus, again, Ωf N with increasing N. B3 is normal and haseigenvalues b ) (N - 1, -1,..., -1). Thus, its singular valuesare b ) (N - 1, 1,..., 1).

These examples exhibit the angular compression of nonor-thogonal BbmN mentioned in the geometric analysis following(10). In the triangular cases, the angle θm,m+1 between neighboringvectors goes as cos θm,m+1 ) 1 - (N - m)-1. Thus the vectors(Bbm)N gradually tend toward parallelism with increasing N. Similarbehavior occurs in the zero-diagonal case, where cos θm,m+1 )1 - (N - 1)-1 f 1. The effect, of course, is the elementarydecrease of the angle between a pair of adjacent vectors of fixeddifference as their lengths increase. Nonetheless, it is worthwhilenoting that this compression is not predicted by the determinant;although |det B|)N- 1 for the uniform zero-diagonal matrices,|det B| ) 1 for the triangular matrices. The quantity Ω[B] - 1seems to be a useful number for characterizing this type of angularcompression, with Ω[B]- 1) 0 signifying no such compressionat all.

(4) Hadamard Matrices. Hadamard matrices, HN, are N ×N matrices composed only of elements(1 and which are requiredto have orthogonal rows. Thus they satisfy

HNHNT )N × 1 (61)

Thus det HN ) (N, and HN is nonsingular. Because of therestriction on its elements, HN exists only for N ) 1, 2, and N) 4n for any positive integer n. With HN ) √N UN, UNUN

T ) 1from (61); thus UN is orthogonal. Therefore, if B ) HN, it is aconstant multiple of an orthogonal matrix and consequently Ω[B]) Ω[UN] ) 1. With regard to shot noise propagation, therefore,Hadamard matrices perform as well as any other unitary strategy,although the binary nature of their elements may be useful inimplementations of multiple beam setups. A similar conclusionwas reached without formal proof by Harwit and Sloan.1

(5) Discrete Fourier Transform Matrices (DFTMs). Gener-ally speaking, a modulation B effects a discrete linear trans-formation of noisy stimulus RmsN to response SmN. Theanalysis up to this point has focused solely on inverting thetransformation, but in some cases the response itself may be ofinterest. For example, the “modulation” defined by matrixelements

BmsDFT[N]) 1

√Ne2πi(m-1)(s-1)⁄N (62)

for m ) 1, 2,..., N and s ) 1, 2,..., N, performs a discrete Fouriertransform on a Nyquist mesh and is unitary. A complex-valuedmodulation such as this is not feasible for neutron beams,obviously, but real-valued analogs of (62) are possible. The realand imaginary parts of (62) are singular matrices, however,because of the inversion symmetries of the sinusoids, takenseparately. This problem is overcome by halving the interval ofthe Nyquist mesh, with or without offsets and end-pointcorrections; e.g., for discrete cosine transforms (in a standardordering):(4) Blank, M. L. Russ. Math. SurV. 2001, 56, 149.


BmsDCT1[N]) 1

√N(1- 1

2δm,1 -

12

δm,N)cosπ(m- 1)(s- 1)

N

(63a)

BmsDCT2[N]) 1

√Ncos

π(m- 1)(s- 1 ⁄ 2)N

(63b)

BmsDCT3[N])

2- δm,1

√Ncos

π(m- 1)(s- 1/2)N

(63c)

and

BmsDCT4[N]) 1

√Ncos

π(m- 1 ⁄ 2)(s- 1 ⁄ 2)N

(63d)

Of these, only BDCT4 is unitary, independently of N, and thusΩ[BDCT4])1. However BDCT1, BDCT2, and BDCT3 approach unitaritywith increasing N, so Ω[BDCTn[N]] ) 1 + O(N-1) as Nf ∞, forn ∈ 1, 2, 3. For n ∈ 2, 3, Ω[BDCTn[N]] ≈ 1 to within 1%at N ) 11, while DCT1 does not achieve comparable unitarityuntil N J 375. We note that

BmsDCT5[N])2- δm,1

Ncos

π(m- 1)(s- 1/2)N

(64)

also is unitary for all N, although it is not among the canonicalDCTs. For completeness, we also note that (BDCT1)-1 ) BDCT1,(BDCT2)-1 ) BDCT3, (BDCT4)-1 ) (BDCT4)T ) BDCT4, and (BDCT5)-1

) (BDCT5)T.In practice, DFTs typically arise as approximations to Fourier

transforms, e.g.,

g(x) ) ∫0

∞f(k) cos kx dkf

gm )kmax

N ∑s)1

N

fs cosπ(m- 1/2)(s- 1/2)

N

)kmax

√N∑s)1

N

fsBmsDCT4

(65)

where gm ) g[(m - 1/2)∆x], fs ) f[(s - 1/2)∆k], and ∆k∆x )π/N. The real-space (more generally, the conjugate-space)function g(x) can be interpreted as a “correlation function”, i.e.,a correlation integral of some functional of scattering lengthdensity, depending on the nature of the reflectivity spectrum. Letus say now that this is the quantity of direct interest, rather thanthe “underlying” reflectivities being summed.

The propagation of shot noise ∆Rms through DCT modulationscan be estimated by writing ∆Sm ) N-1 ∑s)1

N cms∆Rms, where cms

is one of the cosine functions in (63). Then, assuming the randomvariables, ∆Rms, are independent, the estimated standard deviationsof the errors in the Sm are

STD(∆Sm))√var(∆Rms)cms

2

√N(66)

where the overbar denotes an average over the s-index. Forsimplicity, take var(∆Rms) ) var(∆Rm1) for all s, so that (66)becomes

STD(∆Sm))STD(∆Rm1)

√N√cms

2 (67)

where (cms2 )1/2 ) 0.5 for the DCTn above. This is the familiar

law-of-large-numbers benefit of averaging, viz., STD(∆Sm) )O(1/N). Clearly, however, it unrelated to multibeam datacollection, i.e., to how the reflectivities are acquired. Exactly the

same benefit accrues to calculating SmN in the computer froma given RrN. And as we have seen, there is no error benefit oversingle-beam acquisition to measuring Rr) R(r)N on a multibeamrig with a unitary modulation strategy.

We can also think of this in the following way. (We now willuse gm for Sm.) It is true that a multibeam measurement of g1,say, automatically “does” the summation in (65) for a particularR1sN; but this measurement alone does not inform us of thereflectivities, so we cannot use it to compute g2. Instead, we mustmeasure g2, then g3, and so on, to gN. This entails the sameoverall measurement time (for the same total number of neutronson the sample) as if we were only interested in measuring Nreflectivities, RrN, using a multibeam strategy. Indeed, we couldinvert the just-measured gmN to obtain RrN and (instanta-neously) recompute gmN to the same degree of statisticaluncertainty. That is, for DCTn modulations, we have theinferences RmsN2 f gmN f RrN f gmN. The latter twoare purely computational and thus take no time. The first involvesN explicit measurements, viz., either N measurements of gmN

in the multibeam setup with DCTn modulation, or N single-beam measurements of Rm)s,sN with a diagonal (single-beam)strategy. Both pathways take the same instrumental time andyield gmN to the same overall accuracy.

Conclusions

We have shown how shot noise is propagated throughmodulated multibeam configurations for measuring reflectivities.We have introduced a measure, Ω, of the average noise errorresulting from the quasi-inversion of summed data and haveproven a theorem that Ω g 1, with Ω ) 1 occurring only for“trivial” single-beam modulation strategies and for constantmultiples of orthogonal (unitary) nontrivial strategies. Thecorollary to the theorem states, as a consequence, that modulatedmultibeam measurements of reflectivities can do no better, inregard to noise propagation, than single-beam measurementsusing the same average number of neutrons. We have also givensome examples of “good” and “bad” modulation strategies. Themost interesting lesson there is that strategies that are easy toimplement and, which intuition might suggest, are likely toperform well, often perform very badly, errorwise, as the numberof beams increases.

The question can be asked: Does our analysis of noisepropagation through a generic multiple-beam reflectometer applyto all such instruments? We believe it does, to the extent that wecan consider the incident neutrons as being partitioned into discretesets of convergent, well-collimated beams orsmore generallysthatmeasurements made on a multibeam instrument could also bemade on a “corresponding” single-beam instrument. Themathematical issue is whether a given multiple-beam modulationstrategy has an underlyingsand realizablessingle-beam strategyto be compared with. Consider the discussion in Example 5, forexample, which illustrated emulation of a continuous cosinemodulation by an N-beam DCT strategy. The cosine transformin (65) evokes the modulation that emerges in a natural wayfrom spin-labeling of divergent beams in recently devisedinstruments.2,3 Certainly, for large enough N, the continuoustransform can be approximated by a discrete transform asaccurately as desired; but for fixed ∆x, ∆k decreases as L-1 withincreasing length scale, L ) N∆x, implying increasingly higherangular resolution. For sufficiently large lengths scales, therefore,the angular resolution dictated by the multibeam approximationin (65) may not be realizable by well-collimated beams withouthaving to count for prohibitively long times. This, however, isa practical mattersalbeit a very important one. The point to be


made here is that intensity modulation, in and of itself, does notresult in higher efficiency for given statistical accuracy. Instead,as in the SESAME3 technique, it may enable the requisiteinstrumental wavevector transfer resolutionsfor the length scalesof intereststhat otherwise might not be achievable withoutunacceptable reduction of usable incident beam intensity when“lossy” devices, such as absorbing collimators or crystalmonochromators with narrow wavelength bands, must define

extremely tight beams. The analysis of noise propagation throughsuch rigs, however, remains within the formal scope of ourtheorem.

Acknowledgment. The authors acknowledge helpful discus-sions with Brian Maranville.

LA802780V


Documents

Analysis of Multibeam Data for Neutron Reflectivity †