Deterministic Sampling for Propagating Model Covariance

Copyright © by SIAM and ASA. Unauthorized reproduction of this article is prohibited.

SIAM/ASA J. UNCERTAINTY QUANTIFICATION c© 2013 Society for Industrial and Applied MathematicsVol. 1, pp. 297–318 and American Statistical Association

Deterministic Sampling for Propagating Model Covariance∗

Jan Peter Hessling†

Abstract. Deterministic sampling can be used for nonlinear propagation of the statistics of signal processingmodels. Unlike Monte Carlo methods, random generators are not utilized in any stage. The samplesare instead calculated deterministically. Our novel approach generalizes the deterministic samplingtechnique for propagating covariance in the unscented Kalman filter by introducing generic exci-tation matrices describing small discrete canonical ensembles. The approximation lies in how wellthe available statistical information is encoded in the discrete ensemble, not how each sample ispropagated. The application and performance of deterministic sampling are illustrated for a typicalstep response analysis of an electrical device modeled with an uncertain digital filter.

Key words. propagation, uncertainty, unscented, nonlinear, filter, ensemble

AMS subject classifications. 62D99, 62F25, 62G15, 62H12, 62H25, 65C60, 68U01

DOI. 10.1137/120899133

1. Introduction. The evaluation of covariance of signals usually involves analyticallyand/or numerically complicated manipulations as well as coarse approximations. The estab-lished default method [9] is based on model approximation with linearization. The majority ofmodern state-of-the-art methods belong to the broad class of Monte Carlo simulations basedon sampling with random sampling (RS) generators [15, 18]. What will be defined as deter-ministic sampling (DS) is a different class of methods. Both RS and DS apply comparablemethodologies originally phrased statistical sampling by Enrico Fermi in the 1930’s [14]. Themost well known example of DS is perhaps the propagation of covariance in the unscentedKalman filter (UKF) [20, 12, 11], which is widely utilized in the signal processing commu-nity. There are very few other examples. In the most general context presented here, DSutilizes small and thus highly effective ensembles of samples of models devised for particularpurposes. This work targets parameterized models h(q, t) = g(q, t) � x(t), which are identi-fied from calibration measurements and thus have dependent parameters q. Such a model isusually nonlinear in parameters and describes the linear response of a system with impulseresponse g(q, t) to an excitation x(t). The convolution (�) will here be evaluated with classicallinear digital filters [6].

Once the samples are found, the procedure of DS is identical to that of RS—the model isevaluated for all samples of the ensemble, followed by calculation of the desired statistics. Forinstance, assume h(q) depends on one parameter q with mean 〈q〉 and variance 〈δ2q〉, where〈·〉 denotes statistical expectation. The mean 〈h〉 and the variance 〈δ2h〉 of the model can

∗Received by the editors November 16, 2012; accepted for publication (in revised form) June 21, 2013; publishedelectronically September 24, 2013. This work was supported by the National Metrology Program from VINNOVA(the Swedish Governmental Agency for Innovation Systems).

http://www.siam.org/journals/juq/1/89913.html†SP Technical Research Institute of Sweden, Measurement Technology, SE-501 15 Boras, Sweden (peter.hessling@

sp.se).

297

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

http://www.siam.org/journals/juq/1/89913.html

mailto:[email protected]

mailto:[email protected]


298 JAN PETER HESSLING

be estimated with expectations over a discrete ensemble 〈·〉E consisting of only two samplesq(1,2),

(1.1)〈h〉 ≈ 〈h〉E = 〈h(q(k))〉E={k},〈δ2h〉 ≈ 〈δ2h〉E = 〈[δh(q(k))]2〉E={k},

q(1,2) = 〈q〉 ±√〈δ2q〉,δh(q) ≡ h(q)− 〈h〉.

If the model is linearized in parameters, δh(q) ≈ δq ·∂h/∂q(〈q〉), δq ≡ q−〈q〉, this simple ansatzyields the Gauss approximation formula directly [9]. Avoiding the linearization as above, weexpect a more accurate result. Indeed, the UKF utilizing DS is often superior to its linearizedequivalent, the extended Kalman filter [19].

All statistical moments of the parameters influence the resulting modeling covariance.Only for linear-in-parameter models does the kth moment of the parameters influence the kthmoment of the model. The representation accuracy can be improved by encoding more statisti-cal moments. If the samples instead are chosen as q(1...4) = 〈q〉±√〈δ2q〉×√

2(cos(φ) sin(φ)

),

we still have 〈δ2q〉E = 〈δ2q〉, while the requirement on the fourth moment 〈δ4q〉E = 〈δ4q〉 yieldsa transcendental equation for φ, cos4(φ)+sin4(φ) = 〈δ4q〉/2〈δ2q〉2. By padding with 2m “zero”samples,

q(1...2m+4) = 〈q〉 ±√

〈δ2q〉 × √m+ 2

(cos(φ) 01×m sin(φ)

),

the possible range of 4√

〈δ4q〉/√

〈δ2q〉 is extended from [1, 4√2] to [ 4

√1 +m/2, 4

√2 +m]. De-

pendencies (beyond correlations) are generally complex and strong for small ensembles. Con-trolling them is thus particularly important for DS ensembles. Another aspect is that samplesof many parameters selected to reproduce only expectation values (mean, covariance, etc.)are not automatically allowed. Some samples may fall outside the range of the model and beprohibited. Attention should be paid to find consistently approximating and valid ensembles.This illustrates the ideas of DS, as well as the common tricks and difficulties to be discussed.

For the targeted complex signal processing applications, appropriate RS ensembles aretoo large to be evaluated. The low convergence rate of “brute force” RS may, however, beimproved by partially controlling the distribution of samples deterministically using stratifiedsampling techniques, such as Latin hypercube sampling (LHS) [4]. Alternatively, large RSensembles can be allowed for by substituting the complex model with a simple approximation,as in response surface methodology (RSM) [4]. None of these approaches should be confusedwith DS, which does not use random generators at all (as RS/LHS) and makes no modelapproximations (as RSM does). Instead, DS samples with a definite rule at the expense oflimited control of the statistics of the model.

The background for our novel concept of deterministic sampling will be given (section2) before our method is presented (section 3). It is then applied to evaluate the mean andthe covariance of an uncertain model of a step response of an electrical device (section 4).The results are then discussed and generalized (section 5) before the principal findings aresummarized (section 6).

2. Primer.

2.1. The concept of deterministic sampling. On one hand, no further motivation of DSappears necessary: DS propagates the uncertainty of model parameters with deterministicallycalculated samples of these parameters. On the other hand, the UKF has been controversial

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


DETERMINISTIC SAMPLING 299

[3] ever since it was first proposed [12]. The criticism is often targeted towards specificsampling rules (which can always be improved) rather than the more general idea of calculatingappropriate samples. The principal ideas have not spread or been generalized beyond a fewvariants of only one aspect of Kalman filtering, which is not required for application of DS.In fact, the DS sampling rules have a much broader applicability than the Kalman filterand the principal ideas of DS are entirely independent of how Kalman filters work. Thus inmany possible applications of DS, the UKF is not even known to exist. These circumstancescomplicate the discussion of DS and its broad justification. Disregarding the obligation toinclude relevant references, DS can (but will not) be described without any reference to UKF(as in [19, Chapter 14.2]).

A source of misinterpretation is the falsely perceived contradiction of realizing statisticalsampling with DS. The key aspect is to separate the problem from its solution and rememberthat their character may or may not be the same. The statistical problem is to find theresult of a statistical model. The proposed deterministic solution is to calculate an optimizedminimal set of samples of this model to infer its statistical output. The severity of differentapproximations should be rated according to their impact, not the subjective aspect of theircharacter, which clearly is irrelevant. For any finite set of (DS as well as RS) samples it isimpossible to calculate any resulting statistical moment exactly: Since the infinite numberof moments of the parameters influences every resulting moment, an exact evaluation wouldrequire an infinite set of samples. Likewise, no finite (RS or DS) ensemble can provide an exactcomplete representation of a distribution function with infinitely many degrees of freedom.

As long as the ensemble statistics are truthful, the actual method of sampling is irrelevant.The only principal difference between RS and DS lies in their sampling methods. The two typesof ensembles are thus fully comparable, but the goals are nevertheless distinctively different.While RS targets an exact and asymptotic convergence to a given probability distribution,DS focuses on rapid but approximate convergence (with respect to the required number ofsamples). The priorities of convergence rate and accuracy are thus reversed in DS whencompared to RS. The asymptotic convergence is in practice useless if the computationalrequirements for approaching this limit are unrealistic. Approximate convergence has a vaguediffuse meaning if the available information about the correct distribution is incomplete, asusually is the case. The inevitable consequence of incompleteness is that different but valid DSensembles have different properties and yield slightly different results. The lack of a uniqueresult illustrates that the problem of propagating model uncertainty is not well defined, nomatter what method of uncertainty propagation is used. Relatively bold assignment of preciseunique distributions is common practice in RS to eliminate the ubiquitous lack of information,while in DS different ensembles are used to illustrate the same deficiency. To put them onequal footing, sets of plausible RS distributions are needed. The robustness or “uncertaintyof the evaluated uncertainty” resembles but is different from statistics of estimators. Theformer relates to incomplete information and the latter to incomplete or finite sampling. Thisvariability will be a major concern here but is hardly ever illustrated in RS, despite it beinga key figure of reliability in practice.

Finally, the common nomenclature [11, 19] of unscented “transformation” (UT) appearsslightly inappropriate, as no quantity is actually “transformed” without loss of information(as, e.g., with the Fourier transform). Rather, uncertain models are fundamentally sampled

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



from a (usually) continuous probability distribution according to a deterministic rule, withloss of statistical fidelity, just as in RS. The established convention is to label such a discreteset of samples ensemble. Finding a deterministic ensemble by means of DS (our notation)will here be the equivalent of making a UT. The UT should in this respect not be equatedwith any particular method of generating sigma-points (samples), to conform with its originalgeneral definition [20]. The concept of DS is even more general than UT, as it embraces allmethods for finite sampling of statistical information with a deterministic rule, e.g., samplingon confidence boundaries [8], which hardly qualifies as a UT.

2.2. Sampling with conservation of moments. A multivariate probability density func-tion (pdf) f(q) may be sampled (indicated = below) to form a finite discrete ensemble: The

continuous function f(q) of all parameters q =(q1 q2 · · · qn

)Tis then substituted with

the discrete set {q(ν)k }, where ν = 1, 2, . . . ,m denotes the different samples and k = 1, 2, . . . , nthe different parameters of the model. To be most relevant, the ensemble should preserve asmany statistical moments as possible. Expressed in deviations δqk ≡ qk − 〈qk〉 from the firstmoment,

0 = 〈δqk〉 =∫δqkf(q)dq1dq2 . . . dqn=

1

m

m∑ν=1

δq(ν)k ,(2.1)

〈δqk1δqk2〉 =∫δqk1δqk2f(q)dq1dq2 . . . dqn=

1

m

m∑ν=1

δq(ν)k1 δq

(ν)k2 ,

〈δqk1δqk2δqk3〉 =∫δqk1δqk2δqk3f(q)dq1dq2 . . . dqn=

1

m

m∑ν=1

δq(ν)k1 δq

(ν)k2 δq

(ν)k3 ,

... =... .

All sparse sampling techniques (DS or RS) must involve all samples and be global—changing, adding, or removing one sample usually requires resampling of all other samples.The samples are consequently dependent and often highly nonuniformly distributed in pa-rameter space. The acceptance criteria are identical for all sampling methods—accurate rep-resentation of the available statistical information, here formulated in statistical moments.Since these moments may, at least in principle, be estimated from experiments with arbitraryaccuracy, this set of equations provides a well-defined matching or optimization problem.

2.3. Nonlinear propagation of covariance. Central for the performance of various ensem-bles is their ability to propagate covariance nonlinearly. To study the influence from variousstatistical moments of the parameters on the modeling result, the model can be expanded inits parameters in a Taylor expansion,

h(q, t) =

+∞∑k=0

1

k!

{(δqT∇)kh(q, t)

}q=〈q〉

,(2.2)

δq =(δq1 δq2 · · · δqn

)T,

∇ =(

∂∂q1

∂∂q2

· · · ∂∂qn

)T.

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



If postevaluation at 〈q〉 is indicated with a bar ( ),

〈h〉 = h++∞∑k=2

1

k!

⟨(δqT∇)kh

⟩,(2.3)

〈δ2h〉 = 〈(δqT∇)h(δqT∇)h〉+ 2

+∞∑k=2

1

k!〈(δqT∇)h(δqT∇)kh〉(2.4)

++∞∑m=2

+∞∑k=2

1

m!k!

{〈(δqT∇)mh(δqT∇)kh〉 − 〈(δqT∇)mh〉〈(δqT∇)kh〉

},

where first order terms cancel since 〈δqj〉 = 0 for all j. Truncations of these expansions may beinaccurate since they are not made in definite small parameters. On the contrary, the involvedeven marginal moments 〈δ2kqj〉1/2k increase with k. The propagated mean and covariance cantherefore not be approximated by truncated Taylor expansions, unless the effect of the higherorder terms can be argued to be negligible. Indeed, a contradictory often cited claim [11] is thatencoding the mean and covariance of the parameters is sufficient to propagate the covarianceto second order. In compliance with our observations, this was recently demonstrated to beincorrect [3]. Clearly, such arguments must involve the regularity of the model. For instance,the lowest order contribution to the variance of the model h = q4 with finite derivatives upto fourth order may come from any moment 〈δkq〉, k = 2, 3, 4, . . . , 8, depending on the actualvalues of the mean and covariance of the parameters. Even more difficult to analyze are modelswith steps which are not differentiable at all. Apparently, Taylor expansions are of limiteduse for determining the accuracy of methods for propagating covariance. For this reason, themain arguments for our proposal of DS will not at all relate to Taylor expansions as in theprevailing literature, but rather to the truthfulness of encoding available statistical knowledge,in whatever form that might be. Accordingly, the hierarchical structure of representing variousstatistical moments from the lowest to the highest as motivated by Taylor expansions will notbe strictly followed, as, e.g., the maximum parameter variation (denoted by range in section3.2) usually is much better known than any moment higher than two. Taylor expansions will beutilized for referencing, illustrations, definitions, and classifications of statistical informationbut not for formulating statements of accuracy of proposed methods.

The first term of the model variance (see (2.4)) can be expressed in matrix notation as

(2.5) 〈(δqT∇)h(δqT∇)h〉 = ∇hT 〈δq ⊗ δqT 〉∇h = ∇hT cov(q)∇h.This term is the well-known Gauss approximation formula [9]. Here, the nonlinear adjustmentsare of particular interest,

ζ ≡ 〈h〉 − h,(2.6)

ψ ≡√

〈δ2h〉 −√

∇hT cov(q)∇h.(2.7)

The nonlinear shift ζ of the mean or scent was previously introduced [8] to reflect the per-formance of the UKF. Its comparatively accurate calculation of scent is perhaps its mostimportant aspect [19]. The nonlinear shift ψ of the standard deviation will here for short be

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



called warp to similarly distinguish its nonlinear “twist.” Note that none of these definitionsis equivalent to bias—bias is a property of an estimator, while scent and warp are well-definedproperties of models. According to the expansion in (2.3) and (2.4),

ζ =1

2Tr (cov(q)H) +R3(δq),(2.8)

ψ =

∑a,b,c Γabc

JaHbc2σR

+∑

a,b,c,dΛabcdHabHcd8σR

+∑

a,b,c,d (Λabcd +ΥabΥcd)JaKbcd6σR

+R5(δq),(2.9)

Υab ≡ 〈δqaδqb〉σaσb

= cov

(qaσa,qbσb

),(2.10)

Γabc ≡ 〈δqaδqbδqc〉σaσbσc

,(2.11)

Λabcd ≡ 〈δqaδqbδqcδqd〉σaσbσcσd

−ΥabΥcd.(2.12)

The rest term Rm(δq) denotes terms of order 〈|δq|m〉. While σ2a ≡ 〈δ2qa〉, the quantitiesJa ≡ σa∂h/∂qa and Hab ≡ σaσb∂

2h/∂qa∂qb are the normalized Jacobian vector and Hessianmatrix of the model h, respectively. The term with Kabc ≡ σaσbσc∂

3h/∂qa∂qb∂qc introducesno further moments. Note that all terms are time-dependent signals. Clearly, a first orderevaluation of the scent requires appropriate covariance of the ensemble. A lowest order cal-culation of the warp, however, also requires correct encoding of Γ and Λ. These tensors aredirect generalizations of the skewness and kurtosis [17] to several parameters and arbitrary(non-Gaussian) distributions. The skewness Γaaa measures the asymmetry of the marginalized(in qa) probability distribution and will here consequently be denoted by marginal skewness.Analogously, the marginal kurtosis Λaaaa is a lowest order indicator of the shape of the samemarginalized distribution. It expresses the qualitative difference between, for instance, theuniform and Gaussian probability distributions. The mixed kurtosis Λabcd with a, b, c, d notall equal may be just as important since their number can be very large; see Table 1.

Table 1The number of kurtosis elements for n parameters. Below, the indices a, b, c, d ∈ [1, n] are all different and

β(n, k) = n!/((n− k)!k!) is the binomial coefficient.

Type Number of elements n = 5 n = 20

Λaaaa N1 = n 5 20

Λabcd N2 = β(n, 4) 5 4845

Λaacd N3 = β(n, 3) ∗ 3!/2! 30 3420

Λaacc N4 = β(n, 2) 10 190

Λabab N5 = β(n, 2) 10 190

Λaaad N6 = β(n, 2) ∗ 2 20 380

N1/∑

k>1Nk – < 0.07 < 0.003

Apparently, nonlinear propagation of covariance is nontrivial. The available statisticalinformation is strongly limited; the majority of the skewness and kurtosis elements are usuallynot known. Instead, the standard information contains the first and second moments ofthe parameters but not more. This allows for a lowest order evaluation of the scent (for

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



sufficiently regular models) but not the warp. To proceed beyond linear propagation of themodel covariance in most cases requires more or less vague hypotheses.

2.4. Propagation of covariance in the unscented Kalman filter. The reference for thisstudy is the specific variant of DS used for propagating covariance in what will be referred to asthe UKF [19]. In this context it deserves to be emphasized that the general UT/DS approachto statistical sampling should not be confused with this particular sampling rule, as is common

in the UKF literature. The covariance of n uncertain parameters q =(q1 q2 · · · qn

)Tis

then encoded into a DS ensemble of 2n samples called sigma-points [11],

q(s,k) = 〈q〉+ δq(s,k), k = 1, 2, . . . , n, s = ±,(2.13)

δq(s,k) = s√n ·Δ(:, k), ΔΔT = cov(q),

where Δ(:, k) denotes the kth column of Δ. The mean 〈h〉 and variance 〈δ2h〉 of the model h(q)are estimated with ensemble expectations, as in (1.1). The sampling rule is here manifestedin the square root calculation of the covariance matrix (Δ). As suggested in [19], it maybe found with a Cholesky factorization [2]. A more symmetric alternative is to make asymmetric singular value decomposition (as in principle component analysis), Δ = UTSU ,where Sij = 0, i = j, UTU = UUT = I, and I is the identity matrix. Substituting this ansatz

in the covariance matrix yields S =√Ucov(q)UT . It has been shown [11] how this approach

can be extended to encode all marginal moments up to the third (asymmetric pdfs) or fourth(symmetric pdfs) order.

For many parameters with large covariance, the amplification with√n in (2.13) may cause

failure. For parametric variations beyond their allowed range, the model may be stronglyinaccurate, or even incorrect. Samples may be statistically (range of pdf) and/or physically(range of model) allowed. A possible solution to the scaling problem is provided by the scaledunscented transformation [10]. However, it is based on Taylor expansions and thus suffersfrom an approximation problem of the model. The UKF utilizes the mean and covariance ofthe parameters. According to section 2.3, that gives a correct first order calculation of thescent, provided the impact of higher order terms is small, but not the warp.

3. Method.

3.1. The excitation matrix. The samples or sigma-points of the UKF in section 2.4 arenot uniquely defined. Any “half” unitary transformation Δ → ΔV , V : V V T = I (V TV = Inot required) of the variations δq is allowed but modifies the result. It yields another equallyvalid matrix Δ ≡ ΔV since ΔΔT = ΔV [ΔV ]T = ΔV V TΔT = ΔΔT . The matrix V condensesthe invariance of the UKF ensemble. Including nontrivial matrices Vn×m,

Δ = UTSUV, ΔΔT = cov(q),(3.1)

UUT = UTU = I :(Ucov(q)UT

)kl= 0, k = l,

V V T = I, Skl =√

(Ucov(q)UT )kl.

The unitary matrix U accounts for all correlations between the parameters, while the diago-nal matrix S provides the scaling with the standard deviation in the principal directions of

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



cov(q). The matrix V of size n×m combines the n elementary normalized “UKF” excitations(columns) of Δ to m ≥ n + 1 ensemble variations. A complication for the interpretation ofV is that the nonunitary matrix UTSU modifies the distribution of samples. Accounting forthe scaling with S, the transformation reads as T = UTSUS−1 = I + UT [S,U ]S−1, where[S,U ] ≡ SU − US is the commutator of S and U . The commutator vanishes if all nonzeroelements of S are equal or if U = I. The samples are modified only if the correlations arestrong and the variances differ. The degree of modification can be measured by

(3.2) Ψ(T ) =1

n

n∑r=1

(1− maxc Trc −minc Trc

||Tr,:||)

∈ [0, 1], T ≡ UTSUS−1.

The distortion is here roughly indicated by the average (rows) maximum relative spread(columns) due to T , i.e., the deviation of T from unity disregarding pivots. As far as weknow, this has no simple relation to common matrix norms. Right multiplication with U inΔ = UTSUV (see (3.1)) is not needed but reduces Ψ significantly.

To guarantee vanishing first moments of all variations, 〈δq〉 = 0,

(3.3) V · 1m×1 = 0.

The normalization factor√mmust also be included in the samples (their number has increased

from n to m by the introduction of Vn×m). The generalized ensemble Σ of samples arrangedin columns is then given by

(3.4) Σ ≡ 〈q〉 ⊗ 11×m + UTSUV , V ≡ √m · V.

The excitation matrix [6, 7] Vn×m : V V T = m · I provides efficient means for introducingmore complex ensembles. Generic ensembles V are constructed from only the number ofparameters and desirable properties. Since they are canonical, i.e., contain normalized anduncorrelated samples, they may be reused indefinitely. They can be listed and characterizedin tables, similar to pdfs. Their differences are beyond the first and the second statisticalmoments. That is, they contain sampled information of the shapes of the marginal pdfsand dependencies of higher order than correlations. While the excitation matrix V containsgeneric samples, the ensemble Σ contains the specific “dressed” samples for each problem, asexpressed by (3.4). The use of excitation matrices for DS also greatly simplifies comparisonsbetween DS and RS, as common random generators primarily generate such matrices. Theequal status of the REF (RS) and the STD/SPX/BIN (DS) ensembles in section 4 is a directmanifestation of this.

The equivalent use of excitation matrices in DS and RS misleadingly suggests that higherorder dependencies are just as difficult to encode in DS as in RS. DS techniques are basedon rules that can be directly manipulated, while RS requires indirect modification of thesample generator. For instance, utilizing nonuniform sample weights in DS (see section 3.3.5)an arbitrary mixed moment of any finite number of parameters can easily be represented bysolving a linear system of equations [1]. To represent every additional moment, the ensemblemust be expanded with at least one sample. How to manipulate a random generator (RS) tofulfill an arbitrary additional mixed moment without modifying the others is unknown to us.

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



3.2. Characterization of ensembles. A canonical DS ensemble or excitation matrix V ischaracterized by

Γabc ≡ sign(Γabc)3

√|Γabc|,(3.5)

Λabcd ≡ sign(Λabcd)4

√|Λabcd|,(3.6)

Γabc =1

m

m∑k=1

VakVbkVck,(3.7)

Λabcd =1

m

m∑k=1

VakVbkVckVdk − δabδcd,(3.8)

Ma ≡ maxk

|Vak|,(3.9)

Ea ≡√

(V T V )aa, ||E|| =√√√√ 1

m

m∑k=1

E2k =

√n.(3.10)

The tensors Γ and Λ are the linearly scaled skewness and kurtosis of the uncorrelated and nor-malized “canonical” parameters q = S−1Uq, respectively, as defined in section 2.3. The rangeM or maximum sample variation is particularly important for controlling that all samples arephysically allowed. The excitation level E measures the magnitude of the variations (rows ofV ) in parameter space. Their root-mean-square ||E|| must equal

√n to fulfill V V T =

√m · I.

The efficiency or the number (m) of columns of Vn×m is important, as it reflects the criticalcomputation time.

3.3. Ensembles. By no means will the brief survey to be presented exhaust all possibleensembles V . It deserves to be emphasized that excitation matrices are used effectively inboth DS and RS, as mentioned in section 3.1. The REF ensemble described in section 3.3.1below and used for benchmarking in section 4 is one such example. Excitation matrices areunique up to any unitary transformation, just as the covariance square root is. The modelresult is nevertheless generally not invariant, reflecting that the ensembles are constructedfrom incomplete statistical information. The variation is model dependent, as expressed by(2.3)–(2.4). With complete characterization of V (by, e.g., all marginal and mixed moments),its invariance is removed. The fact that the specific choice of ensemble (as well as nonuniformweighting of samples; see section 3.3.5) affects the result motivates the use of different ensem-bles. The selection criteria are to be found in the incompleteness of the encoded information,here the first and second moments.

The lowest order calculation of the scent and warp and obtaining allowed samples willbe prioritized. Other criteria for selecting the most appropriate ensemble can of course beformulated. For instance, the focus on the shape of the marginal distributions but not higherdependencies is common practice in RS. This perspective will not be adopted but touchedupon in section 3.3.5, illustrated in the example section, section 4, and further explored in thediscussion section, section 5.

The STD and SPX ensembles presented in sections 3.3.2 and 3.3.3, respectively, wereoriginally proposed for critical real-time filtering applications where the trade-off between

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



accuracy and performance is important. Similar requirements apply for the numerically de-manding complex models targeted here. The central question for DS is thus whether or notit pays off to use a larger ensemble than the minimal (SPX) and what statistical informationthe excess samples should carry.

3.3.1. The reference ensemble (REF). Traditionally, RS provides the indisputable ref-erence for propagating covariance. For dependent parameters, it is not easier to control themixed higher order moments in RS than it is in DS. For these reasons, the different ensembleswill be evaluated against small randomly generated reference (REF) ensembles. Their firstmoments are canceled by means of subtraction, q → q− 〈q〉E. Dependencies are nontrivial asdesired and cannot be controlled in detail, but they decrease with increasing ensemble size. Itis thus small rather than large RS ensembles which provide the best comparison. For bench-marking, such REFs can be defined to provide the correct result. Using multiple REFs withadjusted first and second moments, the robustness or variability of the result due to differenthigher (> 2) moments can be studied with ease.

3.3.2. The standard ensemble (STD). The standard UKF described in section 2.4 uti-lizes perhaps the simplest excitation matrix,

(3.11) VSTD =√n · ( In×n −In×n

).

This ensemble has the largest possible range M and is thus most prone to generate prohibitedsamples of all ensembles. Its skewness vanishes since it is symmetric, while its marginalkurtosis diverges with the number of parameters,

m = 2n, M = E =√n · 1n×1,(3.12)

Γabc = 0, Λabcd =

⎧⎨⎩

4√n− 1, a = b = c = d,

−1, a = b = c = d,0 otherwise.

The marginal kurtosis cannot be larger than it is for the STD. Already, for n = 3, it isequal to that of the normal distribution ( 4

√2 = 1.2). The range is also large: For uniform

distributions with M =√12 = 3.5 and n > 12 (as in the example section, section 4), samples

are statistically prohibited. The evaluated warp is hence approximately correct for just a fewparameters (as in [11]), but it may be grossly incorrect for many parameters giving a resultmuch worse than linearization. As for all ensembles, the scent is correctly evaluated to lowestorder (for sufficiently regular models). The main advantage of the UKF is indeed its simplicityand evaluation of the mean result. Beside prohibited samples, a disqualifying property of theSTD is the mere dependence of its performance on the number (n) of parameters.

3.3.3. The simplex ensemble (SPX). The minimal simplex ensemble was introduced in[11] and motivated as the most efficient alternative of the UKF. A simplex ensemble can begenerated from half the standard ensemble, complemented by one “cancellation unit sample”1n×1 to cancel the first moments,

(3.13) VSPX =√n+ 1 · ⊥{(

In×n −1n×1

)}.

The half-unitary constraint V V T = m · I is violated by the cancellation sample but is re-stored by the operator ⊥ performing classical Gram–Schmidt orthogonalization [2, 17] andnormalization of rows. This operation complicates its characterization,D

ownl

oade

d 11

/25/

14 to

129

.120

.242

.61.

Red

istr

ibut

ion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.sia

m.o

rg/jo

urna

ls/o

jsa.

php



m = n+ 1, max M = max E =√n,(3.14)

Γ = 0, Λ ∼ ΛSTD.

The (large) nonzero skewness is due to the fact it is impossible to construct n+ 1 orthogonalvectors symmetrically in n dimensions. This is usually the most undesirable property of theSPX and is the price of being the most efficient. To profit from the finite skewness it shouldbe correct for almost all n3/3! skewness elements, which is unlikely for any model. Erroneousperformance of the SPX due to its skewness is sometimes characterized as a robustness problemin the UKF literature. The principal difficulty is the fatal combination of not controllableand potentially large skewness. The systematic skewness error of the ensemble might thusvary dramatically depending on, for instance, the order of orthogonalization or pivoting ofrows. This may dubiously be recognized as lack of robustness. Rather, the real problem isignorance—the result depends on the skewness, but this has not been paid attention to. Withknowledge of skewness/symmetry, it may likely not have been properly encoded by the SPX.Without knowledge, the large variability is a true reflection of the uncertainty of the evaluateduncertainty/covariance. The kurtosis is less problematic and more or less resembles that ofthe STD, as the SPX is mainly generated from this ensemble.

3.3.4. The binary ensemble (BIN). The binary ensemble [7] is the only novel elementaryensemble to be presented. It is primarily constructed with having the lowest possible rangeM to guarantee physically allowed samples, disregarding correlations (see (3.2)). The BINensemble is essentially nothing but the generalization of the trivial univariate ensemble VUNI =(+1 −1

), which encodes the first and second moments, to an ensemble of n uncorrelated

parameters. Its size is a direct consequence of eliminating the correlations from the combined

correlated ensemble V =(V TUNI V T

UNI · · · V TUNI

)T. The marginal moments of the BIN

have nothing to do with correlation and are hence the same as for VUNI .The binary ensemble without supplements can be extracted from any undergraduate text-

book on digital design. To construct the BIN, arrange all digital combinations of n bits inrows in a “standard” arrangement with the following modifications:

• Assign the levels ±1, instead of 0, 1.• Add two kinds of supplementary rows:

– Make cyclic shifts of all rows except the first by a quarter of their periodicity.– Mirror image by changing the sign of the second half of all original rows except

the last two and all cyclically shifted rows except the last.

The supplementary rows reduce the size of the ensemble drastically, from 2n to 2ceil(n+54 ).

For 20 parameters (section 4), it implies a reduction from about 106 to 128 samples. The BINexcitation matrix has the structure

(3.15) VBIN =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

+1 −1 +1 −1 +1 −1 +1 −1 · · ·+1 +1 −1 −1 +1 +1 −1 −1 · · ·+1 +1 +1 +1 −1 −1 −1 −1 · · ·−1 +1 +1 −1 −1 +1 +1 −1 · · ·−1 −1 +1 +1 +1 +1 −1 −1 · · ·+1 −1 +1 −1 −1 +1 −1 +1 · · ·−1 +1 +1 −1 +1 −1 −1 +1 · · ·...

......

......

......

.... . .

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



Rows 4 and 5 are here shifted versions of rows 2 and 3, while rows 6 and 7 are the mirrorimages of row 1 and the shifted row 4. The signature of the BIN is that all parameters areequally excited in all samples, with one standard deviation. The BIN is characterized by

m = 2ceil(n+54 ), M = 1n×1, Ea =

√n · 1n×1,(3.16)

Γaaa = Γaac = 0, Γabc = {−1, 0, 1},Λaaaa = Λaacc = Λaaad = 0, Λabcd = {−1, 0, 1}.

Its major advantage is the minimal range, opposite to the STD. A disadvantage is a generallylower efficiency than the SPX and also the STD for n > 11. For fewer parameters the efficiencyis better (n = 3, 5, 6, 7, 9, 10) or the same (n = 2, 4, 8), as it is for the STD. In some cases(n = 3, 7) it is even the same as for the SPX. This could be of interest in critical signalprocessing applications. The BIN will always support an even number of parameters. If n isodd, there will be n+ 1 possible BINs or associations of n rows to the n+ 1 rows of V .

3.3.5. Combined ensembles (CMB). New ensembles can be found by combining elemen-tary ensembles Vk, such as the STD, SPX, and BIN. In block-diagonal form, different sets ofparameters utilize different ensembles,

(3.17) V =

⎛⎜⎜⎜⎝

√mm1V1 0 · · ·

0√

mm2V2 · · ·

......

. . .

⎞⎟⎟⎟⎠ , m ≡

∑k

mk.

In “stacked” “block-row” form, each parameter makes use of all ensembles,

(3.18) V =(α1V1 α2V2 · · · )

, {αk} :∑k

α2kmk =

∑k

mk.

Two identical BIN ensembles of m samples may, for instance, be combined to satisfy a givenmarginal kurtosis. To allow for a large kurtosis, zero padding may be applied, as discussed inthe introduction (section 1),

VCMB =

√2 +

mz

m

(diag(cos φ) VBIN 0n×mz diag(sinφ) VBIN

),(3.19)

Λaaaa =(2 +

mz

m

) [cos4 φa + sin4 φa

]− 1 ≥ 0,(3.20)

where diag(X)jk = δjkXj. Solving the transcendental equation (3.20) for φ and appropriate

values of mz, any value of the marginal kurtosis may be correctly represented by VCMB. Notethat mixed kurtosis elements generally differ from those of VBIN and cannot be controlledindividually.

The ensemble VCMB can easily be generalized to encode any finite number of marginalmoments of symmetric pdfs. First, define a correlated excitation matrix with row k given bythe 1× m univariate excitation matrix (vector) for the parameter indexed k,

(3.21) VUNI(k, :) =(v(k)1 v

(k)2 · · · v

(k)m

).

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



It is sufficient to include only positive elements, as the BIN later will perform symmetrization.Above, VUNI(k, :) = W

(cos(φk) 0n×mz sin(φk)

), W =

√2 + mz

m . Finding excitationmatrices satisfying a large number of moments 〈ξ(UTS−1Uq)〉 of canonical parameters isgrossly simplified by introducing nonuniform sample weights Wk,

(3.22) 〈ξ(UTS−1Uq)〉 = ξ(V )W T , W ≡ (W1 W2 · · · Wm

), W · 1m×1 = 1,

as sometimes utilized in unscented transforms [11, 19]. Here, the operator ξ evaluates thestatistic(s) or moment(s) of interest; e.g., (ξ(V ))ij = (Vij)

k will return all (n) kth marginalmoments of the excitation matrix V . Assigning rather than solving for the sample pointsV (:, k) and instead letting the weights Wk be variables, the complicated strongly nonlinearsystem of equations (2.1) is translated to a strictly linear system [1] given by (3.22), which isstraightforward to solve. That will also make it possible to have full control of the range Mto avoid prohibited samples.

If only marginal moments are encoded with variable sample weights and collected in VUNI ,its correlations can be removed by combining VUNI with the BIN to a generalized CMB. It isgiven by an outer product over rows (as in (3.19)),(3.23)

VCMB =

⎛⎜⎜⎜⎝

VUNI(1, 1) VBIN (1, :) VUNI(1, 2) VBIN (1, :) · · · VUNI(1, m) VBIN (1, :)

VUNI(2, 1) VBIN (2, :) VUNI(2, 2) VBIN (2, :) · · · VUNI(2, m) VBIN (2, :)...

.... . .

...

VUNI(n, 1) VBIN (n, :) VUNI(n, 2) VBIN (n, :) · · · VUNI(n, m) VBIN (n, :)

⎞⎟⎟⎟⎠ .

This “decorrelation” can be made with the BIN (but not, e.g., the STD), as all samples arereproduced, but not modified, exactly the same number of times. That leaves all marginalexpectations, such as all marginal moments, invariant. In particular, the important range Mis preserved. The size m = m · mBIN of the generalized VCMB distinctively illustrates thelarge computational cost of controlling dependencies. The elimination of only the correlationincreases the ensemble size by the relatively large factor mBIN/2 (accounting for symmetriza-tion). To capture all moments up to, say, order 10 for 20 parameters requires only m = 11using nonuniform weights Wk and one “zero” sample, but mBIN/2 = 64. About six timesas many samples are thus needed to decorrelate the samples than to capture these marginalmoments (many more can in fact be approximately fulfilled by optimizing the sample points).The BIN does not, however, eliminate other kinds of dependencies (like the mixed skewness orkurtosis). Even though ensembles like the generalized VCMB are widely used in RS, they aregenerally inconsistent representations of statistical information (as will be illustrated in sec-tion 4) and will therefore not be further explored here. Cracking the nut of controlling higherorder dependencies is probably the main challenge of DS, as well as RS. The scrambling ofRS may eliminate dependencies but hardly provides control of the strong dependencies ofidentified models.

4. Example—step response model. The step response of an electrical device such asan oscilloscope can be modeled with a continuous time model, which can be sampled into a

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



discrete time digital filter [5],

G(z) =

∏uj=1 (z − zj)∏vk=1 (z − pk)

,(4.1)

q =(Re(zj) Im(zj) > 0 · · · Re(pk) Im(pk) > 0 · · · )T

.

The model parameters q consist of zeros zj and poles pk of the z-transform G(z) of the digitalfilter. Since these are either real or complex-conjugated in pairs, it is convenient to param-eterize in Re(zj , pk) and Im(zj , pk) > 0. The task of identifying a linear-in-response modeland its parametric covariance from measurements [13, 16] is here excluded for simplicity. Themodel is instead assumed to be given by a slightly distorted digital low-pass Butterworthfilter, of order 10 and cross-over frequency fc = 0.1fn, fn being the Nyquist frequency. Thecovariance matrix was found from an REF ensemble containing 100 samples generated fromuniform distributions. That implies an ideal statistical range M =

√3. The correlations were

typically√

cov(q)rc ≈ 0.10√

cov(q)rr, r = c, in normalized parameters. The transformationfrom canonical to actual samples resulted in Ψ = 0.21 (see (3.2)). The studied ensemblesare characterized (see section 3.2) in Table 2. The large mixed skewness and kurtosis ele-ments makes it evident that decorrelated (and normalized) canonical parameters generallyare strongly dependent (if multivariate distribution is not Gaussian).

Table 2Characterization of ensembles for modeling step response. The numbers of different skewness and kurtosis

elements are rough indicators of their importance and are therefore included in superscripts. Below, the indicesa, b, c, d are all different. The elements Λabab are given by Λaacc and are therefore omitted, but their number isincluded in the latter. For vectors with more than three different elements, only the minimum and maximumare given.

m E M Γaaa Γ(380)aac Γ

(1140)abc

REF 100 [3.4, 5.6] [1.7, 2.3] [−0.6,+0.6] [−0.7,+0.7] [−0.7,+0.7]

STD 40 [4.5, 4.5] [4.5, 4.5] [+0.0,+0.0] [+0.0,+0.0] [+0.0,+0.0]

SPX 21 [4.5, 4.5] [3.2, 4.5] [+0.0,+1.6] [−1.2,+0.0] [+0.0,+0.0]

BIN 128 [4.5, 4.5] [1.0, 1.0] [+0.0,+0.0] [+0.0,+0.0] [−1.0,+1.0]

CMB 257 [0.0, 3.7, 6.8] [1.4, 1.6] [+0.0,+0.0] [+0.0,+0.0] [−1.1,+1.1]

Λ(20)aaaa Λ

(190+190)aacc Λ

(380)aaad Λ

(3420)aacd Λ

(4845)abcd

REF [0.9, 1.1] [−0.8,+0.7] [−0.8,+0.8] [−0.8,+0.8] [−0.8,+0.8]

STD [2.1, 2.1] [−1.0,−1.0] [+0.0,+0.0] [+0.0,+0.0] [+0.0,+0.0]

SPX [1.8, 2.1] [−1.0,+1.3] [−1.3,+0.0] [+0.0,+1.2] [+0.0,+0.0]

BIN [0.0, 0.0] [+0.0,+0.0] [+0.0,+0.0] [+0.0,+0.0] [−1.0,+1.0]

CMB [0.9, 1.1] [+0.9,+1.0] [+0.0,+0.0] [+0.0,+0.0] [−1.2,+1.2]

The excitation levels (E) are identical for the different samples of the STD, SPX, andBIN. All their samples thus belong to the same hypersphere in parameter space. For theCMB, there are three different levels, one for each BIN and zero for the mz zero variations.Despite its high excitation levels (E) of the CMB, its range (M) is less than for the REF. Asexpected, no ensemble is close to satisfy the skewness of the REF. The elements Γaaa andΓaac of the SPX are clearly displaced. The variation of Γabc is larger for the CMB than for

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



the BIN, which in turn is larger than for the REF. According to its synthesis, the CMB hasthe proper marginal kurtosis Λaaaa for all n parameters indexed by a. Apparently, it has acost since Λaacc is strongly biased and the variation of Λabcd is larger than for the BIN, whichin turn is larger than for the REF. Comparing with the BIN, the question is whether theimproved marginal kurtosis of the CMB can compensate for its many poorer mixed kurtosiselements.

To propagate the covariance a full dynamic simulation of the step response was made forevery sample, i.e., filtering of a unit step signal with digital filters having transfer functionsgiven by (4.1), for different parameter sets Σ (see (3.4)). The mean and standard deviationwere then calculated at all discrete time instants. Subtracting the linearization result (see(2.8)) yielded the scent ζ shown in Figure 1 and the warp ψ shown in Figure 2. The maximummagnitudes of the scent and the warp of the REF turned out to be almost identical andsurprisingly small, max(|ζ|) ≈ max(|ψ|) ≤ 2 · 10−3. That suggests the model is effectivelyweakly nonlinear in its parameters.

Since the warp is just 3% of the maximum standard deviation, it is mostly a matter oflimiting the warp rather than evaluating it accurately. For the STD and the SPX this controlis very poor (Figure 2, top). Due to their defectively estimated warp their standard deviationwill be roughly 25% off, relative to the maximum. The reason for the gross exaggeration ofthe nonlinear effects is that their range max MSTD = max MSPX = 4.47 > 2.03 is more thantwice as large as for the REF. This failure of the STD and SPX is caused by the factor

√n,

or the large number of parameters. Contrarily, the dense BIN excitation matrix minimizesits range. Opposite to the STD and the SPX, the warp of the BIN (Figure 2, bottom) hasthe correct magnitude. As anticipated from the negligible influence of marginal elements ofhigher moments (section 2.3), the CMB did not generally improve the result beyond the BIN.On the contrary, the CMB almost always (using differently generated REFs) yielded a largerwarp than the BIN. The scent is only roughly estimated with the STD and the SPX (Figure1, top), while it is accurately evaluated with the BIN and the CMB ensembles (Figure 1,bottom).

For long-range ensembles such as the STD and the SPX, the poles may even exit from thestable interior of the unit circle in the z-plane. The model then becomes physically prohibited,and the result may be completely unrealistic. For this purpose it is useful to study pole-zeroplots. All poles and zeros of the REF are for comparison displayed in Figure 3, with all thesamples of the pole marked “P” shown in Figure 4, for the different ensembles.

The STD has almost (since Ψ > 0) all its variation concentrated to the four samplesp1,2,3,4, given by the plus and minus directions in the two principal directions (full, thin). Thevariation of all other samples is entirely due to the transformation UTSUS−1 (see (3.4)), asquantified with Ψ (see (3.2)). Since the effects of this transformation are rather small, it isindeed meaningful to associate properties of V with the samples Σ. The concentration ofthe STD samples to the center compensates for the large excitations of the four p-samples.The sample p1 is close to the stability limit |z| = 1 (full, thick) indicated in the upper rightcorner. It is evident that the SPX is constructed from half the STD, as two of the fourdistant samples (p3, p4) related to the omitted half of the STD are missing. The symmetryis thus broken, giving a large and often unrealistic skewness; see Table 2. Compensating (bythe diagonalization) minor displacements of the samples in the center are discernible in the

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



0 20 40 60 80 100−4

−3

−2

−1

0

1

2

3

4

x 10−3

Time (TS)

ζ

<h>/500

REFSTDSPX

0 20 40 60 80 100−4

−3

−2

−1

0

1

2

3

4x 10

−4

Time (TS)

ζ−ζ R

EF

ζREF

/10

BINCMB

Figure 1. The scent ζ of the REF, the STD, and the SPX (top), and the scent error (for clarity) ζ− ζREF

for the BIN and the CMB (bottom) ensembles. The downscaled mean response (thin, top) and the correct scentζREF (thin, bottom) are included for comparison.

directions (left and downwards) of these missing samples. If the standard deviations wouldbe only slightly higher, the STD as well as the SPX may become unstable, despite the factthat the range of the REF shows that all (allowed) samples must be stable.

The BIN (Figure 4, bottom left) utilizes multiple excitations to limit the range and hencescatter the samples in the intermediate directions (dotted). The samples thus cluster infour distinct domains. The CMB is composed of two BINs, with two different ranges. Thetransformation UTSUS−1 will mix samples to wider “clouds” than for the BIN. The zerosamples used to increase the kurtosis of the CMB are not affected and are visible in thecenter. Even though the marginal kurtosis of the CMB is correct, its range is clearly toolarge.

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



0 20 40 60 80 100

−5

0

5

10

15

20

x 10−3

Time (TS)

ψ

σREF

/5

REFSTDSPX

0 20 40 60 80 100

−2

0

2

4

6

8

10

12x 10

−3

Time (TS)

ψ

σREF

/5REFBINCMB

Figure 2. The warp ψ of the REF, the STD, and the SPX (top), and for the BIN and CMB (bottom)ensembles. The correct downscaled standard deviation σREF is included for comparison.

To summarize, the BIN appears to be the most accurate of all considered ensembles.For several possible reasons, the CMB performed worse. If the model uncertainty is low ormoderate, the more efficient STD and the SPX may be utilized. The large skewness of SPXis likely inappropriate for any model and should be used with caution. It can be motivatedonly by its efficiency, hardly its ability to propagate covariance nonlinearly. The size of theBIN may seem large, but for comparable accuracy it is considerably smaller than any RSensemble up to at least n = 20 parameters. Most importantly, the outcome of the comparisoncould have been inferred already from the characterization of the ensembles (Table 2) withoutevaluating the step response. The ensemble can hence be selected without reference to anyspecific example. That is required for a wide and reliable utilization of DS.

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



−1.05 −1 −0.95−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

Re(z)

Im(z

)

0.7 0.8 0.9 1

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

Re(z)

Im(z

)

P

Figure 3. The zeros (left) and the poles (right) of the REF ensemble. The marker “P” relates to thesamples shown in Figure 4.

5. Discussion. The targeted signal processing models are assumed to be identified fromcalibration measurements. Their parameters are by necessity more or less dependent. Chang-ing to canonical parameters eliminates correlations but does not generally cancel other de-pendencies. The large number of mixed statistical moments emphasizes the importance ofaddressing dependencies. Encoding the marginal at the expense of poorer representation ofmixed moments, as for the CMB in the example, is inconsistent and may not improve theresult at all. Accounting for dependencies is even more difficult in random sampling, as itrelies upon a random generator and not a rule that can be manipulated. The inconsistencythus cannot be resolved, or even shown using traditional Monte Carlo simulations.

A consistent approximation beyond the second moment of the parameters is required fornonlinear evaluation of the covariance. For the targeted complex signal processing modelswith many dependent parameters, determining such large amounts of statistical informationof the model appears intractable: For a typically sized model with 20 parameters the marginalskewness elements constitute a fraction no larger than 1.3% of all skewness elements. Thecorresponding fraction of marginal kurtosis elements is less than 0.3% (Table 1). This curse ofdimensionality sets in for very few parameters. It illustrates in a profound way that multivari-ate analysis is distinctly more complex than univariate analysis. It would also be a formidabletask to encode all this information in any ensemble. Even so, the large number of degreesof freedom would almost certainly render a useless ensemble far too large to be evaluated.Perhaps more important is that adjusting the skewness and the kurtosis of any ensemble willchange its range. The range must not be too large, as it might trigger false nonlinear effects.In the worst case, samples may not even be physically allowed (resulting in, e.g., unstabledigital filtering).

If it is insufficient to tune only the marginal kurtosis tensor elements, it is not worthwhile toadjust for any other higher marginal moments either. These moments are nothing but various

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



0.86 0.88 0.9 0.92 0.94 0.960.24

0.26

0.28

0.3

0.32

0.34REF

0.86 0.88 0.9 0.92 0.94 0.960.24

0.26

0.28

0.3

0.32

0.34STD

p1

p2

p3

p4

0.86 0.88 0.9 0.92 0.94 0.960.24

0.26

0.28

0.3

0.32

0.34SPX

p1

p2

0.86 0.88 0.9 0.92 0.94 0.960.24

0.26

0.28

0.3

0.32

0.34BIN

0.86 0.88 0.9 0.92 0.94 0.960.24

0.26

0.28

0.3

0.32

0.34CMB

Figure 4. The samples (dots) of one pole, marked “P” in Figure 3, for the REF, STD, SPX, BIN, and CMBensembles. Lines (full) connecting the four primary excitations p1,2,3,4 of the STD as well as lines (dashed) tocombined excitations of the BIN are included for reference.

projections of the marginal pdfs and hence carry similar information. Our conclusion is hencethat for many dependent parameters the selection of marginal pdfs (uniform, Gaussian, etc.) ismore or less irrelevant, as long as dependencies beyond correlations are not accounted for. Thisapplies to RS, where the selection of these pdfs traditionally is considered highly important. Itis hence questionable whenever RS gives a more realistic result than any appropriate versionof DS. One possible reason why this deficiency has not been reported is that RS usually isassigned golden reference status, per se.

For comparison to RS, the BIN ensemble provides DS with comparable modeling capabil-

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



ities for symmetric pdfs. Any finite number of higher moments of the marginal distributionsmay first be encoded and combined in a correlated DS ensemble. It can then be decorrelatedby expansion with the BIN but none of the other discussed ensembles, as explained in section3.3.5. In the more general case of asymmetric marginal pdfs and/or the need of encodinghigher mixed moments, nonuniform weighting of samples can be utilized to satisfy or approx-imate an arbitrary number of marginal and mixed moments by solving a linear system ofequations. Such alternatives have only been touched upon in this work but are often used inunscented transforms [19] and will be presented in greater depth elsewhere. This illustratesthat DS is at least as as flexible as RS due to the introduction of the novel BIN ensemble andthe use of nonuniform weighting of samples.

To the extent the ensemble is appropriate for a given signal processing model, it is acorrect and, most importantly, complete representation. It can be propagated in one and onlyone trivial way and results in one unique answer, irrespective of what the model is used for.There is then no longer any issue of how to propagate the uncertainty. Rather, the questionis how to find the most appropriate ensemble. Selecting a DS ensemble is equivalent to theassignment of a pdf—both represent statistical information of the model. DS ensembles mayprovide a more consistent approximation, as all its multivariate aspects can be evaluated.These aspects bring excitation matrices on a par with probability distributions, as statisticalrepresentations.

It is possible to assign a particular discrete ensemble to the model, as it is identified[13, 16] from calibration measurements. The statistical analysis is then effectively transferredfrom the use to the determination of the uncertain model. That is logical, as the calibrationdata provides the prime information for inferring the statistics of the model. The appropri-ateness of any ensemble can be verified against different REF ensembles, as shown in theexample section (section 4). Imperfections of generic ensembles can also be compensated bymore conservative estimates of the covariance of the parameters. This further emphasizes theadvantage of combining ensemble selection and determination of covariance in the process ofmodel identification.

The quality of DS as well RS is mainly determined by whether the amount of encodedstatistical information is sufficient for the complexity of the behavior of the actual model. Theaccuracy of encoding given information is entirely controlled in DS, but not in RS, as it dependson the convergence of random generation. The additional ensemble error due to incompleteconvergence is the main problem for RS, but it will never exist for DS with prescribed fixedensemble sizes. In stark contrast to RS, DS is always completely reproducible.

6. Conclusions. DS has been suggested as an efficient robust method for propagatingmodel covariance through complex signal processing models. The prime advantage of DS isnot the accuracy—it is its simplicity, efficiency, and noninvasive character which allow forvirtually unlimited utilization. The nonlinear displacement of the expected mean result, orscent, can be accurately evaluated with appropriate ensembles. The nonlinear correction of thepropagated covariance, or warp, is considerably more difficult to evaluate, as it requires knowl-edge of rarely known details of the statistics of the model. Fortunately, the scent is primarilyutilized in applications, whereas the warp affects less critical figures of quality. Computationalefficiency is in DS traded against resolution of vaguely known statistical information. From

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



this perspective, DS appears to be optimized for uncertainty quantification of complex signalprocessing models.

Starting with the original standard and simplex ensembles of the UKF, the novel binary(BIN) ensemble was derived. By combining several BINs into a CMB ensemble it is possible toencode any finite number of even marginal moments of uncorrelated parameters. Its control ofsample range and ability to provide means for representing complex marginal statistical infor-mation are not shared by any of the other presented ensembles. In the example, the marginalkurtoses were encoded by fusing two BINS. Contrary to the widespread belief, the CMB didnot perform better than the BIN. The reason is that it is not a consistent approximation toencode the marginal but not the mixed kurtosis elements. The studied ensembles are nothingbut examples of infinitely many alternatives for DS; the most appropriate ensembles are likelyyet to be discovered. Preferably, DS ensembles are determined as the model is identified fromcalibration measurements. At present, there exists no competing method for propagating co-variance with comparable efficiency and performance. Finding good generic DS ensembles isthus an important but complex topic for further investigation.

REFERENCES

[1] L. Angrisani, M. D’Apuzzo, and R. S. L. Moriello, Unscented transform: A powerful tool formeasurement uncertainty evaluation, IEEE Trans. Instrum. Meas., 55 (2006), pp. 737–743.

[2] A. Bjork, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996.[3] F. Gustafsson and G. Hendeby, Some relations between extended and unscented Kalman filters, IEEE

Trans. Signal Process., 60 (2012), pp. 545–555.[4] J. Helton and F. Davis, Latin hypercube sampling and the propagation of uncertainty in analyses of

complex systems, Reliab. Eng. Syst. Saf., 81 (2003), pp. 23–69.[5] J. P. Hessling, Metrology for non-stationary dynamic measurements, in Advances in Measurement Sys-

tems, INTECH, Vukovar, Croatia, 2010, pp. 221–256.[6] J. P. Hessling, Integration of digital filters and measurements, in Digital Filters, INTECH, Rijeka,

Croatia, 2011, pp. 123–154.[7] J. P. Hessling, Deterministic sampling for quantification of modeling uncertainty of signals, in Digital

Filters and Signal Processing, INTECH, Rijeka, Croatia, 2013, pp. 53–79.[8] J. P. Hessling and T. Svensson, Propagation of uncertainty by sampling on confidence boundaries,

Int. J. Uncertainty Quantification, 3 (2013), pp. 421–444.[9] ISO GUM, Guide to the Expression of Uncertainty in Measurement, International Organisation for Stan-

dardisation, Geneva, Switzerland, 1995.[10] S. Julier and J. Uhlmann, The scaled unscented transformation, in Proceedings of the IEEE American

Control Conference, 2002, pp. 4555–4559.[11] S. Julier and J. Uhlmann, Unscented filtering and nonlinear estimation, Proc. IEEE, 92 (2004),

pp. 401–422.[12] S. Julier, J. Uhlmann, and H. Durrant-Whyte, A new approach for filtering non-linear systems, in

Proceedings of the IEEE American Control Conference, 1995, pp. 1628–1632.[13] L. Ljung, System Identification: Theory for the User, 2nd ed., Prentice–Hall, Upper Saddle River, NJ,

1999.[14] N. Metropolis, The beginning of the Monte Carlo method, Los Alamos Sci., no. 15 (1987), pp. 125–130.[15] N. Metropolis and S. Ulam, The Monte Carlo method, J. Amer. Statist. Assoc., 44 (1949), pp. 335–341.[16] R. Pintelon and J. Schoukens, System Identification: A Frequency Domain Approach, IEEE Press,

Piscataway, NJ, 2001.[17] L. Rade and B. Westergren, Mathematics Handbook, 2nd ed., Studentlitteratur, Lund, Sweden, 1990.

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



[18] R. Rubenstein and D. Kroese, Simulation and the Monte Carlo Method, 2nd ed., John Wiley & Sons,New York, 2007.

[19] D. Simon, Optimal State Estimation: Kalman, H∞, and Nonlinear Approaches, Wiley, Hoboken, NJ,2006.

[20] J. K. Uhlmann, Dynamic Map Building and Localization: New Theoretical Foundations, Ph.D. thesis,Robotics Research Group, University of Oxford, Oxford, UK, 1995.

Dow

nloa

ded

11/2

5/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Documents

Deterministic Sampling for Propagating Model Covariance