Upload
pradeep-khola
View
20
Download
1
Embed Size (px)
DESCRIPTION
Bounded Influence Magnetotelluric Response Functi
Citation preview
Bounde
d Influence Magnetotelluric Response Function Estimation
Alan D. Chave
Deep Submergence LaboratoryDepartment of Applied Ocean Physics and Engineering
Woods Hole Oceanographic InstitutionWoods Hole, MA 02543, USA
David J. Thomson
Department of Mathematics and StatisticsQueens University
Kingston, Ontario K7L 3N6, Canada1
Abstract
Robust magne
magnetic inductio
influence of unus
sitive to exceptio
bounded influenc
ers and leverage p
response function
tor combines a st
hat matrix diagon
sions to magnetot
ence method whi
influence estimat
field variables us
illustrated using atotelluric response function estimators are now in standard use by the electro-
n community. Properly devised and applied, these have the ability to reduce the
ual data (outliers) in the response (electric field) variables, but are often not sen-
nal predictor (magnetic field) data, which are often termed leverage points. A
e estimator is described which simultaneously limits the influence of both outli-
oints, and has proven to consistently yield more reliable magnetotelluric
estimates than conventional robust approaches. The bounded influence estima-
andard robust M-estimator with leverage weighting based on the statistics of the
al, which is a standard statistical measure of unusual predictors. Further exten-
elluric data analysis are proposed, including a generalization of the remote refer-
ch utilizes multiple sites instead of a single one and a two-stage bounded
or which effectively removes correlated noise in the local electric and magnetic
ing one or more uncontaminated remote references. These developments are
variety of magnetotelluric data.2
1. Introductio
The magnetote
fields observed at
damental datum i
measured electric
nal source fields (
data, this may be
where
E
and
B
ar
cific site and freq
solution to (1) at
where the supersc
parentheses are th
derived between
nearly identical, t
When
E
and
B
noise, and it beco
Conventional MT
approaches and G
sensitive to small
inadequacies of t
biased and/or phy
In recent years
remote reference
simultaneously a
effective at elimin
data-adaptive rob
ing, have been de
are usually called
E ZB=
Z EBH( )=n
lluric (MT) method utilizes naturally-occurring fluctuations of electromagnetic
Earths surface to map variations in subsurface electrical conductivity. The fun-
n MT is the site-specific, frequency-dependent tensor relationship between the
and magnetic fields. Under general conditions on the morphology of the exter-
e.g., Dmitriev and Berdichevsky 1979), in the absence of noise, and with precise
written
e two-vectors of the horizontal electric and magnetic field components at a spe-
uency, and Z is the second rank, 2x2 MT response tensor connecting them. The
a given frequency is
ript H denotes the Hermitian (complex conjugate) transpose and the terms in
e exact cross- and auto-power spectra. Similar linear relationships can be
the vertical and horizontal magnetic field variations. Since the methodology is
his paper will focus only on the MT case.
are actual measurements, (1) and (2) do not hold exactly due to the presence of
mes necessary to estimate both Z and its uncertainty Z in a statistical manner. data processing has historically been based on classical least squares regression
aussian statistical models (e.g., Sims et al. 1971) which are well-known to be
amounts of unusual data, uncorrelated noise in the magnetic field variables, and
he model description itself. This often leads to estimates of Z which are heavily
sically uninterpretable, and can affect estimates of Z even more seriously., this situation has been dramatically improved by three developments. First, the
method (Gamble et al. 1979), in which the horizontal magnetic fields recorded
t an auxiliary site are combined with the fields at the local site, has proven quite
ating bias from uncorrelated local magnetic field noise. Second, a variety of
ust weighting schemes, sometimes applied in conjunction with remote referenc-
veloped to eliminate the influence of data corresponding to large residuals which
outliers (Egbert and Booker 1986; Chave et al. 1987; Chave and Thomson
1( )
BBH( )1
2( )3
1989; Larsen 198
1996; Egbert 199
methods was dem
general use by th
based estimates o
Jones 1997), and
1989) and rigorou
However, ther
or man-made noi
cally-interpretabl
noted the serious
MT tensor bias fr
documented (Sza
mid-latitude sour
variable spatial sc
responses (Egber
posed. Schultz et
duced by Chave a
this paper to min
residuals, and hen
(2001) proposed
taminated sites as
duced in this pap
approach to detec
tailored to the spe
these are availabl
In a further att
son (1989) throug
problems from bo
the predictor (ma
predictor data. Th9; Sutarno and Vozoff 1989, 1991; Larsen et al. 1996; Egbert and Livelybrooks
7; Oettinger et al. 2001; Smirnov 2003). The superior performance of robust
onstrated in the comparison study of Jones et al. (1989), and these are now in
e MT community. Third, it is now recognized that conventional distribution-
f the errors in the MT response tensor elements are often biased (e.g., Chave and
a nonparametric jackknife method has been introduced (Chave and Thomson
sly justified (Thomson and Chave 1991) to eliminate this problem.
e remain unusual data sets, typically due either to natural source field complexity
se, where even robust remote reference methods sometimes fail to yield physi-
e MT responses. For example, Schultz et al. (1993) and Garcia et al. (1997)
bias that could be produced by short spatial scale auroral substorm source fields.
om cultural noise, especially that from DC electric rail systems, has been widely
rka 1988; Junge 1996; Larsen et al. 1996; Egbert et al. 2000). Some types of
ces (notably those from Pc3 geomagnetic pulsations) have short and temporally-
ales (Andersen et al. 1976; Lanzerotti et al. 1981) which can also alter MT
t et al. 2000). For such situations, extensions to robust processing have been pro-
al. (1993) and Garcia et al. (1997) applied a bounded influence estimator intro-
nd Thomson (1992) and further developed by Chave and Thomson (2003) and in
imize the influence of extreme magnetic field data which do not produce large
ce may be missed by robust estimators. Larsen et al. (1996) and Oettinger et al.
multiple transfer function methods to remove cultural noise using distant, uncon-
references. Both methods are special cases of two-stage processing that is intro-
er. Egbert (1997) introduced a robust multivariate principal components
t and characterize coherent cultural noise. This enables robust estimators to be
cific coherent noise characteristics derived from multiple station data when
e.
empt to deal with MT processing problems, this paper updates Chave and Thom-
h extensions to robust remote reference MT data processing which can control
th outliers in the response (electric field) and extreme data (leverage points) in
gnetic field) variables, as well as eliminate correlated noise in the response and
e next section contains a brief discussion of spectral analysis principles directly 4
relevant to this pa
cusses the role of
the remote refere
ence estimation a
describes bounde
local electromagn
processing results
robust or bounde
variety of field an
issues.
2. Discourse o
It is assumed
variations from o
digitized without
(although well-de
it is assumed that
squares spline fitt
Because outlie
tions in time, rob
(WOSA) approac
cussion). The bas
of each time serie
the frequency of
tapered with a da
Fourier transform
removed. In the c
from each section
frequencies are a
cally akin to wav
functions is chanper. Section 3 reviews robust estimation as applied to MT data. Section 4 dis-
the hat matrix as a leverage indicator. Section 5 introduces a generalization of
nce method utilizing multiple reference sites. Section 6 discusses bounded influ-
s a means to simultaneously control both outlier and leverage effects. Section 7
d influence two-stage MT processing which can eliminate correlated noise in the
etic fields. Section 8 outlines some pre-processing steps that can improve MT
under special circumstances. Section 9 discusses statistical verification of
d influence procedures. Section 10 illustrates many of these concepts using a
d synthetic MT data. Finally, Section 11 is a discussion of a few remaining
n Spectral Analysis
that contemporaneous, finite time sequences of the electric and magnetic field
ne or more sites are available. It is also presumed that the time series have been
aliasing and edited to remove gross errors like boxcar shifts or large spikes
signed processing algorithms can often accommodate these problems). Finally,
serious long-term trends have been removed by high pass filtering or least
ing procedures.
r detection is facilitated by comparing spectra estimated over distinct subsec-
ust MT processing is typically based on the Welch overlapped section averaging
h (Welch 1967; see Percival and Walden 1993, Section 6.17, for a detailed dis-
ic algorithm is computationally straightforward. After time domain prewhitening
s and starting at the lowest frequency, a subset length that is of order a few over
interest is selected. The data sequences are then divided into subsets and each is
ta window. Data sections may be overlapped to improve statistical efficiency.
s are taken, prewhitening is corrected for, and the instrument response is
ontext of this paper, the data are these Fourier transforms of the windowed data
at a single frequency. The section length is then repetitively reduced as higher
ddressed. A variable section length WOSA analysis of this type is philosophi-
elet analysis (e.g., Percival and Walden 2000) in that the time scale of the basis
ged along with the frequency scale to optimize resolution.5
It is important
selected data tape
cally, data tapers
teria (e.g., Harris
those finite time s
tration of energy
on the continuous
of landmark pape
crete time problem
late spheroidal or
Numerical solutio
numerically robu
described in App
has been thoroug
these are employ
both the amount o
width
/N (Figure
provides limited
pendent on the sta
In this instance, p
dB of bias protec
is the frequency o
grid may be taken
point. It is also im
taminated by unr
lapped without in
1 to 4. Percival an
Prewhitening i
to the time series
example, the Yule
ance sequence (ac to understand the bias (i.e., spectral leakage) and resolution properties of the
r and design a frequency sampling strategy that accommodates both. Histori-
have been chosen largely on the basis of computational simplicity or ad hoc cri-
1978). However, an optimal class of data tapers can be obtained by considering
equences (characterized by a length N) which have the largest possible concen-
in the frequency interval [-W,W]. The ensuing resolution bandwidth is 2W. Work
-time version of this problem goes back to Chalk (1950), and culminated in a set
rs by Slepian and Pollak (1961) and Landau and Pollak (1961, 1962). The dis-
was solved by Slepian (1978), and the solution is the zeroth-order discrete pro-
Slepian sequence with a single parameter, the time-bandwidth product =NW. ns for the Slepian sequences are obtained by solving an eigenvalue problem; a
st tridiagonal form is given by Slepian (1978) and a more accurate variant is
endix B of Thomson (1990). The superiority of Slepian sequences as data tapers
hly documented (e.g., Thomson 1977, 1982; Percival and Walden 1993), and
ed exclusively in the present work. The time-bandwidth product determines f bias protection outside the main lobe of the window and the main lobe half
1). A useful range of for MT processing is typically 1 to 4. A =1 window (about 25 dB) bias protection but yields raw estimates that are essentially inde-
ndard discrete Fourier transform (DFT) grid with frequencies separated by 1/N.
rewhitening is essential to avoid spectral bias. A =4 window provides over 100 tion but yields raw estimates that are highly correlated over [f-W,f+W], where f
f interest. As a rule of thumb, frequencies which are spaced apart on the DFT as independent; Thomson (1977) gives a more quantitative discussion of this
portant to avoid frequencies within of the DC value, as these are typically con-esolved low frequency components. The amount that data sections can be over-
troducing data dependence also varies, ranging from 50 to 71% as ranges from d Walden (1993, Section 6.17) give a quantitative treatment of this issue.
s most easily achieved by filtering with a short autoregressive (AR) sequence fit
. Standard time domain AR estimators are based on least squares principles; for
-Walker equations (Yule 1927; Walker 1931) may be solved for the autocovari-
vs), from which the AR filter may be obtained using the Levinson-Durbin recur-6
sion (Levinson 19
for the acvs are o
zeros of the filter
filter z-transform
ally enhance spec
opposite to the de
difficulties. A sim
Fourier transform
computed using t
and taking the fre
AR filter follows
robust AR filter e
3. Robust Res
In the standard
matrix equations
where there are N
given frequency),
solution 2-vector,
L
2
norm of the ra
The elements of
b
based on the avai
values of the resp
an estimate for th
The condition
which is optimal
tistics (e.g., Stuar
applies when the
e bz +=
z^ bHb( )1
b(=47; Durbin 1960). In the presence of extreme data, such least squares estimators
ften seriously biased, yielding AR filters which may either be unstable (i.e., the
z-transform lie outside the unit circle) or highly persistent (i.e., the zeros of the
lie inside but very close to the unit circle). In such instances, the filter may actu-
tral leakage by increasing the dynamic range of the data spectrum, which is
sired outcome. A robust approach to acvs estimation is essential to avoid these
ple robust AR estimator may be obtained using acvs estimates obtained from the
of a robust estimate of the power spectrum. The robust power spectrum may be
he WOSA method with a low bias (e.g., Slepian sequence with =4) data taper quency-by-frequency median rather than mean of the raw section estimates. The
from the robust acvs using the Levinson-Durbin recursion. The importance of
stimation for prewhitening is illustrated in Section 10.
ponse Function Estimation
linear regression model for the row-by-row solution of (1), the equivalent set of
is
observations (i.e., N Fourier transforms of N independent data sections at a
so that e is the response N-vector, b is the Nx2, rank 2 predictor matrix, z is the
and is an N-vector of random errors. The error power H, or equivalently, the ndom errors, is then minimized in the usual way, yielding
Hb and bHe are the averaged estimates of the auto- and cross-power spectra
lable data. The regression residuals r are the differences between the measured
onse variable e and those predicted by the linear regression =b , and serve as
e random errors . s on the variables in (3) and their moments that yield a least squares solution (4)
in a well-defined sense are given by the Gauss-Markov theorem of classical sta-
t et al. 1999, Chapter 29). The textbook version of the Gauss-Markov theorem
predictor variables in (3) are fixed, but Shaffer (1991) has extended it to cover
3( )
He) 4( )
e^ z^7
the cases where t
the distribution o
conditions, when
will be considere
the best linear un
ance independent
must exist. In add
maximum likelih
along with the em
appeal of a least s
However, with
data and their err
ance is often depe
source field comp
duration of many
independent resid
are much more co
tribution is typica
bias the least squ
of all three, probl
These observa
that they are relat
model, and that r
an active area of
have been adapte
(1987) presented
ing, and only an o
The most suita
by analogy to the
Chapter 18). M-e
errors in (3), but he joint distribution of e and b is multivariate normal with unknown parameters,
f b is continuous and nondegenerate but otherwise unspecified, or under mild
b is a random sample from a finite population. It is as one of these cases that (3)
d in this paper. Under these circumstances, the linear regression solution (4) is
biased estimate when the residuals r are uncorrelated and share a common vari-
of any assumptions about their statistical distribution except that the variance
ition, if the residuals are multivariate Gaussian, then the least squares result is a
ood, fully efficient, minimum variance estimate. These optimality properties,
pirical observation that most physical data yield Gaussian residuals, explain the
quares approach.
natural source electromagnetic data, the Gauss-Markov assumptions about the
or structure are rarely tenable for at least three reasons. First, the residual vari-
ndent on the data variance, especially when energetic intervals coincide with
lexity, as occurs for many classes of geomagnetic disturbance. Second, the finite
geomagnetic events causes data anomalies to occur in patches, violating the
ual requirement. Third, due to marked nonstationarity, frequent large residuals
mmon than would be expected for a Gaussian model, and hence the residual dis-
lly very long-tailed with a Gaussian center. Any one of these issues can seriously
ares solution (4) in unpredictable and sometimes insidious ways; in the presence
ems are virtually guaranteed.
tions have led to the development of procedures which are robust, in the sense
ively insensitive to a moderate amount of bad data or to inadequacies in the
eact gradually rather than abruptly to perturbations of either. Robust statistics is
research (e.g., Hampel et al. 1986; Wilcox 1997), and many of its techniques
d to spectral analysis beginning with the work of Thomson (1977). Chave et al.
a tutorial review of robust methods in the context of geophysical data process-
utline relevant to the MT problem will be presented here.
ble type of robust estimator for MT data is the M-estimator which is motivated
maximum likelihood method of statistical inference (e.g., Stuart et al. 1999,
stimation is similar to least squares in that it minimizes a norm of the random
the misfit measure is chosen so that a few extreme values cannot dominate the 8
result. The M-est
whose i-th entry i
originates in stati
speaking, yields a
parameter. For sta
ally, if
(x) is cho
regression residu
hood. In practice,
must be chosen o
and Wilcox (1997
Minimizing
R
where
is an N-
The solution to (5
tion,
is replace
from the previous
and scale
estimate obtained
(5) corresponding
Iteration ends wh
weights are chose
defined sense, the
A simple form
A value of a=1.5
viik[ ] ri
k 1[ ] (=
ri0[ ] di
0[imate is obtained computationally by minimizing RHR, where R is an N-vector
s , d is a scale factor, and (x) is a loss function. The term loss function stical decision theory (e.g., Stuart and Ord 1994, Section 26.52), and loosely
measure of the distance between the true and estimated values of a statistical
ndard least squares, (x)=x2/2 and (4) is immediately recovered. More gener-sen to be -log f(x), where f(x) is the true probability density function (pdf) of the
als r which includes extreme values, then the M-estimate is maximum likeli-
f(x) cannot be determined reliably from finite samples, and the loss function
n theoretical or empirical grounds. Details may be found in Chave et al. (1987)
).
HR gives the M-estimator analog to the normal equations
vector whose i-th entry is the influence function (x)=x(x) evaluated at x=ri/d.
) is obtained using iteratively re-weighted least squares, so that at the k-th itera-
d by v[k]r[k] where v[k] is an NxN diagonal matrix whose i-th entry is
. The weights are computed using the residuals and scale
iteration to linearize the problem; the solution is initialized using the residuals
from ordinary least squares, or for badly contaminated data, those from an L1
using a linear programming algorithm. The weighted least squares solution of
to (4) is
en the solution does not change appreciably, typically after 3-5 steps. Since the
n to minimize the influence of data corresponding to large residuals in a well-
M-estimator is data adaptive.
for the weights is due to Huber (1964). This has diagonal elements
gives better than 95% efficiency with outlier-free Gaussian data. Downweighting
ri d( )
bH 0 5( )=
d k 1[ ]) rik 1[ ] d k 1[ ]( )
]
z^* bHvb( )
1bHve( ) 6( )=
vii 1= xi a
viia
xi------- xi a 7( )>=9
of data with (7) b
als are to be rega
scale invariant wi
duce a comparab
The scale estim
and theoretical va
statistic is the me
where the subscri
The quantity is
MAD of
where is the th
The solution o
The data in MT p
ian is not necessa
tude since this is
appropriate distri
Rayleigh distribu
model for the res
Because the w
provide inadequa
(7) is convex and
iterations of a rob
the Gumbel extre
gested using the m
for the final estim
r~
~egins when |xi|=|ri/d|=a, so that the scale factor d determines which of the residu-
rded as large. It is necessary because the weighted least squares problem is not
thout it, in the sense that multiplication of the data by a constant will not pro-
ly affine change in the solution.
ate must itself be robust, and is typically chosen to be the ratio of the sample
lues of some statistic based on a target distribution. An extremely robust scale
dian absolute deviation from the median (MAD), whose sample value is
pt (i) denotes the i-th order statistic obtained by sorting the N values sequentially.
the middle order statistic or median of r. The theoretical MAD is the solution
eoretical median and F denotes the target cumulative distribution function (cdf).
f (5) and hence the choice of d requires a target distribution for the residuals r.
rocessing are Fourier transforms and therefore complex, but the complex Gauss-
rily the optimal choice. It is preferable to measure residual size using its magni-
rotationally (phase) invariant, and Chave et al. (1987) showed that the
bution for the magnitude of a complex number is Rayleigh. The properties of the
tion are summarized in Johnson et al. (1994, Chapter 10). A Rayleigh statistical
iduals will be used exclusively in this paper.
eights (7) fall off slowly for large residuals and never fully descend to zero, they
te protection against severe outliers. However, the loss function corresponding to
hence convergence to a global minimum is assured, making it safe for the initial
ust procedure to obtain a reliable estimate of the scale. Motivated by the form of
me value distribution, Thomson (1977) and Chave and Thomson (1989) sug-
ore severe weight function
ate, where the parameter determines the residual size at which downweighting
sMAD r r~ N 1+
2------------- 8( )
=
F ~ MAD+( ) F ~ MAD( ) 1/2 9( )=
vii e2[ ] e
xi ( )[ ] 10( )expexp=10
begins and
the truncation fun
cated setting toallowed residual
expects the larges
contaminated poi
To summarize
short sections of a
frequency of inte
istics. At each fre
the residuals r in
the residual distri
(7), where the res
tinues until the w
1-2%. The scale i
using the more se
change appreciab
The algorithm
except that the se
a few over the sec
tive at eliminating
is used. With vari
that its use be avo
by correlation. In
Chave and Thom
MAD. Robust est
response tensor in
Finally, an ess
both an estimate
response are depe
vii = when xi = 0. The weight (10) is essentially like a hard limit at xi= except that
ction is continuous and continuously differentiable. Chave et al. (1987) advo-
the N-th quantile of the Rayleigh distribution which automatically increases the
magnitude as the sample size rises. In an uncontaminated population, one
t residual to increase slowly (approximately as ), but if the fraction of
nts is constant, a level at the N(1-)-th quantile would be more appropriate., robust processing of MT data utilizes the Fourier transforms of comparatively
long data series, where the section length is chosen to be of order a few over the
rest. Data sections may be overlapped depending on the data window character-
quency, an initial least squares solution is obtained from (4) and used to compute
(3) as well as the scale d from the ratio of (8) and (9) using a Rayleigh model for
bution. An iterative procedure is then applied using (6) with the Huber weights
iduals from the previous iteration are used to get the scale and weights. This con-
eighted residual power rHvr does not change below a threshold value, typically
s then fixed at the final Huber weight value and several iterations are performed
vere weight (10), again terminating when the weighted residual power does not
ly.
outlined here is very similar to that introduced by Chave and Thomson (1989)
ction length is variable and always maintained so that the frequency of interest is
tion length. It has been empirically shown that robust procedures are most effec-
outliers without serious statistical efficiency reduction when this methodology
able section lengths, there is no need for band averaging, and it is recommended
ided to prevent errors on the confidence intervals of the response tensor caused
addition, use of the interquartile distance for a scale estimate as discussed by
son (1989) has been discontinued since it has proven to be less robust than the
imation as described here is capable of yielding reliable estimates of the MT
a semi-automatic manner for most data sets.
ential feature of any method for computing MT responses is the provision of
and a measure of its accuracy. Traditionally, confidence intervals on the MT
ndent on explicit statistical models which are ultimately based on a Gaussian
1
N2( )log11
distribution. In ad
presence of corre
tions to yield a tr
other data anoma
of the limitations
and Chave (1991
tors which requir
Thomson and Ch
spectral analysis,
intervals on the M
elements obtained
saved for later us
to violations of th
mates (Hinkley 1
sense that they w
documented for M
error through eith
which might othe
tortion as describ
4. The Hat Ma
The hat or pre
widely used to de
Belsley et al. 198
NxN hat matrix i
The residual r is
from a linear regr
and hencedition to problems in computing the correct number of degrees of freedom in the
lated spectral estimates and the practical requirement for numerous approxima-
actable result, these parametric methods are even more sensitive to outliers or
lies than the least squares estimates themselves. For a more complete discussion
of parametric uncertainty estimates in a spectral analysis context, see Thomson
). This has led to the increasing use of nonparametric confidence interval estima-
e fewer distributional assumptions, among which the simplest is the jackknife.
ave (1991) give a detailed description of its implementation and performance in
while Chave and Thomson (1989) describe its use for estimating confidence
T response tensor. This requires only that N estimates of the response tensor
by deleting a single data section at a time from the ensemble be computed and
e. In a regression context, the jackknife offers the advantage that it is insensitive
e assumption of variance homogeneity which is implicit to parametric error esti-
977). The jackknife also yields conservative confidence interval estimates in the
ill tend to be slightly too large rather than too small (Efron and Stein 1981), as
T by Eisel and Egbert (2001). Finally, the jackknife offers a way to propagate
er linear (e.g., rotation) or nonlinear transformations of the MT response tensor
rwise be completely intractable; tensor decomposition to remove galvanic dis-
ed in Chave and Smith (1994) is a prime example of the latter.
trix and Leverage
dictor matrix is an important auxiliary quantity in regression theory, and is
tect unusually influential (i.e., high leverage) data (Hoaglin and Welsch 1978;
0; Chatterjee and Hadi 1988). For the regression problem defined by (3), the
s given by
the difference between the observed electric field e and the value predicted
ession on b. It follows that
H b bHb( )1bH 11( )=
e^
e^ He 12( )=
r I H( )e 13( )=12
where I is the ide
diagonal element
value of the respo
dent of the actual
inverse of hii is in
Some relevan
because it is a pro
tics can be used t
projection matrix
rank(H)=rank(b)
Thus, the expecte
ing. When hii=0,
then the predicted
data exactly. In th
leading to the term
the i-th row of b
ner to an outlier i
age exerted by a p
exceed the expec
suggests that valu
In Chave and T
plex Gaussian da
asymptotic forms
The result is the b
columns in b and
beta function rati
Chave and Thom
where x=2/N anntity matrix. The hat matrix is a projection matrix which maps e onto . The
s of H (denoted by hii) are a measure of the amount of leverage which the i-th
nse variable ei has in determining its corresponding predicted value indepen-
size of ei, and this leverage depends only on the predictor variables b. The
effect the number of observations in b which determine .
t properties of the hat matrix are described by Hoaglin and Welsch (1978). First,
jection matrix, H is Hermitian and idempotent (H2=H). These two characteris-
o show that the diagonal elements satisfy 0hii1. Second, the eigenvalues of a
are either 0 or 1, and the number of nonzero eigenvalues equals its rank. Since
=p, where p is the number of columns in b (usually 2 for MT), the trace of H is p.
d value of hii is p/N. Finally, the two extreme values 0 and 1 have special mean-
then is fixed at zero and not affected by the corresponding datum in e. If hii=1,
and observed values of the i-th entry in e are identical, and the model fits the
is instance, only the i-th row of b has any influence on the regression problem,
high (or extreme) leverage to describe that point. More generally, if hii>>p/N,
will exert undue influence (or leverage) on the solution in an analogous man-
n e. Thus, the hat matrix diagonal elements are a measure of the amount of lever-
redictor datum. The factor by which the hat matrix diagonal elements must
ted value to be considered a leverage point is not well defined, but statistical lore
es which are more than 2-3 times p/N are a concern (Hoaglin and Welsch 1978).
homson (2003), the exact distribution of the diagonal elements of (11) for com-
ta was derived from first principles, extending previous work which yielded only
for real data (e.g., Rao 1973; Belsley et al. 1980; Chatterjee and Hadi 1988).
eta distribution (hii,p,N-p), where p is the number of predictor variables or
N is the number of data. The cumulative distribution function is the incomplete
o Ix(p,N-p), and a series expression suitable for numerical solution is given in
son (2003). For the MT situation (p=2) and when N>>2, some critical values , d Ix=, are given in Table 1. The table shows that the probability that a given hat
e^
ei^
ei^
ei^
z^13
matrix diagonal e
However, data wh
occur with proba
number of valid d
corresponding to
reasonable proba
Gaussian data, ye
is longer tailed th
off values should
It is important
considerably from
the rows of b. To
Let denote th
cross spectrum of
may be written
where 2 is an est
cross- and auto-p
In the special cas
which is just the
nent of b. As 2 in
Sjkilement will exceed the expected value of 2/N by a factor of 4.6 is only 0.001.
ose corresponding hat matrix diagonal is only twice the expected value will
bility 0.091. Rejecting values at this level will typically remove a significant
ata unless N is fairly small. Chave and Thomson (2003) advocate rejecting data
hat matrix diagonal elements larger than the inverse of the beta distribution at a
bility level. For example, selecting the 0.95 level will carry a 5% penalty for
t effectively protect against large leverage points. In data sets whose hat matrix
an beta (a common occurrence with MT data from auroral latitudes), larger cut-
be used.
to note that using the hat matrix diagonal elements as leverage statistics differs
a simpler approach based on some measure of the relative variance or power in
see this, consider (11) with p=2, as for a conventional single site MT sounding.
e power between the j,k variables in the i-th row of b and let be the averaged
the j,k variables obtained from all N data in b. The i-th diagonal entry in (11)
imate of the squared coherence between bx and by derived from the averaged
owers. The expected value of the right hand side of (14) is easily seen to be 2/N.
e where 20 so that bx and by are uncorrelated, (14) reduces to
sum of the ratios of the power in the i-th row to the total power in each compo-
creases, (14) may be rewritten as
Sjk
hii1
N 1 2( )-----------------------
Sxxi
Sxx--------
Syyi
Syy-------- 2
2Re
Sxyi
Sxy--------
+ 14( )=
hiio
=Sxx
i
Sxxi
i 1=
N
-----------------
Syyi
Syyi
i 1=
N
-----------------+
15( )14
This quantity ma
and sign of the ra
points may corres
variables is much
power between b
with high leverag
This discussio
entries also have
leverage which th
again depends on
diagonal element
local in time, and
off-diagonal elem
The hat matrix
definition in (11)
where v is the d
5. A Generaliz
Consider the r
remote reference
standard remote r
surements collect
reference site or w
reference site, the
regressed on Q iny be either larger or smaller than (15), depending on the size of 2 and the size tio of the i-th row cross-power to the total cross-power. Thus, high leverage
pond to rows of b where 2 is small and the power in one or both of the predictor
larger than the average power, or to rows of b where 2 is nonzero and the cross-
x and by is much larger than the average cross-power. Intermediate situations
e are also possible.
n has centered on the diagonal elements of the hat matrix, but the off-diagonal
a leverage interpretation. The i,j off-diagonal element of H gives the amount of
e j-th response variable ej has on the i-th predicted variable , and the leverage
ly on the predictor variables b. Thus, in the MT context of this paper, the off-
s of H give the amount of leverage exerted by magnetic field data which are non-
hence are expected to be small unless there is significant nonstationarity. The
ents of H will not be considered further in this paper.
and its properties easily generalizes to the robust algorithm of (5)-(6) using the
iagonal matrix of robust weights with entries vii.
ation of the Remote Reference Method
egression equation for the MT response (3), and assume that an auxiliary set of
variables Q are available, where for a given frequency, Q is Nxq with q2. For eference MT data, Q would typically consist of horizontal magnetic field mea-
ed some distance from the base site, and hence q=2. For more than one remote
here more than the auxiliary horizontal magnetic field is available at a single
rank q of Q may be larger. In this instance, the local magnetic field b may be
the usual way by solving
hii1
1 2-------------- hii
o22Re
Sxyi
Sxyi
i 1=
N
-----------------
16( )=
ei^
H vb bHvb( )1bH v 17( )=15
where t is a qx2 t
ence variables in
series of univaria
local magnetic fie
which is just a pro
remote reference
When q=2 so tha
(20) can be simpl
which is identica
and cross-spectra
to show that
because such noi
tion (21) with the
the remote refere
ables problems ca
Geary 1949), the
metric technique
dle violations of
with the residuals
From (20), the
which reduces to
when a single rem
erence field form
^bHransfer tensor between the local horizontal magnetic field b and the remote refer-
Q and is an Nx2 random error matrix. In practice, (18) may be solved as a te regressions on the columns of b, t and . By analogy to (12), the predicted ld is
jection operation. Substituting for b in (3) and solving yields the generalized
analog to (4)
t Q is the standard Nx2 matrix of remote reference magnetic field variables br,
ified using standard matrix identities to yield
l to the remote reference solution of Gamble et al. (1979), where the local auto-
are replaced by cross-spectra with the reference magnetic field. In fact, it is easy
, and hence (20) is not downward biased by uncorrelated noise in
se is not present. Equation (20) may be thought of as the remote reference solu-
two-channel reference magnetic field br replaced by the projection (19). Just as
nce method is equivalent to an econometric technique for solving errors-in-vari-
lled the method of instrumental variables (Wald 1940; Reiersol 1941,1945;
generalized remote reference method in (19)-(20) is identical to a related econo-
called two-stage least squares (Mardia et al. 1979). Both methods arose to han-
the Gauss-Markov theorem where the predictor variables b in (3) are correlated
r, in which case the solution (4) is not unbiased and statistically consistent.
hat matrix for the two-stage least squares solution is
ote reference station is used. Equation (23) differs from the mixed local and ref-
HR proposed in Chave and Thomson (1989; see their equation 27) and further
b Qt 18( )+=
b^ Q QHQ( )1QHb 19( )=
^b
z^q b^Hb^( )
1b^He 20( )=
z^r brHb( )
1br
He 21( )=
b^ ^
bHb=^b
Hq^b
^bHb
^( )
1 ^bH 22( )=
Hr br brHbr( )
1br
H23( )=16
studied by Ghosh
properties outline
adopted in remot
Chave and Th
(21) to yield a rob
where the weight
straightforward e
from uncorrelated
shown to perform
1997). However,
reference magnet
channels at a sing
can be very usefu
quency character
6. Bounded In
Two statistical
estimators are the
point is the small
bounds that can b
outlying data poin
effect of an additi
the functional der
tions may be unb
details may be fo
The procedure
inating bias due t
data that produce
and the local mag (1990). HR is not formally a projection matrix, and hence shares the hat matrix
d in section 3 only approximately. The more natural version (23) should be
e reference applications.
omson (1989) combined a robust estimator like that described in Section 3 and
ust remote reference solution to the MT problem
s v are computed as for (6) based on the ordinary residuals r from (3). This is a
xtension of M-estimation that simultaneously eliminates downward bias in (6)
noise on the magnetic field channels and problems from outliers, and has been
well under general circumstances (e.g., Jones et al. 1989; Chave and Jones
it provides limited protection from overly influential values in either the local or
ic fields. Equation (24) is easily generalized to more than 2 remote reference
le station or to multiple reference sites by substituting for br to give . This
l when remote reference sites contain uncorrelated noise with different fre-
istics, as is demonstrated in Section 10.
fluence MT Response Function Estimation
concepts which are useful in understanding problems with least squares and M-
breakdown point and the influence function. Loosely speaking, the breakdown
est fraction of gross errors which can carry an estimate beyond all statistical
e placed on it. Ordinary least squares has a breakdown point of 1/N, as a single
t can inflict unlimited change on the result. The influence function measures the
onal observation on the estimate of a statistic given a large sample, and hence is
ivative of the statistic evaluated at an underlying distribution. Influence func-
ounded or bounded as the estimator is or is not sensitive to bad data. Further
und in Hampel et al. (1986, Section 1.3).
s described in Sections 3 and 5 are capable in a semi-automatic fashion of elim-
o uncorrelated noise in the local magnetic field and general contamination from
outlying residuals. However, they do require that the reference channels in Q
netic field b be reasonably clear of extreme values or leverage points caused by
z^*r br
Hvb( )1
brHve( ) 24( )=
^b z^*
q17
instrument malfu
phenomena. Whe
severely biased. A
1/N, so that a sing
ence functions ar
The discussion
of leverage, and h
regression residu
(e.g., Mallows 19
mented, these alg
ence estimators a
common use. The
where w is a leve
wii = and
(1975), in which
downweighted ac
the approach is th
solution since lar
However, Carroll
approach can lead
approach will be
In practice, the
ments because do
their breakdown
high breakdown p
1984; Coakley an
putational overhe
cessing.
An alternate b
1 hiinctions, impulsive events in the natural source fields, or other natural or artificial
n this assumption is invalid, then any conventional robust M-estimator can be
s for least squares estimators, M-estimators possess a breakdown point of only
le leverage point can completely dominate the ensuing estimate, and their influ-
e unbounded.
of Section 4 indicates that the hat matrix diagonal elements serve as indicators
ence extensions of robust estimation that can detect both outliers based on the
als and leverage points based on the hat matrix diagonal elements can be devised
75; Handschin et al. 1975; Krasker and Welsch 1982). When properly imple-
orithms have bounded influence functions, and are often termed bounded influ-
s a result, although the term GM (for generalized M) estimator is also in
bounded influence form of the normal equations analogous to (5) is
rage weight. Two versions of (25) are in common use (Wilcox 1997). If
=(ri/d) as in Section 3, then the form is that originally suggested by Mallows
residual and leverage weighting are decoupled and leverage points are gently
cording to the size of the hat matrix diagonal elements. If =(ri/(wiid)), then
at proposed by Handschin et al. (1975), and is more efficient than the Mallows
ge leverage points corresponding to small residuals are not heavily penalized.
and Welsh (1988) have shown under general conditions that the Handschin et al.
to a solution which is not statistically consistent, and only the Mallows
used here.
cited estimators have proven to be less than satisfactory with MT measure-
wnweighting of high leverage data is mild and limited rather than aggressive;
point is only slightly larger than for conventional M-estimators. More capable,
oint (up to 0.5) bounded influence estimators have been proposed (Rousseeuw
d Hettmansperger 1993), but these typically entail a substantial increase in com-
ad which limits their applicability to the large data sets that occur in MT pro-
ounded influence estimator that combines high asymptotic efficiency for Gauss-
bHw 0 25( )=18
ian data, high bre
ity that is suitable
easily adapted to
(24) with the prod
matrix defined in
age weight matrix
influence MT res
Based on num
to both increase t
ones, as will be d
bounded influenc
form as well as m
produce an unsta
leverage weights.
tion step, which o
dominated by a fe
unity on the main
where the statisti
and M is the trace
which determines
thumb where row
twice the expecte
choice of is thea chosen probabi
leverage points as
points is applied
matrix diagonal eakdown point performance with contaminated data, and computational simplic-
for large data sets was proposed by Chave and Thomson (1992, 2003) and is
MT estimation. This is accomplished by replacing the weight matrix v in (6) or
uct of two diagonal matrices , where v is the M-estimator weight
Section 3 whose elements are based on the regression residuals and w is a lever-
whose elements depend on the size of the hat matrix diagonal. The bounded
ponse obtained from (6) with u in place of v will be denoted by .
erous trials with actual data, the robust weights in (6) or (24) are frequently seen
he amount of leverage exerted by existing leverage points or create entirely new
emonstrated in Section 10. This occurs because M-estimators do not have
e and probably accounts for many instances where robust procedures do not per-
ight be expected. Even with Mallows weights in (25), v and w can interact to
ble solution to the robust regression problem unless care is used in applying the
This is especially true when the leverage weights are re-computed at each itera-
ften results in oscillation between two incorrect solutions each of which is
w extreme leverage points. Instead, the leverage weights are initialized with
diagonal, and at the k-th iteration the i-th diagonal element is given by
c yi is with
of u[k-1] which is initially the number of data points N. is a free parameter the point where leverage point rejection begins. If the usual statistical rule of
s of the regression (3) corresponding to hat matrix diagonal entries larger than
d value are to be regarded as leverage points, then =2. A better albeit empirical N-th quantile of the beta distribution or the value of its inverse corresponding to
lity level. These choices will tend to automatically increase the allowed size of
N increases. To further ensure a stable solution, downweighting of leverage
in half decade stages beginning with set just below the largest normalized hat lement and ending with the final value, rather than all at once.
u vw=
z^#
wiik[ ]
wiik 1[ ]
e2[ ] e
yi ( )[ ] 26( )expexp=
Mhiik[ ] p
hiik[ ]
uik 1[ ]
xi XHu k 1[ ]X( )
1xi
Hui
k 1[ ]27( )=19
To summarize
remote reference
Section 2. At eac
compute the resid
and the hat matrix
over ranging frminimum value
computed at each
from the previous
nates when the w
1-2%, or the solu
dard error. After a
and an inner one
(10), again termin
to the ordinary re
denoted by an
7. Two-stage B
Correlated no
and hence extend
Robust remote re
electric and magn
reference site wh
a two step proced
bounded influenc
(6) is solved after
tor variables mak
can be eliminated
The first stage
z^#r this section, bounded influence response function estimation for without a
begins with Fourier transform estimates of short data sections as described in
h frequency, an initial least squares solution is obtained from (4) and used to
uals r as well as the scale d from (8) and (9) using a Rayleigh residual model
(11). A nested iterative procedure is then applied in which the outer loop steps
om immediately below the largest hat matrix diagonal element to the selected
o and the inner loop successively solves (6) using the composite weight matrix u
step, where v is given by the Huber weights (7) using the residuals and scale
inner iteration and the elements of w are given by (26). The inner loop termi-
eighted residual power rHur does not change below a threshold value, typically
tion does not change below a threshold value, typically a fraction of a stan-
ll of the outer loop iterations, a second pass is made with an identical outer loop
with the scale fixed at the final Huber value and the more severe robust weight
ating when the weighted residual power does not change appreciably. Extension
mote reference solution (24) or its generalization is straightforward, and will be
d , respectively.
ounded Influence MT Response Function Estimation
ise of cultural origin in the local electric and magnetic fields is quite common,
ing processing methods to attack such problems is of considerable interest.
ference processing using (24) can be sensitive to correlated noise on the local
etic field channels even if the reference channels are noise-free. However, if a
ich is not contaminated by the noise source is available, this can be remedied by
ure where the local and reference magnetic fields are first regressed using the
e form of (18)-(19) to get a noise-free , and then the bounded influence form of
substituting for b. At each step, the absence of correlated noise in the predic-
es the noise which remains in the response variables appear like outliers so that it
by the bounded influence weighting procedure.
is the straightforward bounded influence solution of (18) using the procedures
z^#
z^#
z^#q
b^
b^20
described in Sect
where u1 is an Nx
defined in Section
e to give
where u=u1u2, u
and u1 remains fi
8. Pre-screeni
The procedure
noisy, as can occu
level. This freque
10000 Hz for aud
play relative mini
into the response
This occurs becau
data-adaptive we
The solution t
data which have a
estimator. Three a
situations require
gle station MT de
field multiple coh
AMT data based
exceed the known
selected data segm
auroral source fie
better performanc
stages can be appions 2 and 5. The bounded influence estimate of the projected magnetic field is
N diagonal matrix which contains the first stage bounded influence weights as
6. The second stage in the solution is the bounded influence regression of on
2 is the second stage bounded influence weight matrix computed as in Section 6,
xed at the final values from the first stage solution.
ng of Data
s described in Sections 3-7 work well unless a large fraction of the data are quite
r when the natural signal level is comparable to or below the instrument noise
ntly occurs in the so-called dead bands between 0.5-5 Hz for MT and 1000-
iomagnetotellurics (AMT), where natural source electromagnetic spectra dis-
ma. In this instance, robust or bounded influence estimators can introduce bias
tensor beyond that which would ensue from the use of ordinary least squares.
se most of the data are noisy and the unusual values detected and removed by
ighting are actually those rare ones containing useful information.
o this difficulty is the addition of a pre-screening step which selects only those
n adequate signal-to-noise ratio applied prior to a robust or bounded influence
pproaches have been suggested, and others will undoubtedly evolve as future
. Egbert and Livelybrooks (1996) applied a coherence screening criterion to sin-
ad band data to select only those data segments with high electric to magnetic
erence for subsequent robust processing. Garcia and Jones (2002) pre-sorted
on the power in the magnetic field channels, selecting only those values which
instrument noise level by a specified amount. Jones and Spratt (2002) pre-
ents whose vertical field power was below a threshold value to minimize
ld bias in high latitude MT data. All of these studies demonstrated substantially
e for data-adaptive weighting schemes after pre-sorting. Similar pre-processing
lied to data with single or multiple reference sites, although the number of data
b^ u1Q QHu1Q( )
1QH u1b( ) 28( )=
b^
z^#q# b
Hub( )
1b
Hue( ) 29( )=21
sets and hence th
9. Statistical V
Quantitative te
realistic situation
the difficulty of d
actual distribution
nation of a data s
ori tests to valida
nonlinear form of
implicitly involve
devise methods f
described in Bels
ful have proven to
ables and adaptiv
matrix diagonal e
In classical sp
in the response va
the predictor vari
the observed elec
the bounded influ
cross power betw
For two-stage pro
corresponding fir
between the obse
where is replacbe number of candidate selection criteria are increased.
erification
sts for the presence of outliers and leverage points do not presently exist for
s where anomalous data occur with multiplicity and intercorrelation. In addition,
evising such tests rises rapidly with the number of available data and when the
of the outliers and leverage points is unknown. This means that a priori exami-
et for bad values is not feasible, and hence it is necessary to construct a posteri-
te the result of a bounded influence analysis. Further, due to the inherently
bounded influence regression solutions like (24) and (29) and because they
the elimination of a significant fraction of the original data set, it is prudent to
or assessing the statistical veracity of the result. These can take many forms, as
ley et al. (1980) and Chatterjee and Hadi (1988). In an MT context, the most use-
be the multiple coherence suitably modified to allow for remote reference vari-
e weights, and quantile-quantile plotting of the regression residuals and hat
lements against their respective target distributions for Gaussian data.
ectral analysis, the multiple coherence is defined as the proportion of the power
riable at a given frequency which is attributable to a linear regression with all of
ables (e.g., Koopmans 1995, p. 155). In MT, this is just the coherence between
tric field e and its prediction given by (11). Care must be taken to incorporate
ence weights for the first and second stages, as appropriate. Let the weighted
een vector variables x and y be defined as
cessing, either x or y correspond to and hence will also implicitly contain the
st stage weights. Using this notation, the generalized multiple squared coherence
rved electric field e and its predicted value is
ed by br for single site remote reference processing. In either instance, (31) is a
e^
Sxy xHuy 30( )=
b
e^
ee^2
Seb Sbb( )1 S
be
2
SeeSbeH S
bbH( )
1Sbb Sbb( )
1 Sbe
------------------------------------------------------------------------ 31( )=22
complex quantity
phase is a measur
the phase should
with spatial scale
magnetic field sit
dard multiple squ
A second and
either with or wit
tile plots of the re
statistical distribu
provide a qualitat
quantiles of a dist
equal probability
for qj, where j=1,
order statistics ob
them data unit in
sizes the distribut
range.
For residual q-
with the order sta
expected of a Ray
distribution may
of 1 corresponds
The use of dat
quantiles be obta
will inevitably ap
original one. Sup. Its amplitude is analogous to the standard multiple squared coherence and its
e of the similarity of the local and remote reference variables. In most instances,
be very close to zero; a significant nonzero phase is indicative of source fields
s that are significant compared to the separation between the local and reference
es. When is replaced by the local magnetic field b, (31) reduces to the stan-
ared coherence, and implicitly has zero phase.
essential tool for analyzing the results of a bounded influence MT analysis,
hout two-stage processing, is comparison of the original and final quantile-quan-
siduals and the hat matrix diagonal. Quantile-quantile or q-q plots compare the
tion of an observed quantity with the corresponding theoretical entity, and hence
ive means for assessing the statistical outcome of robust processing. The N
ribution divide the area under a pdf into N+1 equal area pieces, and hence define
intervals. They are easily obtained by solving
...N, and F(x) is the appropriate cdf. A q-q plot compares the quantiles with the
tained by ranking and sorting the data along with a suitable scaling to make
dependent. The advantage of q-q plots over some alternatives is that it empha-
ion tails; most of a q-q plot covers only the last few percent of the distribution
q plots, it is most useful to compare the quantiles of the Rayleigh distribution
tistics of the residuals scaled so their second moment is 2, corresponding to that
leigh variate. For hat matrix diagonal q-q plots, the quantiles of the (p,N-p) be compared to the order statistics with both quantities scaled by p/N, so a value
to the expected value.
a-adaptive weighting which eliminates a fraction of the data requires that the
ined from the truncated form of the original target distribution or else the result
pear to be short-tailed. The truncated distribution is easily obtained from the
pose data are censored in the process of robust or bounded influence weighting
b^
F q j( )
j 12---
N---------------- 32( )=23
using an estimato
dom variable X p
beta distribution f
able X is
where axb and
data, so that M=N
above, respective
inal distribution
that of the origina
where j=1,....M.
Whether the d
or hat matrix diag
tent with the targ
ifest as sharp upw
diagnostic of long
from the distribut
residual q-q plots
mately linear or s
Gauss-Markov th
variables, and hen
(p,N-p) unless tare useful in dete
Distributions a
qualitative use of
ing these results.
the Kolmogorov-
fr such as those described in Sections 3, 5, 6 or 7. Let be the pdf of a ran-
rior to censoring; this may be the Rayleigh distribution for the residuals or the
or the hat matrix diagonal. After truncation, the pdf of the censored random vari-
is the cdf for . Let N be the original and M be the final number of
-m1-m2, where m1 and m2 are the number of data censored from below and
ly. Suitable choices for a and b are the m1-th and (N-m2)-th quantiles of the orig-
. The M quantiles of the truncated distribution can then be computed from
l one using
ata have been censored or not, a straight line q-q plot indicates that the residual
onal elements are drawn from the target distribution. Data which are inconsis-
et distribution are suggested by departures from linearity which are usually man-
ard shifts in the order statistic. With MT data, this is often extreme, and is
tailed behavior where a small fraction of the data occur at improbable distances
ion center and hence will exert undue influence on the regression solution. The
from the output of a robust or bounded influence estimator should be approxi-
lightly short tailed to be consistent with the optimality requirements of the
eorem. However, the theorem does not prescribe a distribution for the predictor
ce there is not a requirement for the hat matrix diagonal distribution to be
he predictors are actually Gaussian. Nevertheless, hat matrix diagonal q-q plots
cting extreme values, as will be shown in the next section.
nd confidence limit estimates for the quantiles are given by David (1981). The
q-q plots can be quantified by testing the significance of a straight line fit utiliz-
Alternately, nonparametric tests for the goodness-of-fit to a target distribution of
Smirnov type can be applied.
fX x( )
fX x( )fX x( )
FX b( ) FX a( )----------------------------------- 33( )=
FX x( ) fX x( )
X x( )
FX qj( ) FX b( ) FX a( )( )j 1 2( )
M---------------------- FX a( ) 34( )+=24
10. Examples
The first exam
Lithoprobe (BC8
(1988) and Jones
removal; see Cha
about 17 h and in
processed into Fo
(see Section 3) af
sequence with =
and phase estima
influence (hereaft
6. The BI result w
0.001%) point of
results are simila
to the BI ones an
ing the effect of l
between the two
seen in Figure 2 c
inate all data sect
effect beyond a sl
freedom.
Figure 3 show
squares AR filter
ology are those u
whitening. Note t
weakest) obtained
Figure 4 show
5.3 s. This period
dence limits from
estimate given byple data set is taken from site 006 of a 150-km-long, east-west wideband MT
7) transect in British Columbia obtained in 1987 and described by Jones et al.
(1993). These data have been used extensively in studies of distortion and its
ve and Jones (1997) for a summary. The data were sampled at a 12 Hz rate for
clude a remote reference located 1 km from the local site. Each data series was
urier transform estimates at selected frequencies using variable section lengths
ter prewhitening with a five term robust AR filter and tapering with a Slepian
1. Figure 2 compares the xy (where x is north and y is east) apparent resistivity
tes and obtained using the ordinary robust (hereafter OR) and bounded
er BI) remote reference estimators, respectively, as described in Sections 5 and
as computed with the cutoff parameter in (26) taken as the 0.99999 (or upper the beta distribution with (2, N-2) degrees of freedom. In general, the OR and BI
r except in the band between 2-20 s, where the OR estimates are biased relative
d the OR jackknife confidence limits are sometimes dramatically larger, reflect-
everage. There are also more subtle but statistically significant differences
types of estimates at periods shorter than 2 s, as shown below. The differences
annot be removed by coherence thresholding; pre-processing of the data to elim-
ions with electric to magnetic field squared multiple coherence under 0.9 has no
ight increase in the confidence limits due to a decrease in the effective degrees of
s an example of the severe bias that can ensue from prewhitening with a least
rather than the robust AR filter described in Section 2. The data and BI method-
sed for Figure 2, and the only difference between the two results is in the pre-
he erratic results at periods under 2 s (where the data power spectral density is
with the least squares prewhitening filter.
s a complex plane view of various response estimates for these data at a period of
is in the middle of the band where substantial differences between the confi-
the OR and BI estimators obtain. The ordinary least squares (hereafter OLS)
(4) is compared to the OR and BI results; the latter are shown with the cutoff
z^*r z^#
r25
parameter in (2respectively) poin
number of estima
substantially diffe
In Figure 4, both
ticity that is not r
despite the elimin
and 11.5% of the
fraction of the da
by more than a fa
Figure 5 show
response function
between both the
dard error is less
substantially sma
compared to the B
dard errors of the
Figures 6 and
0.99999 BI estim
long tailed, indic
matrix distributio
The most serious
most serious leve
contrast, the BI re
of the result is ind
than Rayleigh rat
guishable from th
6, indicating that
ure to eliminate le
in Figures 2 and 4
expected values a6) taken as the 0.999, 0.9999, and 0.99999 (upper 0.1%, 0.01%, and 0.001%,
ts of the beta distribution for (2, N-2) degrees of freedom, where N=4342 is the
tes. There is no statistically significant difference between the estimates with
rent values of the BI cutoff parameter, indicating that its selection is not critical.
the OLS and OR estimates display large uncertainties, reflecting heteroscedas-
emoved by data weighting based entirely on the size of the regression residuals
ation of 7.2% of the data. The BI estimator removes an additional 6.8%, 8.5%,
data as the cutoff varies between 0.99999 and 0.999, indicating that only a small
ta is responsible for the heteroscedasticity. Further, the standard error is reduced
ctor of 34 over the OR result from bounding the influence.
s analogous complex plane responses from the region where more subtle
differences are observed at a period of 0.89 s. There are large distinctions
OLS and OR and the OR and all of the BI estimates, although the relative stan-
variable than for Figure 4. In this instance, the M-estimator solution displays a
ller standard error than both the OLS and BI values. It is also markedly biased
I result; the M-estimator solution differs from the BI one by more than 6 stan-
latter.
7 compare residual and hat matrix quantile-quantile plots for the OLS and
ates at 5.3 s shown in Figure 4. The OLS q-q plots in Figure 6 are both markedly
ating the presence of severe outliers and leverage points, and in fact the hat
n appears to be more like a long-tailed version of log beta than long-tailed beta.
outliers are about 50 standard deviations from the Rayleigh mean, while the
rage points are over 1000 times the expected value of the hat matrix diagonal. In
sidual q-q plot in Figure 7 is only weakly long-tailed, and the upward concavity
icative of a residual distribution which is slightly but pervasively longer tailed
her than the presence of significant outliers. The OR residual q-q plot is indistin-
at in Figure 7, while the OR hat matrix q-q plot is very similar to that in Figure
robust weighting has little influence on leverage points for this data set. It is fail-
verage effects which accounts for the large confidence limits on the OR estimate
. The BI hat matrix q-q plot is approximately log beta with some smaller than
t the lower end, but the huge leverage points at the upper end of the distribution 26
have been elimin
data at both the lo
changed from tho
Figure 8 comp
s estimates shown
increased the size
between the OR a
than unusual, and
measure. In fact,
show large statist
leverage control.
The second ex
South Carolina to
were part of the l
pled, 5 componen
taken as SEA335
belt, and the remo
SEA360(34530
local site, respect
the x-direction st
Mountains) and t
problems, althoug
ent resistivity is a
were computed u
robust AR filter, a
ing to avoid conta
In fact, these d
with most standa
contaminated ver
the apparent rz^#rated. It is easy to modify the bounded influence estimator to eliminate unusual
wer and upper ends of the distribution, but the ensuing estimates are not
se shown in Figure 4.
ares OLS and OR q-q plots for the hat matrix diagonal corresponding to the 0.89
in Figure 5. Comparison of the two shows that robust weighting has actually
of the most extreme leverage points, possibly accounting for the difference
nd BI results in Figure 5. This phenomenon is typical of OR procedures rather
is not surprising given the form of (24), which is independent of any leverage
all of the short period responses in Figure 2, where the OR and BI estimates
ical differences, display this effect, indicating serious bias in the absence of
ample is taken from the Southeast Appalachians (SEA) transect extending from
Tennessee collected in 1994 (Wannamaker et al. 1996). The data sets used here
ong period MT effort described by Ogawa et al. (1996), and consist of 5 s sam-
t MT data collected contemporaneously at many sites. The local site will be
(361217"N, 815520"W) which is located within the Blue Ridge Mountain
te sites will be taken as SEA320 (363852"N, 821905"W) and
8"N, 801630"W), located 61 and 209 km to the northwest and southeast of the
ively. The data have been rotated into an interpretation coordinate system, so that
rikes N50E (or approximately along the strike direction for the Appalachian
he y-direction strikes N140E. All of the data are fairly clean and free of obvious
h the local site does display strong current channeling such that the Zxy appar-
bout 1/100 the Zyx apparent resistivity. In what follows, all spectral estimates
sing a time-bandwidth 3 Slepian sequence after prewhitening with a five term
nd the fourth and sixth frequencies in each subsection were selected for process-
mination from unresolved low frequency components.
ata are sufficiently clean that nearly identical processing results are obtained
rd methods, yielding a baseline result for comparison with various artificially
sions of the data. As a demonstration, Figure 9 compares the yx component of
esistivities and phases using sites 320 and 360 separately as remotes. The results 27
for different choi
computed with th
simultaneously a
low environmenta
outliers. In fact, t
ure 9, so that leve
high across the ba
cations at this mi
multiple referenc
although the long
in the x direction.
in the sequel.
As a simulatio
360 magnetic fiel
which have been
ter having cutoff
be about 1/3 that
lent to Students
the Gaussian. As
exceed those of th
high enough to to
Figure 10 com
OR estimates
erences, respectiv
360 remote estim
ence data. Howev
identical to the re
of multiple remot
uncontaminated a
z^*rces of remote references in Figure 9 are indistinguishable. The estimate
e two-stage formalism described in Sections 5 and 7 using sites 320 and 360
s remotes is essentially identical, reflecting the high quality of the data and the
l noise level. Residual and hat matrix q-q plots show the presence of only weak
he OR estimates and are nearly identical to the BI estimates shown in Fig-
rage effects in these data are minimal. The generalized coherence (31) is very
nd while its phase remains near zero, reflecting the lack of source field compli-
d-latitude site. Because of these data features, little has been gained by using
e sites. Comparable results are obtained for the xy component (not shown),
est period estimates have large uncertainties due to the very weak electric field
The response using site 320 as a remote will be used as a reference response
n of cultural effects, noise was added to both horizontal components of the site
d data. This consists of random samples drawn from a Cauchy distribution
filtered forward and backward with a fifth order Chebyshev Type 1 bandpass fil-
points at about 200 and 1000 s. The MAD of the Cauchy noise was adjusted to
of the data. The Cauchy distribution (Johnson et al. 1994, Chapter 16) is equiva-
t distribution with one degree of freedom and is substantially longer tailed than
a result, the additive noise appears to be quite impulsive, with peak values that
e original magnetic data by up to a factor of 10,000, and in fact the noise level is
tally obscure the original data in a time series plot.
pares the Zyx BI response using site 320 as a reference as in Figure 9 with the
and using site 360 (with noise) and sites 320+360 (with noise) as remote ref-
ely. In the 200-1000 s band, both the apparent resistivities and phases for the site
ates are totally useless, being wildly biased by the high noise level in the refer-
er, the dual remote reference method easily eliminates this effect, and is nearly
sult using clean remote reference data. This is a graphic illustration of the value
e references; good results will be obtained provided that at least one data set is
t a given frequency.
z^#q
z^*r z^*
q
z^#r
z^*q28
Figure 11 sho
stantially reduced
site 360 as a remo
that from unconta
tion, bounding th
mance and yield
Figures 10 and 11
tages when one o
of the time interv
mator automatica
clean remote is av
Figure 12 sho
ing the x-compon
for both noise-fre
the original data,
part by up to a fa
field variations at
essentially vanish
As a simulatio
tered in the same
MAD comparabl
The noncausal im
computed by min
ance; a maximum
(1992). Simulated
with the noise tim
The resulting noi
from 0 to over 50
lated between the
functions as the n
pares BI processiws the BI counterpart to Figure 10A, indicating that control of leverage has sub-
both the estimate bias and the size of the confidence limits even with only noisy
te. As for the OR approach, the dual remote estimate (not shown) is identical to
minated data. While the multiple remote solution is clearly the preferred solu-
e influence of the remote data set can substantially improve the estimator perfor-
an interpretable response when the reference is noisy. However, the examples in
suggest that multiple rather than single remote references do offer real advan-
f the remote sites is noisy either over part of the period range of interest or part
al. The two-stage regression approach which is implicit to the dual remote esti-
lly eliminates noisy frequency or time intervals, presuming that at least one
ailable.
ws the magnitude of the txx component of the BI magnetic transfer tensor relat-
ents of the local and remote magnetic fields from the first stage in this process
e and noisy site 360 data corresponding to Figures 9 and 11, respectively. With
the site 360 transfer function is systematically smaller than its site 320 counter-
ctor of 2 due to some combination of weaker coherence with or larger magnetic
site 320. With the contaminated site 360 data, the site 360 transfer function
es in the 200-1000 s band.
n of correlated cultural contamination, bandpassed Cauchy-distributed noise fil-
way as for Figures 10-12 polarized 30 clockwise from the x-axis and with a
e to the original data was added to the local magnetic field variables at site 335.
pulse responses from the Zyx MT impedances for the uncontaminated data were
imizing a functional of the impulse response subject to fitting the MT imped-
smoothness constraint in log frequency was imposed as described by Egbert
electric field noise data were produced by convolving the impulse response
e series. The size of the noise electric field was also varied by amplitude scaling.
se data were added to the site 335 data, varying the time duration of the noise
% of the total. Unlike for the earlier examples, the additive noise is now corre-
predictor and response variables, and will result in rising bias in the response
oise duration increases unless it is eliminated during processing. Figure 13 com-
ng on the original (clean) data with BI and two-stage processing on contami-29
nated data with a
80 times larger th
the estimator has
use because the h
320 which are fre
variables makes i
In contrast, the tw
limits because the
local magnetic fie
the electric field b
Figure 14 com
graphically illust
about 60% of the
ences in Figure 1
The two-stage alg
fraction approach
11. Discussion
Based on the r
processing should
presented by Jone
the state-of-the-a
ments. The autho
gently rather than
the time series an
tion process must
any noise compon
about data editing
analysis is often u
to be assessed (Se
also indicate whenoise fraction of 40% and the noise electric field scaled so that its MAD is about
an the background. The BI result is erratic over the affected period band since
difficulty distinguishing good from bad data. Leverage weighting is of limited
at matrix is effectively that from the remote reference data (see Section 5) at site
e of the noise, and the correlated noise on the local electric and magnetic field
t difficult to distinguish data from noise as the noise fraction and amplitude rises.
o-stage estimate remains stable with only modest increases in the confidence
first stage robust weights distinguish and remove the noisy sections from the
ld, and then the second stage robust weights eliminate any remaining noise in
ecause the local magnetic field has been cleaned.
pares residual q-q plots for the BI and two-stage estimators at 853 s period,
rating the effectiveness of the latter. For both the BI and two-stage estimates,
data have been eliminated in the contaminated band by weighting; the differ-
3 reflect how completely the noise-contaminated sections have been removed.
orithm operates effectively both with high noise amplitude and with a noise
ing 50%.
ecent MT literature, it would not be controversial to suggest that all MT data
employ some type of robust estimator with a remote reference. The examples
s et al. (1989) provide graphic evidence of the value of a robust approach, and
rt has certainly improved over the past decade to further strengthen the argu-
rs are not aware of any instances where robust methods fail when applied intelli-
blindly. This means that individual attention must be paid to each data set, all of
d their power spectra need to be inspected, and the relevant physics of the induc-
be incorporated into the analysis plan (e.g., the nature of the source fields and
ents must be understood in at least a gross sense). These steps lead to decisions
, detrending, and the pre-processing stage (Section 8). A preliminary non-robust
seful because it allows the raw data coherences and residual/hat matrix statistics
ction 9), and helps to further refine the requirement for pre-processing. It can
ther a standard robust estimator will be sufficient.30
Based on the r
is suggested that
that of its robust c
not seen any exam
mum, comparabl
ence results are s
where substorm e
examples in Sect
which weight dat
points, and can ea
point. Bounded in
influence estimat
even larger fracti
It is not difficu
distribution seen
non-linear, and m
are the result of m
netic variations w
hence the resultin
There are situa
fields where more
Section 7 works w
et al. (1996) intro
electric trains tha
magnetic referenc
algorithm, a tenso
computed to give
Both the magneto
which explicitly sesults presented in this paper and the authors experience over the past decade, it
standard use of a bounded influence remote reference estimator should supplant
ounterpart. If the enumerated analysis principles are followed, the authors have
ples where bounded influence estimators fail to yield results that are, at a mini-
e to robust estimates, and have seen numerous examples where bounded influ-
ubstantially better. This is especially true as the auroral oval is approached,
ffects can produce spectacular leverage (see Garcia et al. 1997, and the BC87
ion 10), and in the presence of many types of cultural noise. Robust estimators
a sections on the basis of regression residuals frequently fail to detect leverage
sily be dominated by a small fraction of high leverage data or even a single data
fluence estimators avoid this trap in a semi-automatic fashion. The bounded
or of Section 6 has a breakdown point of ~0.5, and can handle data sets with an
on of high leverage values under some circumstances.
lt to rationalize the underlying log beta form for the non-BI hat matrix diagonal
in Figures 6 and 8. Ionospheric and magnetospheric processes are extremely
ore so at auroral than lower latitudes. As a result, their electromagnetic effects
any multiplicative steps. This means that the statistical distribution of the mag-
ill tend towards log normal rather than normal (Lanzerotti et al. 1991), and
g hat matrix diagonal will tend towards log beta rather than beta.
tions where correlated noise is present on both the local electric and magnetic
elaborate processing algorithms may be required. The two-stage algorithm of
ell in many instances, as shown with a synthetic example in Section 10. Larsen
duced a different two-stage procedure to deal with the correlated noise from DC
t explicitly separates the MT and noise responses. In both instances, a remote
e which is free of the correlated noise source is required. In the Larsen et al.
r relating the local and remote magnetic fields equivalent to t in (18) is first
a correlated-noise-free , so that the correlated noise in b is given by .
telluric response z and the correlated noise tensor z are obtained from
eparates e into magnetotelluric signal and correlated noise parts, plus an uncor-
^b b b
^
e b^ z b b^( )z 35( )+=31
related residual. E
problem so that
parameter solutio
with the correlate
some degree. Thi
increases, althoug
lated noise tensor
ponents method o
good MT respons
results with real d
Using data fro
(2001) compared
on the MT respon
metric theory for
weight jackknife,
computationally m
their favoring the
data from locatio
than Gaussian eve
sets. In this instan
limits, which cou
cally long tailed f
from mid-latitude
uncertainty, as we
Chave (1991), the
essence, this favo
rifice in reliability
Acknowledgem
This work wasquation (35) is easily solved using the methods of this paper by recasting the
and comprise a four-column b in (3), and z in (3) is replaced by a four
n containing z and z. However, uncorrelated noise in b is implicitly combined
d noise , and hence the estimates of z and z will always be biased to
s bias will rise as the relative size of the correlated noise to signal
h this is hard to quantify. The two-stage estimate is not biased because the corre-
z is never explicitly computed. A third alternative is the robust principal com-
f Egbert (1997), which can detect contamination from correlated noise and yield
es in some cases. It is presently unclear which approach will yield the best
ata; evaluating this will require direct comparisons using troublesome data sets.
m a continuously operating MT array in central California, Eisel and Egbert
the performance of parametric and jackknife estimators for the confidence limits
se function. Thei