Bounded Influence Magnetotelluric Response Functi

Embed Size (px)

DESCRIPTION

Bounded Influence Magnetotelluric Response Functi

Citation preview

  • Bounde

    d Influence Magnetotelluric Response Function Estimation

    Alan D. Chave

    Deep Submergence LaboratoryDepartment of Applied Ocean Physics and Engineering

    Woods Hole Oceanographic InstitutionWoods Hole, MA 02543, USA

    David J. Thomson

    Department of Mathematics and StatisticsQueens University

    Kingston, Ontario K7L 3N6, Canada1

  • Abstract

    Robust magne

    magnetic inductio

    influence of unus

    sitive to exceptio

    bounded influenc

    ers and leverage p

    response function

    tor combines a st

    hat matrix diagon

    sions to magnetot

    ence method whi

    influence estimat

    field variables us

    illustrated using atotelluric response function estimators are now in standard use by the electro-

    n community. Properly devised and applied, these have the ability to reduce the

    ual data (outliers) in the response (electric field) variables, but are often not sen-

    nal predictor (magnetic field) data, which are often termed leverage points. A

    e estimator is described which simultaneously limits the influence of both outli-

    oints, and has proven to consistently yield more reliable magnetotelluric

    estimates than conventional robust approaches. The bounded influence estima-

    andard robust M-estimator with leverage weighting based on the statistics of the

    al, which is a standard statistical measure of unusual predictors. Further exten-

    elluric data analysis are proposed, including a generalization of the remote refer-

    ch utilizes multiple sites instead of a single one and a two-stage bounded

    or which effectively removes correlated noise in the local electric and magnetic

    ing one or more uncontaminated remote references. These developments are

    variety of magnetotelluric data.2

  • 1. Introductio

    The magnetote

    fields observed at

    damental datum i

    measured electric

    nal source fields (

    data, this may be

    where

    E

    and

    B

    ar

    cific site and freq

    solution to (1) at

    where the supersc

    parentheses are th

    derived between

    nearly identical, t

    When

    E

    and

    B

    noise, and it beco

    Conventional MT

    approaches and G

    sensitive to small

    inadequacies of t

    biased and/or phy

    In recent years

    remote reference

    simultaneously a

    effective at elimin

    data-adaptive rob

    ing, have been de

    are usually called

    E ZB=

    Z EBH( )=n

    lluric (MT) method utilizes naturally-occurring fluctuations of electromagnetic

    Earths surface to map variations in subsurface electrical conductivity. The fun-

    n MT is the site-specific, frequency-dependent tensor relationship between the

    and magnetic fields. Under general conditions on the morphology of the exter-

    e.g., Dmitriev and Berdichevsky 1979), in the absence of noise, and with precise

    written

    e two-vectors of the horizontal electric and magnetic field components at a spe-

    uency, and Z is the second rank, 2x2 MT response tensor connecting them. The

    a given frequency is

    ript H denotes the Hermitian (complex conjugate) transpose and the terms in

    e exact cross- and auto-power spectra. Similar linear relationships can be

    the vertical and horizontal magnetic field variations. Since the methodology is

    his paper will focus only on the MT case.

    are actual measurements, (1) and (2) do not hold exactly due to the presence of

    mes necessary to estimate both Z and its uncertainty Z in a statistical manner. data processing has historically been based on classical least squares regression

    aussian statistical models (e.g., Sims et al. 1971) which are well-known to be

    amounts of unusual data, uncorrelated noise in the magnetic field variables, and

    he model description itself. This often leads to estimates of Z which are heavily

    sically uninterpretable, and can affect estimates of Z even more seriously., this situation has been dramatically improved by three developments. First, the

    method (Gamble et al. 1979), in which the horizontal magnetic fields recorded

    t an auxiliary site are combined with the fields at the local site, has proven quite

    ating bias from uncorrelated local magnetic field noise. Second, a variety of

    ust weighting schemes, sometimes applied in conjunction with remote referenc-

    veloped to eliminate the influence of data corresponding to large residuals which

    outliers (Egbert and Booker 1986; Chave et al. 1987; Chave and Thomson

    1( )

    BBH( )1

    2( )3

  • 1989; Larsen 198

    1996; Egbert 199

    methods was dem

    general use by th

    based estimates o

    Jones 1997), and

    1989) and rigorou

    However, ther

    or man-made noi

    cally-interpretabl

    noted the serious

    MT tensor bias fr

    documented (Sza

    mid-latitude sour

    variable spatial sc

    responses (Egber

    posed. Schultz et

    duced by Chave a

    this paper to min

    residuals, and hen

    (2001) proposed

    taminated sites as

    duced in this pap

    approach to detec

    tailored to the spe

    these are availabl

    In a further att

    son (1989) throug

    problems from bo

    the predictor (ma

    predictor data. Th9; Sutarno and Vozoff 1989, 1991; Larsen et al. 1996; Egbert and Livelybrooks

    7; Oettinger et al. 2001; Smirnov 2003). The superior performance of robust

    onstrated in the comparison study of Jones et al. (1989), and these are now in

    e MT community. Third, it is now recognized that conventional distribution-

    f the errors in the MT response tensor elements are often biased (e.g., Chave and

    a nonparametric jackknife method has been introduced (Chave and Thomson

    sly justified (Thomson and Chave 1991) to eliminate this problem.

    e remain unusual data sets, typically due either to natural source field complexity

    se, where even robust remote reference methods sometimes fail to yield physi-

    e MT responses. For example, Schultz et al. (1993) and Garcia et al. (1997)

    bias that could be produced by short spatial scale auroral substorm source fields.

    om cultural noise, especially that from DC electric rail systems, has been widely

    rka 1988; Junge 1996; Larsen et al. 1996; Egbert et al. 2000). Some types of

    ces (notably those from Pc3 geomagnetic pulsations) have short and temporally-

    ales (Andersen et al. 1976; Lanzerotti et al. 1981) which can also alter MT

    t et al. 2000). For such situations, extensions to robust processing have been pro-

    al. (1993) and Garcia et al. (1997) applied a bounded influence estimator intro-

    nd Thomson (1992) and further developed by Chave and Thomson (2003) and in

    imize the influence of extreme magnetic field data which do not produce large

    ce may be missed by robust estimators. Larsen et al. (1996) and Oettinger et al.

    multiple transfer function methods to remove cultural noise using distant, uncon-

    references. Both methods are special cases of two-stage processing that is intro-

    er. Egbert (1997) introduced a robust multivariate principal components

    t and characterize coherent cultural noise. This enables robust estimators to be

    cific coherent noise characteristics derived from multiple station data when

    e.

    empt to deal with MT processing problems, this paper updates Chave and Thom-

    h extensions to robust remote reference MT data processing which can control

    th outliers in the response (electric field) and extreme data (leverage points) in

    gnetic field) variables, as well as eliminate correlated noise in the response and

    e next section contains a brief discussion of spectral analysis principles directly 4

  • relevant to this pa

    cusses the role of

    the remote refere

    ence estimation a

    describes bounde

    local electromagn

    processing results

    robust or bounde

    variety of field an

    issues.

    2. Discourse o

    It is assumed

    variations from o

    digitized without

    (although well-de

    it is assumed that

    squares spline fitt

    Because outlie

    tions in time, rob

    (WOSA) approac

    cussion). The bas

    of each time serie

    the frequency of

    tapered with a da

    Fourier transform

    removed. In the c

    from each section

    frequencies are a

    cally akin to wav

    functions is chanper. Section 3 reviews robust estimation as applied to MT data. Section 4 dis-

    the hat matrix as a leverage indicator. Section 5 introduces a generalization of

    nce method utilizing multiple reference sites. Section 6 discusses bounded influ-

    s a means to simultaneously control both outlier and leverage effects. Section 7

    d influence two-stage MT processing which can eliminate correlated noise in the

    etic fields. Section 8 outlines some pre-processing steps that can improve MT

    under special circumstances. Section 9 discusses statistical verification of

    d influence procedures. Section 10 illustrates many of these concepts using a

    d synthetic MT data. Finally, Section 11 is a discussion of a few remaining

    n Spectral Analysis

    that contemporaneous, finite time sequences of the electric and magnetic field

    ne or more sites are available. It is also presumed that the time series have been

    aliasing and edited to remove gross errors like boxcar shifts or large spikes

    signed processing algorithms can often accommodate these problems). Finally,

    serious long-term trends have been removed by high pass filtering or least

    ing procedures.

    r detection is facilitated by comparing spectra estimated over distinct subsec-

    ust MT processing is typically based on the Welch overlapped section averaging

    h (Welch 1967; see Percival and Walden 1993, Section 6.17, for a detailed dis-

    ic algorithm is computationally straightforward. After time domain prewhitening

    s and starting at the lowest frequency, a subset length that is of order a few over

    interest is selected. The data sequences are then divided into subsets and each is

    ta window. Data sections may be overlapped to improve statistical efficiency.

    s are taken, prewhitening is corrected for, and the instrument response is

    ontext of this paper, the data are these Fourier transforms of the windowed data

    at a single frequency. The section length is then repetitively reduced as higher

    ddressed. A variable section length WOSA analysis of this type is philosophi-

    elet analysis (e.g., Percival and Walden 2000) in that the time scale of the basis

    ged along with the frequency scale to optimize resolution.5

  • It is important

    selected data tape

    cally, data tapers

    teria (e.g., Harris

    those finite time s

    tration of energy

    on the continuous

    of landmark pape

    crete time problem

    late spheroidal or

    Numerical solutio

    numerically robu

    described in App

    has been thoroug

    these are employ

    both the amount o

    width

    /N (Figure

    provides limited

    pendent on the sta

    In this instance, p

    dB of bias protec

    is the frequency o

    grid may be taken

    point. It is also im

    taminated by unr

    lapped without in

    1 to 4. Percival an

    Prewhitening i

    to the time series

    example, the Yule

    ance sequence (ac to understand the bias (i.e., spectral leakage) and resolution properties of the

    r and design a frequency sampling strategy that accommodates both. Histori-

    have been chosen largely on the basis of computational simplicity or ad hoc cri-

    1978). However, an optimal class of data tapers can be obtained by considering

    equences (characterized by a length N) which have the largest possible concen-

    in the frequency interval [-W,W]. The ensuing resolution bandwidth is 2W. Work

    -time version of this problem goes back to Chalk (1950), and culminated in a set

    rs by Slepian and Pollak (1961) and Landau and Pollak (1961, 1962). The dis-

    was solved by Slepian (1978), and the solution is the zeroth-order discrete pro-

    Slepian sequence with a single parameter, the time-bandwidth product =NW. ns for the Slepian sequences are obtained by solving an eigenvalue problem; a

    st tridiagonal form is given by Slepian (1978) and a more accurate variant is

    endix B of Thomson (1990). The superiority of Slepian sequences as data tapers

    hly documented (e.g., Thomson 1977, 1982; Percival and Walden 1993), and

    ed exclusively in the present work. The time-bandwidth product determines f bias protection outside the main lobe of the window and the main lobe half

    1). A useful range of for MT processing is typically 1 to 4. A =1 window (about 25 dB) bias protection but yields raw estimates that are essentially inde-

    ndard discrete Fourier transform (DFT) grid with frequencies separated by 1/N.

    rewhitening is essential to avoid spectral bias. A =4 window provides over 100 tion but yields raw estimates that are highly correlated over [f-W,f+W], where f

    f interest. As a rule of thumb, frequencies which are spaced apart on the DFT as independent; Thomson (1977) gives a more quantitative discussion of this

    portant to avoid frequencies within of the DC value, as these are typically con-esolved low frequency components. The amount that data sections can be over-

    troducing data dependence also varies, ranging from 50 to 71% as ranges from d Walden (1993, Section 6.17) give a quantitative treatment of this issue.

    s most easily achieved by filtering with a short autoregressive (AR) sequence fit

    . Standard time domain AR estimators are based on least squares principles; for

    -Walker equations (Yule 1927; Walker 1931) may be solved for the autocovari-

    vs), from which the AR filter may be obtained using the Levinson-Durbin recur-6

  • sion (Levinson 19

    for the acvs are o

    zeros of the filter

    filter z-transform

    ally enhance spec

    opposite to the de

    difficulties. A sim

    Fourier transform

    computed using t

    and taking the fre

    AR filter follows

    robust AR filter e

    3. Robust Res

    In the standard

    matrix equations

    where there are N

    given frequency),

    solution 2-vector,

    L

    2

    norm of the ra

    The elements of

    b

    based on the avai

    values of the resp

    an estimate for th

    The condition

    which is optimal

    tistics (e.g., Stuar

    applies when the

    e bz +=

    z^ bHb( )1

    b(=47; Durbin 1960). In the presence of extreme data, such least squares estimators

    ften seriously biased, yielding AR filters which may either be unstable (i.e., the

    z-transform lie outside the unit circle) or highly persistent (i.e., the zeros of the

    lie inside but very close to the unit circle). In such instances, the filter may actu-

    tral leakage by increasing the dynamic range of the data spectrum, which is

    sired outcome. A robust approach to acvs estimation is essential to avoid these

    ple robust AR estimator may be obtained using acvs estimates obtained from the

    of a robust estimate of the power spectrum. The robust power spectrum may be

    he WOSA method with a low bias (e.g., Slepian sequence with =4) data taper quency-by-frequency median rather than mean of the raw section estimates. The

    from the robust acvs using the Levinson-Durbin recursion. The importance of

    stimation for prewhitening is illustrated in Section 10.

    ponse Function Estimation

    linear regression model for the row-by-row solution of (1), the equivalent set of

    is

    observations (i.e., N Fourier transforms of N independent data sections at a

    so that e is the response N-vector, b is the Nx2, rank 2 predictor matrix, z is the

    and is an N-vector of random errors. The error power H, or equivalently, the ndom errors, is then minimized in the usual way, yielding

    Hb and bHe are the averaged estimates of the auto- and cross-power spectra

    lable data. The regression residuals r are the differences between the measured

    onse variable e and those predicted by the linear regression =b , and serve as

    e random errors . s on the variables in (3) and their moments that yield a least squares solution (4)

    in a well-defined sense are given by the Gauss-Markov theorem of classical sta-

    t et al. 1999, Chapter 29). The textbook version of the Gauss-Markov theorem

    predictor variables in (3) are fixed, but Shaffer (1991) has extended it to cover

    3( )

    He) 4( )

    e^ z^7

  • the cases where t

    the distribution o

    conditions, when

    will be considere

    the best linear un

    ance independent

    must exist. In add

    maximum likelih

    along with the em

    appeal of a least s

    However, with

    data and their err

    ance is often depe

    source field comp

    duration of many

    independent resid

    are much more co

    tribution is typica

    bias the least squ

    of all three, probl

    These observa

    that they are relat

    model, and that r

    an active area of

    have been adapte

    (1987) presented

    ing, and only an o

    The most suita

    by analogy to the

    Chapter 18). M-e

    errors in (3), but he joint distribution of e and b is multivariate normal with unknown parameters,

    f b is continuous and nondegenerate but otherwise unspecified, or under mild

    b is a random sample from a finite population. It is as one of these cases that (3)

    d in this paper. Under these circumstances, the linear regression solution (4) is

    biased estimate when the residuals r are uncorrelated and share a common vari-

    of any assumptions about their statistical distribution except that the variance

    ition, if the residuals are multivariate Gaussian, then the least squares result is a

    ood, fully efficient, minimum variance estimate. These optimality properties,

    pirical observation that most physical data yield Gaussian residuals, explain the

    quares approach.

    natural source electromagnetic data, the Gauss-Markov assumptions about the

    or structure are rarely tenable for at least three reasons. First, the residual vari-

    ndent on the data variance, especially when energetic intervals coincide with

    lexity, as occurs for many classes of geomagnetic disturbance. Second, the finite

    geomagnetic events causes data anomalies to occur in patches, violating the

    ual requirement. Third, due to marked nonstationarity, frequent large residuals

    mmon than would be expected for a Gaussian model, and hence the residual dis-

    lly very long-tailed with a Gaussian center. Any one of these issues can seriously

    ares solution (4) in unpredictable and sometimes insidious ways; in the presence

    ems are virtually guaranteed.

    tions have led to the development of procedures which are robust, in the sense

    ively insensitive to a moderate amount of bad data or to inadequacies in the

    eact gradually rather than abruptly to perturbations of either. Robust statistics is

    research (e.g., Hampel et al. 1986; Wilcox 1997), and many of its techniques

    d to spectral analysis beginning with the work of Thomson (1977). Chave et al.

    a tutorial review of robust methods in the context of geophysical data process-

    utline relevant to the MT problem will be presented here.

    ble type of robust estimator for MT data is the M-estimator which is motivated

    maximum likelihood method of statistical inference (e.g., Stuart et al. 1999,

    stimation is similar to least squares in that it minimizes a norm of the random

    the misfit measure is chosen so that a few extreme values cannot dominate the 8

  • result. The M-est

    whose i-th entry i

    originates in stati

    speaking, yields a

    parameter. For sta

    ally, if

    (x) is cho

    regression residu

    hood. In practice,

    must be chosen o

    and Wilcox (1997

    Minimizing

    R

    where

    is an N-

    The solution to (5

    tion,

    is replace

    from the previous

    and scale

    estimate obtained

    (5) corresponding

    Iteration ends wh

    weights are chose

    defined sense, the

    A simple form

    A value of a=1.5

    viik[ ] ri

    k 1[ ] (=

    ri0[ ] di

    0[imate is obtained computationally by minimizing RHR, where R is an N-vector

    s , d is a scale factor, and (x) is a loss function. The term loss function stical decision theory (e.g., Stuart and Ord 1994, Section 26.52), and loosely

    measure of the distance between the true and estimated values of a statistical

    ndard least squares, (x)=x2/2 and (4) is immediately recovered. More gener-sen to be -log f(x), where f(x) is the true probability density function (pdf) of the

    als r which includes extreme values, then the M-estimate is maximum likeli-

    f(x) cannot be determined reliably from finite samples, and the loss function

    n theoretical or empirical grounds. Details may be found in Chave et al. (1987)

    ).

    HR gives the M-estimator analog to the normal equations

    vector whose i-th entry is the influence function (x)=x(x) evaluated at x=ri/d.

    ) is obtained using iteratively re-weighted least squares, so that at the k-th itera-

    d by v[k]r[k] where v[k] is an NxN diagonal matrix whose i-th entry is

    . The weights are computed using the residuals and scale

    iteration to linearize the problem; the solution is initialized using the residuals

    from ordinary least squares, or for badly contaminated data, those from an L1

    using a linear programming algorithm. The weighted least squares solution of

    to (4) is

    en the solution does not change appreciably, typically after 3-5 steps. Since the

    n to minimize the influence of data corresponding to large residuals in a well-

    M-estimator is data adaptive.

    for the weights is due to Huber (1964). This has diagonal elements

    gives better than 95% efficiency with outlier-free Gaussian data. Downweighting

    ri d( )

    bH 0 5( )=

    d k 1[ ]) rik 1[ ] d k 1[ ]( )

    ]

    z^* bHvb( )

    1bHve( ) 6( )=

    vii 1= xi a

    viia

    xi------- xi a 7( )>=9

  • of data with (7) b

    als are to be rega

    scale invariant wi

    duce a comparab

    The scale estim

    and theoretical va

    statistic is the me

    where the subscri

    The quantity is

    MAD of

    where is the th

    The solution o

    The data in MT p

    ian is not necessa

    tude since this is

    appropriate distri

    Rayleigh distribu

    model for the res

    Because the w

    provide inadequa

    (7) is convex and

    iterations of a rob

    the Gumbel extre

    gested using the m

    for the final estim

    r~

    ~egins when |xi|=|ri/d|=a, so that the scale factor d determines which of the residu-

    rded as large. It is necessary because the weighted least squares problem is not

    thout it, in the sense that multiplication of the data by a constant will not pro-

    ly affine change in the solution.

    ate must itself be robust, and is typically chosen to be the ratio of the sample

    lues of some statistic based on a target distribution. An extremely robust scale

    dian absolute deviation from the median (MAD), whose sample value is

    pt (i) denotes the i-th order statistic obtained by sorting the N values sequentially.

    the middle order statistic or median of r. The theoretical MAD is the solution

    eoretical median and F denotes the target cumulative distribution function (cdf).

    f (5) and hence the choice of d requires a target distribution for the residuals r.

    rocessing are Fourier transforms and therefore complex, but the complex Gauss-

    rily the optimal choice. It is preferable to measure residual size using its magni-

    rotationally (phase) invariant, and Chave et al. (1987) showed that the

    bution for the magnitude of a complex number is Rayleigh. The properties of the

    tion are summarized in Johnson et al. (1994, Chapter 10). A Rayleigh statistical

    iduals will be used exclusively in this paper.

    eights (7) fall off slowly for large residuals and never fully descend to zero, they

    te protection against severe outliers. However, the loss function corresponding to

    hence convergence to a global minimum is assured, making it safe for the initial

    ust procedure to obtain a reliable estimate of the scale. Motivated by the form of

    me value distribution, Thomson (1977) and Chave and Thomson (1989) sug-

    ore severe weight function

    ate, where the parameter determines the residual size at which downweighting

    sMAD r r~ N 1+

    2------------- 8( )

    =

    F ~ MAD+( ) F ~ MAD( ) 1/2 9( )=

    vii e2[ ] e

    xi ( )[ ] 10( )expexp=10

  • begins and

    the truncation fun

    cated setting toallowed residual

    expects the larges

    contaminated poi

    To summarize

    short sections of a

    frequency of inte

    istics. At each fre

    the residuals r in

    the residual distri

    (7), where the res

    tinues until the w

    1-2%. The scale i

    using the more se

    change appreciab

    The algorithm

    except that the se

    a few over the sec

    tive at eliminating

    is used. With vari

    that its use be avo

    by correlation. In

    Chave and Thom

    MAD. Robust est

    response tensor in

    Finally, an ess

    both an estimate

    response are depe

    vii = when xi = 0. The weight (10) is essentially like a hard limit at xi= except that

    ction is continuous and continuously differentiable. Chave et al. (1987) advo-

    the N-th quantile of the Rayleigh distribution which automatically increases the

    magnitude as the sample size rises. In an uncontaminated population, one

    t residual to increase slowly (approximately as ), but if the fraction of

    nts is constant, a level at the N(1-)-th quantile would be more appropriate., robust processing of MT data utilizes the Fourier transforms of comparatively

    long data series, where the section length is chosen to be of order a few over the

    rest. Data sections may be overlapped depending on the data window character-

    quency, an initial least squares solution is obtained from (4) and used to compute

    (3) as well as the scale d from the ratio of (8) and (9) using a Rayleigh model for

    bution. An iterative procedure is then applied using (6) with the Huber weights

    iduals from the previous iteration are used to get the scale and weights. This con-

    eighted residual power rHvr does not change below a threshold value, typically

    s then fixed at the final Huber weight value and several iterations are performed

    vere weight (10), again terminating when the weighted residual power does not

    ly.

    outlined here is very similar to that introduced by Chave and Thomson (1989)

    ction length is variable and always maintained so that the frequency of interest is

    tion length. It has been empirically shown that robust procedures are most effec-

    outliers without serious statistical efficiency reduction when this methodology

    able section lengths, there is no need for band averaging, and it is recommended

    ided to prevent errors on the confidence intervals of the response tensor caused

    addition, use of the interquartile distance for a scale estimate as discussed by

    son (1989) has been discontinued since it has proven to be less robust than the

    imation as described here is capable of yielding reliable estimates of the MT

    a semi-automatic manner for most data sets.

    ential feature of any method for computing MT responses is the provision of

    and a measure of its accuracy. Traditionally, confidence intervals on the MT

    ndent on explicit statistical models which are ultimately based on a Gaussian

    1

    N2( )log11

  • distribution. In ad

    presence of corre

    tions to yield a tr

    other data anoma

    of the limitations

    and Chave (1991

    tors which requir

    Thomson and Ch

    spectral analysis,

    intervals on the M

    elements obtained

    saved for later us

    to violations of th

    mates (Hinkley 1

    sense that they w

    documented for M

    error through eith

    which might othe

    tortion as describ

    4. The Hat Ma

    The hat or pre

    widely used to de

    Belsley et al. 198

    NxN hat matrix i

    The residual r is

    from a linear regr

    and hencedition to problems in computing the correct number of degrees of freedom in the

    lated spectral estimates and the practical requirement for numerous approxima-

    actable result, these parametric methods are even more sensitive to outliers or

    lies than the least squares estimates themselves. For a more complete discussion

    of parametric uncertainty estimates in a spectral analysis context, see Thomson

    ). This has led to the increasing use of nonparametric confidence interval estima-

    e fewer distributional assumptions, among which the simplest is the jackknife.

    ave (1991) give a detailed description of its implementation and performance in

    while Chave and Thomson (1989) describe its use for estimating confidence

    T response tensor. This requires only that N estimates of the response tensor

    by deleting a single data section at a time from the ensemble be computed and

    e. In a regression context, the jackknife offers the advantage that it is insensitive

    e assumption of variance homogeneity which is implicit to parametric error esti-

    977). The jackknife also yields conservative confidence interval estimates in the

    ill tend to be slightly too large rather than too small (Efron and Stein 1981), as

    T by Eisel and Egbert (2001). Finally, the jackknife offers a way to propagate

    er linear (e.g., rotation) or nonlinear transformations of the MT response tensor

    rwise be completely intractable; tensor decomposition to remove galvanic dis-

    ed in Chave and Smith (1994) is a prime example of the latter.

    trix and Leverage

    dictor matrix is an important auxiliary quantity in regression theory, and is

    tect unusually influential (i.e., high leverage) data (Hoaglin and Welsch 1978;

    0; Chatterjee and Hadi 1988). For the regression problem defined by (3), the

    s given by

    the difference between the observed electric field e and the value predicted

    ession on b. It follows that

    H b bHb( )1bH 11( )=

    e^

    e^ He 12( )=

    r I H( )e 13( )=12

  • where I is the ide

    diagonal element

    value of the respo

    dent of the actual

    inverse of hii is in

    Some relevan

    because it is a pro

    tics can be used t

    projection matrix

    rank(H)=rank(b)

    Thus, the expecte

    ing. When hii=0,

    then the predicted

    data exactly. In th

    leading to the term

    the i-th row of b

    ner to an outlier i

    age exerted by a p

    exceed the expec

    suggests that valu

    In Chave and T

    plex Gaussian da

    asymptotic forms

    The result is the b

    columns in b and

    beta function rati

    Chave and Thom

    where x=2/N anntity matrix. The hat matrix is a projection matrix which maps e onto . The

    s of H (denoted by hii) are a measure of the amount of leverage which the i-th

    nse variable ei has in determining its corresponding predicted value indepen-

    size of ei, and this leverage depends only on the predictor variables b. The

    effect the number of observations in b which determine .

    t properties of the hat matrix are described by Hoaglin and Welsch (1978). First,

    jection matrix, H is Hermitian and idempotent (H2=H). These two characteris-

    o show that the diagonal elements satisfy 0hii1. Second, the eigenvalues of a

    are either 0 or 1, and the number of nonzero eigenvalues equals its rank. Since

    =p, where p is the number of columns in b (usually 2 for MT), the trace of H is p.

    d value of hii is p/N. Finally, the two extreme values 0 and 1 have special mean-

    then is fixed at zero and not affected by the corresponding datum in e. If hii=1,

    and observed values of the i-th entry in e are identical, and the model fits the

    is instance, only the i-th row of b has any influence on the regression problem,

    high (or extreme) leverage to describe that point. More generally, if hii>>p/N,

    will exert undue influence (or leverage) on the solution in an analogous man-

    n e. Thus, the hat matrix diagonal elements are a measure of the amount of lever-

    redictor datum. The factor by which the hat matrix diagonal elements must

    ted value to be considered a leverage point is not well defined, but statistical lore

    es which are more than 2-3 times p/N are a concern (Hoaglin and Welsch 1978).

    homson (2003), the exact distribution of the diagonal elements of (11) for com-

    ta was derived from first principles, extending previous work which yielded only

    for real data (e.g., Rao 1973; Belsley et al. 1980; Chatterjee and Hadi 1988).

    eta distribution (hii,p,N-p), where p is the number of predictor variables or

    N is the number of data. The cumulative distribution function is the incomplete

    o Ix(p,N-p), and a series expression suitable for numerical solution is given in

    son (2003). For the MT situation (p=2) and when N>>2, some critical values , d Ix=, are given in Table 1. The table shows that the probability that a given hat

    e^

    ei^

    ei^

    ei^

    z^13

  • matrix diagonal e

    However, data wh

    occur with proba

    number of valid d

    corresponding to

    reasonable proba

    Gaussian data, ye

    is longer tailed th

    off values should

    It is important

    considerably from

    the rows of b. To

    Let denote th

    cross spectrum of

    may be written

    where 2 is an est

    cross- and auto-p

    In the special cas

    which is just the

    nent of b. As 2 in

    Sjkilement will exceed the expected value of 2/N by a factor of 4.6 is only 0.001.

    ose corresponding hat matrix diagonal is only twice the expected value will

    bility 0.091. Rejecting values at this level will typically remove a significant

    ata unless N is fairly small. Chave and Thomson (2003) advocate rejecting data

    hat matrix diagonal elements larger than the inverse of the beta distribution at a

    bility level. For example, selecting the 0.95 level will carry a 5% penalty for

    t effectively protect against large leverage points. In data sets whose hat matrix

    an beta (a common occurrence with MT data from auroral latitudes), larger cut-

    be used.

    to note that using the hat matrix diagonal elements as leverage statistics differs

    a simpler approach based on some measure of the relative variance or power in

    see this, consider (11) with p=2, as for a conventional single site MT sounding.

    e power between the j,k variables in the i-th row of b and let be the averaged

    the j,k variables obtained from all N data in b. The i-th diagonal entry in (11)

    imate of the squared coherence between bx and by derived from the averaged

    owers. The expected value of the right hand side of (14) is easily seen to be 2/N.

    e where 20 so that bx and by are uncorrelated, (14) reduces to

    sum of the ratios of the power in the i-th row to the total power in each compo-

    creases, (14) may be rewritten as

    Sjk

    hii1

    N 1 2( )-----------------------

    Sxxi

    Sxx--------

    Syyi

    Syy-------- 2

    2Re

    Sxyi

    Sxy--------

    + 14( )=

    hiio

    =Sxx

    i

    Sxxi

    i 1=

    N

    -----------------

    Syyi

    Syyi

    i 1=

    N

    -----------------+

    15( )14

  • This quantity ma

    and sign of the ra

    points may corres

    variables is much

    power between b

    with high leverag

    This discussio

    entries also have

    leverage which th

    again depends on

    diagonal element

    local in time, and

    off-diagonal elem

    The hat matrix

    definition in (11)

    where v is the d

    5. A Generaliz

    Consider the r

    remote reference

    standard remote r

    surements collect

    reference site or w

    reference site, the

    regressed on Q iny be either larger or smaller than (15), depending on the size of 2 and the size tio of the i-th row cross-power to the total cross-power. Thus, high leverage

    pond to rows of b where 2 is small and the power in one or both of the predictor

    larger than the average power, or to rows of b where 2 is nonzero and the cross-

    x and by is much larger than the average cross-power. Intermediate situations

    e are also possible.

    n has centered on the diagonal elements of the hat matrix, but the off-diagonal

    a leverage interpretation. The i,j off-diagonal element of H gives the amount of

    e j-th response variable ej has on the i-th predicted variable , and the leverage

    ly on the predictor variables b. Thus, in the MT context of this paper, the off-

    s of H give the amount of leverage exerted by magnetic field data which are non-

    hence are expected to be small unless there is significant nonstationarity. The

    ents of H will not be considered further in this paper.

    and its properties easily generalizes to the robust algorithm of (5)-(6) using the

    iagonal matrix of robust weights with entries vii.

    ation of the Remote Reference Method

    egression equation for the MT response (3), and assume that an auxiliary set of

    variables Q are available, where for a given frequency, Q is Nxq with q2. For eference MT data, Q would typically consist of horizontal magnetic field mea-

    ed some distance from the base site, and hence q=2. For more than one remote

    here more than the auxiliary horizontal magnetic field is available at a single

    rank q of Q may be larger. In this instance, the local magnetic field b may be

    the usual way by solving

    hii1

    1 2-------------- hii

    o22Re

    Sxyi

    Sxyi

    i 1=

    N

    -----------------

    16( )=

    ei^

    H vb bHvb( )1bH v 17( )=15

  • where t is a qx2 t

    ence variables in

    series of univaria

    local magnetic fie

    which is just a pro

    remote reference

    When q=2 so tha

    (20) can be simpl

    which is identica

    and cross-spectra

    to show that

    because such noi

    tion (21) with the

    the remote refere

    ables problems ca

    Geary 1949), the

    metric technique

    dle violations of

    with the residuals

    From (20), the

    which reduces to

    when a single rem

    erence field form

    ^bHransfer tensor between the local horizontal magnetic field b and the remote refer-

    Q and is an Nx2 random error matrix. In practice, (18) may be solved as a te regressions on the columns of b, t and . By analogy to (12), the predicted ld is

    jection operation. Substituting for b in (3) and solving yields the generalized

    analog to (4)

    t Q is the standard Nx2 matrix of remote reference magnetic field variables br,

    ified using standard matrix identities to yield

    l to the remote reference solution of Gamble et al. (1979), where the local auto-

    are replaced by cross-spectra with the reference magnetic field. In fact, it is easy

    , and hence (20) is not downward biased by uncorrelated noise in

    se is not present. Equation (20) may be thought of as the remote reference solu-

    two-channel reference magnetic field br replaced by the projection (19). Just as

    nce method is equivalent to an econometric technique for solving errors-in-vari-

    lled the method of instrumental variables (Wald 1940; Reiersol 1941,1945;

    generalized remote reference method in (19)-(20) is identical to a related econo-

    called two-stage least squares (Mardia et al. 1979). Both methods arose to han-

    the Gauss-Markov theorem where the predictor variables b in (3) are correlated

    r, in which case the solution (4) is not unbiased and statistically consistent.

    hat matrix for the two-stage least squares solution is

    ote reference station is used. Equation (23) differs from the mixed local and ref-

    HR proposed in Chave and Thomson (1989; see their equation 27) and further

    b Qt 18( )+=

    b^ Q QHQ( )1QHb 19( )=

    ^b

    z^q b^Hb^( )

    1b^He 20( )=

    z^r brHb( )

    1br

    He 21( )=

    b^ ^

    bHb=^b

    Hq^b

    ^bHb

    ^( )

    1 ^bH 22( )=

    Hr br brHbr( )

    1br

    H23( )=16

  • studied by Ghosh

    properties outline

    adopted in remot

    Chave and Th

    (21) to yield a rob

    where the weight

    straightforward e

    from uncorrelated

    shown to perform

    1997). However,

    reference magnet

    channels at a sing

    can be very usefu

    quency character

    6. Bounded In

    Two statistical

    estimators are the

    point is the small

    bounds that can b

    outlying data poin

    effect of an additi

    the functional der

    tions may be unb

    details may be fo

    The procedure

    inating bias due t

    data that produce

    and the local mag (1990). HR is not formally a projection matrix, and hence shares the hat matrix

    d in section 3 only approximately. The more natural version (23) should be

    e reference applications.

    omson (1989) combined a robust estimator like that described in Section 3 and

    ust remote reference solution to the MT problem

    s v are computed as for (6) based on the ordinary residuals r from (3). This is a

    xtension of M-estimation that simultaneously eliminates downward bias in (6)

    noise on the magnetic field channels and problems from outliers, and has been

    well under general circumstances (e.g., Jones et al. 1989; Chave and Jones

    it provides limited protection from overly influential values in either the local or

    ic fields. Equation (24) is easily generalized to more than 2 remote reference

    le station or to multiple reference sites by substituting for br to give . This

    l when remote reference sites contain uncorrelated noise with different fre-

    istics, as is demonstrated in Section 10.

    fluence MT Response Function Estimation

    concepts which are useful in understanding problems with least squares and M-

    breakdown point and the influence function. Loosely speaking, the breakdown

    est fraction of gross errors which can carry an estimate beyond all statistical

    e placed on it. Ordinary least squares has a breakdown point of 1/N, as a single

    t can inflict unlimited change on the result. The influence function measures the

    onal observation on the estimate of a statistic given a large sample, and hence is

    ivative of the statistic evaluated at an underlying distribution. Influence func-

    ounded or bounded as the estimator is or is not sensitive to bad data. Further

    und in Hampel et al. (1986, Section 1.3).

    s described in Sections 3 and 5 are capable in a semi-automatic fashion of elim-

    o uncorrelated noise in the local magnetic field and general contamination from

    outlying residuals. However, they do require that the reference channels in Q

    netic field b be reasonably clear of extreme values or leverage points caused by

    z^*r br

    Hvb( )1

    brHve( ) 24( )=

    ^b z^*

    q17

  • instrument malfu

    phenomena. Whe

    severely biased. A

    1/N, so that a sing

    ence functions ar

    The discussion

    of leverage, and h

    regression residu

    (e.g., Mallows 19

    mented, these alg

    ence estimators a

    common use. The

    where w is a leve

    wii = and

    (1975), in which

    downweighted ac

    the approach is th

    solution since lar

    However, Carroll

    approach can lead

    approach will be

    In practice, the

    ments because do

    their breakdown

    high breakdown p

    1984; Coakley an

    putational overhe

    cessing.

    An alternate b

    1 hiinctions, impulsive events in the natural source fields, or other natural or artificial

    n this assumption is invalid, then any conventional robust M-estimator can be

    s for least squares estimators, M-estimators possess a breakdown point of only

    le leverage point can completely dominate the ensuing estimate, and their influ-

    e unbounded.

    of Section 4 indicates that the hat matrix diagonal elements serve as indicators

    ence extensions of robust estimation that can detect both outliers based on the

    als and leverage points based on the hat matrix diagonal elements can be devised

    75; Handschin et al. 1975; Krasker and Welsch 1982). When properly imple-

    orithms have bounded influence functions, and are often termed bounded influ-

    s a result, although the term GM (for generalized M) estimator is also in

    bounded influence form of the normal equations analogous to (5) is

    rage weight. Two versions of (25) are in common use (Wilcox 1997). If

    =(ri/d) as in Section 3, then the form is that originally suggested by Mallows

    residual and leverage weighting are decoupled and leverage points are gently

    cording to the size of the hat matrix diagonal elements. If =(ri/(wiid)), then

    at proposed by Handschin et al. (1975), and is more efficient than the Mallows

    ge leverage points corresponding to small residuals are not heavily penalized.

    and Welsh (1988) have shown under general conditions that the Handschin et al.

    to a solution which is not statistically consistent, and only the Mallows

    used here.

    cited estimators have proven to be less than satisfactory with MT measure-

    wnweighting of high leverage data is mild and limited rather than aggressive;

    point is only slightly larger than for conventional M-estimators. More capable,

    oint (up to 0.5) bounded influence estimators have been proposed (Rousseeuw

    d Hettmansperger 1993), but these typically entail a substantial increase in com-

    ad which limits their applicability to the large data sets that occur in MT pro-

    ounded influence estimator that combines high asymptotic efficiency for Gauss-

    bHw 0 25( )=18

  • ian data, high bre

    ity that is suitable

    easily adapted to

    (24) with the prod

    matrix defined in

    age weight matrix

    influence MT res

    Based on num

    to both increase t

    ones, as will be d

    bounded influenc

    form as well as m

    produce an unsta

    leverage weights.

    tion step, which o

    dominated by a fe

    unity on the main

    where the statisti

    and M is the trace

    which determines

    thumb where row

    twice the expecte

    choice of is thea chosen probabi

    leverage points as

    points is applied

    matrix diagonal eakdown point performance with contaminated data, and computational simplic-

    for large data sets was proposed by Chave and Thomson (1992, 2003) and is

    MT estimation. This is accomplished by replacing the weight matrix v in (6) or

    uct of two diagonal matrices , where v is the M-estimator weight

    Section 3 whose elements are based on the regression residuals and w is a lever-

    whose elements depend on the size of the hat matrix diagonal. The bounded

    ponse obtained from (6) with u in place of v will be denoted by .

    erous trials with actual data, the robust weights in (6) or (24) are frequently seen

    he amount of leverage exerted by existing leverage points or create entirely new

    emonstrated in Section 10. This occurs because M-estimators do not have

    e and probably accounts for many instances where robust procedures do not per-

    ight be expected. Even with Mallows weights in (25), v and w can interact to

    ble solution to the robust regression problem unless care is used in applying the

    This is especially true when the leverage weights are re-computed at each itera-

    ften results in oscillation between two incorrect solutions each of which is

    w extreme leverage points. Instead, the leverage weights are initialized with

    diagonal, and at the k-th iteration the i-th diagonal element is given by

    c yi is with

    of u[k-1] which is initially the number of data points N. is a free parameter the point where leverage point rejection begins. If the usual statistical rule of

    s of the regression (3) corresponding to hat matrix diagonal entries larger than

    d value are to be regarded as leverage points, then =2. A better albeit empirical N-th quantile of the beta distribution or the value of its inverse corresponding to

    lity level. These choices will tend to automatically increase the allowed size of

    N increases. To further ensure a stable solution, downweighting of leverage

    in half decade stages beginning with set just below the largest normalized hat lement and ending with the final value, rather than all at once.

    u vw=

    z^#

    wiik[ ]

    wiik 1[ ]

    e2[ ] e

    yi ( )[ ] 26( )expexp=

    Mhiik[ ] p

    hiik[ ]

    uik 1[ ]

    xi XHu k 1[ ]X( )

    1xi

    Hui

    k 1[ ]27( )=19

  • To summarize

    remote reference

    Section 2. At eac

    compute the resid

    and the hat matrix

    over ranging frminimum value

    computed at each

    from the previous

    nates when the w

    1-2%, or the solu

    dard error. After a

    and an inner one

    (10), again termin

    to the ordinary re

    denoted by an

    7. Two-stage B

    Correlated no

    and hence extend

    Robust remote re

    electric and magn

    reference site wh

    a two step proced

    bounded influenc

    (6) is solved after

    tor variables mak

    can be eliminated

    The first stage

    z^#r this section, bounded influence response function estimation for without a

    begins with Fourier transform estimates of short data sections as described in

    h frequency, an initial least squares solution is obtained from (4) and used to

    uals r as well as the scale d from (8) and (9) using a Rayleigh residual model

    (11). A nested iterative procedure is then applied in which the outer loop steps

    om immediately below the largest hat matrix diagonal element to the selected

    o and the inner loop successively solves (6) using the composite weight matrix u

    step, where v is given by the Huber weights (7) using the residuals and scale

    inner iteration and the elements of w are given by (26). The inner loop termi-

    eighted residual power rHur does not change below a threshold value, typically

    tion does not change below a threshold value, typically a fraction of a stan-

    ll of the outer loop iterations, a second pass is made with an identical outer loop

    with the scale fixed at the final Huber value and the more severe robust weight

    ating when the weighted residual power does not change appreciably. Extension

    mote reference solution (24) or its generalization is straightforward, and will be

    d , respectively.

    ounded Influence MT Response Function Estimation

    ise of cultural origin in the local electric and magnetic fields is quite common,

    ing processing methods to attack such problems is of considerable interest.

    ference processing using (24) can be sensitive to correlated noise on the local

    etic field channels even if the reference channels are noise-free. However, if a

    ich is not contaminated by the noise source is available, this can be remedied by

    ure where the local and reference magnetic fields are first regressed using the

    e form of (18)-(19) to get a noise-free , and then the bounded influence form of

    substituting for b. At each step, the absence of correlated noise in the predic-

    es the noise which remains in the response variables appear like outliers so that it

    by the bounded influence weighting procedure.

    is the straightforward bounded influence solution of (18) using the procedures

    z^#

    z^#

    z^#q

    b^

    b^20

  • described in Sect

    where u1 is an Nx

    defined in Section

    e to give

    where u=u1u2, u

    and u1 remains fi

    8. Pre-screeni

    The procedure

    noisy, as can occu

    level. This freque

    10000 Hz for aud

    play relative mini

    into the response

    This occurs becau

    data-adaptive we

    The solution t

    data which have a

    estimator. Three a

    situations require

    gle station MT de

    field multiple coh

    AMT data based

    exceed the known

    selected data segm

    auroral source fie

    better performanc

    stages can be appions 2 and 5. The bounded influence estimate of the projected magnetic field is

    N diagonal matrix which contains the first stage bounded influence weights as

    6. The second stage in the solution is the bounded influence regression of on

    2 is the second stage bounded influence weight matrix computed as in Section 6,

    xed at the final values from the first stage solution.

    ng of Data

    s described in Sections 3-7 work well unless a large fraction of the data are quite

    r when the natural signal level is comparable to or below the instrument noise

    ntly occurs in the so-called dead bands between 0.5-5 Hz for MT and 1000-

    iomagnetotellurics (AMT), where natural source electromagnetic spectra dis-

    ma. In this instance, robust or bounded influence estimators can introduce bias

    tensor beyond that which would ensue from the use of ordinary least squares.

    se most of the data are noisy and the unusual values detected and removed by

    ighting are actually those rare ones containing useful information.

    o this difficulty is the addition of a pre-screening step which selects only those

    n adequate signal-to-noise ratio applied prior to a robust or bounded influence

    pproaches have been suggested, and others will undoubtedly evolve as future

    . Egbert and Livelybrooks (1996) applied a coherence screening criterion to sin-

    ad band data to select only those data segments with high electric to magnetic

    erence for subsequent robust processing. Garcia and Jones (2002) pre-sorted

    on the power in the magnetic field channels, selecting only those values which

    instrument noise level by a specified amount. Jones and Spratt (2002) pre-

    ents whose vertical field power was below a threshold value to minimize

    ld bias in high latitude MT data. All of these studies demonstrated substantially

    e for data-adaptive weighting schemes after pre-sorting. Similar pre-processing

    lied to data with single or multiple reference sites, although the number of data

    b^ u1Q QHu1Q( )

    1QH u1b( ) 28( )=

    b^

    z^#q# b

    Hub( )

    1b

    Hue( ) 29( )=21

  • sets and hence th

    9. Statistical V

    Quantitative te

    realistic situation

    the difficulty of d

    actual distribution

    nation of a data s

    ori tests to valida

    nonlinear form of

    implicitly involve

    devise methods f

    described in Bels

    ful have proven to

    ables and adaptiv

    matrix diagonal e

    In classical sp

    in the response va

    the predictor vari

    the observed elec

    the bounded influ

    cross power betw

    For two-stage pro

    corresponding fir

    between the obse

    where is replacbe number of candidate selection criteria are increased.

    erification

    sts for the presence of outliers and leverage points do not presently exist for

    s where anomalous data occur with multiplicity and intercorrelation. In addition,

    evising such tests rises rapidly with the number of available data and when the

    of the outliers and leverage points is unknown. This means that a priori exami-

    et for bad values is not feasible, and hence it is necessary to construct a posteri-

    te the result of a bounded influence analysis. Further, due to the inherently

    bounded influence regression solutions like (24) and (29) and because they

    the elimination of a significant fraction of the original data set, it is prudent to

    or assessing the statistical veracity of the result. These can take many forms, as

    ley et al. (1980) and Chatterjee and Hadi (1988). In an MT context, the most use-

    be the multiple coherence suitably modified to allow for remote reference vari-

    e weights, and quantile-quantile plotting of the regression residuals and hat

    lements against their respective target distributions for Gaussian data.

    ectral analysis, the multiple coherence is defined as the proportion of the power

    riable at a given frequency which is attributable to a linear regression with all of

    ables (e.g., Koopmans 1995, p. 155). In MT, this is just the coherence between

    tric field e and its prediction given by (11). Care must be taken to incorporate

    ence weights for the first and second stages, as appropriate. Let the weighted

    een vector variables x and y be defined as

    cessing, either x or y correspond to and hence will also implicitly contain the

    st stage weights. Using this notation, the generalized multiple squared coherence

    rved electric field e and its predicted value is

    ed by br for single site remote reference processing. In either instance, (31) is a

    e^

    Sxy xHuy 30( )=

    b

    e^

    ee^2

    Seb Sbb( )1 S

    be

    2

    SeeSbeH S

    bbH( )

    1Sbb Sbb( )

    1 Sbe

    ------------------------------------------------------------------------ 31( )=22

  • complex quantity

    phase is a measur

    the phase should

    with spatial scale

    magnetic field sit

    dard multiple squ

    A second and

    either with or wit

    tile plots of the re

    statistical distribu

    provide a qualitat

    quantiles of a dist

    equal probability

    for qj, where j=1,

    order statistics ob

    them data unit in

    sizes the distribut

    range.

    For residual q-

    with the order sta

    expected of a Ray

    distribution may

    of 1 corresponds

    The use of dat

    quantiles be obta

    will inevitably ap

    original one. Sup. Its amplitude is analogous to the standard multiple squared coherence and its

    e of the similarity of the local and remote reference variables. In most instances,

    be very close to zero; a significant nonzero phase is indicative of source fields

    s that are significant compared to the separation between the local and reference

    es. When is replaced by the local magnetic field b, (31) reduces to the stan-

    ared coherence, and implicitly has zero phase.

    essential tool for analyzing the results of a bounded influence MT analysis,

    hout two-stage processing, is comparison of the original and final quantile-quan-

    siduals and the hat matrix diagonal. Quantile-quantile or q-q plots compare the

    tion of an observed quantity with the corresponding theoretical entity, and hence

    ive means for assessing the statistical outcome of robust processing. The N

    ribution divide the area under a pdf into N+1 equal area pieces, and hence define

    intervals. They are easily obtained by solving

    ...N, and F(x) is the appropriate cdf. A q-q plot compares the quantiles with the

    tained by ranking and sorting the data along with a suitable scaling to make

    dependent. The advantage of q-q plots over some alternatives is that it empha-

    ion tails; most of a q-q plot covers only the last few percent of the distribution

    q plots, it is most useful to compare the quantiles of the Rayleigh distribution

    tistics of the residuals scaled so their second moment is 2, corresponding to that

    leigh variate. For hat matrix diagonal q-q plots, the quantiles of the (p,N-p) be compared to the order statistics with both quantities scaled by p/N, so a value

    to the expected value.

    a-adaptive weighting which eliminates a fraction of the data requires that the

    ined from the truncated form of the original target distribution or else the result

    pear to be short-tailed. The truncated distribution is easily obtained from the

    pose data are censored in the process of robust or bounded influence weighting

    b^

    F q j( )

    j 12---

    N---------------- 32( )=23

  • using an estimato

    dom variable X p

    beta distribution f

    able X is

    where axb and

    data, so that M=N

    above, respective

    inal distribution

    that of the origina

    where j=1,....M.

    Whether the d

    or hat matrix diag

    tent with the targ

    ifest as sharp upw

    diagnostic of long

    from the distribut

    residual q-q plots

    mately linear or s

    Gauss-Markov th

    variables, and hen

    (p,N-p) unless tare useful in dete

    Distributions a

    qualitative use of

    ing these results.

    the Kolmogorov-

    fr such as those described in Sections 3, 5, 6 or 7. Let be the pdf of a ran-

    rior to censoring; this may be the Rayleigh distribution for the residuals or the

    or the hat matrix diagonal. After truncation, the pdf of the censored random vari-

    is the cdf for . Let N be the original and M be the final number of

    -m1-m2, where m1 and m2 are the number of data censored from below and

    ly. Suitable choices for a and b are the m1-th and (N-m2)-th quantiles of the orig-

    . The M quantiles of the truncated distribution can then be computed from

    l one using

    ata have been censored or not, a straight line q-q plot indicates that the residual

    onal elements are drawn from the target distribution. Data which are inconsis-

    et distribution are suggested by departures from linearity which are usually man-

    ard shifts in the order statistic. With MT data, this is often extreme, and is

    tailed behavior where a small fraction of the data occur at improbable distances

    ion center and hence will exert undue influence on the regression solution. The

    from the output of a robust or bounded influence estimator should be approxi-

    lightly short tailed to be consistent with the optimality requirements of the

    eorem. However, the theorem does not prescribe a distribution for the predictor

    ce there is not a requirement for the hat matrix diagonal distribution to be

    he predictors are actually Gaussian. Nevertheless, hat matrix diagonal q-q plots

    cting extreme values, as will be shown in the next section.

    nd confidence limit estimates for the quantiles are given by David (1981). The

    q-q plots can be quantified by testing the significance of a straight line fit utiliz-

    Alternately, nonparametric tests for the goodness-of-fit to a target distribution of

    Smirnov type can be applied.

    fX x( )

    fX x( )fX x( )

    FX b( ) FX a( )----------------------------------- 33( )=

    FX x( ) fX x( )

    X x( )

    FX qj( ) FX b( ) FX a( )( )j 1 2( )

    M---------------------- FX a( ) 34( )+=24

  • 10. Examples

    The first exam

    Lithoprobe (BC8

    (1988) and Jones

    removal; see Cha

    about 17 h and in

    processed into Fo

    (see Section 3) af

    sequence with =

    and phase estima

    influence (hereaft

    6. The BI result w

    0.001%) point of

    results are simila

    to the BI ones an

    ing the effect of l

    between the two

    seen in Figure 2 c

    inate all data sect

    effect beyond a sl

    freedom.

    Figure 3 show

    squares AR filter

    ology are those u

    whitening. Note t

    weakest) obtained

    Figure 4 show

    5.3 s. This period

    dence limits from

    estimate given byple data set is taken from site 006 of a 150-km-long, east-west wideband MT

    7) transect in British Columbia obtained in 1987 and described by Jones et al.

    (1993). These data have been used extensively in studies of distortion and its

    ve and Jones (1997) for a summary. The data were sampled at a 12 Hz rate for

    clude a remote reference located 1 km from the local site. Each data series was

    urier transform estimates at selected frequencies using variable section lengths

    ter prewhitening with a five term robust AR filter and tapering with a Slepian

    1. Figure 2 compares the xy (where x is north and y is east) apparent resistivity

    tes and obtained using the ordinary robust (hereafter OR) and bounded

    er BI) remote reference estimators, respectively, as described in Sections 5 and

    as computed with the cutoff parameter in (26) taken as the 0.99999 (or upper the beta distribution with (2, N-2) degrees of freedom. In general, the OR and BI

    r except in the band between 2-20 s, where the OR estimates are biased relative

    d the OR jackknife confidence limits are sometimes dramatically larger, reflect-

    everage. There are also more subtle but statistically significant differences

    types of estimates at periods shorter than 2 s, as shown below. The differences

    annot be removed by coherence thresholding; pre-processing of the data to elim-

    ions with electric to magnetic field squared multiple coherence under 0.9 has no

    ight increase in the confidence limits due to a decrease in the effective degrees of

    s an example of the severe bias that can ensue from prewhitening with a least

    rather than the robust AR filter described in Section 2. The data and BI method-

    sed for Figure 2, and the only difference between the two results is in the pre-

    he erratic results at periods under 2 s (where the data power spectral density is

    with the least squares prewhitening filter.

    s a complex plane view of various response estimates for these data at a period of

    is in the middle of the band where substantial differences between the confi-

    the OR and BI estimators obtain. The ordinary least squares (hereafter OLS)

    (4) is compared to the OR and BI results; the latter are shown with the cutoff

    z^*r z^#

    r25

  • parameter in (2respectively) poin

    number of estima

    substantially diffe

    In Figure 4, both

    ticity that is not r

    despite the elimin

    and 11.5% of the

    fraction of the da

    by more than a fa

    Figure 5 show

    response function

    between both the

    dard error is less

    substantially sma

    compared to the B

    dard errors of the

    Figures 6 and

    0.99999 BI estim

    long tailed, indic

    matrix distributio

    The most serious

    most serious leve

    contrast, the BI re

    of the result is ind

    than Rayleigh rat

    guishable from th

    6, indicating that

    ure to eliminate le

    in Figures 2 and 4

    expected values a6) taken as the 0.999, 0.9999, and 0.99999 (upper 0.1%, 0.01%, and 0.001%,

    ts of the beta distribution for (2, N-2) degrees of freedom, where N=4342 is the

    tes. There is no statistically significant difference between the estimates with

    rent values of the BI cutoff parameter, indicating that its selection is not critical.

    the OLS and OR estimates display large uncertainties, reflecting heteroscedas-

    emoved by data weighting based entirely on the size of the regression residuals

    ation of 7.2% of the data. The BI estimator removes an additional 6.8%, 8.5%,

    data as the cutoff varies between 0.99999 and 0.999, indicating that only a small

    ta is responsible for the heteroscedasticity. Further, the standard error is reduced

    ctor of 34 over the OR result from bounding the influence.

    s analogous complex plane responses from the region where more subtle

    differences are observed at a period of 0.89 s. There are large distinctions

    OLS and OR and the OR and all of the BI estimates, although the relative stan-

    variable than for Figure 4. In this instance, the M-estimator solution displays a

    ller standard error than both the OLS and BI values. It is also markedly biased

    I result; the M-estimator solution differs from the BI one by more than 6 stan-

    latter.

    7 compare residual and hat matrix quantile-quantile plots for the OLS and

    ates at 5.3 s shown in Figure 4. The OLS q-q plots in Figure 6 are both markedly

    ating the presence of severe outliers and leverage points, and in fact the hat

    n appears to be more like a long-tailed version of log beta than long-tailed beta.

    outliers are about 50 standard deviations from the Rayleigh mean, while the

    rage points are over 1000 times the expected value of the hat matrix diagonal. In

    sidual q-q plot in Figure 7 is only weakly long-tailed, and the upward concavity

    icative of a residual distribution which is slightly but pervasively longer tailed

    her than the presence of significant outliers. The OR residual q-q plot is indistin-

    at in Figure 7, while the OR hat matrix q-q plot is very similar to that in Figure

    robust weighting has little influence on leverage points for this data set. It is fail-

    verage effects which accounts for the large confidence limits on the OR estimate

    . The BI hat matrix q-q plot is approximately log beta with some smaller than

    t the lower end, but the huge leverage points at the upper end of the distribution 26

  • have been elimin

    data at both the lo

    changed from tho

    Figure 8 comp

    s estimates shown

    increased the size

    between the OR a

    than unusual, and

    measure. In fact,

    show large statist

    leverage control.

    The second ex

    South Carolina to

    were part of the l

    pled, 5 componen

    taken as SEA335

    belt, and the remo

    SEA360(34530

    local site, respect

    the x-direction st

    Mountains) and t

    problems, althoug

    ent resistivity is a

    were computed u

    robust AR filter, a

    ing to avoid conta

    In fact, these d

    with most standa

    contaminated ver

    the apparent rz^#rated. It is easy to modify the bounded influence estimator to eliminate unusual

    wer and upper ends of the distribution, but the ensuing estimates are not

    se shown in Figure 4.

    ares OLS and OR q-q plots for the hat matrix diagonal corresponding to the 0.89

    in Figure 5. Comparison of the two shows that robust weighting has actually

    of the most extreme leverage points, possibly accounting for the difference

    nd BI results in Figure 5. This phenomenon is typical of OR procedures rather

    is not surprising given the form of (24), which is independent of any leverage

    all of the short period responses in Figure 2, where the OR and BI estimates

    ical differences, display this effect, indicating serious bias in the absence of

    ample is taken from the Southeast Appalachians (SEA) transect extending from

    Tennessee collected in 1994 (Wannamaker et al. 1996). The data sets used here

    ong period MT effort described by Ogawa et al. (1996), and consist of 5 s sam-

    t MT data collected contemporaneously at many sites. The local site will be

    (361217"N, 815520"W) which is located within the Blue Ridge Mountain

    te sites will be taken as SEA320 (363852"N, 821905"W) and

    8"N, 801630"W), located 61 and 209 km to the northwest and southeast of the

    ively. The data have been rotated into an interpretation coordinate system, so that

    rikes N50E (or approximately along the strike direction for the Appalachian

    he y-direction strikes N140E. All of the data are fairly clean and free of obvious

    h the local site does display strong current channeling such that the Zxy appar-

    bout 1/100 the Zyx apparent resistivity. In what follows, all spectral estimates

    sing a time-bandwidth 3 Slepian sequence after prewhitening with a five term

    nd the fourth and sixth frequencies in each subsection were selected for process-

    mination from unresolved low frequency components.

    ata are sufficiently clean that nearly identical processing results are obtained

    rd methods, yielding a baseline result for comparison with various artificially

    sions of the data. As a demonstration, Figure 9 compares the yx component of

    esistivities and phases using sites 320 and 360 separately as remotes. The results 27

  • for different choi

    computed with th

    simultaneously a

    low environmenta

    outliers. In fact, t

    ure 9, so that leve

    high across the ba

    cations at this mi

    multiple referenc

    although the long

    in the x direction.

    in the sequel.

    As a simulatio

    360 magnetic fiel

    which have been

    ter having cutoff

    be about 1/3 that

    lent to Students

    the Gaussian. As

    exceed those of th

    high enough to to

    Figure 10 com

    OR estimates

    erences, respectiv

    360 remote estim

    ence data. Howev

    identical to the re

    of multiple remot

    uncontaminated a

    z^*rces of remote references in Figure 9 are indistinguishable. The estimate

    e two-stage formalism described in Sections 5 and 7 using sites 320 and 360

    s remotes is essentially identical, reflecting the high quality of the data and the

    l noise level. Residual and hat matrix q-q plots show the presence of only weak

    he OR estimates and are nearly identical to the BI estimates shown in Fig-

    rage effects in these data are minimal. The generalized coherence (31) is very

    nd while its phase remains near zero, reflecting the lack of source field compli-

    d-latitude site. Because of these data features, little has been gained by using

    e sites. Comparable results are obtained for the xy component (not shown),

    est period estimates have large uncertainties due to the very weak electric field

    The response using site 320 as a remote will be used as a reference response

    n of cultural effects, noise was added to both horizontal components of the site

    d data. This consists of random samples drawn from a Cauchy distribution

    filtered forward and backward with a fifth order Chebyshev Type 1 bandpass fil-

    points at about 200 and 1000 s. The MAD of the Cauchy noise was adjusted to

    of the data. The Cauchy distribution (Johnson et al. 1994, Chapter 16) is equiva-

    t distribution with one degree of freedom and is substantially longer tailed than

    a result, the additive noise appears to be quite impulsive, with peak values that

    e original magnetic data by up to a factor of 10,000, and in fact the noise level is

    tally obscure the original data in a time series plot.

    pares the Zyx BI response using site 320 as a reference as in Figure 9 with the

    and using site 360 (with noise) and sites 320+360 (with noise) as remote ref-

    ely. In the 200-1000 s band, both the apparent resistivities and phases for the site

    ates are totally useless, being wildly biased by the high noise level in the refer-

    er, the dual remote reference method easily eliminates this effect, and is nearly

    sult using clean remote reference data. This is a graphic illustration of the value

    e references; good results will be obtained provided that at least one data set is

    t a given frequency.

    z^#q

    z^*r z^*

    q

    z^#r

    z^*q28

  • Figure 11 sho

    stantially reduced

    site 360 as a remo

    that from unconta

    tion, bounding th

    mance and yield

    Figures 10 and 11

    tages when one o

    of the time interv

    mator automatica

    clean remote is av

    Figure 12 sho

    ing the x-compon

    for both noise-fre

    the original data,

    part by up to a fa

    field variations at

    essentially vanish

    As a simulatio

    tered in the same

    MAD comparabl

    The noncausal im

    computed by min

    ance; a maximum

    (1992). Simulated

    with the noise tim

    The resulting noi

    from 0 to over 50

    lated between the

    functions as the n

    pares BI processiws the BI counterpart to Figure 10A, indicating that control of leverage has sub-

    both the estimate bias and the size of the confidence limits even with only noisy

    te. As for the OR approach, the dual remote estimate (not shown) is identical to

    minated data. While the multiple remote solution is clearly the preferred solu-

    e influence of the remote data set can substantially improve the estimator perfor-

    an interpretable response when the reference is noisy. However, the examples in

    suggest that multiple rather than single remote references do offer real advan-

    f the remote sites is noisy either over part of the period range of interest or part

    al. The two-stage regression approach which is implicit to the dual remote esti-

    lly eliminates noisy frequency or time intervals, presuming that at least one

    ailable.

    ws the magnitude of the txx component of the BI magnetic transfer tensor relat-

    ents of the local and remote magnetic fields from the first stage in this process

    e and noisy site 360 data corresponding to Figures 9 and 11, respectively. With

    the site 360 transfer function is systematically smaller than its site 320 counter-

    ctor of 2 due to some combination of weaker coherence with or larger magnetic

    site 320. With the contaminated site 360 data, the site 360 transfer function

    es in the 200-1000 s band.

    n of correlated cultural contamination, bandpassed Cauchy-distributed noise fil-

    way as for Figures 10-12 polarized 30 clockwise from the x-axis and with a

    e to the original data was added to the local magnetic field variables at site 335.

    pulse responses from the Zyx MT impedances for the uncontaminated data were

    imizing a functional of the impulse response subject to fitting the MT imped-

    smoothness constraint in log frequency was imposed as described by Egbert

    electric field noise data were produced by convolving the impulse response

    e series. The size of the noise electric field was also varied by amplitude scaling.

    se data were added to the site 335 data, varying the time duration of the noise

    % of the total. Unlike for the earlier examples, the additive noise is now corre-

    predictor and response variables, and will result in rising bias in the response

    oise duration increases unless it is eliminated during processing. Figure 13 com-

    ng on the original (clean) data with BI and two-stage processing on contami-29

  • nated data with a

    80 times larger th

    the estimator has

    use because the h

    320 which are fre

    variables makes i

    In contrast, the tw

    limits because the

    local magnetic fie

    the electric field b

    Figure 14 com

    graphically illust

    about 60% of the

    ences in Figure 1

    The two-stage alg

    fraction approach

    11. Discussion

    Based on the r

    processing should

    presented by Jone

    the state-of-the-a

    ments. The autho

    gently rather than

    the time series an

    tion process must

    any noise compon

    about data editing

    analysis is often u

    to be assessed (Se

    also indicate whenoise fraction of 40% and the noise electric field scaled so that its MAD is about

    an the background. The BI result is erratic over the affected period band since

    difficulty distinguishing good from bad data. Leverage weighting is of limited

    at matrix is effectively that from the remote reference data (see Section 5) at site

    e of the noise, and the correlated noise on the local electric and magnetic field

    t difficult to distinguish data from noise as the noise fraction and amplitude rises.

    o-stage estimate remains stable with only modest increases in the confidence

    first stage robust weights distinguish and remove the noisy sections from the

    ld, and then the second stage robust weights eliminate any remaining noise in

    ecause the local magnetic field has been cleaned.

    pares residual q-q plots for the BI and two-stage estimators at 853 s period,

    rating the effectiveness of the latter. For both the BI and two-stage estimates,

    data have been eliminated in the contaminated band by weighting; the differ-

    3 reflect how completely the noise-contaminated sections have been removed.

    orithm operates effectively both with high noise amplitude and with a noise

    ing 50%.

    ecent MT literature, it would not be controversial to suggest that all MT data

    employ some type of robust estimator with a remote reference. The examples

    s et al. (1989) provide graphic evidence of the value of a robust approach, and

    rt has certainly improved over the past decade to further strengthen the argu-

    rs are not aware of any instances where robust methods fail when applied intelli-

    blindly. This means that individual attention must be paid to each data set, all of

    d their power spectra need to be inspected, and the relevant physics of the induc-

    be incorporated into the analysis plan (e.g., the nature of the source fields and

    ents must be understood in at least a gross sense). These steps lead to decisions

    , detrending, and the pre-processing stage (Section 8). A preliminary non-robust

    seful because it allows the raw data coherences and residual/hat matrix statistics

    ction 9), and helps to further refine the requirement for pre-processing. It can

    ther a standard robust estimator will be sufficient.30

  • Based on the r

    is suggested that

    that of its robust c

    not seen any exam

    mum, comparabl

    ence results are s

    where substorm e

    examples in Sect

    which weight dat

    points, and can ea

    point. Bounded in

    influence estimat

    even larger fracti

    It is not difficu

    distribution seen

    non-linear, and m

    are the result of m

    netic variations w

    hence the resultin

    There are situa

    fields where more

    Section 7 works w

    et al. (1996) intro

    electric trains tha

    magnetic referenc

    algorithm, a tenso

    computed to give

    Both the magneto

    which explicitly sesults presented in this paper and the authors experience over the past decade, it

    standard use of a bounded influence remote reference estimator should supplant

    ounterpart. If the enumerated analysis principles are followed, the authors have

    ples where bounded influence estimators fail to yield results that are, at a mini-

    e to robust estimates, and have seen numerous examples where bounded influ-

    ubstantially better. This is especially true as the auroral oval is approached,

    ffects can produce spectacular leverage (see Garcia et al. 1997, and the BC87

    ion 10), and in the presence of many types of cultural noise. Robust estimators

    a sections on the basis of regression residuals frequently fail to detect leverage

    sily be dominated by a small fraction of high leverage data or even a single data

    fluence estimators avoid this trap in a semi-automatic fashion. The bounded

    or of Section 6 has a breakdown point of ~0.5, and can handle data sets with an

    on of high leverage values under some circumstances.

    lt to rationalize the underlying log beta form for the non-BI hat matrix diagonal

    in Figures 6 and 8. Ionospheric and magnetospheric processes are extremely

    ore so at auroral than lower latitudes. As a result, their electromagnetic effects

    any multiplicative steps. This means that the statistical distribution of the mag-

    ill tend towards log normal rather than normal (Lanzerotti et al. 1991), and

    g hat matrix diagonal will tend towards log beta rather than beta.

    tions where correlated noise is present on both the local electric and magnetic

    elaborate processing algorithms may be required. The two-stage algorithm of

    ell in many instances, as shown with a synthetic example in Section 10. Larsen

    duced a different two-stage procedure to deal with the correlated noise from DC

    t explicitly separates the MT and noise responses. In both instances, a remote

    e which is free of the correlated noise source is required. In the Larsen et al.

    r relating the local and remote magnetic fields equivalent to t in (18) is first

    a correlated-noise-free , so that the correlated noise in b is given by .

    telluric response z and the correlated noise tensor z are obtained from

    eparates e into magnetotelluric signal and correlated noise parts, plus an uncor-

    ^b b b

    ^

    e b^ z b b^( )z 35( )+=31

  • related residual. E

    problem so that

    parameter solutio

    with the correlate

    some degree. Thi

    increases, althoug

    lated noise tensor

    ponents method o

    good MT respons

    results with real d

    Using data fro

    (2001) compared

    on the MT respon

    metric theory for

    weight jackknife,

    computationally m

    their favoring the

    data from locatio

    than Gaussian eve

    sets. In this instan

    limits, which cou

    cally long tailed f

    from mid-latitude

    uncertainty, as we

    Chave (1991), the

    essence, this favo

    rifice in reliability

    Acknowledgem

    This work wasquation (35) is easily solved using the methods of this paper by recasting the

    and comprise a four-column b in (3), and z in (3) is replaced by a four

    n containing z and z. However, uncorrelated noise in b is implicitly combined

    d noise , and hence the estimates of z and z will always be biased to

    s bias will rise as the relative size of the correlated noise to signal

    h this is hard to quantify. The two-stage estimate is not biased because the corre-

    z is never explicitly computed. A third alternative is the robust principal com-

    f Egbert (1997), which can detect contamination from correlated noise and yield

    es in some cases. It is presently unclear which approach will yield the best

    ata; evaluating this will require direct comparisons using troublesome data sets.

    m a continuously operating MT array in central California, Eisel and Egbert

    the performance of parametric and jackknife estimators for the confidence limits

    se function. Thei