Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
A GENERALIZED KOLMOGOROV-SMIRNOV STATISTIC FOR DETRITAL ZIRCON ANALYSIS OF MODERN RIVERS
Oscar M. Lovera1 Department of Earth & Space Sciences, 3806 Geology Building, University of California, Los Angeles, Los Angeles, CA 90095, USA, Tel: 310-206-2657, FAX: (310) 825-2779 [email protected]
Marty Grove Department of Geological & Environmental Sciences, Stanford University, Stanford, CA, 94305, USA, [email protected]
Sara E. Cina Department of Earth & Space Sciences, University of California, Los Angeles, Los Angeles, CA 90095, USA, [email protected]
1Corresponding author
In preparation for Journal of Sedimentary Research (Feb. 5, 2009)
“Current Ripples” Format
Version 9 = 7100 Words (entire document)
ABSTRACT The Kolmogorov-Smirnov (K-S) statistic is widely used to test the null
hypothesis (i.e., are two distributions drawn from the same population?). In detrital
zircon provenance analysis of river systems, it is useful to have an equivalent statistical
measure for comparisons involving composite samples that collectively represent the
contributions of tributaries to regionally extensive river systems. We present a
generalized K-S statistic that depends on the proportional contribution and sample size of
individual distributions as well as the correlation between the respective individual
populations. Our generalized K-S statistic may be generally applied to analyze any type
of one dimensional data and can be calculated from either error weighted distributions
(i.e., cumulative probability density functions) or raw data series (i.e., cumulative
distribution functions). Analytical expressions are provided for end member cases in
which the individual populations are either completely independent or identical.
Although intermediate cases must still be tested by numerical analysis, the bounding end
members generally tightly constrain all possible solutions and thus permit conservative
evaluation of the null hypothesis. We demonstrate how the generalized K-S statistic can
be used with an example from the modern Marsyandi River (central Nepal Himalaya)
river system. Our results clarify the manner in which in which detrital zircon age
distributions from tectonically active areas may be used to constrain key parameters such
as erosion rates within the catchment of major rivers.
Key Words: Kolmogorov-Smirnov, mixing, detrital zircon, U-Pb age, modern river sand.
1. Introduction
Provenance analysis of fluvial depositional systems by detrital
geo/thermochronology is used to address a wide range of geologic problems within the
geosciences (Fedo et al., 2003; Bernet and Spiegel, 2004). Measurement of detrital
zircon age distributions has become the premier approach to characterize sedimentary
provenance with hundreds of studies now incorporating such data annually. Large data
sets that make provenance analysis a meaningful exercise became possible when
secondary ionization mass spectrometry (SIMS) methods began to be applied (e.g.,
Compston et al., 1984; Ireland et al., 1998; DeGraaff-Surpless et al., 2002; Grove et al.,
2003). Further advances were realized when detrital zircon data sets began to incorporate
trace element, Hf isotopic, (U-Th)-He age, and other types of information (e.g., Grimes et
al., 2007; Mueller et al., 2007; Campbell et al., 2005). Presently, very high throughput
laser ablation, inductively coupled plasma mass spectrometry (LA-ICP-MS; e.g., Kosler
et al., 2002; Gehrels et al., 2006) techniques now routinely permit very large data sets to
be gathered with ease (e.g., Adams et al., 2007; Dickinson and Gehrels, 2008). Given
this explosion of data, it is of paramount importance that development of interpretative
methods for detrital zircon analysis keep pace.
Statistical analysis of detrital zircon age distributions has generally focused either
upon the extent of sampling required adequately characterize an age distribution (e.g.,
Dodson et al., 1988; Vermeesch, 2004; Anderson, 2005) or upon methods of comparing
one measured detrital distribution with another (see discussion in Fedo et al., 2003). Of
the methods to compare distributions, the most widely applied approach for detrital
geochronology applications has been the Kolmogorov-Smirnov (K-S) statistic (e.g.,
Lovera et al., 1999; Berry et al., 2001; Degraff-Surpless et al., 2003; Amidon et al., 2005;
Fletcher et al., 2007; Prokopiev et al., 2008). The K-S test has a well understood
statistical basis but is insensitive to differences exhibited by the tails of distributions and
was not formulated to permit experimental error to be explicitly accounted for (Press et
al., 2002; Borradaile, 2003). Useful alternative approaches that have a less well-
understood statistical basis include measures of similarity and overlap (Gehrels, 2000),
multivariate analysis (Sircombe, 2000), and kernel functional estimation (Sircombe and
Hazelton, 2004). Sambridge and Compston (1994) have described approaches for
deconvolving overlapping age components. However, none of the above statistical
approaches can be readily applied to make comparisons involving mixtures of separately
measured age distributions.
Analysis of modern river systems is an important exercise for evaluating both our
understanding of present-day surficial processes (e.g., Brewer et al., 2003; Amidon et al.,
2005) as well as our ability to reconstruct past depositional systems (e.g., Fletcher et al.,
2007). With modern rivers in which all critical details of the system are available for
study, is possible to sample the detrital zircon age distributions of sands supplied by
tributaries with observable surface geology in their respective catchment basins to then
determine how these signals are combined to yield an integrated provenance signature
further downstream (Fig. 1). In order to meaningfully accomplish the task illustrated in
Figure 1, a valid statistical approach for making comparisons involving mixtures is
required. In this paper, we describe we describe how the widely used K-S statistic can be
extended so that it is applicable to the analysis of mixtures. While our approach has
specifically been developed for provenance analysis, the method is general and can be
applied to compare natural and synthetic mixtures of one dimensional data in similar
problems in the geosciences and other disciplines.
2. Background
As summarized by Press et al. (2002), the Kolmogorov-Smirnov (K-S) statistic
can be used to determine whether two distributions are drawn from the same population
(i.e., the null hypothesis). The K-S test is specifically used to evaluate the validity of the
null hypothesis which holds that there is no significant difference between two
populations and that any observed difference relates to insufficient sampling or
experimental error. There are two types of comparisons that can be made. In the one-
sample test, a measured distribution is compared with a reference function. In the two-
sample test, two measured distributions are compared. The advantage of the K-S test
over more exacting statistical measures such as the students t test is that no assumptions
are made with respect to the nature of the distributions being compared (see Press et al.,
2002; Borradaile, 2003). The lack of a dependence upon the nature of the distribution is
essential when no expectations regarding the form of the distributions are possible.
In the K-S test, the null hypothesis is evaluated by measuring the absolute value
of the maximum difference between two cumulative distribution functions (CDF). In the
one sample test, a sample CDF SN(x) is compared to a known distribution function P(x) as
follows:
max ( ) ( )NxD S x P x
−∞ < < ∞= − [1]
The two sample test, D is defined as:
1 2
max ( ) ( )N NxD S x S x
−∞ < < ∞= − [2]
where 1( )NS x and
2( )NS x are different data sets assumed to be drawn from the same
distribution function. Provided that the value of D is non-zero, its significance (QKS) can
be assessed from the following function (Press. et al 2002):
2 21 2
1
( ) 2 ( 1) j jKS
j
PROB Q e λλ∞
− −
=
≡ = −∑ [3]
where QKS varies from QKS(0)=1 to QKS(∞)=0. The significance of D given by [3] is the
maximum probability (PROB) of accidentally rejecting a true null hypothesis. Two
distributions are regarded as distinct (i.e., null hypothesis = false) if the probability that D
could be higher than the observed value is less than a specified significance level. In the
K-S test, the significance level is conventionally set at 0.05 (corresponds to 95%
confidence).
In the one sample test, the size of the sample is given by Ne. When Ne > 20, the
factor λ is well approximated by:
eN Dλ = [4]
For smaller data sets, λ is more accurately approximated by:
( )0.12 0.11/e eN N Dλ = + + [5]
In the two sample test, Ne is given by:
1 2
1 2e
N NNN N
=+
[6]
where N1 and N2 are the respective sizes of samples 1 and 2.
Figure 2A illustrates the dependence of PROB upon D and Ne. Curves are shown
for five effective sample sizes (Ne = 9, 25, 100, 400, and 900). Note that a “critical”
value for D (Dcrit) can be defined at PROB = 0.05. As Ne becomes larger, the curves shift
to lower Dcrit values to make the K-S test more sensitive. Figure 2B plots Dcrit as a
function of Ne. As indicated, the sensitivity of the K-S test rapidly increases for sample
sizes between 9 and 100. Comparisons between data sets defined by Ne ~ 100, the
threshold for distinguishing two distributions at 95% confidence involves a maximum
difference in the age distributions of 13% (Fig. 2B). Depending upon the geologic
questions being addressed, this sensitivity for the K-S test may be completely adequate.
However, if significantly greater sensitivity is demanded by the application, a much
larger increase in the size of the data set is required to achieve it. For example, a value
for Ne = 900 is needed to reduce the Dcrit value for distinguishing two distributions at
95% confidence to 4% (Fig. 2B).
3 Incorporating Analytical Error into Cumulative Distribution Functions
In the K-S test, the CDF is calculated from the one dimensional data series
without accounting for experimental errors. Although various authors including
Silverman (1986) and Sircombe (2000) have proposed using a Gaussian Kernel
Probability (GKP) function to represent the CDF, the impact of using error weighted data
for the K-S test has not been formally evaluated. Below we provide a method for
estimating the magnitude of the correction to Dcrit (ΔDcrit) that is required to correctly use
the K-S test when error weighted data are employed.
For a sample of size N, with values i iμ σ± , the GKP function is defined as the
normalized sum of all the Gaussian functions corresponding to each data point
(Silverman, 1986). The cumulative probability density function (CPDF) is thus given by
the integral of GKP function as follows:
2
2( )
2
1
1( )2
i
i
tt N
i i
eCPDF t dtN
μσ
σ π
′−−
=−∞
′= ∑∫ [7]
Weighting the data according to error produces a much smoother function than the CDF.
Use of CPDF thus produces smaller D values than would have been obtained if D had
been calculated from CDF’s. Because this increases the value of PROB in [3], the
sensitivity of the K-S test is decreased and it becomes more difficult to disprove the null
hypothesis.
To correct for this artifact and maintain the sensitivity of the K-S test, we have
undertaken Monte Carlo simulations to quantitatively determine how Dcrit is impacted by
the average percent error (σav) associated with the data sets being compared. Figure 3
plots the magnitude of the shift (ΔDcrit) in Dcrit. vs. the square root of σav. Individual
Monte Carlo results are shown as open circles for simulations involving Ne =100. The
solid line through these data is a linear best-fit. Best fits for simulations involving Ne =
25, 400, and 900 are also shown. Based upon these results, we have empirically derived
the following expression:
. 0.163crit ave eD NσΔ = [8]
For σav = 1, ΔDcrit. varies from 0.033 to 0.005 when Ne is increased from 25 to 900.
Increasing σav to 4 increases the range in ΔDcrit values from 0.065 to 0.011 when Ne is
increased from 25 to 900. Thus, although the sensitivity of the K-S test is quite sensitive
both to σav and Ne for small Ne, the influence of σav decreases as Ne is increased.
4. Generalizing the K-S test to Mixtures
The K-S test applies only to single distributions and there is no established
protocol for comparing distributions consisting of pooled data. In detrital zircon
provenance analysis of river systems, there is a need to develop a statistically valid test to
evaluate how sand delivered by tributaries collectively supply the trunk river. Let X1,
X2… Xm represent the cumulative distribution functions (CDF) of the age distributions
sampled from the tributaries of a river system (Fig. 1). The sample sizes and weighting
coefficients associated with these individual samples are n1, n2… nm, and 1 2, ... mφ φ φ
respectively. For rivers, values of φ would represent the proportional contributions of
each tributary as estimated by catchments area, discharge, bed load, etc. The CDF of the
mixture (Xw) is defined as:
1 1 1
* ; where 1 andm m m
w i i i ii i i
X X N nφ φ= = =
= = =∑ ∑ ∑ [7]
where N is the total number of measurements. The river population, Xw, can be assumed
to be a linear combination of the individual populations. Applying [2], we can calculate
the parameter D to evaluate the null hypothesis:
max ( ) ( )w bxD X x X x
−∞ < < ∞= − [8]
where Xb is a random samples from the main trunk of the river. Because Xw(x) depends
both upon the nature of the distributions being mixed and the weighting coefficients (φi)
used to calculate the mixture, the significance function for D is no longer independent of
the distributions being compared. However, we have identified two limiting cases in
which the added degrees of freedom associated with mixtures can be expressed
analytically. These limiting cases bound the spectrum of possible solutions and thus
allow conservative evaluation of the null-hypothesis.
3.1 Limiting Cases
In presenting the limiting cases below, we distinguish between identical and
independent populations. The definition of identical and independent populations
relevant to the generalized K-S test is established by the correlation coefficient rij
between the populations Fi and Fj:
2 2
;i jij
i j
F Fr i j
F F= ≠∫
∫ ∫ [9]
For the generalized K-S test, we define populations Fi and Fj to be identical if rij=1 ∀ i≠j
and independent if rij=0 ∀ i≠j in [9] above.
3.1.1 Mixtures Drawn from Identical Populations
A trivial though important limiting case exists for the situation in which the
CPDF’s of all of the individual data sets used to form the mixture are identical. In this
special case, the CPDF of the mixture ( wX ) is independent of the individual contributions
φi and is represented by:
1
mi
w ii
nX XN=
= ∑ ; 1
m
ii
N n=
= ∑ [10]
where ni and Xi are the size and CPDF of the ith data set in the mixture, and N is the total
number of measurements contributed to the mixture by the m data sets. Note that while
D and PROB can be calculated from [2] and [3] respectively, the size of the mixture
required for [4]-[6] will generally differ from the sum total number of analyses
contributed by the mixing components. The effective size of the mixture (Neff) can be
calculated from sizes of the individual sample sizes (Ni) used to construct the mixture
using the following first order approximation:
2
1
1 mi
ieff iN Nφ
=
= ∑ [11]
Only when ii
nN
φ = will Neff = N. We present a second order approximation for Neff in the
Appendix (see [A1]).
3.1.2 Mixtures Drawn from Independent Populations
A second limiting case is presented by the situation in which the CDF’s of the
individual populations are independent. When individual populations Xi are independent,
the PROB valve of the mixture would be a product of the individual PROB values
associated with each distribution. In the first-order approximation, equations [3] and [4]
become:
( )1
1 ( ) 1 ( )m
obs KS ii
PROB D D Q λ=
− > = −∏ [12]
i ii
Dnλφ
= [13]
Second-order approximations are presented in the Appendix A ([A2] thru [A6]). Figure 4
compares the significance functions for the two end-member cases when all other
variables (ni, φi) remain constant. PROB values for intermediate cases will be bounded
by the end-member solutions.
5. Test Case: Drainages of the Central Nepalese Himalaya
5.1 Background
The Himalayan orogen provides an excellent setting for studying inter-
relationships between tectonics and surface processes including river evolution. One of
the most significant and laterally persistent structures within the central Himalaya is the
Main Central thrust (MCT). Although the MCT is thought to have initiated in the Early
Miocene (Hodges et al., 2000; Harrison et al., 1995, 1997), it is also associated with Plio-
Pleistocene activity based upon mineral cooling patterns and analysis of river profiles
(Seeber and Gornitz, 1983; Burbank et al., 2003; Catlos et al., 2001). Whether this recent
uplift is due to reactivation of the MCT itself, or is driven by focused erosion south of the
MCT at the location of the greatest monsoonal precipitation has been widely debated
(Burbank et al., 2003; Wobus et al., 2003).
Amidon et al. (2005) set out to constrain erosion rates on either side of the MCT
by analyzing modern sands from the Marsayandi River in central Nepal. The Marsayandi
River flows southward, perpendicular to the Himalayan orogen across the MCT toward
the Himalayan foreland basin. The catchments of several of its tributaries are confined to
the major Himalayan units: the Tethyan Himalaya (TH), consists largely of Cambrian to
Jurassic sediments; the Greater Himalaya (GH) contains similar age lithologies
metamorphosed to amphibolite facies; and the Lesser Himalaya (LH) has generally lower
grade Early Proterozoic and Archean metasediments (Le Fort, 1975). The TH is bounded
to the south by the north-dipping South Tibetan Detatchment (STD) and structurally
overlies the GH. The GH, in turn, structurally overlies the LH across (Gansser, 1964; Le
Fort, 1975).
Amidon et al. (2005) compared the integrated downstream detrital age
populations with mixtures calculated from key tributaries. Detrital results from two or
more tributaries were linearly mixed and compared to the observed downstream
population to using an iterative approach to determine the optimal contribution
proportions of the drainages. The estimates of relative erosion rates between catchment
basins were a function of catchment area and zircon concentration of source lithologies.
The iterative approach involved minimization of the percent area mismatch between
smoothed PDF’s corresponding to the mixture and the integrated Marsyandi signature.
5.2 Application of the Generalized KS statistic
To illustrate our approach, we have selected Amidon et al’s (2005) site K
population of the Marsyandi River (Fig. 5). Amidon et al. (2005) modeled their site K
river sand in terms of three components (see Table 1 and Fig. 5). Their site E sample was
collected along the main trunk of the Marsyandi River approximately 85 km upstream
from site K. The sand at site K is supplied by the TH and formations II and III of the GH
units (Fig. 5). Amidon et al.’s site F sample is a small tributary of the Marsyandi (Syange
K.) that enter the Marsyandi 8 km downstream from site E. Site F was selected by
Amidon et al. (2005) to represent sediment eroded from formation I of the GH which
structurally overlies the MCT. The Marsyandi flows across the MCT 7 km downstream
from Site F. To represent the LH unit that is overthrust by the MCT, Amidon et al.
(2005) selected two samples positioned 30 and 49 km south of the MCT (site H, Paudi K,
and site I, Chudi K, respectively). Because both tributaries exclusively drain the LH (Fig.
5), Amidon combined the data from both to define the LH end member.
The PDF’s and CPDF’s of the three end members defined in Table 1 are shown
together with that of site K of the Marsyandi River in Figure 6A and 6B respectively.
Two points can be made. First, the Marsyandi River site K CPDF is bracketed by the
CPDF’s of the end members in Table 1. This is a necessary requirement for successful
mixing of these end member age distributions to reproduce that measured at site K. For
example, binary mixing of components 1 and 3 can potentially produce a mixture
indistinguishable from the site K age distribution while mixtures 1 and 2 or mixtures of 2
and 3 cannot. The second point is that each the three mixing component are poorly
correlated with one another and thus easily distinguished on the basis of the conventional
K-S test (Table 2).
To illustrate the impact of both limiting cases defined above for the generalized
K-S statistic, we have computed PROB values associated with all possible mixtures of
Amidon et al.’s (2005) three end members in Figures 6C and 6D by varying the volume
contributions (φi) of each component subject to the constraint 1iφ =∑ . Figure 6C shows
the PROB values for the limiting case in which all of the mixing components are derived
from independent populations. The CPDF of the mixtures were calculated from [10] and
D was calculated relative to the Site K CPDF. Corrections were applied for experimental
error using [9] and PROB values were calculated from [3] and [4] with Neff determined
from [11]. All mixtures outside the 5% probability contour (dashed line in Fig. 6C) are
distinguished from the Marsyandi site K age distribution at 95% confidence.
Based upon the results shown in Table 2, it is clearly more appropriate to apply
limiting case in which the end members are assumed to be derived from independent
populations. Probabilities calculated under the assumption of independent populations
were calculated using [12]-[13] in place of [3]-[4]. Under the assumption of identical
populations, 17% of the solutions were indistinguishable from the Marsyandi site K
distribution. When the assumption of independent populations was applied, the
percentage of indistinguishable mixtures decreased by a factor of two to 8.4%.
6. DISCUSSION
Our generalization of the Kolmogorov-Smirnov statistic enables its use for
comparisons involving mixtures of one dimensional, error-weighted data. The approach
is general and broadly applicable to problems requiring statistical comparisons of
composite distributions consisting of weighted individual data sets for which no
assumptions can be made regarding how the data are distributed. While the approach is
general, it is also uniquely suited for detrital zircon U-Pb age provenance analysis that
involves modeling the net contributions of individual tributaries to large rivers. Such an
endeavor represents ground truth in provenance studies and is essential for testing
hypotheses that are posed in efforts to better understand the source regions of ancient
depositional systems for which far less geologic context has been preserved.
In the example provided, we have focused upon defining the family of statistically
equivalent solutions at 95% confidence. This is readily illustrated for three component
mixing as shown in Figures 6C-6D. While calculations involving a greater number of
mixing end members become progressively more laborious and difficult to represent
graphically, the same principles apply. These will be addressed in greater detail in a
subsequent publication. Below we address two simple concepts: best-fit solutions and
appropriate use of the calculations to constrain parameters such as differential erosion
within the river catchment.
5.2.1 Best-fit Mixtures
While it is possible to use the K-S test to algebraically identify the mixture which
yields maximum probability, this approach is ad hoc (Lovera et al., 1999; Amidon et al.,
2005; Fletcher et al., 2007). Because the statistical foundation of the K-S test only deals
with the evaluation of the null hypothesis, use of the K-S test for optimization (i.e.,
identifying the best-fit mixture within the 5% probability curve) is not firmly supported
by the underpinnings of the method. For example, because D generally depends upon the
extent to which the distributions defining the mixture are correlated, the composition of
the “best-fit” mixture corresponding to the lowest D value will not necessarily coincide
with that associated with the highest probability because the significance function has a
nonlinear dependence on both the sizes (Ni) and the relative contributions (φi).
In spite of the above considerations, it is possible to calculate best-fit solutions.
For the sake of illustration, we have plotted several key mixtures in Figures 6C and 6D.
Our “best-fit” (i.e., maximum probability) mixture is indicated by the filled red circle in
these ternary plots. As indicated in Table 1 our maximum probability mixture plots very
close to the best-fit mixture determined by Amidon et al. (2005; see filled blue square in
Fig. 6B). While the near coincidence of these results produced from two very different
approaches conveys the appearance of a robust conclusion, we emphasize that all
mixtures that yield PROB values in excess of 5% are indistinguishable solutions from the
Marsyandi site K age distribution at the 95% confidence level. This has important
implications for estimating erosion rates from “best-fit” solutions are discussed below.
5.2.1 Estimating Erosion Rates
Amidon et al.’s (2005) sought to estimate erosion rates structurally above and
below the MCT. They considered that three primary factors influenced mixing of their
end member age distributions to yield the integrated composition of the sand at site K: (1)
exposure area of the contributing geologic units; (2) the average zircon concentration
within these units; and (3) relative erosion rates of these units. Amidon et al. (2005)
constrained (1) and (2) by measurement so that they could calculate the relative erosion
rates. When they weighted their measured detrital zircon age distributions by catchment
area alone, they obtained the filled star in Figures 6C, 6D. Our calculations indicate that
their result weighted by exposure area alone is statistically indistinguishable from the site
K age distribution at 95% confidence. This would imply that if exposure area alone were
the only important factor, there would be no valid statistical argument to presume that the
contributing geologic units were differentially eroded.
When Amidon et al. (2005) weighted their age distributions by both exposure area
and zircon concentration they predicted a composition represented by the filled triangle in
Figs. 6C, 6D). They qualitatively concluded that their predicted age distribution was
significantly different from the site K age distribution. Our work confirms that Amidon
et al.’s predicted composition is easily distinguished from the measured composition at
site K (Table 3). To estimate differential erosion rates, Amidon et al. (2005) interpreted
misfit of their “predicted” and “measured” age distributions at site K as being due
entirely to differential erosion of the contributing geologic units represented by the
mixing components. For their site K example, they concluded that relative erosion rates
of LH were about twice as high as expected from exposure area alone relative to the other
end members.
While Amidon et al.’s (2005) conclusion of very high erosion rates of LH below
the MCT may be valid, the precision with which differential erosion of the contributing
geologic units can be estimated must take into account. Following our approach, the best
method for estimating the relative erosion rates of each contributing geologic units
represented by the mixing end members is to calculate the locus of relative erosion rates
that correspond to mixtures yielding > 5% probability. In the case of the Marsyandi site
K age distribution, mixing is best approximated by the limiting case for the generalized
K-S statistic that assumes independently sourced age distributions for the end members
(i.e., Fig. 6D). The predicted contributions ( Piφ ), relative erosion rates ( R
iφ ), and the
weighting factors that reflect relative drainage area and zircon concentration ( iφ ) is given
by:
3
1
P Ri i
iP R
i ii
φ φφφ φ
=
=
∑ [14]
(see Appendix B). By applying Amidon et al.’s (2005) “predicted” contributions for
components 1, 2, and 3 respectively (see Table 3), we can then assign to any vector φP of
relative erosion rates, the probability associated to the relative contribution vector
φ resulting from [14]. The resulting probabilities are plotted in Figure 7. The “best-fit”
relative erosion rates associated with the maximum probability we determine agree quite
well with Amidon et al.’s (2005) estimate. The more important point however, is that
because all relative erosion rates associated with > 5% probabilities are equally valid,
there is insufficient constraint from the detrital zircon results to conclude that erosion
rates from the LH were necessarily higher than from formation 1 of the overlying GH.
Equivalent acceptable solutions for relative erosion rates shown in Figure 7 range from
37:04:58 to 03:77:19 for components 1, 2, and 3 respectively (see Table 3). This range of
values permits both very high and anomalously low LH erosion rates and is thus
insufficient to arrive at a definitive conclusion on the basis of the existing detrital zircon
data set alone.
In order to more confidently address the issue of the spatial dependence of erosion
rates within the Marsyandi catchment, the sensitivity of the generalized K-S statistic
needs to be enhanced to further limit the range of possible solutions. The sensitivity of
the statistic depends significantly upon the relationship between the sample size defining
the end member age distributions and the manner in which the age distributions are
weighted. Optimum sampling for a mixture is obtained when ii
nN
φ = . In the present
example (see Tables 1 and 3), the sample size used to define the age distributions of end
members 1 (TH + Fm. II & III of GH) and 2 (Fm. I of GH) should be increased to bring
them into proper proportions with respect to end members 3 (LH). Once these
proportions have been achieved, overall sampling levels can be increased until the
sensitivity needed to adequately shrink the area of region of 95% confidence has been
achieved.
7. CONCLUSIONS
1. Even though the K-S test was not developed for use with error-weighted distributions,
the artifacts that occur when cumulative probability density functions are used instead
of the raw age series (cumulative distribution functions) are effectively circumvented
if the measured value of D is corrected using [8] as described in the text.
2. While the K-S test was not designed to make comparisons involving pooled and
separately weighted age distributions, we have found two limiting analytical solutions
that allow the K-S test to be rigorously applied for mixtures of age distributions that
are either identical (correlation coefficient of one) or independent (correlation
coefficient of zero). Because these limiting solutions tend to tightly bound all
possible solutions, it is possible to conservatively apply the K-S test to make
comparisons involving mixtures of age distributions derived from populations that are
neither identical or independent.
3. The K-S test can be used to define confidence limits in mixing calculations that bound
the range of indistinguishable solutions. Best-fit (i.e., maximum probability)
solutions based upon application of the K-S test provide an ad hoc basis for
constraining the outcome of mixing calculations (i.e., estimation of erosion rates). A
more conservative approach for using mixing calculations to constrain parameters
such as erosion rates involves use of the 95% confidence limit defined by the
generalized K-S test. The range of indistinguishable solutions bounded by the 95%
confidence limit can be reduced by optimizing sampling such the number of
measurements defining an age distribution is proportional to the weighting coefficient
used during the mixing calculations.
Acknowledements We thank Willy Amidon for discussions regarding his Marsyandi
River study and for making available details of the U-Pb analysis and modern sand
sampling. Jerome Gynum and George Gehrels are also thanked for discussions regarding
Monte Carlo simulations related to the K-S statistic. This work was made possible by
NSF grants to Lovera and Grove (Geochemistry-Petrology) and by an ExxonMobil grant
to Cina.
Appendix A
Second order approximation of the Generalized Kolmogorov-Smirnov Statistic
For small data sets (Ne < 20) a more accurate approximation to the K-S
significance function (see [3] and [4]) is obtained by introducing a second order
correction on the definition of the λ parameter. Straightforward generalization of the
mixture solutions for both cases (independent and identical tributaries populations) can
be obtained as follows:
1) Independent populations: The definition of λi in [12] is replaced by:
0.110.12i iii
Dnn
λφ
⎛ ⎞= + +⎜ ⎟⎜ ⎟
⎝ ⎠ [Α1]
2) Identical populations: The definition of Neff in [10] is altered as follows. Since each
individual distribution is normalized by its weight iφ , we first define an individual
effective sample size (Ne,i) as the value that will produce the same K-S significant value
for the measured D:
1 0.11 0.110.12 0.12ei
ii eii i
n ND n Nλ
φ
⎛ ⎞ ⎛ ⎞= + + ≡ + +⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠
[Α2]
Solving [7] we can write Nei as:
22/ 0.12 ( / 0.12) 4*0.11
2i ie
iD D
Nλ λ⎡ ⎤− + − −
⎢ ⎥=⎢ ⎥⎣ ⎦
[A3]
The composite Neff is then defined as:
1
1 1m
eiieffN N=
= ∑ [A4]
In a similar way, the 2-sample extension of the statistic is straightforward for the identical
populations case, by defining the Ne as the variance between Neff and M as eff
eff
N MN M+
.
However, the extension in the case of Independent populations is more complicated.
Using extensive Monte Carlo calculations we found that at least in the ternary case (m=3)
a good approximation is obtained when M is replaced by a weighted average Vi, where Vi
is calculated from the relative contributions φi as follows:
1;
(1 )m
i i jj j i
V Mφ φ= ≠
= −∏ [A5]
The new definition of λi is then written as:
0.11or 0.12 ; where i i
i i i i ii i i ii
nVD DX X Xn VX
λ λφ φ
⎛ ⎞= = + + =⎜ ⎟⎜ ⎟ +⎝ ⎠
[Α6]
Appendix B: Relationship between Relative Drainages Contributions φi, Predicted areas contributions φi
P and relative erosion rates φi
Contribution of the drainages is assumed to be proportional to the product of the
Predicted Area contribution (Ai, a function of the Exposured Area + zircon
concentrations) times the erosion rate of the area (Ei). Thus the relative contributions of
the drainages can be written as:
i ii
i i
A EA E
φ =∑
,
This relationship can be written in terms of the relative predicted area φiP and the relative
erosion rates φiR as follows:
i iP R
i i i ii P R
i i i i
i i
A EA E
A EA E
φ φφφ φ
= ≡∑ ∑∑ ∑
∑ ∑
References Cited
Adams, C. J., Campbell, H. J., & Griffin, W. L., 2007. Provenance comparisons of Permian to Jurassic tectonostratigraphic terranes in New Zealand: perspectives from detrital zircon age patterns. Geol. Mag. 144, 701-729.
Amidon, W.H., Burbank, D.W., Gehrels, G.E., 2005. Construction of detrital mineral populations: insights from mixing of U-Pb zircon ages in Himalayan rivers, Basin Res. 17, 463-485.
Bernet, M., & Spiegel, C., 2004. Detrital thermochronology: Provenance analysis, exhumation, and landscape evolution of mountain belts. Boulder, Colo: Geological Society of America.
Borradaile, G. J., 2003, Statistics of Earth Science Data: Their Distribution in Time, Space, and Orientation 2nd edition, Springer.
Brewer, I. D., Burbank, D.W., & Hodges, K.V. (2003). Modelling detrital cooling-age populations: insights from two Himalayan catchments. Basin Research. 15 (3), 305-320.
Burbank, D.W., Blythe, A.E., Putkonen, J., Pratt-Sitaula, B., Gabet, E., Oskin, M., Barros, A., Ojha, T.P., 2003. Decoupling of erosion and precipitation in the Himalaya, Nature 426, 652-655.
Campbell, I.H., Reiners, P.W., Allen, C.M., Nicolescu, S., and Upadhyay, R., 2005. He-Pb double dating of detrital zircons from the Ganges and Indus Rivers: Implications for sediment recycling and provenance studies, Earth and Planetary Science Letters, 237, 402-432.
Catlos, E.J., Harrison, T.M., Kohn, M.J., Grove, M., Ryerson, F.J., Manning, C., Upreti, B.N., 2001. Geochronologic and thermobarometric constraints on the evolution of the Main central Thrust, central Nepal Himalaya, J. Geophys. Res., B 106 (8) 16177–16204.
Compston W., Williams I. S., Meyer C., 1984. U–Pb geochronology of zircons from lunar breccia 73217 using a sensitive high mass resolution ion microprobe. Journal of Geophysical Research 89, B525–B534.
DeGraaff-Surpless, K., Graham, S. A., Wooden, J. L., & McWilliams, M. O. (2002). Detrital zircon provenance analysis of the Great Valley Group, California: Evolution of an arc-forearc system. Geological Society of America Bulletin. 114, 1564-1580.
Dickinson, W.R. and Gehrels, G.E., 2008, U-Pb Ages of Detrital Zircons in Relation to Paleogeography: Triassic Paleodrainage. Journal of Sedimentary Research. 78, 745-764.
Dodson, M.H., Compston, W., Williams, I.S., Wilson, J.F., 1988. A search for ancient detrital zircons in Zimbabwean sediments. J. Geol. Soc. (Lond.) 145, 977– 983.
Fedo, C.M., Sircombe, K.N., Rainbird, R.H., 2003. Detrtial zircon analysis of the sedimentary record. In: Hanchar, J.M., Hoskin, P.W.O. (Eds.), Zircon, Reviews in Mineralogy and Geochemistry, 53, 277– 303.
Fletcher, J., Grove, M., Kimbrough, D., Lovera, O., & Gehrels, G., 2007. Ridge-trench interactions and the Neogene tectonic evolution of the Magdalena shelf and southern Gulf of California: Insights from detrital zircon U-Pb ages from the
Magdalena fan and adjacent areas. Geological Society of America Bulletin. 119, 1313-1336.
Gansser, A., 1964. Geology of the Himalayas.Wiley-Interscience, NewYork, 1964. Gehrels, G.E., 2000. Introduction to detrital zircon studies of Paleozoic and Triassic
strata in western Nevada and northern California. In: Soreghan, M.J., Gehrels, G.E. (Eds.), Paleozoic and Triassic Paleogeography and Tectonics of Western Nevada and Northern California, Geological Society of America Special Paper, 347, 1 – 17.
Grimes, C. B., John, B. E., Kelemen, P. B., Mazdab, F. K., Wooden, J. L., Cheadle, M. J., et al., 2007. Trace element chemistry of zircons From oceanic crust: A method for distinguishing detrital zircon provenance. Geology 35, 643-646.
Hodges, K.V., Hurtado, J.M., Whipple K.X., 2001. Southward extrusion of Tibetan crust and its effect on Himalayan tectonics, Tectonics 20 (6) 799– 809.
Ireland, Flötmann. T., Fanning C. M., Gibson G. M., Preiss, V., 1998. Development of the early Paleozoic Pacific margin of ondwana from detrital-zircon ages across the Delamerian oogen. Geology 26, 243–246.
Sircombe, K.N., 2000. Quantative comparison of large sets of geochronological data using multivariate analysis: A provenance study example from Australia, GCA, 64, 1593-1616.
Kosler, J., Fonneland, H., Sylvester, P., Tubrett, M., & Pedersen, R. B., 2002. U-Pb dating of detrital zircons for sediment provenance studies-a comparison of laser ablation ICPMS and SIMS techniques. Chemical Geology. 182, 605-618.
Le Fort, P., 1975. Himalayas; the collided range; present knowledge of the continental arc, Am. J. Sci. 275A, 1– 44.
Mueller, P.A., Foster, D.A., Mogk, D.W., Wooden, J.L., Kamenov, G.D., and Vogl J.J., 2007, Detrital mineral chronology of the Uinta Mountain Group: Implications for the Grenville flood in southwestern Laurentia. Geology, 35, 431 - 434.
Press William H., Flannery, Brian P., Teukolsky, Saul A., and Vetterling, William T., 2002. Numerical Recipes: The Art of Scientific Computing, 2nd Ed., Cambridge University Press, pp. 997.
Prokopiev, A.V. Toro, J., Miller, E.L., and Gehrels, G.E., 2008; The paleo–Lena River—200 m.y. of transcontinental zircon transport in Siberia. Geology 36: 699-702.
Sambridge, M.S., Compston, W., 1994. Mixture modelling of multicomponent data sets with application to ion-probe zircon ages. Earth Planet. Sci. Lett. 128, 373– 390.
Seeber, L., Gornitz, V., 1983. River profiles along the Himalayan arc as indicators of active tectonics, Tectonophysics 92, 335–367.
Silverman, B. W., 1986. Density Estimation for Statistics and Data Analysis. CRC Press, Boca Raton, Fla. 175 pp.
Sircombe, K.N., 2000. Quantitative comparison of geochronological data using multivariate analysis: a provenance study example from Australia. Geochim. Cosmochim. Acta 64, 1593– 1619.
Sircombe, K.N., Hazelton, M.L., 2004. Comparison of detrital zircon age distributions by kernel functional estimation. Sediment. Geol. 171, 91– 111.
Vermeesch, P. (2005), Statistical uncertainty associated with histograms in the Earth sciences, J. Geophys. Res., 110, B02211, doi:10.1029/2004JB003479.
Vermeesch, P., 2004. How many grains are needed for a provenance study? Earth Planet. Sci. Lett., 224(3– 4), 441– 451.
Wobus, C., Hodges, K.V., Whipple, K.X., 2003. Has focused denudation sustained active thrusting at the Himalayan topographic front, Geology 31 (10) 861– 864.
Tables Table 1 Conventional K-S Statistics of Individual Samples (Relative to Marsyandi Site K)
CPDF CDF Sample* Units Sampled
σav
(%) Size Ni D %PROB D %PROB
Site E TH + GH (II&III) 3.2 89 0.33 10-4 0.34 10-4 Site F GH (I) 5.2 98 0.18 2.0 0.20 0.8 Sites I + H LH 2.6 178 0.58 10-26 0.61 10-28 *Amidon et. al. 2005 Table 2: KS-Statistics and Correlation Values of Mixing Components PROB# Rij
*
Site E Site F Site I+H Site E Site F Site I+H Site E 1 0.00005 10-37 1 0.51 0.07 Site F - 1 10-32 - 1 0.13 Site I+H - - 1 - - 1 #KS-Test Probabilities *Cross-correlation values Table 3 Generalized K-S Statistics of Mixtures (Relative to Marsyandi Site K) % Relative Drainage Contributions Generalized K-S
%PROB Mixtures TH+GH (II-III) GH (I) LH D Identical Independent
weighted by catchment area* 44 24 32 0.05 54 31 weighted by catchment area & zircon concentration* 43 37 20 0.12 1.2 0.03
Best-fit (Amidon et. al. 2005) 35 31 34 0.06 34 10 Max. PROB (This Study) 34 35 31 0.04 71 50 *Amidon et. al. 2005
Figure Captions
Figure 1. (A) Hypothetical drainage system illustrating how three distinct catchment
basins with disparate detrital zircon provenance mix to produce integrated
population in main stem of the river system. The proportional contribution of
each basin is represented by weighting coefficients φi. (B) – (D) Probability
density functions (PDF) for individual age distributions (Xi) measured for
catchments 1, 2, and 3 respectively. (E) PDF for the integrated down stream age
distribution (X). (F) PDF corresponding to a weighted mixture (XW) of individual
age distributions 1-3. (G) Cumulative probability density functions for (B) thru
(F) above.
Figure 2. Illustration of key parameters in the Kolmogorov-Smirnov (K-S) test. (A)
Relationship between the maximum difference (D; see [1-2] in text) between two
cumulative distribution functions (CDF) and the probability (PROB; see [3-5])
that D could, by chance, be larger than the observed value. Equivalent sample
size in the K-S test is given by Ne (see [6]). Curves are shown for Ne = 9, 25, 100,
400, and 900. The dashed line represents 95% confidence that two distributions
are distinct (i.e., they are not drawn from the same population). The critical value
of D (Dcrit.) marks the 95% confidence threshold. (B) Relationship between the
square root of Ne and Dcrit.. The sensitivity of the K-S test improves by 67% when
Ne is increased by an order of magnitude.
Figure 3. Incorporation of experimental errors in the K-S test. ΔDcrit represents the
difference in Dcrit. obtained when the K-S test is applied to cumulative probability
density functions (CPDF; see [7]) are compared instead of CDF. Average
experimental error is given by σav. Results of individual simulations are
represented by open circles for Ne = 100. Least-squares of these results yielded
the indicated line. Regression lines for similar sets of Monte Carlo simulations
are also shown for Ne = 25, 400, and 900. For a given Ne, ΔDcrit is a linear
function of the square root of σav/Ne (see [8]).
Figure 4. The generalized K-S statistic for mixtures. Analytical solutions are available to
use in the K-S test when the mixtures are constructed from distributions drawn
from either identical (see [11]) or independent (see [12-13]) populations. These
two limiting cases are shown in the plot of PROB vs. D. The curves enclose the
spectrum of possible solutions.
Figure 5. Geologic map of the Marsyandi drainage, central Nepal (after Amidon et al.
(2005; see additional references cited within). Detrital zircon samples of river
sediments measured by Amidon et al. (2005) from main stem of the Marsyandi
River (E and K) and tributaries draining single geologic units (F, H, and I). See
Table 1 and text for further details.
Figure 6. (A) CPDFs of Marsyandi river samples used in mixing calculations (see
locations in Figure 5). The maximum probability solution (red) is based upon the
result assuming that the mixing end members are derived from independent
populations (see below). (B) Ternary plots of all mixtures of the end members
defined in Table 1 (see also Amidon et al. 2005). Contours of probability assume
that the mixing components are derived from identical populations (see Table 3).
The dashed line encloses all solutions that are indistinguishable from Amidon et
al.’s (2005) site K distribution at 95% confidence. Compositions predicted from
exposure area only and exposure area plus zircon concentration are from Amidon
et al. (2005) and are discussed in the text. (C) Same as for (B) above but contours
of probability assume that the mixing components are derived from independent
populations (see Table 3).
Figure 7. Relative erosion rates of geologic units exposed in the catchment of the
Marsyandi River based upon mixing calculations that assume end members are
derived from independent populations. The inverted triangle corresponds to
uniform erosion rates. The dashed line encloses the locus of erosion relative
erosion rates that are consistent with solutions indistinguishable from the site K
distribution at 95% confidence. While most solutions indicate higher erosion
rates for the Lesser Himalayan formations that structurally underlie the Main
Central thrust (MCT), some acceptable solutions are also consistent with higher
erosion structurally above the MCT. The range of acceptable solutions can be
better constrained by more optimal sampling (see text).
��
��
��
�
φ�
φ�
φ�
������������������
���������
1
*m
w i ii
X Xφ=
=
���
��
���
���
�
��
�
��
���
��
��
���
��
���
��
�
��
� �������������������
���������� ���
���������� ���
���������� ���
����������� ��
���������� ���
�������������������
�� ��
��
��
��
��
�
���
���
���
��
�!
0.0 0.1 0.2 0.3 0.4 0.50
25
50
75
100
900400
(A)
Ne=925100
PR
OB
(%)
D
Dcrit
0 5 10 15 20 25 300.0
0.1
0.2
0.3
0.4
0.5(B)
Figure 2Lovera et al.
Dcr
it
(Ne)1/2
0.0 0.5 1.0 1.5 2.00.00
0.01
0.02
0.03
0.04
0.05
Figure 3Lovera et al.
900
400
Ne= 25
ΔDcrit= 0.163 (σav/Ne)1/2
ΔDcr
it
(σav)1/2
100
0.00 0.05 0.10 0.15 0.200
25
50
75
100
Figure 4Lovera et al.
Intermediatesolution
Distinct at95% confidence
Indpendent populationsIdentical populations
P
RO
B (%
)
D
Indistinguishableat 95% confidence
���
���
������������ �
�� ��������������������
���
��
�
��� ���������������
���
� ���� ��������
������ ���� ���� ��� ����
��������� ��������
��� �� �!�
�� "
�� #
�� $%&
�� '
��( ����
�)�� �)�* �)*� �)+* �)��
�)��
�)�*
�)*�
�)+*
�)���)��
�)�*
�)*�
�)+*
�)�� ��() ���� �,- ��.��
��.�� �� ��) ����*�
"(/) ��� % ���� ����)
0� ��1�
"(/) ���
��������
2 3� !
��� 3� !
*)� � �� !
�)� � *)� !
�)� � �)� !
4 �)� !
5�� "
5�� #
5�� $%&
�)�� �)�* �)*� �)+* �)��
�)��
�)�*
�)*�
�)+*
�)���)��
�)�*
�)*�
�)+*
�)��
��������
2 3� !
��� 3� !
*)� � �� !
�)� � *)� !
�)� � �)� !
4 �)� !
5�� "
5�� #
5�� $%&
��() ���� �,- ��.��
��.�� �� ��) ����*�
"(/) ��� % ���� ����)
0� ��1�
"(/) ���
���
�0�
���
#��� 6 ����� ����
���� ���� ���� ���� ����
����
����
����
����
��������
����
����
����
����
��� �������� �������� � ���� ��
������ ������
���� �� � �����
����!"���#"��!��
�� � �����
$ !� %
��& !� %
��� & �� %
��� & ��� %
��� & ��� %
' ��� %
(�)�� * ��� �������� �������� ��*���
���� ��
+�� �������� ���� �*��
��,- � �����������