A GENERALIZED KOLMOGOROV-SMIRNOV STATISTIC FOR …wamidon/pdfs/lovera_et... · 2009. 2. 10. · Oscar M. Lovera1 Department of Earth & Space Sciences, 3806 Geology Building, University

A GENERALIZED KOLMOGOROV-SMIRNOV STATISTIC FOR DETRITAL ZIRCON ANALYSIS OF MODERN RIVERS

Oscar M. Lovera1 Department of Earth & Space Sciences, 3806 Geology Building, University of California, Los Angeles, Los Angeles, CA 90095, USA, Tel: 310-206-2657, FAX: (310) 825-2779 [email protected]

Marty Grove Department of Geological & Environmental Sciences, Stanford University, Stanford, CA, 94305, USA, [email protected]

Sara E. Cina Department of Earth & Space Sciences, University of California, Los Angeles, Los Angeles, CA 90095, USA, [email protected]

1Corresponding author

In preparation for Journal of Sedimentary Research (Feb. 5, 2009)

“Current Ripples” Format

Version 9 = 7100 Words (entire document)

ABSTRACT The Kolmogorov-Smirnov (K-S) statistic is widely used to test the null

hypothesis (i.e., are two distributions drawn from the same population?). In detrital

zircon provenance analysis of river systems, it is useful to have an equivalent statistical

measure for comparisons involving composite samples that collectively represent the

contributions of tributaries to regionally extensive river systems. We present a

generalized K-S statistic that depends on the proportional contribution and sample size of

individual distributions as well as the correlation between the respective individual

populations. Our generalized K-S statistic may be generally applied to analyze any type

of one dimensional data and can be calculated from either error weighted distributions

(i.e., cumulative probability density functions) or raw data series (i.e., cumulative

distribution functions). Analytical expressions are provided for end member cases in

which the individual populations are either completely independent or identical.

Although intermediate cases must still be tested by numerical analysis, the bounding end

members generally tightly constrain all possible solutions and thus permit conservative

evaluation of the null hypothesis. We demonstrate how the generalized K-S statistic can

be used with an example from the modern Marsyandi River (central Nepal Himalaya)

river system. Our results clarify the manner in which in which detrital zircon age

distributions from tectonically active areas may be used to constrain key parameters such

as erosion rates within the catchment of major rivers.

Key Words: Kolmogorov-Smirnov, mixing, detrital zircon, U-Pb age, modern river sand.

1. Introduction

Provenance analysis of fluvial depositional systems by detrital

geo/thermochronology is used to address a wide range of geologic problems within the

geosciences (Fedo et al., 2003; Bernet and Spiegel, 2004). Measurement of detrital

zircon age distributions has become the premier approach to characterize sedimentary

provenance with hundreds of studies now incorporating such data annually. Large data

sets that make provenance analysis a meaningful exercise became possible when

secondary ionization mass spectrometry (SIMS) methods began to be applied (e.g.,

Compston et al., 1984; Ireland et al., 1998; DeGraaff-Surpless et al., 2002; Grove et al.,

2003). Further advances were realized when detrital zircon data sets began to incorporate

trace element, Hf isotopic, (U-Th)-He age, and other types of information (e.g., Grimes et

al., 2007; Mueller et al., 2007; Campbell et al., 2005). Presently, very high throughput

laser ablation, inductively coupled plasma mass spectrometry (LA-ICP-MS; e.g., Kosler

et al., 2002; Gehrels et al., 2006) techniques now routinely permit very large data sets to

be gathered with ease (e.g., Adams et al., 2007; Dickinson and Gehrels, 2008). Given

this explosion of data, it is of paramount importance that development of interpretative

methods for detrital zircon analysis keep pace.

Statistical analysis of detrital zircon age distributions has generally focused either

upon the extent of sampling required adequately characterize an age distribution (e.g.,

Dodson et al., 1988; Vermeesch, 2004; Anderson, 2005) or upon methods of comparing

one measured detrital distribution with another (see discussion in Fedo et al., 2003). Of

the methods to compare distributions, the most widely applied approach for detrital

geochronology applications has been the Kolmogorov-Smirnov (K-S) statistic (e.g.,

Lovera et al., 1999; Berry et al., 2001; Degraff-Surpless et al., 2003; Amidon et al., 2005;

Fletcher et al., 2007; Prokopiev et al., 2008). The K-S test has a well understood

statistical basis but is insensitive to differences exhibited by the tails of distributions and

was not formulated to permit experimental error to be explicitly accounted for (Press et

al., 2002; Borradaile, 2003). Useful alternative approaches that have a less well-

understood statistical basis include measures of similarity and overlap (Gehrels, 2000),

multivariate analysis (Sircombe, 2000), and kernel functional estimation (Sircombe and

Hazelton, 2004). Sambridge and Compston (1994) have described approaches for

deconvolving overlapping age components. However, none of the above statistical

approaches can be readily applied to make comparisons involving mixtures of separately

measured age distributions.

Analysis of modern river systems is an important exercise for evaluating both our

understanding of present-day surficial processes (e.g., Brewer et al., 2003; Amidon et al.,

2005) as well as our ability to reconstruct past depositional systems (e.g., Fletcher et al.,

2007). With modern rivers in which all critical details of the system are available for

study, is possible to sample the detrital zircon age distributions of sands supplied by

tributaries with observable surface geology in their respective catchment basins to then

determine how these signals are combined to yield an integrated provenance signature

further downstream (Fig. 1). In order to meaningfully accomplish the task illustrated in

Figure 1, a valid statistical approach for making comparisons involving mixtures is

required. In this paper, we describe we describe how the widely used K-S statistic can be

extended so that it is applicable to the analysis of mixtures. While our approach has

specifically been developed for provenance analysis, the method is general and can be

applied to compare natural and synthetic mixtures of one dimensional data in similar

problems in the geosciences and other disciplines.

2. Background

As summarized by Press et al. (2002), the Kolmogorov-Smirnov (K-S) statistic

can be used to determine whether two distributions are drawn from the same population

(i.e., the null hypothesis). The K-S test is specifically used to evaluate the validity of the

null hypothesis which holds that there is no significant difference between two

populations and that any observed difference relates to insufficient sampling or

experimental error. There are two types of comparisons that can be made. In the one-

sample test, a measured distribution is compared with a reference function. In the two-

sample test, two measured distributions are compared. The advantage of the K-S test

over more exacting statistical measures such as the students t test is that no assumptions

are made with respect to the nature of the distributions being compared (see Press et al.,

2002; Borradaile, 2003). The lack of a dependence upon the nature of the distribution is

essential when no expectations regarding the form of the distributions are possible.

In the K-S test, the null hypothesis is evaluated by measuring the absolute value

of the maximum difference between two cumulative distribution functions (CDF). In the

one sample test, a sample CDF SN(x) is compared to a known distribution function P(x) as

follows:

max ( ) ( )NxD S x P x

−∞ < < ∞= − [1]

The two sample test, D is defined as:

1 2

max ( ) ( )N NxD S x S x

−∞ < < ∞= − [2]

where 1( )NS x and

2( )NS x are different data sets assumed to be drawn from the same

distribution function. Provided that the value of D is non-zero, its significance (QKS) can

be assessed from the following function (Press. et al 2002):

2 21 2

1

( ) 2 ( 1) j jKS

j

PROB Q e λλ∞

− −

=

≡ = −∑ [3]

where QKS varies from QKS(0)=1 to QKS(∞)=0. The significance of D given by [3] is the

maximum probability (PROB) of accidentally rejecting a true null hypothesis. Two

distributions are regarded as distinct (i.e., null hypothesis = false) if the probability that D

could be higher than the observed value is less than a specified significance level. In the

K-S test, the significance level is conventionally set at 0.05 (corresponds to 95%

confidence).

In the one sample test, the size of the sample is given by Ne. When Ne > 20, the

factor λ is well approximated by:

eN Dλ = [4]

For smaller data sets, λ is more accurately approximated by:

( )0.12 0.11/e eN N Dλ = + + [5]

In the two sample test, Ne is given by:

1 2

1 2e

N NNN N

=+

[6]

where N1 and N2 are the respective sizes of samples 1 and 2.

Figure 2A illustrates the dependence of PROB upon D and Ne. Curves are shown

for five effective sample sizes (Ne = 9, 25, 100, 400, and 900). Note that a “critical”

value for D (Dcrit) can be defined at PROB = 0.05. As Ne becomes larger, the curves shift

to lower Dcrit values to make the K-S test more sensitive. Figure 2B plots Dcrit as a

function of Ne. As indicated, the sensitivity of the K-S test rapidly increases for sample

sizes between 9 and 100. Comparisons between data sets defined by Ne ~ 100, the

threshold for distinguishing two distributions at 95% confidence involves a maximum

difference in the age distributions of 13% (Fig. 2B). Depending upon the geologic

questions being addressed, this sensitivity for the K-S test may be completely adequate.

However, if significantly greater sensitivity is demanded by the application, a much

larger increase in the size of the data set is required to achieve it. For example, a value

for Ne = 900 is needed to reduce the Dcrit value for distinguishing two distributions at

95% confidence to 4% (Fig. 2B).

3 Incorporating Analytical Error into Cumulative Distribution Functions

In the K-S test, the CDF is calculated from the one dimensional data series

without accounting for experimental errors. Although various authors including

Silverman (1986) and Sircombe (2000) have proposed using a Gaussian Kernel

Probability (GKP) function to represent the CDF, the impact of using error weighted data

for the K-S test has not been formally evaluated. Below we provide a method for

estimating the magnitude of the correction to Dcrit (ΔDcrit) that is required to correctly use

the K-S test when error weighted data are employed.

For a sample of size N, with values i iμ σ± , the GKP function is defined as the

normalized sum of all the Gaussian functions corresponding to each data point

(Silverman, 1986). The cumulative probability density function (CPDF) is thus given by

the integral of GKP function as follows:

2

2( )

2

1

1( )2

i

i

tt N

i i

eCPDF t dtN

μσ

σ π

′−−

=−∞

′= ∑∫ [7]

Weighting the data according to error produces a much smoother function than the CDF.

Use of CPDF thus produces smaller D values than would have been obtained if D had

been calculated from CDF’s. Because this increases the value of PROB in [3], the

sensitivity of the K-S test is decreased and it becomes more difficult to disprove the null

hypothesis.

To correct for this artifact and maintain the sensitivity of the K-S test, we have

undertaken Monte Carlo simulations to quantitatively determine how Dcrit is impacted by

the average percent error (σav) associated with the data sets being compared. Figure 3

plots the magnitude of the shift (ΔDcrit) in Dcrit. vs. the square root of σav. Individual

Monte Carlo results are shown as open circles for simulations involving Ne =100. The

solid line through these data is a linear best-fit. Best fits for simulations involving Ne =

25, 400, and 900 are also shown. Based upon these results, we have empirically derived

the following expression:

. 0.163crit ave eD NσΔ = [8]

For σav = 1, ΔDcrit. varies from 0.033 to 0.005 when Ne is increased from 25 to 900.

Increasing σav to 4 increases the range in ΔDcrit values from 0.065 to 0.011 when Ne is

increased from 25 to 900. Thus, although the sensitivity of the K-S test is quite sensitive

both to σav and Ne for small Ne, the influence of σav decreases as Ne is increased.

4. Generalizing the K-S test to Mixtures

The K-S test applies only to single distributions and there is no established

protocol for comparing distributions consisting of pooled data. In detrital zircon

provenance analysis of river systems, there is a need to develop a statistically valid test to

evaluate how sand delivered by tributaries collectively supply the trunk river. Let X1,

X2… Xm represent the cumulative distribution functions (CDF) of the age distributions

sampled from the tributaries of a river system (Fig. 1). The sample sizes and weighting

coefficients associated with these individual samples are n1, n2… nm, and 1 2, ... mφ φ φ

respectively. For rivers, values of φ would represent the proportional contributions of

each tributary as estimated by catchments area, discharge, bed load, etc. The CDF of the

mixture (Xw) is defined as:

1 1 1

* ; where 1 andm m m

w i i i ii i i

X X N nφ φ= = =

= = =∑ ∑ ∑ [7]

where N is the total number of measurements. The river population, Xw, can be assumed

to be a linear combination of the individual populations. Applying [2], we can calculate

the parameter D to evaluate the null hypothesis:

max ( ) ( )w bxD X x X x

−∞ < < ∞= − [8]

where Xb is a random samples from the main trunk of the river. Because Xw(x) depends

both upon the nature of the distributions being mixed and the weighting coefficients (φi)

used to calculate the mixture, the significance function for D is no longer independent of

the distributions being compared. However, we have identified two limiting cases in

which the added degrees of freedom associated with mixtures can be expressed

analytically. These limiting cases bound the spectrum of possible solutions and thus

allow conservative evaluation of the null-hypothesis.

3.1 Limiting Cases

In presenting the limiting cases below, we distinguish between identical and

independent populations. The definition of identical and independent populations

relevant to the generalized K-S test is established by the correlation coefficient rij

between the populations Fi and Fj:

2 2

;i jij

i j

F Fr i j

F F= ≠∫

∫ ∫ [9]

For the generalized K-S test, we define populations Fi and Fj to be identical if rij=1 ∀ i≠j

and independent if rij=0 ∀ i≠j in [9] above.

3.1.1 Mixtures Drawn from Identical Populations

A trivial though important limiting case exists for the situation in which the

CPDF’s of all of the individual data sets used to form the mixture are identical. In this

special case, the CPDF of the mixture ( wX ) is independent of the individual contributions

φi and is represented by:

1

mi

w ii

nX XN=

= ∑ ; 1

m

ii

N n=

= ∑ [10]

where ni and Xi are the size and CPDF of the ith data set in the mixture, and N is the total

number of measurements contributed to the mixture by the m data sets. Note that while

D and PROB can be calculated from [2] and [3] respectively, the size of the mixture

required for [4]-[6] will generally differ from the sum total number of analyses

contributed by the mixing components. The effective size of the mixture (Neff) can be

calculated from sizes of the individual sample sizes (Ni) used to construct the mixture

using the following first order approximation:

2

1

1 mi

ieff iN Nφ

=

= ∑ [11]

Only when ii

nN

φ = will Neff = N. We present a second order approximation for Neff in the

Appendix (see [A1]).

3.1.2 Mixtures Drawn from Independent Populations

A second limiting case is presented by the situation in which the CDF’s of the

individual populations are independent. When individual populations Xi are independent,

the PROB valve of the mixture would be a product of the individual PROB values

associated with each distribution. In the first-order approximation, equations [3] and [4]

become:

( )1

1 ( ) 1 ( )m

obs KS ii

PROB D D Q λ=

− > = −∏ [12]

i ii

Dnλφ

= [13]

Second-order approximations are presented in the Appendix A ([A2] thru [A6]). Figure 4

compares the significance functions for the two end-member cases when all other

variables (ni, φi) remain constant. PROB values for intermediate cases will be bounded

by the end-member solutions.

5. Test Case: Drainages of the Central Nepalese Himalaya

5.1 Background

The Himalayan orogen provides an excellent setting for studying inter-

relationships between tectonics and surface processes including river evolution. One of

the most significant and laterally persistent structures within the central Himalaya is the

Main Central thrust (MCT). Although the MCT is thought to have initiated in the Early

Miocene (Hodges et al., 2000; Harrison et al., 1995, 1997), it is also associated with Plio-

Pleistocene activity based upon mineral cooling patterns and analysis of river profiles

(Seeber and Gornitz, 1983; Burbank et al., 2003; Catlos et al., 2001). Whether this recent

uplift is due to reactivation of the MCT itself, or is driven by focused erosion south of the

MCT at the location of the greatest monsoonal precipitation has been widely debated

(Burbank et al., 2003; Wobus et al., 2003).

Amidon et al. (2005) set out to constrain erosion rates on either side of the MCT

by analyzing modern sands from the Marsayandi River in central Nepal. The Marsayandi

River flows southward, perpendicular to the Himalayan orogen across the MCT toward

the Himalayan foreland basin. The catchments of several of its tributaries are confined to

the major Himalayan units: the Tethyan Himalaya (TH), consists largely of Cambrian to

Jurassic sediments; the Greater Himalaya (GH) contains similar age lithologies

metamorphosed to amphibolite facies; and the Lesser Himalaya (LH) has generally lower

grade Early Proterozoic and Archean metasediments (Le Fort, 1975). The TH is bounded

to the south by the north-dipping South Tibetan Detatchment (STD) and structurally

overlies the GH. The GH, in turn, structurally overlies the LH across (Gansser, 1964; Le

Fort, 1975).

Amidon et al. (2005) compared the integrated downstream detrital age

populations with mixtures calculated from key tributaries. Detrital results from two or

more tributaries were linearly mixed and compared to the observed downstream

population to using an iterative approach to determine the optimal contribution

proportions of the drainages. The estimates of relative erosion rates between catchment

basins were a function of catchment area and zircon concentration of source lithologies.

The iterative approach involved minimization of the percent area mismatch between

smoothed PDF’s corresponding to the mixture and the integrated Marsyandi signature.

5.2 Application of the Generalized KS statistic

To illustrate our approach, we have selected Amidon et al’s (2005) site K

population of the Marsyandi River (Fig. 5). Amidon et al. (2005) modeled their site K

river sand in terms of three components (see Table 1 and Fig. 5). Their site E sample was

collected along the main trunk of the Marsyandi River approximately 85 km upstream

from site K. The sand at site K is supplied by the TH and formations II and III of the GH

units (Fig. 5). Amidon et al.’s site F sample is a small tributary of the Marsyandi (Syange

K.) that enter the Marsyandi 8 km downstream from site E. Site F was selected by

Amidon et al. (2005) to represent sediment eroded from formation I of the GH which

structurally overlies the MCT. The Marsyandi flows across the MCT 7 km downstream

from Site F. To represent the LH unit that is overthrust by the MCT, Amidon et al.

(2005) selected two samples positioned 30 and 49 km south of the MCT (site H, Paudi K,

and site I, Chudi K, respectively). Because both tributaries exclusively drain the LH (Fig.

5), Amidon combined the data from both to define the LH end member.

The PDF’s and CPDF’s of the three end members defined in Table 1 are shown

together with that of site K of the Marsyandi River in Figure 6A and 6B respectively.

Two points can be made. First, the Marsyandi River site K CPDF is bracketed by the

CPDF’s of the end members in Table 1. This is a necessary requirement for successful

mixing of these end member age distributions to reproduce that measured at site K. For

example, binary mixing of components 1 and 3 can potentially produce a mixture

indistinguishable from the site K age distribution while mixtures 1 and 2 or mixtures of 2

and 3 cannot. The second point is that each the three mixing component are poorly

correlated with one another and thus easily distinguished on the basis of the conventional

K-S test (Table 2).

To illustrate the impact of both limiting cases defined above for the generalized

K-S statistic, we have computed PROB values associated with all possible mixtures of

Amidon et al.’s (2005) three end members in Figures 6C and 6D by varying the volume

contributions (φi) of each component subject to the constraint 1iφ =∑ . Figure 6C shows

the PROB values for the limiting case in which all of the mixing components are derived

from independent populations. The CPDF of the mixtures were calculated from [10] and

D was calculated relative to the Site K CPDF. Corrections were applied for experimental

error using [9] and PROB values were calculated from [3] and [4] with Neff determined

from [11]. All mixtures outside the 5% probability contour (dashed line in Fig. 6C) are

distinguished from the Marsyandi site K age distribution at 95% confidence.

Based upon the results shown in Table 2, it is clearly more appropriate to apply

limiting case in which the end members are assumed to be derived from independent

populations. Probabilities calculated under the assumption of independent populations

were calculated using [12]-[13] in place of [3]-[4]. Under the assumption of identical

populations, 17% of the solutions were indistinguishable from the Marsyandi site K

distribution. When the assumption of independent populations was applied, the

percentage of indistinguishable mixtures decreased by a factor of two to 8.4%.

6. DISCUSSION

Our generalization of the Kolmogorov-Smirnov statistic enables its use for

comparisons involving mixtures of one dimensional, error-weighted data. The approach

is general and broadly applicable to problems requiring statistical comparisons of

composite distributions consisting of weighted individual data sets for which no

assumptions can be made regarding how the data are distributed. While the approach is

general, it is also uniquely suited for detrital zircon U-Pb age provenance analysis that

involves modeling the net contributions of individual tributaries to large rivers. Such an

endeavor represents ground truth in provenance studies and is essential for testing

hypotheses that are posed in efforts to better understand the source regions of ancient

depositional systems for which far less geologic context has been preserved.

In the example provided, we have focused upon defining the family of statistically

equivalent solutions at 95% confidence. This is readily illustrated for three component

mixing as shown in Figures 6C-6D. While calculations involving a greater number of

mixing end members become progressively more laborious and difficult to represent

graphically, the same principles apply. These will be addressed in greater detail in a

subsequent publication. Below we address two simple concepts: best-fit solutions and

appropriate use of the calculations to constrain parameters such as differential erosion

within the river catchment.

5.2.1 Best-fit Mixtures

While it is possible to use the K-S test to algebraically identify the mixture which

yields maximum probability, this approach is ad hoc (Lovera et al., 1999; Amidon et al.,

2005; Fletcher et al., 2007). Because the statistical foundation of the K-S test only deals

with the evaluation of the null hypothesis, use of the K-S test for optimization (i.e.,

identifying the best-fit mixture within the 5% probability curve) is not firmly supported

by the underpinnings of the method. For example, because D generally depends upon the

extent to which the distributions defining the mixture are correlated, the composition of

the “best-fit” mixture corresponding to the lowest D value will not necessarily coincide

with that associated with the highest probability because the significance function has a

nonlinear dependence on both the sizes (Ni) and the relative contributions (φi).

In spite of the above considerations, it is possible to calculate best-fit solutions.

For the sake of illustration, we have plotted several key mixtures in Figures 6C and 6D.

Our “best-fit” (i.e., maximum probability) mixture is indicated by the filled red circle in

these ternary plots. As indicated in Table 1 our maximum probability mixture plots very

close to the best-fit mixture determined by Amidon et al. (2005; see filled blue square in

Fig. 6B). While the near coincidence of these results produced from two very different

approaches conveys the appearance of a robust conclusion, we emphasize that all

mixtures that yield PROB values in excess of 5% are indistinguishable solutions from the

Marsyandi site K age distribution at the 95% confidence level. This has important

implications for estimating erosion rates from “best-fit” solutions are discussed below.

5.2.1 Estimating Erosion Rates

Amidon et al.’s (2005) sought to estimate erosion rates structurally above and

below the MCT. They considered that three primary factors influenced mixing of their

end member age distributions to yield the integrated composition of the sand at site K: (1)

exposure area of the contributing geologic units; (2) the average zircon concentration

within these units; and (3) relative erosion rates of these units. Amidon et al. (2005)

constrained (1) and (2) by measurement so that they could calculate the relative erosion

rates. When they weighted their measured detrital zircon age distributions by catchment

area alone, they obtained the filled star in Figures 6C, 6D. Our calculations indicate that

their result weighted by exposure area alone is statistically indistinguishable from the site

K age distribution at 95% confidence. This would imply that if exposure area alone were

the only important factor, there would be no valid statistical argument to presume that the

contributing geologic units were differentially eroded.

When Amidon et al. (2005) weighted their age distributions by both exposure area

and zircon concentration they predicted a composition represented by the filled triangle in

Figs. 6C, 6D). They qualitatively concluded that their predicted age distribution was

significantly different from the site K age distribution. Our work confirms that Amidon

et al.’s predicted composition is easily distinguished from the measured composition at

site K (Table 3). To estimate differential erosion rates, Amidon et al. (2005) interpreted

misfit of their “predicted” and “measured” age distributions at site K as being due

entirely to differential erosion of the contributing geologic units represented by the

mixing components. For their site K example, they concluded that relative erosion rates

of LH were about twice as high as expected from exposure area alone relative to the other

end members.

While Amidon et al.’s (2005) conclusion of very high erosion rates of LH below

the MCT may be valid, the precision with which differential erosion of the contributing

geologic units can be estimated must take into account. Following our approach, the best

method for estimating the relative erosion rates of each contributing geologic units

represented by the mixing end members is to calculate the locus of relative erosion rates

that correspond to mixtures yielding > 5% probability. In the case of the Marsyandi site

K age distribution, mixing is best approximated by the limiting case for the generalized

K-S statistic that assumes independently sourced age distributions for the end members

(i.e., Fig. 6D). The predicted contributions ( Piφ ), relative erosion rates ( R

iφ ), and the

weighting factors that reflect relative drainage area and zircon concentration ( iφ ) is given

by:

3

1

P Ri i

iP R

i ii

φ φφφ φ

=

=

∑ [14]

(see Appendix B). By applying Amidon et al.’s (2005) “predicted” contributions for

components 1, 2, and 3 respectively (see Table 3), we can then assign to any vector φP of

relative erosion rates, the probability associated to the relative contribution vector

φ resulting from [14]. The resulting probabilities are plotted in Figure 7. The “best-fit”

relative erosion rates associated with the maximum probability we determine agree quite

well with Amidon et al.’s (2005) estimate. The more important point however, is that

because all relative erosion rates associated with > 5% probabilities are equally valid,

there is insufficient constraint from the detrital zircon results to conclude that erosion

rates from the LH were necessarily higher than from formation 1 of the overlying GH.

Equivalent acceptable solutions for relative erosion rates shown in Figure 7 range from

37:04:58 to 03:77:19 for components 1, 2, and 3 respectively (see Table 3). This range of

values permits both very high and anomalously low LH erosion rates and is thus

insufficient to arrive at a definitive conclusion on the basis of the existing detrital zircon

data set alone.

In order to more confidently address the issue of the spatial dependence of erosion

rates within the Marsyandi catchment, the sensitivity of the generalized K-S statistic

needs to be enhanced to further limit the range of possible solutions. The sensitivity of

the statistic depends significantly upon the relationship between the sample size defining

the end member age distributions and the manner in which the age distributions are

weighted. Optimum sampling for a mixture is obtained when ii

nN

φ = . In the present

example (see Tables 1 and 3), the sample size used to define the age distributions of end

members 1 (TH + Fm. II & III of GH) and 2 (Fm. I of GH) should be increased to bring

them into proper proportions with respect to end members 3 (LH). Once these

proportions have been achieved, overall sampling levels can be increased until the

sensitivity needed to adequately shrink the area of region of 95% confidence has been

achieved.

7. CONCLUSIONS

1. Even though the K-S test was not developed for use with error-weighted distributions,

the artifacts that occur when cumulative probability density functions are used instead

of the raw age series (cumulative distribution functions) are effectively circumvented

if the measured value of D is corrected using [8] as described in the text.

2. While the K-S test was not designed to make comparisons involving pooled and

separately weighted age distributions, we have found two limiting analytical solutions

that allow the K-S test to be rigorously applied for mixtures of age distributions that

are either identical (correlation coefficient of one) or independent (correlation

coefficient of zero). Because these limiting solutions tend to tightly bound all

possible solutions, it is possible to conservatively apply the K-S test to make

comparisons involving mixtures of age distributions derived from populations that are

neither identical or independent.

3. The K-S test can be used to define confidence limits in mixing calculations that bound

the range of indistinguishable solutions. Best-fit (i.e., maximum probability)

solutions based upon application of the K-S test provide an ad hoc basis for

constraining the outcome of mixing calculations (i.e., estimation of erosion rates). A

more conservative approach for using mixing calculations to constrain parameters

such as erosion rates involves use of the 95% confidence limit defined by the

generalized K-S test. The range of indistinguishable solutions bounded by the 95%

confidence limit can be reduced by optimizing sampling such the number of

measurements defining an age distribution is proportional to the weighting coefficient

used during the mixing calculations.

Acknowledements We thank Willy Amidon for discussions regarding his Marsyandi

River study and for making available details of the U-Pb analysis and modern sand

sampling. Jerome Gynum and George Gehrels are also thanked for discussions regarding

Monte Carlo simulations related to the K-S statistic. This work was made possible by

NSF grants to Lovera and Grove (Geochemistry-Petrology) and by an ExxonMobil grant

to Cina.

Appendix A

Second order approximation of the Generalized Kolmogorov-Smirnov Statistic

For small data sets (Ne < 20) a more accurate approximation to the K-S

significance function (see [3] and [4]) is obtained by introducing a second order

correction on the definition of the λ parameter. Straightforward generalization of the

mixture solutions for both cases (independent and identical tributaries populations) can

be obtained as follows:

1) Independent populations: The definition of λi in [12] is replaced by:

0.110.12i iii

Dnn

λφ

⎛ ⎞= + +⎜ ⎟⎜ ⎟

⎝ ⎠ [Α1]

2) Identical populations: The definition of Neff in [10] is altered as follows. Since each

individual distribution is normalized by its weight iφ , we first define an individual

effective sample size (Ne,i) as the value that will produce the same K-S significant value

for the measured D:

1 0.11 0.110.12 0.12ei

ii eii i

n ND n Nλ

φ

⎛ ⎞ ⎛ ⎞= + + ≡ + +⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠

[Α2]

Solving [7] we can write Nei as:

22/ 0.12 ( / 0.12) 4*0.11

2i ie

iD D

Nλ λ⎡ ⎤− + − −

⎢ ⎥=⎢ ⎥⎣ ⎦

[A3]

The composite Neff is then defined as:

1

1 1m

eiieffN N=

= ∑ [A4]

In a similar way, the 2-sample extension of the statistic is straightforward for the identical

populations case, by defining the Ne as the variance between Neff and M as eff

eff

N MN M+

.

However, the extension in the case of Independent populations is more complicated.

Using extensive Monte Carlo calculations we found that at least in the ternary case (m=3)

a good approximation is obtained when M is replaced by a weighted average Vi, where Vi

is calculated from the relative contributions φi as follows:

1;

(1 )m

i i jj j i

V Mφ φ= ≠

= −∏ [A5]

The new definition of λi is then written as:

0.11or 0.12 ; where i i

i i i i ii i i ii

nVD DX X Xn VX

λ λφ φ

⎛ ⎞= = + + =⎜ ⎟⎜ ⎟ +⎝ ⎠

[Α6]

Appendix B: Relationship between Relative Drainages Contributions φi, Predicted areas contributions φi

P and relative erosion rates φi

Contribution of the drainages is assumed to be proportional to the product of the

Predicted Area contribution (Ai, a function of the Exposured Area + zircon

concentrations) times the erosion rate of the area (Ei). Thus the relative contributions of

the drainages can be written as:

i ii

i i

A EA E

φ =∑

,

This relationship can be written in terms of the relative predicted area φiP and the relative

erosion rates φiR as follows:

i iP R

i i i ii P R

i i i i

i i

A EA E

A EA E

φ φφφ φ

= ≡∑ ∑∑ ∑

∑ ∑

References Cited

Adams, C. J., Campbell, H. J., & Griffin, W. L., 2007. Provenance comparisons of Permian to Jurassic tectonostratigraphic terranes in New Zealand: perspectives from detrital zircon age patterns. Geol. Mag. 144, 701-729.

Amidon, W.H., Burbank, D.W., Gehrels, G.E., 2005. Construction of detrital mineral populations: insights from mixing of U-Pb zircon ages in Himalayan rivers, Basin Res. 17, 463-485.

Bernet, M., & Spiegel, C., 2004. Detrital thermochronology: Provenance analysis, exhumation, and landscape evolution of mountain belts. Boulder, Colo: Geological Society of America.

Borradaile, G. J., 2003, Statistics of Earth Science Data: Their Distribution in Time, Space, and Orientation 2nd edition, Springer.

Brewer, I. D., Burbank, D.W., & Hodges, K.V. (2003). Modelling detrital cooling-age populations: insights from two Himalayan catchments. Basin Research. 15 (3), 305-320.

Burbank, D.W., Blythe, A.E., Putkonen, J., Pratt-Sitaula, B., Gabet, E., Oskin, M., Barros, A., Ojha, T.P., 2003. Decoupling of erosion and precipitation in the Himalaya, Nature 426, 652-655.

Campbell, I.H., Reiners, P.W., Allen, C.M., Nicolescu, S., and Upadhyay, R., 2005. He-Pb double dating of detrital zircons from the Ganges and Indus Rivers: Implications for sediment recycling and provenance studies, Earth and Planetary Science Letters, 237, 402-432.

Catlos, E.J., Harrison, T.M., Kohn, M.J., Grove, M., Ryerson, F.J., Manning, C., Upreti, B.N., 2001. Geochronologic and thermobarometric constraints on the evolution of the Main central Thrust, central Nepal Himalaya, J. Geophys. Res., B 106 (8) 16177–16204.

Compston W., Williams I. S., Meyer C., 1984. U–Pb geochronology of zircons from lunar breccia 73217 using a sensitive high mass resolution ion microprobe. Journal of Geophysical Research 89, B525–B534.

DeGraaff-Surpless, K., Graham, S. A., Wooden, J. L., & McWilliams, M. O. (2002). Detrital zircon provenance analysis of the Great Valley Group, California: Evolution of an arc-forearc system. Geological Society of America Bulletin. 114, 1564-1580.

Dickinson, W.R. and Gehrels, G.E., 2008, U-Pb Ages of Detrital Zircons in Relation to Paleogeography: Triassic Paleodrainage. Journal of Sedimentary Research. 78, 745-764.

Dodson, M.H., Compston, W., Williams, I.S., Wilson, J.F., 1988. A search for ancient detrital zircons in Zimbabwean sediments. J. Geol. Soc. (Lond.) 145, 977– 983.

Fedo, C.M., Sircombe, K.N., Rainbird, R.H., 2003. Detrtial zircon analysis of the sedimentary record. In: Hanchar, J.M., Hoskin, P.W.O. (Eds.), Zircon, Reviews in Mineralogy and Geochemistry, 53, 277– 303.

Fletcher, J., Grove, M., Kimbrough, D., Lovera, O., & Gehrels, G., 2007. Ridge-trench interactions and the Neogene tectonic evolution of the Magdalena shelf and southern Gulf of California: Insights from detrital zircon U-Pb ages from the

Magdalena fan and adjacent areas. Geological Society of America Bulletin. 119, 1313-1336.

Gansser, A., 1964. Geology of the Himalayas.Wiley-Interscience, NewYork, 1964. Gehrels, G.E., 2000. Introduction to detrital zircon studies of Paleozoic and Triassic

strata in western Nevada and northern California. In: Soreghan, M.J., Gehrels, G.E. (Eds.), Paleozoic and Triassic Paleogeography and Tectonics of Western Nevada and Northern California, Geological Society of America Special Paper, 347, 1 – 17.

Grimes, C. B., John, B. E., Kelemen, P. B., Mazdab, F. K., Wooden, J. L., Cheadle, M. J., et al., 2007. Trace element chemistry of zircons From oceanic crust: A method for distinguishing detrital zircon provenance. Geology 35, 643-646.

Hodges, K.V., Hurtado, J.M., Whipple K.X., 2001. Southward extrusion of Tibetan crust and its effect on Himalayan tectonics, Tectonics 20 (6) 799– 809.

Ireland, Flötmann. T., Fanning C. M., Gibson G. M., Preiss, V., 1998. Development of the early Paleozoic Pacific margin of ondwana from detrital-zircon ages across the Delamerian oogen. Geology 26, 243–246.

Sircombe, K.N., 2000. Quantative comparison of large sets of geochronological data using multivariate analysis: A provenance study example from Australia, GCA, 64, 1593-1616.

Kosler, J., Fonneland, H., Sylvester, P., Tubrett, M., & Pedersen, R. B., 2002. U-Pb dating of detrital zircons for sediment provenance studies-a comparison of laser ablation ICPMS and SIMS techniques. Chemical Geology. 182, 605-618.

Le Fort, P., 1975. Himalayas; the collided range; present knowledge of the continental arc, Am. J. Sci. 275A, 1– 44.

Mueller, P.A., Foster, D.A., Mogk, D.W., Wooden, J.L., Kamenov, G.D., and Vogl J.J., 2007, Detrital mineral chronology of the Uinta Mountain Group: Implications for the Grenville flood in southwestern Laurentia. Geology, 35, 431 - 434.

Press William H., Flannery, Brian P., Teukolsky, Saul A., and Vetterling, William T., 2002. Numerical Recipes: The Art of Scientific Computing, 2nd Ed., Cambridge University Press, pp. 997.

Prokopiev, A.V. Toro, J., Miller, E.L., and Gehrels, G.E., 2008; The paleo–Lena River—200 m.y. of transcontinental zircon transport in Siberia. Geology 36: 699-702.

Sambridge, M.S., Compston, W., 1994. Mixture modelling of multicomponent data sets with application to ion-probe zircon ages. Earth Planet. Sci. Lett. 128, 373– 390.

Seeber, L., Gornitz, V., 1983. River profiles along the Himalayan arc as indicators of active tectonics, Tectonophysics 92, 335–367.

Silverman, B. W., 1986. Density Estimation for Statistics and Data Analysis. CRC Press, Boca Raton, Fla. 175 pp.

Sircombe, K.N., 2000. Quantitative comparison of geochronological data using multivariate analysis: a provenance study example from Australia. Geochim. Cosmochim. Acta 64, 1593– 1619.

Sircombe, K.N., Hazelton, M.L., 2004. Comparison of detrital zircon age distributions by kernel functional estimation. Sediment. Geol. 171, 91– 111.

Vermeesch, P. (2005), Statistical uncertainty associated with histograms in the Earth sciences, J. Geophys. Res., 110, B02211, doi:10.1029/2004JB003479.

Vermeesch, P., 2004. How many grains are needed for a provenance study? Earth Planet. Sci. Lett., 224(3– 4), 441– 451.

Wobus, C., Hodges, K.V., Whipple, K.X., 2003. Has focused denudation sustained active thrusting at the Himalayan topographic front, Geology 31 (10) 861– 864.

Tables Table 1 Conventional K-S Statistics of Individual Samples (Relative to Marsyandi Site K)

CPDF CDF Sample* Units Sampled

σav

(%) Size Ni D %PROB D %PROB

Site E TH + GH (II&III) 3.2 89 0.33 10-4 0.34 10-4 Site F GH (I) 5.2 98 0.18 2.0 0.20 0.8 Sites I + H LH 2.6 178 0.58 10-26 0.61 10-28 *Amidon et. al. 2005 Table 2: KS-Statistics and Correlation Values of Mixing Components PROB# Rij

*

Site E Site F Site I+H Site E Site F Site I+H Site E 1 0.00005 10-37 1 0.51 0.07 Site F - 1 10-32 - 1 0.13 Site I+H - - 1 - - 1 #KS-Test Probabilities *Cross-correlation values Table 3 Generalized K-S Statistics of Mixtures (Relative to Marsyandi Site K) % Relative Drainage Contributions Generalized K-S

%PROB Mixtures TH+GH (II-III) GH (I) LH D Identical Independent

weighted by catchment area* 44 24 32 0.05 54 31 weighted by catchment area & zircon concentration* 43 37 20 0.12 1.2 0.03

Best-fit (Amidon et. al. 2005) 35 31 34 0.06 34 10 Max. PROB (This Study) 34 35 31 0.04 71 50 *Amidon et. al. 2005

Figure Captions

Figure 1. (A) Hypothetical drainage system illustrating how three distinct catchment

basins with disparate detrital zircon provenance mix to produce integrated

population in main stem of the river system. The proportional contribution of

each basin is represented by weighting coefficients φi. (B) – (D) Probability

density functions (PDF) for individual age distributions (Xi) measured for

catchments 1, 2, and 3 respectively. (E) PDF for the integrated down stream age

distribution (X). (F) PDF corresponding to a weighted mixture (XW) of individual

age distributions 1-3. (G) Cumulative probability density functions for (B) thru

(F) above.

Figure 2. Illustration of key parameters in the Kolmogorov-Smirnov (K-S) test. (A)

Relationship between the maximum difference (D; see [1-2] in text) between two

cumulative distribution functions (CDF) and the probability (PROB; see [3-5])

that D could, by chance, be larger than the observed value. Equivalent sample

size in the K-S test is given by Ne (see [6]). Curves are shown for Ne = 9, 25, 100,

400, and 900. The dashed line represents 95% confidence that two distributions

are distinct (i.e., they are not drawn from the same population). The critical value

of D (Dcrit.) marks the 95% confidence threshold. (B) Relationship between the

square root of Ne and Dcrit.. The sensitivity of the K-S test improves by 67% when

Ne is increased by an order of magnitude.

Figure 3. Incorporation of experimental errors in the K-S test. ΔDcrit represents the

difference in Dcrit. obtained when the K-S test is applied to cumulative probability

density functions (CPDF; see [7]) are compared instead of CDF. Average

experimental error is given by σav. Results of individual simulations are

represented by open circles for Ne = 100. Least-squares of these results yielded

the indicated line. Regression lines for similar sets of Monte Carlo simulations

are also shown for Ne = 25, 400, and 900. For a given Ne, ΔDcrit is a linear

function of the square root of σav/Ne (see [8]).

Figure 4. The generalized K-S statistic for mixtures. Analytical solutions are available to

use in the K-S test when the mixtures are constructed from distributions drawn

from either identical (see [11]) or independent (see [12-13]) populations. These

two limiting cases are shown in the plot of PROB vs. D. The curves enclose the

spectrum of possible solutions.

Figure 5. Geologic map of the Marsyandi drainage, central Nepal (after Amidon et al.

(2005; see additional references cited within). Detrital zircon samples of river

sediments measured by Amidon et al. (2005) from main stem of the Marsyandi

River (E and K) and tributaries draining single geologic units (F, H, and I). See

Table 1 and text for further details.

Figure 6. (A) CPDFs of Marsyandi river samples used in mixing calculations (see

locations in Figure 5). The maximum probability solution (red) is based upon the

result assuming that the mixing end members are derived from independent

populations (see below). (B) Ternary plots of all mixtures of the end members

defined in Table 1 (see also Amidon et al. 2005). Contours of probability assume

that the mixing components are derived from identical populations (see Table 3).

The dashed line encloses all solutions that are indistinguishable from Amidon et

al.’s (2005) site K distribution at 95% confidence. Compositions predicted from

exposure area only and exposure area plus zircon concentration are from Amidon

et al. (2005) and are discussed in the text. (C) Same as for (B) above but contours

of probability assume that the mixing components are derived from independent

populations (see Table 3).

Figure 7. Relative erosion rates of geologic units exposed in the catchment of the

Marsyandi River based upon mixing calculations that assume end members are

derived from independent populations. The inverted triangle corresponds to

uniform erosion rates. The dashed line encloses the locus of erosion relative

erosion rates that are consistent with solutions indistinguishable from the site K

distribution at 95% confidence. While most solutions indicate higher erosion

rates for the Lesser Himalayan formations that structurally underlie the Main

Central thrust (MCT), some acceptable solutions are also consistent with higher

erosion structurally above the MCT. The range of acceptable solutions can be

better constrained by more optimal sampling (see text).

��

��

��

�

φ�

φ�

φ�

��

��

1

*m

w i ii

X Xφ=

=

��

��

��

��

�

��

�

��

��

��

��

��

��

��

��

�

��

� ��

��

��

��

��

��

��

��

��

��

��

��

�

��

��

��

��

�!

0.0 0.1 0.2 0.3 0.4 0.50

25

50

75

100

900400

(A)

Ne=925100

PR

OB

(%)

D

Dcrit

0 5 10 15 20 25 300.0

0.1

0.2

0.3

0.4

0.5(B)

Figure 2Lovera et al.

Dcr

it

(Ne)1/2

0.0 0.5 1.0 1.5 2.00.00

0.01

0.02

0.03

0.04

0.05


900

400

Ne= 25

ΔDcrit= 0.163 (σav/Ne)1/2

ΔDcr

it

(σav)1/2

100

0.00 0.05 0.10 0.15 0.200

25

50

75

100


Intermediatesolution

Distinct at95% confidence

Indpendent populationsIdentical populations

P

RO

B (%

)

D

Indistinguishableat 95% confidence

��

��

��

��

��

��

�

��

��

� ��

��

��

�� !�

�� "

�� #

�� $%&

�� '

��( ��

�)�� )�* �)*� �)+* �)��

�)��

�)�*

�)*�

�)+*

�)��)��

�)�*

�)*�

�)+*

�)�� () �� ,- ��.��

��.�� ) ��*�

"(/) �� % �� )

0� ��1�

"(/) ��

��

2 3� !

�� 3� !

*)� � �� !

�)� � *)� !

�)� � �)� !

4 �)� !

5�� "

5�� #

5�� $%&

�)�� )�* �)*� �)+* �)��

�)��

�)�*

�)*�

�)+*

�)��)��

�)�*

�)*�

�)+*

�)��

��

2 3� !

�� 3� !

*)� � �� !

�)� � *)� !

�)� � �)� !

4 �)� !

5�� "

5�� #

5�� $%&

��() �� ,- ��.��

��.�� ) ��*�

"(/) �� % �� )

0� ��1�

"(/) ��

��

�0�

��

#�� 6 ��

��

��

��

��

��

��

��

��

��

��

��

��

��

��!"��#"��!��

��

$ !� %

��& !� %

�� & �� %

�� & �� %

�� & �� %

' �� %

(�)�� * �� *��

��

+�� *��

��,- � ��

Documents

A GENERALIZED KOLMOGOROV-SMIRNOV STATISTIC FOR …wamidon/pdfs/lovera_et... · 2009. 2. 10. · Oscar M. Lovera1 Department of Earth & Space Sciences, 3806 Geology Building, University