Clase Kolmogorov Smirnov para Java

Embed Size (px)

Citation preview

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    1/50

    /** Licensed to the Apache Software Foundation(ASF) under one or more* contributor license agreements. See the

    NO!"# file distributed with* this wor$ for additional informationregarding cop%right ownership.* he ASF licenses this file to &ou under theApache License' ersion .* (the +License+), %ou ma% not use this filee-cept in compliance with* the License. &ou ma% obtain a cop% of the

    License at**http//www.apache.org/licenses/L!"#NS#.** 0nless re1uired b% applicable law or agreedto in writing' software* distributed under the License isdistributed on an +AS !S+ 2AS!S'

    * 3!4O0 3A55AN!#S O5 "ON6!!ONS OF AN&7!N6' either e-press or implied.* See the License for the specific languagego8erning permissions and* limitations under the License.*/

    pac$age

    org.apache.commons.math9.stat.inference,

    import :a8a.math.2ig6ecimal,import :a8a.util.Arra%s,import :a8a.util.4ashSet,

    importorg.apache.commons.math9.distribution.#numerated5eal6istribution,

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    2/50

    importorg.apache.commons.math9.distribution.5eal6istribution,import

    org.apache.commons.math9.distribution.0niform5eal6istribution,importorg.apache.commons.math9.e-ception.!nsufficient6ata#-ception,importorg.apache.commons.math9.e-ception.;athArithmetic#-ception,

    importorg.apache.commons.math9.e-ception.;ath!nternal#rror,importorg.apache.commons.math9.e-ception.NullArgument#-ception,importorg.apache.commons.math9.e-ception.Number!sooLarge#-ception,

    importorg.apache.commons.math9.e-ception.OutOf5ange#-ception,importorg.apache.commons.math9.e-ception.oo;an%!terations#-ception,importorg.apache.commons.math9.e-ception.util.Local

    i

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    3/50

    importorg.apache.commons.math9.linear.Arra%65owField;atri-,import

    org.apache.commons.math9.linear.Field;atri-,importorg.apache.commons.math9.linear.;atri-0tils,importorg.apache.commons.math9.linear.5eal;atri-,importorg.apache.commons.math9.random.=675andom>enerator,

    importorg.apache.commons.math9.random.5andom>enerator,importorg.apache.commons.math9.random.3ell?@@9c,importorg.apache.commons.math9.util."ombinatorics0tils,import

    org.apache.commons.math9.util.Fast;ath,importorg.apache.commons.math9.util.;athArra%s,importorg.apache.commons.math9.util.;ath0tils,

    /** * !mplementation of the Ba

    hrefC+http//en.wi$ipedia.org/wi$i/7olmogoro8Smirno8Dtest+E * 7olmogoro8Smirno8 (7S) testB/aE fore1ualit% of continuous distributions. * BpE* he 7S test uses a statistic based on thema-imum de8iation of the empiricaldistribution of

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    4/50

    * sample data points from the distributione-pected under the null h%pothesis. For onesample tests* e8aluating the null h%pothesis that a set

    of sample data points follow a gi8endistribution' the* test statistic is (6DnCsupD- GFDn(-)F(-)G)' where (F) is the e-pecteddistribution and* (FDn) is the empirical distribution ofthe (n) sample data points. hedistribution of

    * (6Dn) is estimated using a method basedon H?I with certain 1uic$ decisions fore-treme 8alues* gi8en in HI.* B/pE* BpE* wosample tests are also supported'e8aluating the null h%pothesis that the twosamples

    * JKcode - and JKcode % come from the sameunderl%ing distribution. !n this case' thetest* statistic is (6DJn'mCsupDt G FDn(t)FDm(t)G) where (n) is the length of JKcode-' (m) is* the length of JKcode %' (FDn) is theempirical distribution that puts mass (?/n)

    at each of* the 8alues in JKcode - and (FDm) is theempirical distribution of the JKcode %8alues. he* default sample test method' JKlin$M$olmogoro8Smirno8est(doubleHI' doubleHI)wor$s as* follows* BulE

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    5/50

    * BliEFor small samples (where the product ofthe sample si#DSA;L#D5O60")' the methodpresented in HI is used to compute the

    * e-act p8alue for the sample test.B/liE* BliE3hen the product of the sample si#DSA;L#D5O60"' theas%mptotic* distribution of (6DJn'm) is used. SeeJKlin$ Mappro-imate(double' int' int) fordetails on* the appro-imation.B/liE

    * B/ulEB/pEBpE* !f the product of the sample si#DSA;L#D5O60" and thesample* data contains ties' random :itter is addedto the sample data to brea$ ties beforeappl%ing* the algorithm abo8e. Alternati8el%' theJKlin$ Mbootstrap(doubleHI' doubleHI' int'

    boolean)* method' modeled after BahrefC+http//se$hon.ber$ele%.edu/matching/$s.boot.html+E$s.bootB/aE* in the 5 ;atching pac$age H9I' can be usedif ties are $nown to be present in the data.* B/pE* BpE

    * !n the twosample case' (6DJn'm) has adiscrete distribution. his ma$es the p8alue* associated with the null h%pothesis (4D 6DJn'm ge d ) differ from (4D 6DJn'mE d )* b% the mass of the obser8ed 8alue (d). odistinguish these' the twosample tests use aboolean* JKcode strict parameter. his parameter is

    ignored for large samples.

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    6/50

    * B/pE* BpE* he methods used b% the sample defaultimplementation are also e-posed directl%

    * BulE* BliEJKlin$ Me-act(double' int' int'boolean) computes e-act sample p8aluesB/liE* BliEJKlin$ Mappro-imate(double' int' int)uses the as%mptotic distribution he JKcodeboolean* arguments in the first two methods allow

    the probabilit% used to estimate the p8alueto be* e-pressed using strict or nonstrictine1ualit%. See* JKlin$ M$olmogoro8Smirno8est(doubleHI'doubleHI' boolean).B/liE* B/ulE* B/pE* BpE

    * 5eferences* BulE* BliEH?I BahrefC+http//www.:statsoft.org/8P/i?P/+E#8aluating 7olmogoro8Qs 6istributionB/aE b%* >eorge ;arsaglia' 3ai 3an sang' and =ingbo3angB/liE* BliEHI Ba

    hrefC+http//www.:statsoft.org/89@/i??/+E"omputing the woSided 7olmogoro8Smirno8* 6istributionB/aE b% 5ichard Simard andierre LQ#cu%erB/liE* BliEH9I =as:eet S. Se$hon. ??. BahrefC+http//www.:statsoft.org/article/8iew/8i+E* ;ulti8ariate and ropensit% Score ;atchingSoftware with Automated 2alance Optimi

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    7/50

    * he ;atching pac$age for 5B/aE =ournal ofStatistical Software' () ?R.B/liE* BliEHI 3ilco-' 5and. ?. !ntroduction to5obust #stimation and 4%pothesis esting'

    * "hapter R' 9rd #d. Academic ress.B/liE* B/ulE* Bbr/E* Note that H?I contains an error incomputing h' refer to Ba*hrefC+https//issues.apache.org/:ira/browse/;A49+E;A49B/aE for details.

    * B/pE** Ksince 9.9*/public class 7olmogoro8Smirno8est J

    /*** 2ound on the number of partial sums inJKlin$ M$sSum(double' double' int)

    */protected static final int;A!;0;DA5!ALDS0;D"O0N C ?,

    /** "on8ergence criterion for JKlin$M$sSum(double' double' int) */protected static final double7SDS0;D"A0"4&D"5!#5!ON C ?#,

    /** "on8ergence criterion for the sums inMpelood(double' double' int) */protected static final double>DS0;D5#LA!#D#55O5 C ?.e?,

    /** No longer used. */K6eprecatedprotected static final int

    S;ALLDSA;L#D5O60" C ,

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    8/50

    /*** 3hen product of sample sienerator rng,

    /*** "onstruct a 7olmogoro8Smirno8est instancewith a default random data generator.*/public 7olmogoro8Smirno8est() Jrng C new 3ell?@@9c(),

    /*** "onstruct a 7olmogoro8Smirno8est with thepro8ided random data generator.* he Mmonte"arlo(double' int' int' boolean'int) that uses the generator supplied to this* constructor is deprecated as of 8ersion9.T.

    *

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    9/50

    * Kparam rng random data generator used b%JKlin$ Mmonte"arlo(double' int' int'boolean' int)*/

    K6eprecatedpublic 7olmogoro8Smirno8est(5andom>eneratorrng) Jthis.rng C rng,

    /*** "omputes the BiEp8alueB/iE' or BiEobser8ed

    significance le8elB/iE' of a onesample Ba*hrefC+http//en.wi$ipedia.org/wi$i/7olmogoro8Smirno8Dtest+E 7olmogoro8Smirno8 testB/aE* e8aluating the null h%pothesis that JKcodedata conforms to JKcode distribution. !f* JKcode e-act is true' the distributionused to compute the p8alue is computed using* e-tended precision. See JKlin$

    Mcdf#-act(double' int).** Kparam distribution reference distribution* Kparam data sample being being e8aluated* Kparam e-act whether or not to force e-actcomputation of the p8alue* Kreturn the p8alue associated with thenull h%pothesis that JKcode data is a sample

    from* JKcode distribution* Kthrows !nsufficient6ata#-ception if JKcodedata does not ha8e length at least * Kthrows NullArgument#-ception if JKcodedata is null*/public double$olmogoro8Smirno8est(5eal6istribution

    distribution' doubleHI data' boolean e-act) J

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    10/50

    return ?d cdf($olmogoro8Smirno8Statistic(distribution'data)' data.length' e-act),

    /*** "omputes the onesample 7olmogoro8Smirno8test statistic' (6DnCsupD- GFDn(-)F(-)G)where* (F) is the distribution (cdf) functionassociated with JKcode distribution' (n)is the

    * length of JKcode data and (FDn) is theempirical distribution that puts mass (?/n)at* each of the 8alues in JKcode data.** Kparam distribution reference distribution* Kparam data sample being e8aluated* Kreturn 7olmogoro8Smirno8 statistic (6Dn)

    * Kthrows !nsufficient6ata#-ception if JKcodedata does not ha8e length at least * Kthrows NullArgument#-ception if JKcodedata is null*/public double$olmogoro8Smirno8Statistic(5eal6istributiondistribution' doubleHI data) J

    chec$Arra%(data),final int n C data.length,final double nd C n,final doubleHI data"op% C new doubleHnI,S%stem.arra%cop%(data' ' data"op%' ' n),Arra%s.sort(data"op%),double d C d,for (int i C ?, i BC n, iUU) J

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    11/50

    final double %i Cdistribution.cumulati8erobabilit%(data"op%Hi ?I),final double curr6 C Fast;ath.ma-(%i (i

    ?) / nd' i / nd %i),if (curr6 E d) Jd C curr6,return d,/**

    * "omputes the BiEp8alueB/iE' or BiEobser8edsignificance le8elB/iE' of a twosample Ba*hrefC+http//en.wi$ipedia.org/wi$i/7olmogoro8Smirno8Dtest+E 7olmogoro8Smirno8 testB/aE* e8aluating the null h%pothesis that JKcode- and JKcode % are samples drawn from thesame* probabilit% distribution. Specificall%'

    what is returned is an estimate of theprobabilit%* that the JKlin$M$olmogoro8Smirno8Statistic(doubleHI'doubleHI) associated with a randoml%* selected partition of the combined sampleinto subsamples of si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    12/50

    * in HI' implemented in JKlin$Me-act(double' int' int' boolean). B/liE* BliE3hen the product of the sample si#DSA;L#D5O60"' the

    * as%mptotic distribution of (6DJn'm) isused. See JKlin$ Mappro-imate(double' int'int)* for details on the appro-imation.B/liE* B/ulEBpE* !f JKcode -.length * %.length B JK8alueMLA5>#DSA;L#D5O60" and the combined setof 8alues in

    * JKcode - and JKcode % contains ties'random :itter is added to JKcode - andJKcode % to* brea$ ties before computing (6DJn'm) andthe p8alue. he :itter is uniforml%distributed* on (min6elta / ' min6elta / ) wheremin6elta is the smallest pairwise differencebetween

    * 8alues in the combined sample.B/pE* BpE* !f ties are $nown to be present in thedata' JKlin$ Mbootstrap(doubleHI' doubleHI'int' boolean)* ma% be used as an alternati8e method forestimating the p8alue.B/pE*

    * Kparam - first sample dataset* Kparam % second sample dataset* Kparam strict whether or not theprobabilit% to compute is e-pressed as astrict ine1ualit%* (ignored for large samples)* Kreturn p8alue associated with the nullh%pothesis that JKcode - and JKcode %represent

    * samples from the same distribution

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    13/50

    * Kthrows !nsufficient6ata#-ception if eitherJKcode - or JKcode % does not ha8e lengthat* least

    * Kthrows NullArgument#-ception if eitherJKcode - or JKcode % is null* Ksee Mbootstrap(doubleHI' doubleHI' int'boolean)*/public double $olmogoro8Smirno8est(doubleHI-' doubleHI %' boolean strict) Jfinal long lengthroduct C (long) -.length *

    %.length,doubleHI -a C null,doubleHI %a C null,if (lengthroduct B LA5>#DSA;L#D5O60" VVhasies(-'%)) J-a C ;athArra%s.cop%Of(-),%a C ;athArra%s.cop%Of(%),fi-ies(-a' %a), else J

    -a C -,%a C %,if (lengthroduct B LA5>#DSA;L#D5O60") Jreturn e-act($olmogoro8Smirno8Statistic(-a'%a)' -.length' %.length' strict),return

    appro-imate($olmogoro8Smirno8Statistic(-'%)' -.length' %.length),

    /*** "omputes the BiEp8alueB/iE' or BiEobser8edsignificance le8elB/iE' of a twosample Ba*hrefC+http//en.wi$ipedia.org/wi$i/7olmogoro8

    Smirno8Dtest+E 7olmogoro8Smirno8 testB/aE

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    14/50

    * e8aluating the null h%pothesis that JKcode- and JKcode % are samples drawn from thesame* probabilit% distribution. Assumes the

    strict form of the ine1ualit% used to computethe* p8alue. See JKlin$M$olmogoro8Smirno8est(5eal6istribution'doubleHI' boolean).** Kparam - first sample dataset* Kparam % second sample dataset

    * Kreturn p8alue associated with the nullh%pothesis that JKcode - and JKcode %represent* samples from the same distribution* Kthrows !nsufficient6ata#-ception if eitherJKcode - or JKcode % does not ha8e lengthat* least * Kthrows NullArgument#-ception if either

    JKcode - or JKcode % is null*/public double $olmogoro8Smirno8est(doubleHI-' doubleHI %) Jreturn $olmogoro8Smirno8est(-' %' true),

    /**

    * "omputes the twosample 7olmogoro8Smirno8test statistic' (6DJn'mCsupD- GFDn(-)FDm(-)G)* where (n) is the length of JKcode -' (m) is the length of JKcode %' (FDn) isthe* empirical distribution that puts mass (?/n) at each of the 8alues in JKcode - and(FDm)

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    15/50

    * is the empirical distribution of the JKcode% 8alues.** Kparam - first sample

    * Kparam % second sample* Kreturn test statistic (6DJn'm) used toe8aluate the null h%pothesis that JKcode -and* JKcode % represent samples fromthe same underl%ing distribution* Kthrows !nsufficient6ata#-ception if eitherJKcode - or JKcode % does not ha8e length

    at* least * Kthrows NullArgument#-ception if eitherJKcode - or JKcode % is null*/public double$olmogoro8Smirno8Statistic(doubleHI -'doubleHI %) Jreturn integral7olmogoro8Smirno8Statistic(-'

    %)/((double)(-.length * (long)%.length)),

    /*** "omputes the twosample 7olmogoro8Smirno8test statistic' (6DJn'mCsupD- GFDn(-)FDm(-)G)* where (n) is the length of JKcode -'

    (m) is the length of JKcode %' (FDn) isthe* empirical distribution that puts mass (?/n) at each of the 8alues in JKcode - and(FDm)* is the empirical distribution of the JKcode% 8alues. Finall% (n m 6DJn'm) isreturned* as long 8alue.

    *

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    16/50

    * Kparam - first sample* Kparam % second sample* Kreturn test statistic (n m 6DJn'm) usedto e8aluate the null h%pothesis that JKcode

    - and* JKcode % represent samples fromthe same underl%ing distribution* Kthrows !nsufficient6ata#-ception if eitherJKcode - or JKcode % does not ha8e lengthat* least * Kthrows NullArgument#-ception if either

    JKcode - or JKcode % is null*/pri8ate longintegral7olmogoro8Smirno8Statistic(doubleHI-' doubleHI %) Jchec$Arra%(-),chec$Arra%(%),// "op% and sort the sample arra%sfinal doubleHI s- C ;athArra%s.cop%Of(-),

    final doubleHI s% C ;athArra%s.cop%Of(%),Arra%s.sort(s-),Arra%s.sort(s%),final int n C s-.length,final int m C s%.length,

    int ran$ C ,int ran$& C ,

    long cur6 C l,

    // Find the ma- difference between cdfD- andcdfD%long sup6 C l,do Jdouble < C 6ouble.compare(s-Hran$I's%Hran$&I) BC W s-Hran$I s%Hran$&I,while(ran$ B n VV 6ouble.compare(s-Hran$I'

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    17/50

    ran$ UC ?,cur6 UC m,while(ran$& B m VV 6ouble.compare(s%Hran$&I'

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    18/50

    public double$olmogoro8Smirno8est(5eal6istributiondistribution' doubleHI data) Jreturn $olmogoro8Smirno8est(distribution'

    data' false),

    /*** erforms a BahrefC+http//en.wi$ipedia.org/wi$i/7olmogoro8Smirno8Dtest+E 7olmogoro8Smirno8* testB/aE e8aluating the null h%pothesis

    that JKcode data conforms to JKcodedistribution.** Kparam distribution reference distribution* Kparam data sample being being e8aluated* Kparam alpha significance le8el of the test* Kreturn true iff the null h%pothesis thatJKcode data is a sample from JKcodedistribution

    * can be re:ected with confidence ? JKcode alpha* Kthrows !nsufficient6ata#-ception if JKcodedata does not ha8e length at least * Kthrows NullArgument#-ception if JKcodedata is null*/public boolean

    $olmogoro8Smirno8est(5eal6istributiondistribution' doubleHI data' double alpha) Jif ((alpha BC ) GG (alpha E .R)) Jthrow newOutOf5ange#-ception(LocaliN!F!"AN"#DL##L' alpha' ' .R),return $olmogoro8Smirno8est(distribution'data) B alpha,

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    19/50

    /*** #stimates the BiEp8alueB/iE of a twosample

    * BahrefC+http//en.wi$ipedia.org/wi$i/7olmogoro8Smirno8Dtest+E 7olmogoro8Smirno8 testB/aE* e8aluating the null h%pothesis that JKcode- and JKcode % are samples drawn from thesame* probabilit% distribution. his methodestimates the p8alue b% repeatedl% sampling

    sets of si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    20/50

    final int %Length C %.length,final doubleHI combined C new doubleH-LengthU %LengthI,S%stem.arra%cop%(-' ' combined' ' -Length),

    S%stem.arra%cop%(%' ' combined' -Length'%Length),final #numerated5eal6istribution dist C new#numerated5eal6istribution(rng' combined),final long d Cintegral7olmogoro8Smirno8Statistic(-' %),int greater"ount C ,int e1ual"ount C ,

    doubleHI cur,doubleHI cur&,long cur6,for (int i C , i B iterations, iUU) Jcur C dist.sample(-Length),cur& C dist.sample(%Length),cur6 Cintegral7olmogoro8Smirno8Statistic(cur'cur&),

    if (cur6 E d) Jgreater"ountUU, else if (cur6 CC d) Je1ual"ountUU,return strict W greater"ount / (double)iterations

    (greater"ount U e1ual"ount) / (double)iterations,

    /*** "omputes JKcode bootstrap(-' %' iterations'true).* his is e1ui8alent to $s.boot(-'%'nbootsCiterations) using the 5 ;atching

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    21/50

    * pac$age function. See Mbootstrap(doubleHI'doubleHI' int' boolean).** Kparam - first sample

    * Kparam % second sample* Kparam iterations number of bootstrapresampling iterations* Kreturn estimated p8alue*/public double bootstrap(doubleHI -' doubleHI%' int iterations) Jreturn bootstrap(-' %' iterations' true),

    /*** "alculates ((6Dn B d)) using the methoddescribed in H?I with 1uic$ decisions fore-treme* 8alues gi8en in HI (see abo8e). he resultis not e-act as with* JKlin$ Mcdf#-act(double' int) because

    calculations are based on* JKcode double rather than JKlin$org.apache.commons.math9.fraction.2igFraction.** Kparam d statistic* Kparam n sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    22/50

    return cdf(d' n' false),

    /**

    * "alculates JKcode (6Dn B d). he resultis e-act in the sense that2igFraction/2ig5eal is* used e8er%where at the e-pense of 8er% slowe-ecution time. Almost ne8er choose this inreal* applications unless %ou are 8er% sure, thisis almost solel% for 8erification purposes.

    * Normall%' %ou would choose JKlin$Mcdf(double' int). See the class* :a8adoc for definitions and algorithmdescription.** Kparam d statistic* Kparam n sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    23/50

    * Kparam d statistic* Kparam n sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    24/50

    return res, else if (? nin8 BC d VV d B ?) Jreturn ? * ;ath.pow(? d' n),

    else if (? BC d) Jreturn ?,if (e-act) Jreturn e-act7(d' n),if (n BC ?) Jreturn rounded7(d' n),

    return pelood(d' n),

    /*** "alculates the e-act 8alue of JKcode (6DnB d) using the method described in H?I(reference* in class :a8adoc abo8e) and JKlin$

    org.apache.commons.math9.fraction.2igFraction (see* abo8e).** Kparam d statistic* Kparam n sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    25/50

    final int $ C (int) ;ath.ceil(n * d),

    final Field;atri-B2igFractionE 4 C

    this.create#-act4(d' n),final Field;atri-B2igFractionE 4power C4.power(n),

    2igFraction pFrac C 4power.get#ntr%($ ?' $ ?),

    for (int i C ?, i BC n, UUi) J

    pFrac C pFrac.multipl%(i).di8ide(n),

    /** 2igFraction.doublealue con8erts numeratorto double and the denominator to double and* di8ides afterwards. hat gi8es NaN 1uiteeas%. his does not (scale is the number of* digits)

    */return pFrac.big6ecimalalue('2ig6ecimal.5O0N6D4ALFD0).doublealue(),

    /*** "alculates JKcode (6Dn B d) using methoddescribed in H?I and doubles (see abo8e).

    ** Kparam d statistic* Kparam n sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    26/50

    final 5eal;atri- 4power C 4.power(n),

    double pFrac C 4power.get#ntr%($ ?' $ ?),for (int i C ?, i BC n, UUi) J

    pFrac *C (double) i / (double) n,

    return pFrac,

    /*** "omputes the elood appro-imation for

    ((6Dn B d)) as described in HI in theclass :a8adoc.** Kparam d 8alue of dstatistic (- in HI)* Kparam n sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    27/50

    int $ C ?,for (, $ B ;A!;0;DA5!ALDS0;D"O0N, $UU) J$erm C * $ ?,increment C Fast;ath.e-p(DS0;D5#LA!#D#55O5 * sum)Jbrea$,if ($ CC ;A!;0;DA5!ALDS0;D"O0N) J

    throw newoo;an%!terations#-ception(;A!;0;DA5!ALDS0;D"O0N),ret C sum * Fast;ath.s1rt( * Fast;ath.!) /

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    28/50

    if ($ CC ;A!;0;DA5!ALDS0;D"O0N) Jthrow new

    oo;an%!terations#-ception(;A!;0;DA5!ALDS0;D"O0N),final double s1rt4alfi CFast;ath.s1rt(Fast;ath.! / ),// !nstead of doubling sum' di8ide b% 9instead of Tret UC sum * s1rt4alfi / (9 *

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    29/50

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    30/50

    for ($ C , $ B ;A!;0;DA5!ALDS0;D"O0N, $UU) J$erm C $ U .R,$erm C $erm * $erm,

    $erm C $erm * $erm,$ermT C $erm * $erm,increment C (piT * $ermT * (R 9 *

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    31/50

    throw newoo;an%!terations#-ception(;A!;0;DA5!ALDS0;D"O0N),

    return ret U (s1rt4alfi / (s1rtN * n)) *(sum / (9 *

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    32/50

    2igFraction h C null,tr% Jh C new 2igFraction(h6ouble' ?.e' ?), catch (final Fraction"on8ersion#-ception

    e?) Jtr% Jh C new 2igFraction(h6ouble' ?.e?' ?), catch (final Fraction"on8ersion#-ceptione) Jh C new 2igFraction(h6ouble' ?.eR' ?),

    final 2igFractionHIHI 4data C new2igFractionHmIHmI,

    /** Start b% filling e8er%thing with either or ?.*/for (int i C , i B m, UUi) Jfor (int : C , : B m, UU:) J

    if (i : U ? B ) J4dataHiIH:I C 2igFraction.[#5O, else J4dataHiIH:I C 2igFraction.ON#,

    /** Setting up powerarra% to a8oid calculatingthe same 8alue twice howersHI C hY? ...* howersHm?I C hYm*/final 2igFractionHI howers C new2igFractionHmI,howersHI C h,for (int i C ?, i B m, UUi) J

    howersHiI C h.multipl%(howersHi ?I),

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    33/50

    /** First column and last row has special

    8alues (each other re8ersed).*/for (int i C , i B m, UUi) J4dataHiIHI C 4dataHiIHI.subtract(howersHiI),4dataHm ?IHiI C 4dataHm ?IHiI.subtract(howersHm i ?I),

    /** H?I states +For ?/ B h B ? the bottomleft element of the matri- should be (? *hYm U* (h ?)Ym )/mX+ Since BC h B ?' then ifh E ?/ is sufficient to chec$*/if (h.compareo(2igFraction.ON#D4ALF) CC ?) J

    4dataHm ?IHI C 4dataHm ?IHI.add(h.multipl%().subtract(?).pow(m)),

    /** Aside from the first column and last row'the (i' :)th element is ?/(i : U ?)X if i

    * : U ? EC ' else . ?Qs and Qs are alread%put' so onl% di8ision with (i : U ?)X is* needed in the elements that ha8e ?Qs. hereis no need to calculate (i : U ?)X and then* di8ide small steps a8oid o8erflows. Notethat i : U ? E BCE i U ? E : instead of* :Qing all the wa% to m. Also note that itis started at g C because di8iding b% ?isnQt

    * reall% necessar%.

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    34/50

    */for (int i C , i B m, UUi) Jfor (int : C , : B i U ?, UU:) Jif (i : U ? E ) J

    for (int g C , g BC i : U ?, UUg) J4dataHiIH:I C 4dataHiIH:I.di8ide(g),return newArra%65owField;atri-B2igFractionE(2igFractio

    nField.get!nstance()' 4data),

    /**** "reates JKcode 4 of si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    35/50

    /** Start b% filling e8er%thing with either or ?.*/

    for (int i C , i B m, UUi) Jfor (int : C , : B m, UU:) Jif (i : U ? B ) J4dataHiIH:I C , else J4dataHiIH:I C ?,

    /** Setting up powerarra% to a8oid calculatingthe same 8alue twice howersHI C hY? ...* howersHm?I C hYm*/final doubleHI howers C new doubleHmI,howersHI C h,

    for (int i C ?, i B m, UUi) JhowersHiI C h * howersHi ?I,

    /** First column and last row has special8alues (each other re8ersed).*/

    for (int i C , i B m, UUi) J4dataHiIHI C 4dataHiIHI howersHiI,4dataHm ?IHiI C howersHm i ?I,

    /** H?I states +For ?/ B h B ? the bottomleft element of the matri- should be (? *hYm U

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    36/50

    * (h ?)Ym )/mX+ Since BC h B ?' then ifh E ?/ is sufficient to chec$*/if (6ouble.compare(h' .R) E ) J

    4dataHm ?IHI UC Fast;ath.pow( * h ?'m),

    /** Aside from the first column and last row'the (i' :)th element is ?/(i : U ?)X if i

    * : U ? EC ' else . ?Qs and Qs are alread%put' so onl% di8ision with (i : U ?)X is* needed in the elements that ha8e ?Qs. hereis no need to calculate (i : U ?)X and then* di8ide small steps a8oid o8erflows. Notethat i : U ? E BCE i U ? E : instead of* :Qing all the wa% to m. Also note that itis started at g C because di8iding b% ?isnQt

    * reall% necessar%.*/for (int i C , i B m, UUi) Jfor (int : C , : B i U ?, UU:) Jif (i : U ? E ) Jfor (int g C , g BC i : U ?, UUg) J4dataHiIH:I /C g,

    return ;atri-0tils.create5eal;atri-(4data),

    /*** erifies that JKcode arra% has length atleast .

    *

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    37/50

    * Kparam arra% arra% to test* Kthrows NullArgument#-ception if arra% isnull* Kthrows !nsufficient6ata#-ception if arra%

    is too short*/pri8ate 8oid chec$Arra%(doubleHI arra%) Jif (arra% CC null) Jthrow newNullArgument#-ception(Locali

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    38/50

    * Kreturn 7olmogoro8 sum e8aluated at t* Kthrows oo;an%!terations#-ception if theseries does not con8erge*/

    public double $sSum(double t' doubletolerance' int ma-!terations) Jif (t CC .) Jreturn .,

    // O6O for small t (sa% less than ?)' thealternati8e e-pansion in part 9 of H?I

    // from class :a8adoc should be used.

    final double - C * t * t,int sign C ?,long i C ?,double partialSum C .Rd,double delta C ?,while (delta E tolerance VV i Bma-!terations) J

    delta C Fast;ath.e-p(- * i * i),partialSum UC sign * delta,sign *C ?,iUU,if (i CC ma-!terations) Jthrow newoo;an%!terations#-ception(ma-!terations),

    return partialSum * ,

    /*** >i8en a dstatistic in the range H' ?I andthe two sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    39/50

    * comparison with other integral dstatistics. 6epending whether JKcode strictis* JKcode true or not' the returned 8alue

    di8ided b% (n*m) is greater than* (resp greater than or e1ual to) the gi8en d8alue (allowing some tolerance).** Kparam d a dstatistic in the range H' ?I* Kparam n first sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    40/50

    * JKlin$M$olmogoro8Smirno8Statistic(doubleHI'doubleHI) for the definition of (6DJn'm).* BpE

    * he returned probabilit% is e-act'implemented b% unwinding the recursi8efunction* definitions presented in HI (class:a8adoc).* B/pE** Kparam d 6statistic 8alue

    * Kparam n first sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    41/50

    * JKlin$M$olmogoro8Smirno8Statistic(doubleHI'doubleHI) for the definition of (6DJn'm).* BpE

    * Specificall%' what is returned is (? $(ds1rtJmn / (m U n))) where ($(t) C ? U * sumDJiC?Yinft% (?)Yi eYJ iY tY).See JKlin$ M$sSum(double' double' int) for* details on how con8ergence of the sum isdetermined. his implementation passes JKcode$sSum* JK8alue M7SDS0;D"A0"4&D"5!#5!ON as JKcode

    tolerance and* JK8alue M;A!;0;DA5!ALDS0;D"O0N asJKcode ma-!terations.* B/pE** Kparam d 6statistic 8alue* Kparam n first sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    42/50

    * he method uses a simplified 8ersion of theFisher&ates shuffle algorithm.* 2% processing first the JKcode true 8aluesfollowed b% the remaining JKcode false

    8alues* less random numbers need to be generated.he method is optimienerator

    rng) JArra%s.fill(b' true),for (int $ C numberOfruealues, $ Bb.length, $UU) Jfinal int r C rng.ne-t!nt($ U ?),bH(bHrI) W r $I C false,

    /*** 0ses ;onte "arlo simulation toappro-imate ((6DJn'm E d)) where (6DJn'm) is the* sample 7olmogoro8Smirno8 statistic. See* JKlin$M$olmogoro8Smirno8Statistic(doubleHI'doubleHI) for the definition of (6DJn'm).

    * BpE

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    43/50

    * he simulation generates JKcode iterationsrandom partitions of JKcode m U n into an* JKcode n set and an JKcode m set'computing (6DJn'm) for each partition and

    returning* the proportion of 8alues that are greaterthan JKcode d' or greater than or e1ual to* JKcode d if JKcode strict is JKcodefalse.* B/pE** Kparam d 6statistic 8alue

    * Kparam n first sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    44/50

    * BpE* 4ere d is the 6statistic represented aslong 8alue.* he real 6statistic is obtained b%

    di8iding d b% n*m.* See also JKlin$ Mmonte"arlo(double' int'int' boolean' int).** Kparam d integral 6statistic* Kparam n first sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    45/50

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    46/50

    final doubleHI 8alues C;athArra%s.uni1ue(;athArra%s.concatenate(-'%)),if (8alues.length CC -.length U %.length) J

    return, // here are no ties

    // Find the smallest difference between8alues' or ? if all 8alues are the samedouble min6elta C ?,double pre8 C 8aluesHI,double delta C ?,

    for (int i C ?, i B 8alues.length, iUU) Jdelta C pre8 8aluesHiI,if (delta B min6elta) Jmin6elta C delta,pre8 C 8aluesHiI,min6elta /C ,

    // Add :itter using a fi-ed seed (so samearguments alwa%s gi8e same results)'// lowinitialienerator(?)' min6elta'min6elta),

    // !t is theoreticall% possible that :itterdoes not brea$ ties' so repeat// until all ties are gone. 2ound the loopand throw ;!# if bound is e-ceeded.int ct C ,boolean ties C true,do J:itter(-' dist),:itter(%' dist),

    ties C hasies(-' %),

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    47/50

    ctUU, while (ties VV ct B ?),if (ties) Jthrow new ;ath!nternal#rror(), // Should

    ne8er happen

    /*** 5eturns true iff there are ties in thecombined sample* formed from - and %.

    ** Kparam - first sample* Kparam % second sample* Kreturn true if - and % together containties*/pri8ate static boolean hasies(doubleHI -'doubleHI %) Jfinal 4ashSetB6oubleE 8alues C new

    4ashSetB6oubleE(),for (int i C , i B -.length, iUU) Jif (X8alues.add(-HiI)) Jreturn true,for (int i C , i B %.length, iUU) Jif (X8alues.add(%HiI)) J

    return true,return false,

    /*** Adds random :itter to JKcode data usingde8iates sampled from JKcode dist.

    * BpE

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    48/50

    * Note that :itter is applied inplace i.e.' the arra%* 8alues are o8erwritten with the result ofappl%ing :itter.B/pE

    ** Kparam data input/output data arra% entries o8erwritten b% the method* Kparam dist probabilit% distribution tosample for :itter 8alues* Kthrows Nullointer#-ception if either ofthe parameters is null*/

    pri8ate static 8oid :itter(doubleHI data'5eal6istribution dist) Jfor (int i C , i B data.length, iUU) JdataHiI UC dist.sample(),

    /*** he function "(i' :) defined in HI (class

    :a8adoc)' formula (R.R).* defined to return ? if Gi/n :/mG BC c, otherwise. 4ere c is scaled up* and recoded as a long to a8oid roundingerrors in comparison tests' so what* is actuall% tested is Gim :nG BC cmn.** Kparam i first path parameter

    * Kparam : second path paramter* Kparam m first sample si

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    49/50

  • 8/15/2019 Clase Kolmogorov Smirnov para Java

    50/50

    * "ompute n(?'?)' n(?')...n('?)' n(')...up to n(i':)' one row at a time.* 3hen n(i'*) are being computed' lagHI holdsthe 8alues of n(i ?' *).

    */final doubleHI lag C new doubleHnI,double last C ,for (int $ C , $ B n, $UU) JlagH$I C c(' $ U ?' m' n' cnm' strict),for (int $ C ?, $ BC i, $UU) Jlast C c($' ' m' n' cnm' strict),

    for (int l C ?, l BC :, lUU) JlagHl ?I C c($' l' m' n' cnm' strict) *(last U lagHl ?I),last C lagHl ?I,return last,