Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Lecture1:
Essentialstatistics
ShaneElipotTheRosenstielSchoolofMarineandAtmosphericScience,
UniversityofMiami
Createdwith{Remark.js}using{Markdown}+{MathJax}
Loading[MathJax]/jax/output/HTML-CSS/jax.js
Foreword
Statistics(noun,pluralinformbutsingularorpluralinconstruction)
1. Merriam-Webster:abranchofmathematicsdealingwiththecollection,analysis,interpretation,andpresentationofmassesofnumericaldata
Foreword
Statistics(noun,pluralinformbutsingularorpluralinconstruction)
1. Merriam-Webster:abranchofmathematicsdealingwiththecollection,analysis,interpretation,andpresentationofmassesofnumericaldata
2. OxforddictionnaryofEnglish:
a.thepracticeorscienceofcollectingandanalysingnumericaldatainlargequantities,especiallyforthepurposeofinferringproportionsinawholefromthoseinarepresentativesample.
b.acollectionofquantitativedata(e.g.Thestatisticsofthedataareunknown.)
Foreword
Statistics(noun,pluralinformbutsingularorpluralinconstruction)
1. Merriam-Webster:abranchofmathematicsdealingwiththecollection,analysis,interpretation,andpresentationofmassesofnumericaldata
2. OxforddictionnaryofEnglish:
a.thepracticeorscienceofcollectingandanalysingnumericaldatainlargequantities,especiallyforthepurposeofinferringproportionsinawholefromthoseinarepresentativesample.
b.acollectionofquantitativedata(e.g.Thestatisticsofthedataareunknown.)
Statistic(noun)asingletermordatuminacollectionofstatistics(e.g.Themeanofthedataiszero.)
References
[1]Bendat,J.S.,&Piersol,A.G.(2011).Randomdata:analysisandmeasurementprocedures(Vol.729).JohnWiley&Sons.
[2]Thomson,R.E.,&Emery,W.J.(2014).Dataanalysismethodsinphysicaloceanography.Newnes.dx.doi.org/10.1016/B978-0-12-387782-6.00003-X
[3]Taylor,J.(1997).Introductiontoerroranalysis,thestudyofuncertaintiesinphysicalmeasurements.
[4]Press,W.H.etal.(2007).Numericalrecipes3rdedition:Theartofscientificcomputing.Cambridgeuniversitypress.
[5]Kanji,G.K.(2006).100statisticaltests.Sage.
[6]vonStorch,H.andZwiers,F.W.(1999).StatisticalAnalysisinClimateResearch,CambridgeUniversityPress
Foreword
AquotefromDataAnalysisMethodsinPhysicalOceanography,ThompsonandEmery,ThirdEdition,2014:
"Statisticalmethodsareessentialtodeterminingthevalueofthedataandtodecidehowmuchofitcanbeconsideredusefulfortheintendedanalysis.Thisstatisticalapproacharisesfromthefundamentalcomplexityoftheocean,amultivariatesystemwithmanydegreesoffreedominwhichnonlineardynamicsandsamplinglimitationsmakeitdifficulttoseparatescalesofvariability."
Whatdomultivariate,degreesoffreedom,sampling,andvariabilitymeaninthiscontext?
Lecture1:Outline1. Introduction2. EstimationvsTruth3. FundamentalStatistics4. CommonDistributions5. Errors,Uncertainties,andhypothesistesting6. Extraslides
1.Introduction
Introduction
Inoceanographyandatmosphericscience,wehaveatourdisposalanumberofphysicaltheories,thatisequations,thatdescribethespatialandtemporalvariabilityoftheclimatesystem.
Introduction
Inoceanographyandatmosphericscience,wehaveatourdisposalanumberofphysicaltheories,thatisequations,thatdescribethespatialandtemporalvariabilityoftheclimatesystem.
Specifically,ingeophysicalfluiddynamics,theNavier-Stokesequationsdescribethemotionofafluidin2Dor3D.Yet,wedonotknowiftheseequationshavereasonablephysicalsolutions(Ifyoufigurethisout,there'sa$1MprizefromtheClayMathematicsInstitute).Assumingthattheydo,thentheocean,asanexample,canbeseenasbeingadeterministicsystem,whichmeansthatmathematicalexpressionscanbeusedtodescribecompletelythevelocityandpressurefield(e.g. ).p(x, t) = cos(ωt− k ⋅ x)p0
Introduction
Inoceanographyandatmosphericscience,wehaveatourdisposalanumberofphysicaltheories,thatisequations,thatdescribethespatialandtemporalvariabilityoftheclimatesystem.
Specifically,ingeophysicalfluiddynamics,theNavier-Stokesequationsdescribethemotionofafluidin2Dor3D.Yet,wedonotknowiftheseequationshavereasonablephysicalsolutions(Ifyoufigurethisout,there'sa$1MprizefromtheClayMathematicsInstitute).Assumingthattheydo,thentheocean,asanexample,canbeseenasbeingadeterministicsystem,whichmeansthatmathematicalexpressionscanbeusedtodescribecompletelythevelocityandpressurefield(e.g. ).
Yet,evenifwedonotknowtheanalyticalsolutions,wecandiscretizetheequationswithinacomputermodelstoobtainapproximatesolutionsdescribingtheflowdeterministically...(intheory).
p(x, t) = cos(ωt− k ⋅ x)p0
Introduction
Adeterministicapproachisnotrealisticbecauseofthreecommonlyacknowledgedreasons.Theoceanandtheatmospherearecomplex,nonlinear,andunpredictable.
Introduction
Adeterministicapproachisnotrealisticbecauseofthreecommonlyacknowledgedreasons.Theoceanandtheatmospherearecomplex,nonlinear,andunpredictable.
Asanexample,thereisanestimated watermoleculesintheocean(that'scomplexity)sothattherearetoomanyvariables,andtoomanyinitialandboundaryconditionstobespecified,(jointlyformingthenumberofdegreesoffreedomofthesystem)inordertosolveallequationsnumericallyinacomputermodel.Becausemanyvariablescannotbeobserved,orareunspecifiedatthestartofasimulation,outcomeswillappearrandomtotheobserver(that'sunpredictability).
4.7 × 1046
Introduction
Adeterministicapproachisnotrealisticbecauseofthreecommonlyacknowledgedreasons.Theoceanandtheatmospherearecomplex,nonlinear,andunpredictable.
Asanexample,thereisanestimated watermoleculesintheocean(that'scomplexity)sothattherearetoomanyvariables,andtoomanyinitialandboundaryconditionstobespecified,(jointlyformingthenumberofdegreesoffreedomofthesystem)inordertosolveallequationsnumericallyinacomputermodel.Becausemanyvariablescannotbeobserved,orareunspecifiedatthestartofasimulation,outcomeswillappearrandomtotheobserver(that'sunpredictability).
Finally,oceanandatmospherearenonlinearwhichmeansthatyoucannotreallyfindaportionofthesystem(e.g.surfacegravitywaves)withafinitenumberofdegreesoffreedomwhoseevolutionisisolatedandcanbemadedeterministic.Unknownperturbationswillrendertheobservationstocontainrandomness,ornoise.
4.7 × 1046
Introduction
Onceweacceptthattheclimateisnotapurelydeterministicsystem,butcontainrandomness,wecanrelyonasuiteoftoolsespeciallyapplicabletostochasticsystemsorprocesses.
stochastic(adjective,technical)
havingarandomprobabilitydistributionorpatternthatmaybeanalysedstatisticallybutmaynotbepredictedprecisely.
Asaconsequence,intherestofthislecture,wewilloftendiscussrandomvariables(hereafterr.v.),thatisvariablestowhichstatisticaltheorycanbeapplied.Inaddition,wewilltaketheapproachthatoursystem,orthatourobservationaldata,canbeseparatedintosignalplusnoise.Asanexample,estimationoftheseasonalcycleofoceantemperature(thesoughtaftersignal)isdisturbedbychangesduetooceancurrents,butalsochangesduetotheimperfectionsofyourtemperaturesensors,etc.
2.Estimationvstruth
Estimationvstruth
Weareinthebusinessofestimation:wetrytodescribeand/orunderstandtheclimatesystembyestimatingthevalueofarandomvariable.Thisr.v.maybeaphysicalvariablesuchasairtemperature,watervapor,seaicearea,seasurfacetemperature,sealevel,snowcover,glaciervolume,CO concentration,etc.2
Estimationvstruth
Weareinthebusinessofestimation:wetrytodescribeand/orunderstandtheclimatesystembyestimatingthevalueofarandomvariable.Thisr.v.maybeaphysicalvariablesuchasairtemperature,watervapor,seaicearea,seasurfacetemperature,sealevel,snowcover,glaciervolume,CO concentration,etc.
Oritmaybeavariablederivedfromoneorseveralotherr.v.,inessenceaparameter.Thisparameterisitselfar.v.Asanexampleoceanheatcontent,thedailymeantemperature,thedecadaltemperaturetrend,theacoustictraveltimeinwater,theamplitudeoftheseasonalcyleoftemperature,theadiabaticlapserate,thepowerspectraldensityfunctionofvelocity,theprecipitationrate,etc.
2
Estimationvstruth
Weareinthebusinessofestimation:wetrytodescribeand/orunderstandtheclimatesystembyestimatingthevalueofarandomvariable.Thisr.v.maybeaphysicalvariablesuchasairtemperature,watervapor,seaicearea,seasurfacetemperature,sealevel,snowcover,glaciervolume,CO concentration,etc.
Oritmaybeavariablederivedfromoneorseveralotherr.v.,inessenceaparameter.Thisparameterisitselfar.v.Asanexampleoceanheatcontent,thedailymeantemperature,thedecadaltemperaturetrend,theacoustictraveltimeinwater,theamplitudeoftheseasonalcyleoftemperature,theadiabaticlapserate,thepowerspectraldensityfunctionofvelocity,theprecipitationrate,etc.
Thedistinctionbetweenvariableandparameterismaybesemantic.Inanycase,let'scall ar.v.ofinterest.
2
ϕ
Estimationvstruth
Unfortunatelly,itislikelythatwewillneverknow exactly,butonlyaccessanestimatethatwewillnote ( "hat").Thisestimatemeansweuseagivenmethodoragiveninstrumenttomeasureorcalculate .
ϕ
ϕ̂ ϕ
ϕ
Estimationvstruth
Unfortunatelly,itislikelythatwewillneverknow exactly,butonlyaccessanestimatethatwewillnote ( "hat").Thisestimatemeansweuseagivenmethodoragiveninstrumenttomeasureorcalculate .
Asanexample,wewanttoknowthetemperatureoftheroom.Wecan:
1. useonetemperaturesensoratonefixedlocation(inthemiddleoftheroom),repeatedlythroughtime, times.
2. use temperaturesensors,once.3. useonetemperaturesensor,usedrepeatedly times,eachtime
inadifferentcorneroftheroom4. etc.
Eachoneofthesemethodsleadstooneestimateof"thetemperatureoftheroom".
ϕ
ϕ̂ ϕ
ϕ
NN
N
Expectationofanestimate
Let'sassumethatwedesignanexperimentandobtain ,repeatedlytimes.Theexpectationvalueof ,denoted ,is
where istheestimatefromthe -thexperiment.Unfortunatelly,sincewemusthave ,theexpectationcannotbeknown.
ϕ̂
N ϕ̂ E[ ]ϕ̂
E[ ] =ϕ̂ limN→∞
1N
∑n=0
N
ϕ̂n
ϕ̂n nN →∞
Expectationofanestimate
Let'sassumethatwedesignanexperimentandobtain ,repeatedlytimes.Theexpectationvalueof ,denoted ,is
where istheestimatefromthe -thexperiment.Unfortunatelly,sincewemusthave ,theexpectationcannotbeknown.
However,ifwecanreasonablymakesomeassumptionsaboutthestatisticsofther.v.itself,and/orifweknowthespecificationsofourinstruments,thenwecansometimesderivetheexpectationofourestimator.
ϕ̂
N ϕ̂ E[ ]ϕ̂
E[ ] =ϕ̂ limN→∞
1N
∑n=0
N
ϕ̂n
ϕ̂n nN →∞
Expectationofanestimate
Let'sassumethatwedesignanexperimentandobtain ,repeatedlytimes.Theexpectationvalueof ,denoted ,is
where istheestimatefromthe -thexperiment.Unfortunatelly,sincewemusthave ,theexpectationcannotbeknown.
However,ifwecanreasonablymakesomeassumptionsaboutthestatisticsofther.v.itself,and/orifweknowthespecificationsofourinstruments,thenwecansometimesderivetheexpectationofourestimator.
Whyisthisimportant?
ϕ̂
N ϕ̂ E[ ]ϕ̂
E[ ] =ϕ̂ limN→∞
1N
∑n=0
N
ϕ̂n
ϕ̂n nN →∞
Expectationofanestimate
Let'sassumethatwedesignanexperimentandobtain ,repeatedlytimes.Theexpectationvalueof ,denoted ,is
where istheestimatefromthe -thexperiment.Unfortunatelly,sincewemusthave ,theexpectationcannotbeknown.
However,ifwecanreasonablymakesomeassumptionsaboutthestatisticsofther.v.itself,and/orifweknowthespecificationsofourinstruments,thenwecansometimesderivetheexpectationofourestimator.
Whyisthisimportant?Toassesshowgoodourestimateis.
ϕ̂
N ϕ̂ E[ ]ϕ̂
E[ ] =ϕ̂ limN→∞
1N
∑n=0
N
ϕ̂n
ϕ̂n nN →∞
Expectationofanestimate
Let'sassumethatwehaveaccesstotheexpectation ofanestimator ofther.v. .
E[ ]ϕ̂ϕ̂ ϕ
Expectationofanestimate
Let'sassumethatwehaveaccesstotheexpectation ofanestimator ofther.v. .
Ifwearelucky, ,andtheestimator issaidtobeunbiased.
E[ ]ϕ̂ϕ̂ ϕ
E[ ] = ϕϕ̂ ϕ̂
Expectationofanestimate
Let'sassumethatwehaveaccesstotheexpectation ofanestimator ofther.v. .
Ifwearelucky, ,andtheestimator issaidtobeunbiased.Otherwise,itissaidtobebiasedandwedefinethebiasoftheestimator:
whichconstitutesasystematicerroroftheestimate.
E[ ]ϕ̂ϕ̂ ϕ
E[ ] = ϕϕ̂ ϕ̂
b[ ] = E[ ] − ϕ,ϕ̂ ϕ̂
Expectationofanestimate
Let'sassumethatwehaveaccesstotheexpectation ofanestimator ofther.v. .
Ifwearelucky, ,andtheestimator issaidtobeunbiased.Otherwise,itissaidtobebiasedandwedefinethebiasoftheestimator:
whichconstitutesasystematicerroroftheestimate.
Inaddition,thevalueoftheestimatorwillchangefromoneexperimenttothenext,sowedefinethevarianceoftheestimator,denoted ,as
whichconstitutestherandomerroroftheestimate.
E[ ]ϕ̂ϕ̂ ϕ
E[ ] = ϕϕ̂ ϕ̂
b[ ] = E[ ] − ϕ,ϕ̂ ϕ̂
Var[ ]ϕ̂
Var[ ] = E[( − E[ ] ],ϕ̂ ϕ̂ ϕ̂ )2
Expectationofanestimate
Thebiasandthevarianceoftheestimatorcontributesbothtoitstotalerror,whichcanbeassessedbythemeansquareerror(MSE):
usuallyreportedastherootmeansquareerror:
Anothersometimesusefulquantityisthenormalizedrmserror:
whichisunitlessandcanbereportedasapercentage.
MSE[ ] = E[( − ϕ ] = Var[ ] + (b[ ] ,ϕ̂ ϕ̂ )2 ϕ̂ ϕ̂ )2
RMS = =MSE[ ]ϕ̂− −−−−−−√ Var[ ] + (b[ ]ϕ̂ ϕ̂ )2
− −−−−−−−−−−−−√
ε[ϕ] = for ϕ ≠ 0,E[( − ϕ ]ϕ̂ )2− −−−−−−−−√
ϕ
Estimationvstruth
FromtheInternationalOrganizationforStandardization(ISO)publication5725-1:1994Accuracy(truenessandprecision)ofmeasurementmethodsandresults—Part1:Generalprinciplesanddefinitions
Introduction0.1ISO5725usestwoterms"trueness"and"precision"todescribetheaccuracyofameasurementmethod."Trueness"referstotheclosenessofagreementbetweenthearithmeticmeanofalargenumberoftestresultsandthetrueoracceptedreferencevalue."Precision"referstotheclosenessofagreementbetweentestresults.
Example
SpecificationsheetsofSeabird911CTDPlussensors:
Example
SpecificationsheetsofSeabird911CTDPlussensors:
Aninterpretationisthattheaccuracyisthetotalerror,orerrorwhichisoriginallyonlytherandomerror.Thestabilityimpliesthatthebias,originallyzero,increaseswithtime.
RMS
2.Fundamentalstatistics
Fundamentalstatistics
Let'sconsiderarandomvariable ,forwhichweobtain(thatismeasure,calculate)values ,dependentonanindex alongagivendimension,orscale(e.g.temperatureasafunctionoftime,sealevelalongasatellitetrack,oxygenconcentrationasafunctionofdepthetc).
xXn n
Fundamentalstatistics
Let'sconsiderarandomvariable ,forwhichweobtain(thatismeasure,calculate)values ,dependentonanindex alongagivendimension,orscale(e.g.temperatureasafunctionoftime,sealevelalongasatellitetrack,oxygenconcentrationasafunctionofdepthetc).
Statisticaltheoryoftenconsidersr.v.thatarecontinuous,asin.Sincewearetypicallydealingwithdigitalornumericaldata
thatarediscrete,wewilltakeadiscreteapproachinthiscourse,asin where isthestepoftherecord.
Sometimes,wewillneedtoreverttocontinuousnotationswhenneeded;asanexample:
xXn n
X(t)
= X( ) = X(nΔt)Xn tn Δt
Δt⟺ X(t)dt, T = NΔt∑n=0
N
Xn ∫ T
0
Mean
Thefirststatisticalquantitytoconsideristhetruemean,populationmean,orexpectationof :x
≡ E[X] =μx limN→∞
1N
∑n=1
N
Xn
Mean
Thefirststatisticalquantitytoconsideristhetruemean,populationmean,orexpectationof :
Unfortunatelly,weonlyhaveafinitenumbersofestimatessowecomputethesamplemean
wherethelastequalitymeansthatthesamplemeanisanestimatorofthetruemeanof .
x
≡ E[X] =μx limN→∞
1N
∑n=1
N
Xn
, n = 1,…,NXn
≡ =X¯ ¯¯̄ 1
N∑n=1
N
Xn μ̂x
x
Variance
Thenextstatisticalquantityofinterestisthevarianceof :
iscalledthestandarddeviation.
x
≡ E[(X − ].σ2x μx)2
=σx σ2x−−√
Variance
Thenextstatisticalquantityofinterestisthevarianceof :
iscalledthestandarddeviation.
Anestimatorof isthesamplevariance :
Notethefactor insteadof inthepreviousexpression;seesection4.1ofreference[1]foranexplanation.
x
≡ E[(X − ].σ2x μx)2
=σx σ2x−−√
σ2x s2x
= = ( − =s2x σ̂2x
1N − 1
∑n=1
N
Xn X¯ ¯¯̄ )2
1N − 1
∑n=1
N ( − )Xn
1N
∑n=1
N
Xn
2
1N−1
1N
Mean
Let'sgobacktotheestimator of .Itisanestimator,soisitbiased?Howmuchdoesitvary?
X¯ ¯¯̄
μx
Mean
Let'sgobacktotheestimator of .Itisanestimator,soisitbiased?Howmuchdoesitvary?
Theexpectationisanmathematicaloperatorwhichislinear:
X¯ ¯¯̄
μx
E[aX + bY ] = aE[X] + bE[Y ]
Mean
Let'sgobacktotheestimator of .Itisanestimator,soisitbiased?Howmuchdoesitvary?
Theexpectationisanmathematicaloperatorwhichislinear:
Let'susethisrulefor :
Notethat isadefinition,validforall !
X¯ ¯¯̄
μx
E[aX + bY ] = aE[X] + bE[Y ]
X¯ ¯¯̄
E[ ] = E[ ] = E[ ] = (N ) =X¯ ¯¯̄ 1
N∑n=1
N
Xn
1N
∑n=1
N
Xn
1N
μx μx
E[ ] =Xn μx n
Mean
Let'sgobacktotheestimator of .Itisanestimator,soisitbiased?Howmuchdoesitvary?
Theexpectationisanmathematicaloperatorwhichislinear:
Let'susethisrulefor :
Notethat isadefinition,validforall !
Since thenitissaidthat isanunbiasedestimatorof(seeslideonbias).Thismeansthatthemoreobservationsof
weobtain,themoreaccuratetheestimationofthemeanwillbe.
X¯ ¯¯̄
μx
E[aX + bY ] = aE[X] + bE[Y ]
X¯ ¯¯̄
E[ ] = E[ ] = E[ ] = (N ) =X¯ ¯¯̄ 1
N∑n=1
N
Xn
1N
∑n=1
N
Xn
1N
μx μx
E[ ] =Xn μx n
E[ ] =X¯ ¯¯̄
μx X¯ ¯¯̄
μx x
Mean
Let'sgobacktotheestimator of .Itisanestimator,soisitbiased?Howmuchdoesitvary?
Theexpectationisanmathematicaloperatorwhichislinear:
Let'susethisrulefor :
Notethat isadefinition,validforall !
Since thenitissaidthat isanunbiasedestimatorof(seeslideonbias).Thismeansthatthemoreobservationsof
weobtain,themoreaccuratetheestimationofthemeanwillbe.Thatdoesnotmeanthat isfreeoferrors...
X¯ ¯¯̄
μx
E[aX + bY ] = aE[X] + bE[Y ]
X¯ ¯¯̄
E[ ] = E[ ] = E[ ] = (N ) =X¯ ¯¯̄ 1
N∑n=1
N
Xn
1N
∑n=1
N
Xn
1N
μx μx
E[ ] =Xn μx n
E[ ] =X¯ ¯¯̄
μx X¯ ¯¯̄
μx x
X¯ ¯¯̄
Mean
Howmuchdoesthesamplemeanestimatorvary?Recallthedefinitionofthe ofanestimator;for itisMSE X
¯ ¯¯̄
E[( − ] = Var[ ] + (b[ ] = Var[ ] + 0X¯ ¯¯̄
μx)2 X¯ ¯¯̄
X¯ ¯¯̄ )2 X
¯ ¯¯̄
Mean
Howmuchdoesthesamplemeanestimatorvary?Recallthedefinitionofthe ofanestimator;for itis
Undersomeassumption(thatthe areindependent),itcanbeshown(yourhomework,orsection4.1ofreference[1])that
Thevarianceofthemeanestimatoristhetruevarianceofthedata( )dividedbythenumberofobservation( ).
MSE X¯ ¯¯̄
E[( − ] = Var[ ] + (b[ ] = Var[ ] + 0X¯ ¯¯̄
μx)2 X¯ ¯¯̄
X¯ ¯¯̄ )2 X
¯ ¯¯̄
Xn
Var[ ] = =X¯ ¯¯̄ σ2x
N
Var[X]N
σ2x N
Mean
Howmuchdoesthesamplemeanestimatorvary?Recallthedefinitionofthe ofanestimator;for itis
Undersomeassumption(thatthe areindependent),itcanbeshown(yourhomework,orsection4.1ofreference[1])that
Thevarianceofthemeanestimatoristhetruevarianceofthedata( )dividedbythenumberofobservation( ).Sincewetypicallydonotknowthetruevariance,wesubstituteforthesamplevariancetoobtainthestandarderrorofthemean,orrandomerrorforthemean:
MSE X¯ ¯¯̄
E[( − ] = Var[ ] + (b[ ] = Var[ ] + 0X¯ ¯¯̄
μx)2 X¯ ¯¯̄
X¯ ¯¯̄ )2 X
¯ ¯¯̄
Xn
Var[ ] = =X¯ ¯¯̄ σ2x
N
Var[X]N
σ2x N
s.e.[ ] ≡ = =X¯ ¯¯̄ Var[ ]X¯ ¯¯̄− −−−−−√ σ̂
2x
N
−−−
√ sx
N−−√
Mean
isameasureoftheuncertainty,orofourcapabilityofestimatingthemeanvalueof .s.e.[ ]X¯ ¯¯̄
x
ExamplefromBeal,L.M.etal.(2015),CapturingtheTransportVariabilityofaWesternBoundaryJet:ResultsfromtheAgulhasCurrentTime-seriesexperiment(ACT),J.Phys.Oceanogr.,45,1302-1324,doi:10.1175/JPO-D-14-0119.1
Mean
isameasureoftheuncertainty,orofourcapabilityofestimatingthemeanvalueof .s.e.[ ]X¯ ¯¯̄
x
Example
AgulhascurrentboundarytransportfromBeal,L.M.andS.Elipot,BroadeningnotstrengtheningoftheAgulhasCurrentsincetheearly1990s,Nature,540,570573,doi:10.1038/nature19853
Dependingonyourdata,yourpointofview,andyourinterests,themeanandthevariancemaytellyoualot,orlittle,aboutyourdata.
Dependingonyourdata,yourpointofview,andyourinterests,themeanandthevariancemaytellyoualot,orlittle,aboutyourdata.Thus,youmaywanttolookatthefrequencydistributionplot,orhistogram,whichisacountofyourdatavaluesinanumberof
discreteintervals.
Histogram
Thereisnogeneralrule(onlyrecommendations)onhowtochoosethesizeofthebinsorthenumberofbinstobeused.
Histogram
Thereisnogeneralrule(onlyrecommendations)onhowtochoosethesizeofthebins,orthenumberofbinstobeused.
Histogram
Thefrequencyofoccurencesofagivenvalue isquantityderivedfrom andcanbeitselfestimated.Theredlineinthisplotshowsakernelestimateofthehistogram(seepracticalsessionthisafternoon).
x = ax
Buttheoverallvaluesof stilldependonthewidth ofthebins.Seethisexamplewith and
.
Probabilityfunction
Let'sconsidertheprobability,orrelativecountofoccurences,toobtainavalue intheinterval .Thisdefinesaprobabilityfunction:
X [ , ]xk−1 xk
( ≤ X ≤ ) = = , countin[ , ]P ∗ xk−1 xk P ∗k
ck
Nck xk−1 xk
= 1, k:intervalindex∑k
P ∗k
P ∗kΔXΔX = 10
ΔX = 5
PDFandCFD
Let'sconsiderinsteadthediscreteprobabilitydensityfunction,orPDF:
P ( ≤ X ≤ ) = = , ΔX = −xk−1 xk Pkck
NΔXxk xk−1
PDFandCFD
Let'sconsiderinsteadthediscreteprobabilitydensityfunction,orPDF:
andthediscretecumulative(probability)distributionfunction,orCDF:
P ( ≤ X ≤ ) = = , ΔX = −xk−1 xk Pkck
NΔXxk xk−1
F ( ) = ( ≤ X ≤ ) = (X ≤ ) =xk P ∗ x0 xk P ∗ xk ∑i≤k
P ∗i
PDFandCFD
Let'sconsiderinsteadthediscreteprobabilitydensityfunction,orPDF:
andthediscretecumulative(probability)distributionfunction,orCDF:
Sinceallthevalues arecontainedbetween and :
P ( ≤ X ≤ ) = = , ΔX = −xk−1 xk Pkck
NΔXxk xk−1
F ( ) = ( ≤ X ≤ ) = (X ≤ ) =xk P ∗ x0 xk P ∗ xk ∑i≤k
P ∗i
X min[X] max[X]
P (min[X] ≤ X ≤ max[X])
F (max[X])
=
=
1 or ΔX = ( )ΔX = 1∑k
PkN
NΔX
1
PDFandCFD
TheoverallvaluesofthePDFandCDFdonotdependonthebinwidth.However,theirresolution(ordetailedshapes)do.
PDFandCDF
Asonereducesthesizeofthebins, ,thecontinuousPDFandCDFareapproximated:
withtheproperty
ΔX→ 0
P (x ≤ X ≤ x+ΔX)⟶ p(x) =df(x)dx
F (x) = P (X ≤ x)ΔX⟶ f(x) = p( ) d∫ x
−∞x′ x′
ΔX = 1⟶ p(x) dx = f(+∞) − f(−∞) = 1 − 0 = 1∑k
Pk ∫ +∞−∞
PDFandstatistics
Wecannowgivesomeformaldefinitions:
Thelastexpressiondefines the -thcentralmomentof .
μx
μn
≡
≡
xp(x) dx, ≡ (x− p(x) dx∫ +∞−∞
σ2x ∫ +∞−∞
μx)2
(x− p(x) dx∫ +∞−∞
μx)n
(x)μn n x
PDFandstatistics
Wecannowgivesomeformaldefinitions:
Thelastexpressiondefines the -thcentralmomentof .
Thediscreteequivalentis
μx
μn
≡
≡
xp(x) dx, ≡ (x− p(x) dx∫ +∞−∞
σ2x ∫ +∞−∞
μx)2
(x− p(x) dx∫ +∞−∞
μx)n
(x)μn n x
≡ E[(X − ]μn μx)n
PDFandstatistics
Wecannowgivesomeformaldefinitions:
Thelastexpressiondefines the -thcentralmomentof .
Thediscreteequivalentis
Thesecondcentralmoment isthevariance,bydefinition.Themeanisthefirstmomentabouttheorigin( ),thatis
μx
μn
≡
≡
xp(x) dx, ≡ (x− p(x) dx∫ +∞−∞
σ2x ∫ +∞−∞
μx)2
(x− p(x) dx∫ +∞−∞
μx)n
(x)μn n x
≡ E[(X − ]μn μx)n
μ20
= (x− 0)p(x) dxμx ∫ +∞−∞
3rdmoment:skewness
Thethirdnormalizedcentralmomentiscalledtheskewness.ItdescribesthetendencyforanasymmetrybetweenpositiveexcursionsandnegativeexcursionsofthePDF:
One(biased)estimatoris
Positive SkewNegative Skew
≡ =γxμ3
(μ2)3/2μ3
(σ2x )3/2
≡γ̂x( −1
N∑N
n=1 Xn X¯ ¯¯̄ )3
[ ( − ]1N
∑Nn=1 Xn X
¯ ¯¯̄ )23/2
4thmoment:kurtosis
Thefourthnormalizedcentralmomentiscalledthekurtosis.Itdescribesthepeakedness(concentrationnear ),oratendencyforlongtails(concentrationfarfrom ):
One(biased)estimatoris
Becausethekurtosisofanormal,orGaussian,distributionisequalto ,oftentheexcesskurtosis isconsidered.SeeMoors(1986),“TheMeaningofKurtosis:DarlingtonReexamined”.
μxμx
≡ =κxμ4
(μ2)2μ4
(σ2x)2
≡κ̂x( −1
N∑N
n=1 Xn X¯ ¯¯̄ )4
[ ( − ]1N
∑Nn=1 Xn X
¯ ¯¯̄ )24/2
3 − 3κx
IllustrationofKurtosis
Distributionscorrespondingtodifferentvaluesofexcesskurtosis.
Positiveexcesskurtosiscorrespondstolongtailsandpeakedness.
Kurtosis:example
Hughesetal.(2010)Identificationofjetsandmixingbarriersfromsealevelandvorticitymeasurementsusingsimplestatistics
TheyusedthestatisticsandPDFofsealevelanomaliesandderivedrelativevorticitytoshowthatstrongoceanicjetstendtobeidentifiedbyazerocontourinskewnesscoincidingwithalowvalueofkurtosis.
Whyisthisuseful?
Ithinkthatplottingthehistogramofyourdata,andfurtherestimatingindetailitsPDF,givesyouaholistic,orglobalview,ofthedatapopulationfromwhichyoursampleisdrawn.MaybeyouwillfindthattheestimatedPDFofasampleonagivendayisdifferentfromtheestimatedPDFonanotherday...
Whyisthisuseful?
Ithinkthatplottingthehistogramofyourdata,andfurtherestimatingindetailitsPDF,givesyouaholistic,orglobalview,ofthedatapopulationfromwhichyoursampleisdrawn.MaybeyouwillfindthattheestimatedPDFofasampleonagivendayisdifferentfromtheestimatedPDFonanotherday...
Italsoallowsyoutoanswerquestionssuchas:whatisthemostprobablevalueofthedata?Howoftendoweobserveextremevalues?etc.
Whyisthisuseful?
Ithinkthatplottingthehistogramofyourdata,andfurtherestimatingindetailitsPDF,givesyouaholistic,orglobalview,ofthedatapopulationfromwhichyoursampleisdrawn.MaybeyouwillfindthattheestimatedPDFofasampleonagivendayisdifferentfromtheestimatedPDFonanotherday...
Italsoallowsyoutoanswerquestionssuchas:whatisthemostprobablevalueofthedata?Howoftendoweobserveextremevalues?etc.
Inaddition,theknowledgeofyourdatadistribution,and/orachoiceofamodelforyourdatadistributionwillallowyoutodefineconfidenceintervalsforyourestimatedparameters,andtoproceedtoconducthypothesistestinginyourresearch.
Whyisthisuseful?
Ithinkthatplottingthehistogramofyourdata,andfurtherestimatingindetailitsPDF,givesyouaholistic,orglobalview,ofthedatapopulationfromwhichyoursampleisdrawn.MaybeyouwillfindthattheestimatedPDFofasampleonagivendayisdifferentfromtheestimatedPDFonanotherday...
Italsoallowsyoutoanswerquestionssuchas:whatisthemostprobablevalueofthedata?Howoftendoweobserveextremevalues?etc.
Inaddition,theknowledgeofyourdatadistribution,and/orachoiceofamodelforyourdatadistributionwillallowyoutodefineconfidenceintervalsforyourestimatedparameters,andtoproceedtoconducthypothesistestinginyourresearch.
Beforelookingatthis,weneedtoreviewthetheoryofvariousprobabilitydistributionfunctions.
Butfirstanexample
Animation(#joyplot)byGavinSchmidtshowingglobaltemperaturedistributionin10-yrwindows,seehisblogpostonrealclimate.org.DatafromtheNASAGISSSurfaceTemperatureAnalysis(GISTEMP)dataset.
3.CommonDistributions
z = rand(1000,1);x1 = -1; x2 = 2;x = (x2-x1)*z + x1;
Theuniformdistribution
Arandomvariable thatisuniformlydistributedbetween andhasforPDF:
Themeanofthisdistributionis anditsstandarddeviationis
x x1x2
p(x) =
=
,1−x2 x10,
≤ x ≤x1 x2
otherwise
( + )/2x1 x2( − )/(2 )x2 x1 3–√
= 0.5, = = 0.8660μx σx2 − −1
2 3)(√
Thenormaldistribution(orGaussian)
Arandomvariable thatisnormallydistributedwithmean andstandarddeviation hasforPDF:
x μxσx
p(x) = exp[− ] ≡ N ( , )1
σx 2π−−√
(x− μx)2
2σ2xμx σx
Thenormaldistribution(orGaussian)
Arandomvariable thatisnormallydistributedwithmean andstandarddeviation hasforPDF:
If thenthevariable
Hereafter, willmean"distributedlike".
x μxσx
p(x) = exp[− ] ≡ N ( , )1
σx 2π−−√
(x− μx)2
2σ2xμx σx
x ∼ N ( , )μx σx z = ∼ N (0, 1)x− μxσx
∼
Thenormaldistribution(orGaussian)
Arandomvariable thatisnormallydistributedwithmean andstandarddeviation hasforPDF:
If thenthevariable
Hereafter, willmean"distributedlike".
InMatlab,thefollowinggeneratesadatavector containing1000samplesfromar.v. ,andadatavector fromar.v.
z = randn(1000,1);x = 1.35*z + 2.1;
x μxσx
p(x) = exp[− ] ≡ N ( , )1
σx 2π−−√
(x− μx)2
2σ2xμx σx
x ∼ N ( , )μx σx z = ∼ N (0, 1)x− μxσx
∼
z∼ N (0, 1) x
∼ N (2.1, 1.35)
Thenormaldistribution
TheredcurveisthetheoreticalnormalPDF .Thehistogramiscomputedfromasampleofsize .
N (2.1, 1.35)N = 1000
Thenormaldistribution
Thenormaldistributionisofparticularimportancebecauseofthecentrallimittheoremwhichassertsroughlythatthenormaldistributionistheresultofthesumofalargenumberofindependentrandomvariableactingtogether.
Thenormaldistribution
Thenormaldistributionisofparticularimportancebecauseofthecentrallimittheoremwhichassertsroughlythatthenormaldistributionistheresultofthesumofalargenumberofindependentrandomvariableactingtogether.
Tobemorespecific,let be independentr.v.withindividualmeans andvariances .Nowconsiderthenewr.v.
Thecentrallimittheoremstatesthat,as , willbenormallydistributedwithmean andvariance .Inpractice,theCLTisusedfor "large".
, ,… , ,…,x1 x2 xi xN Nμi σ2i
x = + +…+ .a1x1 a2x2 aN xN
N → +∞ x∑k akμk ∑k a
2kσ2k
N
Thenormaldistribution
Thenormaldistributionisofparticularimportancebecauseofthecentrallimittheoremwhichassertsroughlythatthenormaldistributionistheresultofthesumofalargenumberofindependentrandomvariableactingtogether.
Tobemorespecific,let be independentr.v.withindividualmeans andvariances .Nowconsiderthenewr.v.
Thecentrallimittheoremstatesthat,as , willbenormallydistributedwithmean andvariance .Inpractice,theCLTisusedfor "large".
Infact,wehavealreadyusedthecentrallimittheorem...
, ,… , ,…,x1 x2 xi xN Nμi σ2i
x = + +…+ .a1x1 a2x2 aN xN
N → +∞ x∑k akμk ∑k a
2kσ2k
N
ThenormaldistributionandtheCLT
Recallthatthesamplemeanoftherecord isdefinedas
Here, canbeseenasanewr.v.whichisthesumofindividualr.v.(forwhichwehaveonlyonevalue)withthesamepopulationmean
,andsamepopulationvariance .
, ,… ,X1 X2 XN
= = ( ) + ( ) +…+ ( )X¯ ¯¯̄ 1
N∑n=1
N
Xn
1N
X11N
X21N
XN
X¯ ¯¯̄
μx σ2x
ThenormaldistributionandtheCLT
Recallthatthesamplemeanoftherecord isdefinedas
Here, canbeseenasanewr.v.whichisthesumofindividualr.v.(forwhichwehaveonlyonevalue)withthesamepopulationmean
,andsamepopulationvariance .
Hence,theCLTstatesthat,for largeenough, isnormallydistributedwithmean andvariance ,wherethislastresultwasgivenpreviouslywithoutexplanation.
, ,… ,X1 X2 XN
= = ( ) + ( ) +…+ ( )X¯ ¯¯̄ 1
N∑n=1
N
Xn
1N
X11N
X21N
XN
X¯ ¯¯̄
μx σ2x
N X¯ ¯¯̄
(1/N ) = (N/N ) =∑Nn=1 μx μx μx
(1/N = (N/ ) = (1/N )∑Nn=1 )2σ2x N 2 σ2x σ2x
The distribution
Let be independentr.v. .Letanewr.v.definedas
χ2n
, ,… ,z1 z2 zn n ∼ N (0, 1)
= + +…+χ2 z21 z22 z2n
The distribution
Let be independentr.v. .Letanewr.v.definedas
Itissaidthatthisr.v.isa"chi-squared"variablewith degreesoffreedom(DOF).
χ2n
, ,… ,z1 z2 zn n ∼ N (0, 1)
= + +…+χ2 z21 z22 z2n
n
The distribution
Let be independentr.v. .Letanewr.v.definedas
Itissaidthatthisr.v.isa"chi-squared"variablewith degreesoffreedom(DOF).Suchr.v.hasforPDF:
Themeanofthisdistributionis anditsvarianceis .
χ2n
, ,… ,z1 z2 zn n ∼ N (0, 1)
= + +…+χ2 z21 z22 z2n
n
p(x) = ≡ (n)orexp(− )x −1
n
2x2
Γ(n/2)2n
2
χ2 χ2n
n 2n
The distribution
Let be independentr.v. .Letanewr.v.definedas
Itissaidthatthisr.v.isa"chi-squared"variablewith degreesoffreedom(DOF).Suchr.v.hasforPDF:
Themeanofthisdistributionis anditsvarianceis .
Examplesof variablesarepowerspectraldensityfunctionestimates(seeLecture4ontimeseriesanalysis)orvarianceestimates.Thesamplevarianceof is
χ2n
, ,… ,z1 z2 zn n ∼ N (0, 1)
= + +…+χ2 z21 z22 z2n
n
p(x) = ≡ (n)orexp(− )x −1
n
2x2
Γ(n/2)2n
2
χ2 χ2n
n 2n
χ2
x ∼ N ( , )μx σx
∼ (N − 1)s2xσ2xN−1χ2
The distribution
Inthisexample,theredcurveisthetheoretical PDFfor .Thehistogramiscomputedfromasampleofsize .
χ2n
χ2n n = 14N = 1000
TheFdistribution
If and thenthefollowingvariable
is distributedwith and degreesoffreedom.Themathematicalexpressionforthisdistributionisverycomplicatedandnotveryusefulhere,seereference[1].
Themeanvalueofthe distributionis for anditsvarianceis
The distributionarisesasanexamplewhentestingfortheequalityoftwopopulationvariances(seepracticalthisafternoon).
x ∼ χ2n y ∼ χ2m
∼ F (n,m)x/ny/m
F n m
F m/(m− 2) m > 2
form > 42 (n+m− 2)m2
n(m− 2 (m− 4))2
F
TheStudent's distribution
Let and betwoindependentr.v.with and .Letbeanewr.v.definedas
Itissaidthatthisr.v.isaStudent's variablewith degreesoffreedom(DOF).
t
y z y ∼ χ2n z ∼ N (0, 1)
t =z
y
n
−−√t n
TheStudent's distribution
Let and betwoindependentr.v.with and .Letbeanewr.v.definedas
Itissaidthatthisr.v.isaStudent's variablewith degreesoffreedom(DOF).Suchr.v.hasforPDF:
Themeanis for andthevarianceis for .
Anexampleof distributedr.v.istheestimateofthemeanofapopulationwithunknownvariance,aswewillseelater.
t
y z y ∼ χ2n z ∼ N (0, 1)
t =z
y
n
−−√t n
p(x) = ≡ t(n)orΓ[(n+ 1)/2]Γ(n/2)πn−−−√
[1 + ]x2
n
n+12
tn
0 n > 0 n
n−2 n > 2
t
TheStudent's distribution
Inthisexample,theredcurveisthetheoretical PDFfor .Thehistogramiscomputedfromasampleofsize .TheblackcurveisthetheoreticalPDF .
t
tn n = 10N = 1000
N (0, 1)
TheGamma( )distributionfamily
Infact,the distributionandtheexponentialdistributionaretwoparticularcasesoftheGamma( )distributionfamily.Arandomvariable thatisGammadistributedwithparameters and hasforPDF:
withthe functiondefinedas
exponentialdistribution
distribution
Γ
χ2nΓ
x α β
p(x) = , β > 0; 0 ≤ x ≤ +∞exp[−x/β]xα−1
Γ(α)βα
Γ
Γ(α)
Γ(n)Γ(α)
=
==
exp(− ) d∫ +∞0
x′α−1 x′ x′
(n− 1)! forninteger(α− 1)Γ(α− 1)forαcontinuouswithΓ(1) = 1
α = 1⟶
α = , β = 2⟶n2 χ2n
4.Uncertainties,errorsandhypothesistesting
Probabilitystatements
Weusedistributionstomakeprobabilitystatementsaboutourr.v.estimates.
Probabilitystatements
Weusedistributionstomakeprobabilitystatementsaboutourr.v.estimates.Itisusefultoconsiderthefollowing.ForanygivenPDF
,andassociatedCDF ,ofthevariable ,let'sdenote thevaluethatcorrespondsto ,thatis
BEWAREthatthistheconventionusedhere.ItisnotablydifferentinMatlabwhereicdf('normal',1- ,0,1)returnsthevalue
p(z) f(z) z zαf(z) = 1 − α
f( ) = p(z) dz = Prob[z ≤ ] = 1 − αzα ∫ zα
−∞zα
α zα
Confidenceintervals
PDFsareusedtoderiveconfidenceintervals(CI),theinterpretationofwhichissubtle.WhatisaCIforyou?
Confidenceintervals
PDFsareusedtoderiveconfidenceintervals(CI),theinterpretationofwhichissubtle.WhatisaCIforyou?
Givenanestimate ofaquantity ,andachosensignificancelevel,weconstructanintervalwithlowerbound andupperboundsothatthisintervalisexpectedtocoverthetrue,unknown,but
fixedvalueof ,withprobability .
ϕ̂ ϕα ϕLϕU
ϕ 1 − α
Confidenceintervals
PDFsareusedtoderiveconfidenceintervals(CI),theinterpretationofwhichissubtle.WhatisaCIforyou?
Givenanestimate ofaquantity ,andachosensignificancelevel,weconstructanintervalwithlowerbound andupperboundsothatthisintervalisexpectedtocoverthetrue,unknown,but
fixedvalueof ,withprobability .
Inotherwords,ifwecouldrepeattheestimationandcalculationoftheCImanytimes,wecanexpectthatthetrueunknownparameteriscoveredbythecalculatedCI,95outof100times.
ϕ̂ ϕα ϕLϕU
ϕ 1 − α
ϕ
Confidenceintervals
PDFsareusedtoderiveconfidenceintervals(CI),theinterpretationofwhichissubtle.WhatisaCIforyou?
Givenanestimate ofaquantity ,andachosensignificancelevel,weconstructanintervalwithlowerbound andupperboundsothatthisintervalisexpectedtocoverthetrue,unknown,but
fixedvalueof ,withprobability .
Inotherwords,ifwecouldrepeattheestimationandcalculationoftheCImanytimes,wecanexpectthatthetrueunknownparameteriscoveredbythecalculatedCI,95outof100times.
Thereisnoprobabilitystatementabout ,onlyabout and.
ϕ̂ ϕα ϕLϕU
ϕ 1 − α
ϕ
ϕ ϕ̂[ , ]ϕL ϕU
Confidenceintervals
Asaconcreteexample,considerthesamplemean of.
Westatedearlierthat .Assuch,wecanstatethatthenew"transformed"variable
andthatwecanfindtwo valuessuchthat
X¯ ¯¯̄
x ∼ N ( , )μx σx
∼ N ( , / )X¯ ¯¯̄
μx σx N−−√
z = ∼ N (0, 1)−X¯ ¯¯̄ μx
/σx N−−√
z
Prob[ < ≤ ] = 1 − αz1−α/2−X¯ ¯¯̄ μx
/σx N−−√
zα/2
Confidenceintervals:normalcase
Since,thenormaldistributionissymmetricaroundzero,
Prob[ < ≤ ] = 1 − αz1−α/2−X¯ ¯¯̄ μx
/σx N−−√
zα/2
= −z1−α/2 zα/2
Confidenceintervals:normalcase
Asaresult,thenormalizedcalculatedvariable issuchthat
with probability.Afterrearranging,wecanstatethatthetruemean ofther.v. issuchthat
withaconfidenceof .
Incommonparlance,a95%CIfor is
z
− < z = ≤zα/2−X¯ ¯¯̄ μx
/σx N−−√
zα/2
1 − αμx x
− ≤ < +X¯ ¯¯̄ σxzα/2
N−−√
μx X¯ ¯¯̄ σxzα/2
N−−√
100(1 − α)%
μx
[ − , + ]X¯ ¯¯̄ σxzα/2
N−−√
X¯ ¯¯̄ σxzα/2
N−−√
Confidenceintervals:normalcase
Typicalintervalsusedare:
Beforetheadventofadvancedsoftwares,peoplereliedonstatisticaltablesforthevaluesof ,suchastheonesfoundintheAppendicesofreferences[1],[2],[3],[6].
90%CI : α = 0.1⟶ = 1.6449zα/2
95%CI : α = 0.05⟶ = 1.9600zα/2
99%CI : α = 0.01⟶ = 2.5758zα/2
z
Confidenceintervals:normalcase
ExamplefromBendatandPiersol(2011):95%CI : α = 0.05⟶ α/2 = 0.0250
Confidenceintervals:normalcase;example:
UsingaCTDrecordoftemperatureat24Hz,weestimatethemeantemperaturenear70dbpressurelevelbyaveragingdatapointswithin.5dbof70db(fallingrateis1-2m/s), .Wefind
.FromthespecificationsheetoftheCTD911Plus,theaccuracyofthetemperaturesensoris ,whichweinterpretasbeingtherandomerrororstdof ,i.e. ,ignoringapossiblebias.
N = 11= 19.21359T¯ ¯¯̄
0.001T σT
Confidenceintervals:normalcase;example:
UsingaCTDrecordoftemperatureat24Hz,weestimatethemeantemperaturenear70dbpressurelevelbyaveragingdatapointswithin.5dbof70db(fallingrateis1-2m/s), .Wefind
.FromthespecificationsheetoftheCTD911Plus,theaccuracyofthetemperaturesensoris ,whichweinterpretasbeingtherandomerrororstdof ,i.e. ,ignoringapossiblebias.Weuse ,andbasedonthepreviousformula,the95%CIforthetruemean is:
Alternatively,oncecanstatethattheestimateofthemeanwith95%uncertaintyis
N = 11= 19.21359T¯ ¯¯̄
0.001T σT
( − )/( / ) ∼ N (0, 1)T¯ ¯¯̄
μT σT N−−√
μT
19.21359 − ≤ < 19.21359 +0.001 × 1.96
11−−√μT
0.001 × 1.96
11−−√
⇒ 19.21300 ≤ < 19.21418μT
= 19.21359 ± 0.00059μT
Confidenceintervals: case
Nowimaginethatyouobtainthedatafromthepreviousexamplebutdonotknowthespecificationofthesensorused,thatis .Instead,youcanconsiderthesamplestandarddeviation asanestimateoftheunknown .
t
σTsT
σT
Confidenceintervals: case
Nowimaginethatyouobtainthedatafromthepreviousexamplebutdonotknowthespecificationofthesensorused,thatis .Instead,youcanconsiderthesamplestandarddeviation asanestimateoftheunknown .Itcanbeshown(notobvious),that
t
σTsT
σT
∼ t(N − 1)−T¯ ¯¯̄ μT
/sT N−−√
Confidenceintervals: case
Nowimaginethatyouobtainthedatafromthepreviousexamplebutdonotknowthespecificationofthesensorused,thatis .Instead,youcanconsiderthesamplestandarddeviation asanestimateoftheunknown .Itcanbeshown(notobvious),that
Thus,a %CIforthetruemean is:
where isthevalueofthe variablesuchthat
The distributionissymmetriclikethenormaldistributionsothat
t
σTsT
σT
∼ t(N − 1)−T¯ ¯¯̄ μT
/sT N−−√
100(1 − α) μT
[ − ≤ < + ]T¯ ¯¯̄ sT tN−1;α/2
N−−√
μT T¯ ¯¯̄ sT tN−1;α/2
N−−√
tN−1;α/2 tN−1
Prob [t ≤ ] = 1 − or Prob [t > ] =tN−1;α/2α
2tN−1;α/2
α
2
t= −tN;1−β tN;β
Confidenceintervals: case
GoingbacktoourCTDexample,westillhave butnowcalculate .Since ,the95%CIfor becomes:
Alternativelyoncecanstatethattheestimateofthemeanwith95%uncertaintyis
t
= 19.21359T¯ ¯¯̄
= 0.02277sT = 2.0227t40−1;0.05/2μT
19.21359 − ≤ < 19.21359 +0.02277 × 2.0227
1–√μT
0.02277 × 2.0227
11−−√
⇒ 19.19829 ≤ < 19.22889μT
= 19.21359 ± 0.01530μT
Confidenceintervals: case
GoingbacktoourCTDexample,westillhave butnowcalculate .Since ,the95%CIfor becomes:
Alternativelyoncecanstatethattheestimateofthemeanwith95%uncertaintyis
Previously,westatedthat
Sowhatistherightanswer?
t
= 19.21359T¯ ¯¯̄
= 0.02277sT = 2.0227t40−1;0.05/2μT
19.21359 − ≤ < 19.21359 +0.02277 × 2.0227
1–√μT
0.02277 × 2.0227
11−−√
⇒ 19.19829 ≤ < 19.22889μT
= 19.21359 ± 0.01530μT
= 19.21359 ± 0.00059μT
Confidenceintervals: case
GoingbacktoourCTDexample,westillhave butnowcalculate .Since ,the95%CIfor becomes:
Alternativelyoncecanstatethattheestimateofthemeanwith95%uncertaintyis
Previously,westatedthat
Sowhatistherightanswer?Itisarguable...
t
= 19.21359T¯ ¯¯̄
= 0.02277sT = 2.0227t40−1;0.05/2μT
19.21359 − ≤ < 19.21359 +0.02277 × 2.0227
1–√μT
0.02277 × 2.0227
11−−√
⇒ 19.19829 ≤ < 19.22889μT
= 19.21359 ± 0.01530μT
= 19.21359 ± 0.00059μT
Toattempttoanswerthisquestion,let'slookatthedata.Thebluecurveisthe24Hztemperaturedata,andtheredcurveandshadingshowthe1dbbinaveragesand95%CIusingthe distribution.t
Instrumentalvssampling/modelerrors
Whenweuse asanestimateofthetemperatureat70db,weassumedthatthe recordsoftemperatureat24Hzhadthesameexpectationandvariance.
T¯ ¯¯̄
N = 11
Instrumentalvssampling/modelerrors
Whenweuse asanestimateofthetemperatureat70db,weassumedthatthe recordsoftemperatureat24Hzhadthesameexpectationandvariance.
However,weknowbasedonouroceanographicknowledgethatthetemperatureisnotnecessarilyaconstantwithdepth.Here,theobvioustemperaturegradientwithdepthmakesusthinkthattheexpectedtemperatureat69.5dbisnotthesameasat70.5db.
T¯ ¯¯̄
N = 11
Instrumentalvssampling/modelerrors
Whenweuse asanestimateofthetemperatureat70db,weassumedthatthe recordsoftemperatureat24Hzhadthesameexpectationandvariance.
However,weknowbasedonouroceanographicknowledgethatthetemperatureisnotnecessarilyaconstantwithdepth.Here,theobvioustemperaturegradientwithdepthmakesusthinkthattheexpectedtemperatureat69.5dbisnotthesameasat70.5db.
Thus,whenwecalculateanarithmeticaverage( )oftemperaturethatisafunctiondepth,itisanestimateofatemperaturequantitythatisvariablebecauseof1)theaccuracyofthesensor,and2)thevaryingexpectationvalue.
T¯ ¯¯̄
N = 11
T¯ ¯¯̄
Instrumentalvssampling/modelerrors
Whenweuse asanestimateofthetemperatureat70db,weassumedthatthe recordsoftemperatureat24Hzhadthesameexpectationandvariance.
However,weknowbasedonouroceanographicknowledgethatthetemperatureisnotnecessarilyaconstantwithdepth.Here,theobvioustemperaturegradientwithdepthmakesusthinkthattheexpectedtemperatureat69.5dbisnotthesameasat70.5db.
Thus,whenwecalculateanarithmeticaverage( )oftemperaturethatisafunctiondepth,itisanestimateofatemperaturequantitythatisvariablebecauseof1)theaccuracyofthesensor,and2)thevaryingexpectationvalue.
Traditionally,thissecondtypeoferroriscalledasamplingerrororamodelerror,whichaddstothefirsttypeoferrorcalledinstrumentalerror.
T¯ ¯¯̄
N = 11
T¯ ¯¯̄
Instrumentalvssampling/modelerrors
Whenweuse asanestimateofthetemperatureat70db,weassumedthatthe recordsoftemperatureat24Hzhadthesameexpectationandvariance.
However,weknowbasedonouroceanographicknowledgethatthetemperatureisnotnecessarilyaconstantwithdepth.Here,theobvioustemperaturegradientwithdepthmakesusthinkthattheexpectedtemperatureat69.5dbisnotthesameasat70.5db.
Thus,whenwecalculateanarithmeticaverage( )oftemperaturethatisafunctiondepth,itisanestimateofatemperaturequantitythatisvariablebecauseof1)theaccuracyofthesensor,and2)thevaryingexpectationvalue.
Traditionally,thissecondtypeoferroriscalledasamplingerrororamodelerror,whichaddstothefirsttypeoferrorcalledinstrumentalerror.
Partoftheanalysisofyourdataistounderstand,ormodel,thesourcesorvarianceandhenceoferrorswhencalculatingderivedquantitiessuchasmean,varianceetc.
T¯ ¯¯̄
N = 11
T¯ ¯¯̄
Interlude:modelingsignalandnoise
Partoftheproblemofchoosingtheappropriatevariancetoestimateerrorsischoosingamodelfortheobservations.Asanexample,themeasured"process" maybethesumofagivensignalplusinstrumentalnoise .
xy ε
x = y+ ε
Interlude:modelingsignalandnoise
Partoftheproblemofchoosingtheappropriatevariancetoestimateerrorsischoosingamodelfortheobservations.Asanexample,themeasured"process" maybethesumofagivensignalplusinstrumentalnoise .
Ifthesignalandnoiseindependent,thenthetotalvarianceoftheprocessis
xy ε
x = y+ ε
Var[x]σ2x
==Var[y]σ2y
++Var[ε]σ2ε
Interlude:modelingsignalandnoise
Partoftheproblemofchoosingtheappropriatevariancetoestimateerrorsischoosingamodelfortheobservations.Asanexample,themeasured"process" maybethesumofagivensignalplusinstrumentalnoise .
Ifthesignalandnoiseindependent,thenthetotalvarianceoftheprocessis
InourpreviousexampleoftheCTDprofile,thesamplevarianceofthemeasurementswaslikelythesumoftheinstrumentalerrorandofthe"error"fromthebackgroundshearoftemperature.Iwouldtendtochoosethesecondcase( distributionwithunknownvariance)toderiveCIs.
xy ε
x = y+ ε
Var[x]σ2x
==Var[y]σ2y
++Var[ε]σ2ε
t
Confidenceintervals: case
The distributionisdefinedforpositivevaluesonly,andisnotsymmetric:
χ2n
Prob [ < ≤ ] = 1 − αχ2n;1−α/2 χ2n χ2
n;α/2
χ2
≠ −χ2n;1−β χ2
n;β
Confidenceintervals: caseexample
The distributioncanbeusedtoderiveCIsforvarianceestimates.Itcanbeshownthatfor samplesdrawnfromanormallydistributedr.v. withvariance ,wehave
whichcanbeusedtoderive %CIforvarianceestimatesas
χ2n
χ2
Nx σ2x
∼(N − 1)s2x
σ2xχ2N−1
100(1 − α)s2x
≤ <(N − 1)s2xχ2N−1;α/2
σ2(N − 1)s2xχ2N−1;1−α/2
Hypothesistesting
Confidenceintervalsareparticularcasesofhypothesistesting,acaseofdataanalysisthatoccursfrequently.Seetheintroductionof100statisticaltestsbyG.K.Kanji(seereference[5]).
Hypothesistesting
Confidenceintervalsareparticularcasesofhypothesistesting,acaseofdataanalysisthatoccursfrequently.Seetheintroductionof100statisticaltestsbyG.K.Kanji(seereference[5]).
Hypothesistestingdoesnotconsistinprovingordisprovinghypotheses.Justlikewewillneverknowthetruevalueofar.v.,wewillneverproveinanindeniablefashionthatanhypothesisistrue.
Hypothesistesting
Confidenceintervalsareparticularcasesofhypothesistesting,acaseofdataanalysisthatoccursfrequently.Seetheintroductionof100statisticaltestsbyG.K.Kanji(seereference[5]).
Hypothesistestingdoesnotconsistinprovingordisprovinghypotheses.Justlikewewillneverknowthetruevalueofar.v.,wewillneverproveinanindeniablefashionthatanhypothesisistrue.
Hypothesistestingconsistsinshowingthatanhypothesiscannotbesupportedgivenitssmallprobability.Howsmallisyourownchoice,orthedifferencebetweengettingpublishedornotpublished,orastakeholdertakingadecisionoractionorinaction,etc.
Hypothesistesting
Confidenceintervalsareparticularcasesofhypothesistesting,acaseofdataanalysisthatoccursfrequently.Seetheintroductionof100statisticaltestsbyG.K.Kanji(seereference[5]).
Hypothesistestingdoesnotconsistinprovingordisprovinghypotheses.Justlikewewillneverknowthetruevalueofar.v.,wewillneverproveinanindeniablefashionthatanhypothesisistrue.
Hypothesistestingconsistsinshowingthatanhypothesiscannotbesupportedgivenitssmallprobability.Howsmallisyourownchoice,orthedifferencebetweengettingpublishedornotpublished,orastakeholdertakingadecisionoractionorinaction,etc.
Ingeneral,thehypothesiswearetryingtodenouce,decry,etc,isonewithnochange(i.e. ,"themeantemperaturetodayisthesameasyesterday"),sothatitistypicallycalledthenullhypothesis,
.When isrejectedbecauseofinsufficientprobability,weacceptthealternaticehypothesis (i.e. ,"themeantemperaturetodayisdifferentfromyesterday").
a = b
H0 H0H1 a ≠ b
Hypothesistesting
Step1
Defineyourpracticalproblemintermsofsimplehypotheses,anullhypothesisandanalternatehypothesisthattypicallyleadstoaction.Decideifyouarelikelytoconductaone-tailedortwo-tailedtest.
Hypothesistesting
Step1
Defineyourpracticalproblemintermsofsimplehypotheses,anullhypothesisandanalternatehypothesisthattypicallyleadstoaction.Decideifyouarelikelytoconductaone-tailedortwo-tailedtest.
Asanexample,anullhypothesisisthatthepopulationmean ofar.v. isequaltoagivenvalue (maybe ).Alternativehypothesesmaybethat isnotequalto (case ,two-tailedtest),orthatisgreaterorsmallerthan (cases ,one-tailedtests).
μxx μ0 0
μx μ0 1 μxμ0 2, 3
1.
2.
3.
: =H0 μx μ0
: ≠H1 μx μ0: =H0 μx μ0: >H1 μx μ0
: =H0 μx μ0: <H1 μx μ0
Hypothesistesting
Step2
Deriveastatistic,thatisanumber,thatcanbecalculatedfromyourdataandyourassumptions,typicallyunderyournullhypothesis
.Makesurethatthisnumberisgoingtobedifferentwhen istrueorwhen istrue.H0 H0
H1
Hypothesistesting
Step2
Deriveastatistic,thatisanumber,thatcanbecalculatedfromyourdataandyourassumptions,typicallyunderyournullhypothesis
.Makesurethatthisnumberisgoingtobedifferentwhen istrueorwhen istrue.
Followingthepreviousexample,wesawthatif isnormallydistributedwithknownvariance ,thenthestatistic
H0 H0H1
xσx
z = ∼ N (0, 1)−X¯ ¯¯̄¯ μx
/σx N√
Hypothesistesting
Steps3&4
Chooseacriticalregionforyourteststatisticandasignificancelevel thatdeterminethesizeofyourcriticalregion.Criticalregionscanbeofthreetypes;right-sidedmeansthatyourejectifyourteststatisticisgreaterthanorequaltosomerightcriticalvalue;left-sidedyougetit;orboth-sidedsothatyoureject ifyourteststatisticiseithergreaterthanorequaltotherightcriticalvalueorlessthanorequaltotheleftcriticalvalue.
αH0
H0
Hypothesistesting
Steps3&4:example
Case1:
isoutsideofthecriticalregion!Noreasontoreject (i.e.weacceptthatthemeanisnotdifferentfrom )
Case2:
isinthecriticalregionforaright-sidedtest!Wecanreject (inthesensethatthemeanappearslargerthan )
Case3:
isoutsidethecriticalregionforaleft-sidedtest!Noreasontoreject (inthesensethatitmeandoesnotappeartobelessthan
).
α = 0.05, : = 4,N = 9, = 4.6, = 1.0 → z = = 1.8H0 μ0 X¯ ¯¯̄
σx4.6 − 4
1/ 9–√
= −1.96 < z = 1.8 < = 1.96z1−0.05/2 z0.05/2
z H0μ0
z = 1.8 > = 1.64z1−0.05
z H0μ0
= −1.64 ≤ z = 1.8z0.05
zH0
μ0
Hypothesistesting
PleaseseethebookbyG.K.Kanji,100StatisticalTests,(2006)!Itisveryhandy...
Onelastthing:Errorpropagation
Wesawcommoncaseswherethestatisticswere , ,or r.v.Whatifwearetryingtoassesstheerrororuncertaintyforar.v.thatisarbitrarilyfunctionof variables withindependentrandomerrors (maybetheRMSerror)?
x ∼ z t χ2
yN xn
εxn
y = y( , ,… , )x1 x2 xN
Onelastthing:Errorpropagation
Wesawcommoncaseswherethestatisticswere , ,or r.v.Whatifwearetryingtoassesstheerrororuncertaintyforar.v.thatisarbitrarilyfunctionof variables withindependentrandomerrors (maybetheRMSerror)?
Anapproximateformulafor"small"errorsis
Seereference[3].
x ∼ z t χ2
yN xn
εxn
y = y( , ,… , )x1 x2 xN
≈ + +…+ε2y ( )∂y∂x1
2
ε2x1 ( )∂y∂x2
2
ε2x2 ( )∂y∂xN
2
ε2xN
Practicalsession
Pleasedownloaddataatthefollowinglink:
PleasedownloadtheMatlabcodeatthefollowinglink:
MakesureyouhaveinstalledandtestedthefreejLabMatlabtoolboxfromJonathanLillyatwww.jmlilly.net/jmlsoft.html
Extraslides
-testfortwopopulationmeans(variancesunknownandunequal)
Followingtest#9ofKanji(2006),reference[5]
Theteststatisticis
whichisusedtotest ,sothat with
t
t =( − ) − ( − )X¯ ¯¯̄1 X¯ ¯¯̄2 μ1 μ2
( + )s21n1
s22n2
12
=μ1 μ2 t ∼ t(0, ν)
ν =
( + )s21n1
s22n2
2
+s41
( − 1)n21 n1
s42
( − 1)n22 n2
Kolmogorov-Smirnovtestfordistribution
TheKolmogorov-Smirnovtestcomparesanempiricaldistributionfunction toaprescribednormaldistributionfunction withmean andstandarddeviation .Itconsidersthestatistic
whichmeasuresthemaximumdistancebetweenthetwodistribution(asseenonaQ-Qplot).
Theissueisthatthistestistooconservativewhenthemeanandstdof arecalculatedfromthedata.AnalternativetestiscalledtheLillieforstest,whichismorestringent.Seealsotest20ofKanji(2006),reference[5]foranothertest.
InMatlab:
h = kstest(x); h = lillietest(x);
F̂ Fμ σ
D = | ( ) − F (μ,σ)|maxXi
F̂ Xi
F