Upload
haile
View
41
Download
0
Embed Size (px)
DESCRIPTION
BASIC STATISTICAL CONCEPTS. Ocean is not “stationary”. “Stationary” - statistical properties remain constant in time. Data collected have signal and noise. Both signal and noise are assumed to have random behavior. Most basic descriptive parameter :. Sample Mean. - PowerPoint PPT Presentation
Citation preview
BASIC STATISTICAL CONCEPTSStatistical Moments & Probability Density Functions
Ocean is not “stationary”
“Stationary” - statistical properties remain constant in time
Data collected have signal and noise
Both signal and noise are assumed to have random behavior
Population Sample
Most basic descriptive parameter for any set of measurements:
N
iix
Nx
1
1
Sample Mean
over the duration of a time series – “time average”
or over an ensemble of measurements – “ensemble mean”
Sample mean is an unbiased estimate of the population mean ‘’
The population mean, μ, can be regarded as the expected outcome E(y) of an event y.
If the measurement is executed many times, μ would be the most common outcome, i.e., it’d be E(y) (e.g. the weight printed on a bag of chips)
Sample Mean - locates center of mass of data distribution such that:
Weighted Sample Mean
N
iii xf
Nx
1
1
N
fi relative frequency of occurrence of i th value
N
ii xx
N 1
01
N
i
x1
'
Variance - describes spread about the mean or sample variability
N
ii xx
Ns
1
22 1'Sample variance
2'' ss Sample standard deviation typical difference from the mean
N
ii xx
N 1
22
1
1Population variance (unbiased)
N needs to be > 1 to define variance and std dev
Only for N < 30 s’ and are significantly different
N
i
N
iii x
Nx
N 1
2
1
22 1
1
1Computationally more efficient (only one pass through the data)
N
ii xx
N 1
22
1
1
Population variance
has one degree of freedom (dof) <
N
ii xx
Ns
1
22 1'
Sample variance
because we estimate population variance with sample variance(one less dependent measure)
d.o.f. : = # of independent pieces of data being used to make a calculation.
= measure of how certain we are that our sample is representative of the entire population
The larger the more certain we are that we have sampled the entire population
Example: we have 2 observations, when estimating the mean we have 2 independent observations: = 2
But when estimating the variance, we have one independent observation because the two observations are at the same distance from the mean: =1
Other values of Importance
range(1.27)
0.66
-0.61
Median – equal number of values above and below = -0.007
Mode – value occurring most often
N = 1601
Mode = -0.3
Two ModesBimodal
Probability
Provides procedures to infer population distribution from sample distribution
and to determine how good the inference is
The probability of a particular event to occur is the ratio of the number of occurrences of that event and the total number of occurrences for all possible events
P (a dice showing ‘6’) = 1/6
The probability of a continuous variable is defined by a PROBABILITY DENSITY FUNCTION -- PDF
0 P (x) 1
Probability is measured by the area underneath PDF
1
dxxf
1
dxxf
Probability Density FunctionGauss or Normal or Bell
123
2
22 2
xexf
erf(1/(2)½)
= 68.3%
erf(2/(2)½)
= 95.4%
erf(3/(2)½)
= 99.7%
123
68.3%
95.4%99.7%
2
22zezF
x
z
standardized normal variable
Probability Density FunctionGauss or Normal or Bell
Probability Density FunctionGamma
xexxf
1
= 1
= 1
= 2 = 3
= 4
0
1 dxex x
Probability Density FunctionGamma
xexxf
1
0
1 dxex x
= 2
= 1
= 2
= 3
= 4
Probability Density FunctionChi Square
xexxf
1
= /2
Special case for = 2
= 2
= 4 = 6
= 8
4 2
8 2
12
2
16
2
CONFIDENCE INTERVALS
1 - /2/2
Confidence Interval for with known
For N > 30 (large enough sample)
the 100 (1 - )% confidence interval is:
Nzx
Nzx
22
x
z
standardized normal variable
(1 - /2) = 0.975
http://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php
z /2 = 1.96
100 (1 - )% C.I. is:N
zxN
zx
22
If = 0.05, z /2 = 1.96
Suppose we have a CT sensor at the outlet of a spring into the ocean. We obtain a burst sample of 50 measurements, once per second, with a sample mean of 26.5 ºC and a stdev of 1.2 ºC for the burst.
What is the range of possible values, at the 95% confidence, for the population mean?
50
2.196.12
Nz
55.0 05.2795.25
CONFIDENCE INTERVALS
1 - /2/2
Confidence Interval for with unknown
For N < 30 (small samples)
the 100 (1 - )% confidence interval is:
N
stx
N
stx ,2,2
Ns
xt
Student’s t-distribution with = (N-1) degrees of freedom
x
z
/2 = 0.025
d.o.f.= 19
1 - /2/2
100 (1 - )% C.I. is:N
stx
N
stx ,2,2
If = 0.05, t0.025,19 = 2.093
Suppose we do 20 CTD profiles at one station in St Augustine Inlet. We obtain a mean at the surface of 16.5 ºC and a stdev of 0.7 ºC .
20
7.0093.2,2
N
st 33.0 83.1617.16
What is the range of possible values, at the 95% confidence, for the population mean?
CONFIDENCE INTERVALS
1 - /2
/2
Confidence Interval for 2
To determine reliability of spectral peaks
Need to know C.I. for 2 on the basis of s2
2
,21
22
2,2
2 11
sNsN
2L
2U
= (N-1) degrees of freedom
1 - /2
/2
2L
2U
2
,21
22
2,2
2 11
sNsN
Suppose that we have = 10 spectral estimates of a tidal record.
100 (1 - )% C.I. is:
The background variance near a distinct spectral peak is 0.3 m2
95% C.I. for variance?
How large would the peak have to be to stand out, statistically, from background level?
/2 = 0.025; 1 - /2 = 0.975
Look at Chi square table:
148.2025.3 210P
25.3
3.010
48.20
3.01011 22
,21
22
2,2
2
sNsN
Chi Square Table
92.015.0 2
The background variance lies in this range
The spectral peak has to be greater than 0.92 m2 to distinguish it from background levels