STATISTICS Sampling and Sampling Distributions Professor Ke-Sheng Cheng Department of...

Preview:

Citation preview

STATISTICS Sampling and Sampling

Distributions

Professor Ke-Sheng ChengDepartment of Bioenvironmental Systems Engineering

National Taiwan University

Random sample • Let the random variables X1, X2, …, Xn have a

joint density that factors as follows:

where is the common density of each Xi . Then (X1, X2, …, Xn) is defined to be a random sample of size n from a population with density .

),,,(,,, 21 nXXXf

)()()(),,( 2121,,, 21 nnXXX xfxfxfxxxfn

)(f

)(f

04/10/23 2Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• If X1, X2, …, Xn is a random sample of size n

from , then X1, X2, …, Xn are stochastically

independent. • Histogram -- A frequency (or relative

frequency) plot of observed data is called a frequency histogram (or relative frequency histogram).

)(f

04/10/23 3Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Frequency Histogram

04/10/23 4Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Cumulative frequency

04/10/23 5Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Relative cumulative frequency

04/10/23 6Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Statistic• A statistic is a function of observable random

variables, which is itself an observable random variable and does not contain any unknown parameters.

• A statistic must be observable because we intend to use it to make inferences about the density functions of the random variables.

04/10/23 7Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• For example, if a random variable has a probability density function where and are unknown, then is not a statistic.

• If a statistic is not observable, then it can not be used to inference the parameters of the density function.

),( 2N

n

iiX

1

2

04/10/23 8Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• An observation of random sample of size n can be regarded as n independent observations of a random variable.

04/10/23 9Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• One of the central problems in statistics is to find suitable statistics to represent parameters of the probability distribution function of a random variable. Sample

Statistics

Population

Parameters

),( 2N

),( 2sx ),( 2

},,{ 1 nxx

04/10/23 10Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Observable Unknown

Sample moments

• Let X1, X2, …, Xn be a random sample from the

density . Then the rth sample moment about 0 is defined as

)(f

n

i

rir X

nM

1

' 1

04/10/23 11Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• In particular, if r = 1, we have the sample mean ; that is,

• Also, the rth sample moment about the sample mean is defined as

nX

n

iin X

nX

1

1

n

i

rnir XX

nM

1

)(1

04/10/23 12Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• Theorem – Let X1, X2, …, Xn be a random

sample from the density . The expected value of the rth sample moment about 0 is equal to the rth population moment; i.e.,

Also,

)(f

'' ][ rrME

])([1

}])[(][{1

][

2''2

22

'

rrrr

r

nXEXE

n

MVar

04/10/23 13Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• Special case: r=1

nnXVar

n

XEXEn

XVar

X /)(])([1

}])[(][{1

][

22'1

'2

22

04/10/23 14Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Sample statistics

• Let X1, X2, …, Xn be a random sample from the

distribution of a random variable X. Sample mean and sample variance of the distribution are respectively defined to be

n

iiX

nX

1

1 2

1

2 )(1

1XX

nS

n

ii

04/10/23 15Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 16Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 17Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Estimating the mean• Given a random sample from a probability

density function f( . ) with unknown mean μ and finite variance σ2

, we want to estimate the mean using the random sample.

• Using only a finite number of values of X (a random sample of size n), can any reliable inferences be made about E(X), the average of an infinite number of values of X?

• Will the estimate be more reliable if the size of the random sample is larger?

nxxx ,, 21

04/10/23 18Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

R-program demonstration

04/10/23 19Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 20Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 21Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 22Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Mean of sample means w.r.t. sample size

59.8

59.85

59.9

59.95

60

60.05

60.1

60.15

60.2

0 1000 2000 3000 4000 5000

04/10/23 23Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Mean of sample standard deviations w.r.t. sample size

19.84

19.86

19.88

19.9

19.92

19.94

19.96

19.98

20

20.02

0 1000 2000 3000 4000 5000

04/10/23 24Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Standard deviation of sample means w.r.t. sample size

y = 19.938x-0.4998

R2 = 0.9995

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 1000 2000 3000 4000 5000

Y=f(x)=?What is the theoretical basis?

04/10/23 25Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Histograms of sample mean and standard deviationns=30

04/10/23 26Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Histograms of sample mean and standard deviationns=5000

04/10/23 27Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Weak Law of Large Numbers (WLLN)

• Let f( . ) be a density with mean μ and variance σ2, and let be the sample mean of a random sample of size n from f( . ). Let ε and δ be any two specified numbers satisfying ε>0 and 0<δ<1. If n is any integer greater than , then

nX

2

2

1][ nXP

04/10/23 28Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Recall the theorem

04/10/23 29Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 30Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• (Example) Suppose that some distribution with an unknown mean has its variance equal to 1. How large a random sample must be taken such that the probability will be at least 0.95 that the sample mean will lie within 0.5 of the population mean?

nX

12 5.0

05.095.01

80)5.0)(05.0(

12

n

04/10/23 31Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

(Example) How large a random sample must be taken in order that you are 99% certain that is within 0.5σ of μ?nX

5.0

01.099.01 400

)5.0)(01.0( 2

2

n

04/10/23 32Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Raingauge network design• Assuming there are already some raingauge

stations in a catchment, and we are interested in determining the optimal number of stations that should exist to achieve a desired accuracy in the estimation of mean rainfall.

• Two approaches– (1) Standard deviation of the sample mean should

not exceed a certain portion of the population mean.

– (2) 1][ nxP

04/10/23 33Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Criterion 1Standard deviation of the sample mean should not exceed a

certain portion of the population mean.

2

22

,

),0(~)(,)/,(~

V

VX

nn

Cn

nC

n

nNXnNX

n

04/10/23 34Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Criterion 2

• From the weak law of large numbers,

1][ nxP

2

2

n

What assumptions have we made for such approaches of network design ?

What are the practical considerations in monitoring network design?

Data independence

04/10/23 35Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

The Central Limit Theorem

• Let f( . ) be a density with mean μ and finite variance σ2. Let be the sample mean of a random sample of size n from f( . ). Then

approaches the standard normal distribution as n approaches infinity.

nX

n

XZ n

n

04/10/23 36Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• The importance of the CLT is the fact that the mean of a random sample from any distribution with finite variance σ2

and mean μ is

approximately distributed as a normal random variable with mean μ and variance .

nX

n2

nN

nZX nn

,~

04/10/23 37Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

R-program demonstration- Central Limit Theorem

04/10/23 38Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 39Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 40Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 41Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

n=2n=10

n=25

n=50

n=100

04/10/23 42Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Sampling distributions

• Given random samples of certain probability densities, we often are interested in knowing the probability densities of sampling statistics.– Poisson distribution– Exponential distribution– Normal distribution– Chi-square distribution– Standard normal and chi-square distributions– Student’s t-distribution

04/10/23 43Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Poisson distribution

04/10/23 44Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 45Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 46Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 47Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 48Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Exponential distribution

04/10/23 49Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Normal distribution

04/10/23 50Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 51Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 52Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 53Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 54Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Chi-square distribution

04/10/23 55Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 56Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Standard normal and chi-square distributions

04/10/23 57Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 58Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Student’s t-distribution

Student’s t distribution with k degrees of freedom

04/10/23 59Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

• The "student's" distribution was published in 1908 by W. S. Gosset. Gosset, however, was employed at a brewery that forbade the publication of research by its staff members. To circumvent this restriction, Gosset used the name "Student", and consequently the distribution was named "Student t-distribution.

04/10/23 60Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Order statistics

04/10/23 61Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 62Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

04/10/23 63Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

Recommended