Statistical Inferences

Preview:

DESCRIPTION

Statistical Inferences. Jake Blanchard Spring 2010. Introduction. Statistical inference=process of drawing conclusions from random data Conclusions of this process are “propositions,” for example Estimates Confidence intervals Credible intervals Rejecting a hypothesis - PowerPoint PPT Presentation

Citation preview

Uncertainty Analysis for Engineers 1

Statistical InferencesJake BlanchardSpring 2010

Uncertainty Analysis for Engineers 2

IntroductionStatistical inference=process of

drawing conclusions from random dataConclusions of this process are

“propositions,” for example◦Estimates◦Confidence intervals◦Credible intervals◦Rejecting a hypothesis◦Clustering data points

Part of this is the estimation of model parameters

Uncertainty Analysis for Engineers 3

Parameter EstimationPoint Estimation

◦Calculate single number from a set of observational data

Interval Estimation◦Determine interval within which true

parameter lies (along with confidence level)

Uncertainty Analysis for Engineers 4

PropertiesBias=expected value of

estimator does not necessarily equal parameter

Consistency=estimator approaches parameter as n approaches infinity

Efficiency=smaller variance of parameter implies higher efficiency

Sufficient=utilizes all pertinent information in a sample

Uncertainty Analysis for Engineers 5

Point EstimationStart with data sample of size NExample: estimate fraction of voters

who will vote for particular candidate (estimate is based on random sample of voters)

Other examples: quality control, clinical trials, software engineering, orbit prediction

Assume successive samples are statistically independent

Uncertainty Analysis for Engineers 6

EstimatorsMaximum likelihoodMethod of momentsMinimum mean squared errorBayes estimatorsCramer-Rao boundMaximum a posterioriMinimum variance unbiased

estimatorBest linear unbiased estimatoretc

Uncertainty Analysis for Engineers 7

Maximum LikelihoodSuppose we have a random

variable x with pdf f(x;)Take n samples of xWhat is value of that will

maximize the likelihood of obtaining these n observations?

Let L=likelihood of observing this set of values for x

Then maximize L with respect to

Uncertainty Analysis for Engineers 8

Maximum Likelihood

0;,...,log

0;,...,

);()...;();(;,...,

21

21

2121

n

n

nn

xxxL

xxxL

xfxfxfxxxL

Uncertainty Analysis for Engineers 9

ExampleTime between successive arrivals

of vehicles at an intersection are 1.2, 3, 6.3, 10.1, 5.2, 2.4, and 7.2 seconds

Assume exponential distributionFind MLE for

Uncertainty Analysis for Engineers 10

Solution

04.573.35

03.357)log(

3.35)(71)(7)log(

1exp11

1

2

7

1

7

17

7

1

/

/

L

LogtLogL

teL

ef

ii

ii

i

t

t

t

Uncertainty Analysis for Engineers 11

2-Parameter ExampleMeasure cycles to failure of

saturated sand (25, 20, 28, 33, 26 cycles)

Assume lognormal distribution

Uncertainty Analysis for Engineers 12

Solution

164.0

027.0)ln(1

26.3)ln(1

0)ln(1)ln(

0)ln(1)ln(

)ln(21ln)ln()2ln()ln(

)ln(21exp1

21)ln(

21exp

21

)ln(21exp

21

1

22

1

1

23

12

1

22

1

1

22

11

2

2

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

i i

n

i

i

i

i

i

xn

xn

xnL

xL

xxnnL

xx

xx

L

xx

f

Uncertainty Analysis for Engineers 13

Method of MomentsUse sample moments (mean,

variance, etc.) to set distribution parameters

Uncertainty Analysis for Engineers 14

ExampleTime between successive arrivals

of vehicles at an intersection are 1.2, 3, 6.3, 10.1, 5.2, 2.4, and 7.2 seconds

Assume exponential distributionMean=5.05

Uncertainty Analysis for Engineers 15

2-Parameter ExampleMeasure cycles to failure of

saturated sand (25, 20, 28, 33, 26 cycles)

Assume lognormal distributionMean=26.4Standard Deviation=4.72Solve for and =3.26=0.177

Uncertainty Analysis for Engineers 16

Solution

164.0

027.0)ln(1

26.3)ln(1

0)ln(1)ln(

0)ln(1)ln(

)ln(21ln)ln()2ln()ln(

)ln(21exp1

21)ln(

21exp

21

)ln(21exp

21

1

22

1

1

23

12

1

22

1

1

22

11

2

2

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

i i

n

i

i

i

i

i

xn

xn

xnL

xL

xxnnL

xx

xx

L

xx

f

Uncertainty Analysis for Engineers 17

Minimum Mean Square ErrorChoose parameters to minimize

mean squared error between measured data and continuous distribution

Essentially a curve fit

Uncertainty Analysis for Engineers 18

ApproachExcel

◦Guess parameters◦Calculate sum of squares of errors◦Vary guessed parameters to

minimize error (use the Solver)Matlab

◦Use fminsearch function

Uncertainty Analysis for Engineers 19

ExampleSolar insolation data

◦Gather data◦Form histogram◦Normalize histogram by number of

samples and width of bins

Uncertainty Analysis for Engineers 20

Scatter Plot and Histogram

3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 45000

2

4

6

8

10

120 5 10 15 20 25 30 350

50010001500200025003000350040004500

Uncertainty Analysis for Engineers 21

Normal and Weibull Fits

3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 45000

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 45000

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

Mean=3980 (fit)Mean=3915 (data)

Uncertainty Analysis for Engineers 22

Excel Screen Shot

Uncertainty Analysis for Engineers 23

Excel Screen Shot

Uncertainty Analysis for Engineers 24

Solver Set Up

Uncertainty Analysis for Engineers 25

Matlab Scripty=xlsread('matlabfit.xlsx','normal')[s,t]=hist(y,8);s=s/((max(t)-min(t))/8)/numel(y);numpts=numel(t);zin(1)=mean(t); zin(2)=std(t);sumoferrs(zin,t,s)zout=fminsearch(@(z) sumoferrs(z,t,s), zin)sumoferrs(zout,t,s)xplot=t(1):(t(end)-t(1))/(10*numel(t)):t(end);yplot=curve(xplot,zout);plot(t,s,'+',xplot,yplot)

Uncertainty Analysis for Engineers 26

Matlab Scriptfunction f=curve(x,z)mu=z(1);sig=z(2);f=normpdf(x,mu,sig); function f=sumoferrs(z, x, y)f=sum((curve(x,z)-y).^2);

Uncertainty Analysis for Engineers 27

Sampling DistributionsHow do we assess inaccuracy in

using sample mean to estimate population mean?

n

nn

nxVar

nx

nVarxVar

nn

xn

E

xn

x

x

n

ii

n

ii

n

iix

n

ii

22

21

21

1

1

111

11

1

Uncertainty Analysis for Engineers 28

ConclusionsExpected value of mean is equal to

population meanMean of sample is unbiased estimator

of mean of populationVariance of sample mean is sampling

errorBy CLT, sample mean is Gaussian for

large nMean of x is N(,/n)Estimator for improves as n increases

Uncertainty Analysis for Engineers 29

Sample Mean with Unknown In previous derivation, is the

population meanThis is generally not knownAll we have is the sample

variance (s2)If sample size is small,

distribution will not be GaussianWe can use a “student’s t-

distribution”

121

2

12/2/1)(

f

T ft

ffftf

f=number of degrees of freedom

Uncertainty Analysis for Engineers 30

Distribution of Sample Variance

2222

2

1

22

1

22

1

2

1 1

2

1 1

2

1

22

1

2

1

2

1

22

1

22

11

11

22

22

2

2

11

1111

nn

sE

xnExEn

sE

xnxxnnxnxx

xnxnxx

xxxx

xxxxxx

xxEn

xxEn

sE

xxn

s

n

ii

n

ii

n

ii

n

i

n

iii

n

i

n

iii

n

iii

n

ii

n

ii

n

ii

n

ii

Uncertainty Analysis for Engineers 31

ConclusionsSample variance is unbiased

estimator of population variance

For normal variates

44

44

42

13

xE

nn

nsVar

n

i

i

n

ii

n

ii

nxxsn

xnxxxsn

1

22

2

2

1

22

1

22

/)1(

)1(

Chi-Square Distribution with n-1 dof

This approaches normal

distribution for large

n

Uncertainty Analysis for Engineers 32

Testing HypothesesUsed to make decisions about

population based on sampleSteps

◦Define null and alternative hypotheses◦Identify test statistic◦Estimate test statistic, based on sample◦Specify level of significance Type I error: rejecting null hypothesis when it is

true Type II error: accepting null hypothesis when it

is false◦Define region of rejection (one tail or two?)

Uncertainty Analysis for Engineers 33

Level of SignificanceType I error

◦Level of significance ()◦Typically 1-5%

Type II error () is seldom used

Uncertainty Analysis for Engineers 34

ExampleWe need yield strength of rebar to

be at least 38 psiWe order sample of 25 rebarsSample mean from 25 tests is

37.5 psiStandard deviation of rebar

strength =3 psiUse one-sided testHypotheses: null-=38; alt.- <38

Uncertainty Analysis for Engineers 35

Solution

64.1)1,0,05.0(norminv)05.0()(

833.0

253

385.37

11

zn

xZ

So we cannot reject the null hypothesis and the supplier is considered acceptable

Uncertainty Analysis for Engineers 36

Variation of This ExampleSuppose standard deviation is

not knownUse student’s t-distributionSample stand. dev. = 3.5 psi

711.1)24,05.0(tinv24125

714.0

255.3

385.375.35.37

tdoff

n

xT

psispsix

So we cannot reject the null hypothesis and the supplier is considered acceptable

Uncertainty Analysis for Engineers 37

Third VariationSample size increased to 41Sample mean=37.6 psiSample standard deviation =

3.75 psiNull-variance=9Alternative-variance>9Use Chi-Square distribution

Uncertainty Analysis for Engineers 38

Solution

34.59)40,975.0(240025.0

5.62975.31411

975.0

2

2

2

2

invchicf

snC

So we reject the null hypothesis and the supplier is not acceptable

Uncertainty Analysis for Engineers 39

Confidence IntervalsIn addition to mean, standard

deviation, etc., confidence intervals can help us characterize populations

For example, the mean gives us a best estimate of the expected value of the population, but confidence intervals can help indicate the accuracy of the mean

Confidence interval is defined as the range within which a parameter will lie – within a prescribed probability

Uncertainty Analysis for Engineers 40

CI of the MeanFirst, we’ll assume the variance is

knownThe central limit theorem states

that the pdf of the mean of n individual observations from any distribution with finite mean and variance approaches a normal distribution as n approaches infinity

Uncertainty Analysis for Engineers 41

CI of the Mean

21

21

;

1

1

)1,0(

1

21

1

2

2121

212

212

K

K

nKx

nKxCI

nKx

nKxP

K

n

xKP

N

n

xK

Is CDF of standard normal variate

Uncertainty Analysis for Engineers 42

ExampleMeasure strength of rebar25 samplesMean=37.5 psiStandard deviation=3 psiFind 95% confidence interval for

mean

Uncertainty Analysis for Engineers 43

Solution

psi

KK

KK

7.38;3.3625396.15.37;

25396.15.37

96.1975.0

96.1975.0

95.0

95.0

1975.0

21

1025.0

2

So the mean of the strength falls between 36.3 and 38.7 with a 95% confidence level

Uncertainty Analysis for Engineers 44

The Scriptmu=37.5sig=3n=25alpha=0.05ka=-norminv(1-alpha/2)k1ma=-kacil=mu+ka*sig/sqrt(n)ciu=mu-ka*sig/sqrt(n)

Uncertainty Analysis for Engineers 45

Variance Not KnownWhat if the variance of the

population () is not known?That is, we only know variance of

sample.Let s=standard deviation of

sampleWe can show that

does not conform to a normal distribution, especially for small n

nsx

Uncertainty Analysis for Engineers 46

Variance Not KnownWe can show that this quantity

follows a Student’s t-distribution with n-1 degrees of freedom (f)

1

122/1)(

1,211,2

121

2

nn

f

t

t

nsxtP

ft

fftf

Uncertainty Analysis for Engineers 47

ExampleMeasure strength of rebar25 samplesMean=37.5 psis=3.5 psiFind 95% confidence interval for

mean

Uncertainty Analysis for Engineers 48

ScriptResult is 36.06, 38.94

xbar=37.5;s=3.5;n=25;alpha=0.05;ka=-tinv(1-alpha/2,n-1);kb=-tinv(alpha/2,n-1);cil=xbar+ka*s/sqrt(n)ciu=xbar+kb*s/sqrt(n)

Uncertainty Analysis for Engineers 49

One-Sided Confidence LimitSometimes we only care about

the upper or lower boundsLower

Upper

nstx

nKx

nstx

nKx

n

n

1,11

11

1,11

11

)

)

Uncertainty Analysis for Engineers 50

Example100 steel specimens – measure

strengthMean=2200 kgf; s=220 kgfSpecify 95% confidence limit of

mean

Assume =s=220 kgf1-=0.95; =0.05

216410022065.12200

65.1)95.0(

95.0

195.0

k Manufacturer has 95% confidence that yield strength is at least 2164 kgf

Uncertainty Analysis for Engineers 51

ExampleNow only 15 steel specimenMean=2200 kgf; s=220 kgfSpecify 95% confidence limit of mean

210015220761.12200

761.1

95.0

14,95.0

t

Manufacturer has 95% confidence that yield strength is at least 2100 kgf

Uncertainty Analysis for Engineers 52

Confidence Interval of Variance

1,2

2

1,21

2

1

2

1,212

2

1,2

1;1

11

nn

nn

csn

csn

csncP

Uncertainty Analysis for Engineers 53

Example25 storms, sample variance for measured

runoff is 0.36 in2

Find upper 95% confidence limit for variance

So, we can say, with 95% confidence, that the upper bound of the variance of the runoff is 0.624 in2 and the upper bound of the standard deviation is 0.79 in

2

1,

2

1

2 624.01 inc

sn

n

Uncertainty Analysis for Engineers 54

Scriptvar=0.36n=25alpha=0.05c=chi2inv(alpha,n-1)ci=1/c*var*(n-1)si=sqrt(ci)

Uncertainty Analysis for Engineers 55

Measurement TheorySuppose we are measuring

distancesd1, d2, …, dn are measured

distancesDistance estimate is

Standard error is◦s=standard deviation of sample◦d is the expected value of the mean

n

iidn

d1

1

ns

d

nstd

nstdd

nn 1,211,21;

Recommended