55
Statistical Inferences Jake Blanchard Spring 2010 Uncertainty Analysis for Engineers 1

Statistical Inferences

  • Upload
    dudley

  • View
    61

  • Download
    0

Embed Size (px)

DESCRIPTION

Statistical Inferences. Jake Blanchard Spring 2010. Introduction. Statistical inference=process of drawing conclusions from random data Conclusions of this process are “propositions,” for example Estimates Confidence intervals Credible intervals Rejecting a hypothesis - PowerPoint PPT Presentation

Citation preview

Page 1: Statistical Inferences

Uncertainty Analysis for Engineers 1

Statistical InferencesJake BlanchardSpring 2010

Page 2: Statistical Inferences

Uncertainty Analysis for Engineers 2

IntroductionStatistical inference=process of

drawing conclusions from random dataConclusions of this process are

“propositions,” for example◦Estimates◦Confidence intervals◦Credible intervals◦Rejecting a hypothesis◦Clustering data points

Part of this is the estimation of model parameters

Page 3: Statistical Inferences

Uncertainty Analysis for Engineers 3

Parameter EstimationPoint Estimation

◦Calculate single number from a set of observational data

Interval Estimation◦Determine interval within which true

parameter lies (along with confidence level)

Page 4: Statistical Inferences

Uncertainty Analysis for Engineers 4

PropertiesBias=expected value of

estimator does not necessarily equal parameter

Consistency=estimator approaches parameter as n approaches infinity

Efficiency=smaller variance of parameter implies higher efficiency

Sufficient=utilizes all pertinent information in a sample

Page 5: Statistical Inferences

Uncertainty Analysis for Engineers 5

Point EstimationStart with data sample of size NExample: estimate fraction of voters

who will vote for particular candidate (estimate is based on random sample of voters)

Other examples: quality control, clinical trials, software engineering, orbit prediction

Assume successive samples are statistically independent

Page 6: Statistical Inferences

Uncertainty Analysis for Engineers 6

EstimatorsMaximum likelihoodMethod of momentsMinimum mean squared errorBayes estimatorsCramer-Rao boundMaximum a posterioriMinimum variance unbiased

estimatorBest linear unbiased estimatoretc

Page 7: Statistical Inferences

Uncertainty Analysis for Engineers 7

Maximum LikelihoodSuppose we have a random

variable x with pdf f(x;)Take n samples of xWhat is value of that will

maximize the likelihood of obtaining these n observations?

Let L=likelihood of observing this set of values for x

Then maximize L with respect to

Page 8: Statistical Inferences

Uncertainty Analysis for Engineers 8

Maximum Likelihood

0;,...,log

0;,...,

);()...;();(;,...,

21

21

2121

n

n

nn

xxxL

xxxL

xfxfxfxxxL

Page 9: Statistical Inferences

Uncertainty Analysis for Engineers 9

ExampleTime between successive arrivals

of vehicles at an intersection are 1.2, 3, 6.3, 10.1, 5.2, 2.4, and 7.2 seconds

Assume exponential distributionFind MLE for

Page 10: Statistical Inferences

Uncertainty Analysis for Engineers 10

Solution

04.573.35

03.357)log(

3.35)(71)(7)log(

1exp11

1

2

7

1

7

17

7

1

/

/

L

LogtLogL

teL

ef

ii

ii

i

t

t

t

Page 11: Statistical Inferences

Uncertainty Analysis for Engineers 11

2-Parameter ExampleMeasure cycles to failure of

saturated sand (25, 20, 28, 33, 26 cycles)

Assume lognormal distribution

Page 12: Statistical Inferences

Uncertainty Analysis for Engineers 12

Solution

164.0

027.0)ln(1

26.3)ln(1

0)ln(1)ln(

0)ln(1)ln(

)ln(21ln)ln()2ln()ln(

)ln(21exp1

21)ln(

21exp

21

)ln(21exp

21

1

22

1

1

23

12

1

22

1

1

22

11

2

2

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

i i

n

i

i

i

i

i

xn

xn

xnL

xL

xxnnL

xx

xx

L

xx

f

Page 13: Statistical Inferences

Uncertainty Analysis for Engineers 13

Method of MomentsUse sample moments (mean,

variance, etc.) to set distribution parameters

Page 14: Statistical Inferences

Uncertainty Analysis for Engineers 14

ExampleTime between successive arrivals

of vehicles at an intersection are 1.2, 3, 6.3, 10.1, 5.2, 2.4, and 7.2 seconds

Assume exponential distributionMean=5.05

Page 15: Statistical Inferences

Uncertainty Analysis for Engineers 15

2-Parameter ExampleMeasure cycles to failure of

saturated sand (25, 20, 28, 33, 26 cycles)

Assume lognormal distributionMean=26.4Standard Deviation=4.72Solve for and =3.26=0.177

Page 16: Statistical Inferences

Uncertainty Analysis for Engineers 16

Solution

164.0

027.0)ln(1

26.3)ln(1

0)ln(1)ln(

0)ln(1)ln(

)ln(21ln)ln()2ln()ln(

)ln(21exp1

21)ln(

21exp

21

)ln(21exp

21

1

22

1

1

23

12

1

22

1

1

22

11

2

2

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

i i

n

i

i

i

i

i

xn

xn

xnL

xL

xxnnL

xx

xx

L

xx

f

Page 17: Statistical Inferences

Uncertainty Analysis for Engineers 17

Minimum Mean Square ErrorChoose parameters to minimize

mean squared error between measured data and continuous distribution

Essentially a curve fit

Page 18: Statistical Inferences

Uncertainty Analysis for Engineers 18

ApproachExcel

◦Guess parameters◦Calculate sum of squares of errors◦Vary guessed parameters to

minimize error (use the Solver)Matlab

◦Use fminsearch function

Page 19: Statistical Inferences

Uncertainty Analysis for Engineers 19

ExampleSolar insolation data

◦Gather data◦Form histogram◦Normalize histogram by number of

samples and width of bins

Page 20: Statistical Inferences

Uncertainty Analysis for Engineers 20

Scatter Plot and Histogram

3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 45000

2

4

6

8

10

120 5 10 15 20 25 30 350

50010001500200025003000350040004500

Page 21: Statistical Inferences

Uncertainty Analysis for Engineers 21

Normal and Weibull Fits

3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 45000

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 45000

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

Mean=3980 (fit)Mean=3915 (data)

Page 22: Statistical Inferences

Uncertainty Analysis for Engineers 22

Excel Screen Shot

Page 23: Statistical Inferences

Uncertainty Analysis for Engineers 23

Excel Screen Shot

Page 24: Statistical Inferences

Uncertainty Analysis for Engineers 24

Solver Set Up

Page 25: Statistical Inferences

Uncertainty Analysis for Engineers 25

Matlab Scripty=xlsread('matlabfit.xlsx','normal')[s,t]=hist(y,8);s=s/((max(t)-min(t))/8)/numel(y);numpts=numel(t);zin(1)=mean(t); zin(2)=std(t);sumoferrs(zin,t,s)zout=fminsearch(@(z) sumoferrs(z,t,s), zin)sumoferrs(zout,t,s)xplot=t(1):(t(end)-t(1))/(10*numel(t)):t(end);yplot=curve(xplot,zout);plot(t,s,'+',xplot,yplot)

Page 26: Statistical Inferences

Uncertainty Analysis for Engineers 26

Matlab Scriptfunction f=curve(x,z)mu=z(1);sig=z(2);f=normpdf(x,mu,sig); function f=sumoferrs(z, x, y)f=sum((curve(x,z)-y).^2);

Page 27: Statistical Inferences

Uncertainty Analysis for Engineers 27

Sampling DistributionsHow do we assess inaccuracy in

using sample mean to estimate population mean?

n

nn

nxVar

nx

nVarxVar

nn

xn

E

xn

x

x

n

ii

n

ii

n

iix

n

ii

22

21

21

1

1

111

11

1

Page 28: Statistical Inferences

Uncertainty Analysis for Engineers 28

ConclusionsExpected value of mean is equal to

population meanMean of sample is unbiased estimator

of mean of populationVariance of sample mean is sampling

errorBy CLT, sample mean is Gaussian for

large nMean of x is N(,/n)Estimator for improves as n increases

Page 29: Statistical Inferences

Uncertainty Analysis for Engineers 29

Sample Mean with Unknown In previous derivation, is the

population meanThis is generally not knownAll we have is the sample

variance (s2)If sample size is small,

distribution will not be GaussianWe can use a “student’s t-

distribution”

121

2

12/2/1)(

f

T ft

ffftf

f=number of degrees of freedom

Page 30: Statistical Inferences

Uncertainty Analysis for Engineers 30

Distribution of Sample Variance

2222

2

1

22

1

22

1

2

1 1

2

1 1

2

1

22

1

2

1

2

1

22

1

22

11

11

22

22

2

2

11

1111

nn

sE

xnExEn

sE

xnxxnnxnxx

xnxnxx

xxxx

xxxxxx

xxEn

xxEn

sE

xxn

s

n

ii

n

ii

n

ii

n

i

n

iii

n

i

n

iii

n

iii

n

ii

n

ii

n

ii

n

ii

Page 31: Statistical Inferences

Uncertainty Analysis for Engineers 31

ConclusionsSample variance is unbiased

estimator of population variance

For normal variates

44

44

42

13

xE

nn

nsVar

n

i

i

n

ii

n

ii

nxxsn

xnxxxsn

1

22

2

2

1

22

1

22

/)1(

)1(

Chi-Square Distribution with n-1 dof

This approaches normal

distribution for large

n

Page 32: Statistical Inferences

Uncertainty Analysis for Engineers 32

Testing HypothesesUsed to make decisions about

population based on sampleSteps

◦Define null and alternative hypotheses◦Identify test statistic◦Estimate test statistic, based on sample◦Specify level of significance Type I error: rejecting null hypothesis when it is

true Type II error: accepting null hypothesis when it

is false◦Define region of rejection (one tail or two?)

Page 33: Statistical Inferences

Uncertainty Analysis for Engineers 33

Level of SignificanceType I error

◦Level of significance ()◦Typically 1-5%

Type II error () is seldom used

Page 34: Statistical Inferences

Uncertainty Analysis for Engineers 34

ExampleWe need yield strength of rebar to

be at least 38 psiWe order sample of 25 rebarsSample mean from 25 tests is

37.5 psiStandard deviation of rebar

strength =3 psiUse one-sided testHypotheses: null-=38; alt.- <38

Page 35: Statistical Inferences

Uncertainty Analysis for Engineers 35

Solution

64.1)1,0,05.0(norminv)05.0()(

833.0

253

385.37

11

zn

xZ

So we cannot reject the null hypothesis and the supplier is considered acceptable

Page 36: Statistical Inferences

Uncertainty Analysis for Engineers 36

Variation of This ExampleSuppose standard deviation is

not knownUse student’s t-distributionSample stand. dev. = 3.5 psi

711.1)24,05.0(tinv24125

714.0

255.3

385.375.35.37

tdoff

n

xT

psispsix

So we cannot reject the null hypothesis and the supplier is considered acceptable

Page 37: Statistical Inferences

Uncertainty Analysis for Engineers 37

Third VariationSample size increased to 41Sample mean=37.6 psiSample standard deviation =

3.75 psiNull-variance=9Alternative-variance>9Use Chi-Square distribution

Page 38: Statistical Inferences

Uncertainty Analysis for Engineers 38

Solution

34.59)40,975.0(240025.0

5.62975.31411

975.0

2

2

2

2

invchicf

snC

So we reject the null hypothesis and the supplier is not acceptable

Page 39: Statistical Inferences

Uncertainty Analysis for Engineers 39

Confidence IntervalsIn addition to mean, standard

deviation, etc., confidence intervals can help us characterize populations

For example, the mean gives us a best estimate of the expected value of the population, but confidence intervals can help indicate the accuracy of the mean

Confidence interval is defined as the range within which a parameter will lie – within a prescribed probability

Page 40: Statistical Inferences

Uncertainty Analysis for Engineers 40

CI of the MeanFirst, we’ll assume the variance is

knownThe central limit theorem states

that the pdf of the mean of n individual observations from any distribution with finite mean and variance approaches a normal distribution as n approaches infinity

Page 41: Statistical Inferences

Uncertainty Analysis for Engineers 41

CI of the Mean

21

21

;

1

1

)1,0(

1

21

1

2

2121

212

212

K

K

nKx

nKxCI

nKx

nKxP

K

n

xKP

N

n

xK

Is CDF of standard normal variate

Page 42: Statistical Inferences

Uncertainty Analysis for Engineers 42

ExampleMeasure strength of rebar25 samplesMean=37.5 psiStandard deviation=3 psiFind 95% confidence interval for

mean

Page 43: Statistical Inferences

Uncertainty Analysis for Engineers 43

Solution

psi

KK

KK

7.38;3.3625396.15.37;

25396.15.37

96.1975.0

96.1975.0

95.0

95.0

1975.0

21

1025.0

2

So the mean of the strength falls between 36.3 and 38.7 with a 95% confidence level

Page 44: Statistical Inferences

Uncertainty Analysis for Engineers 44

The Scriptmu=37.5sig=3n=25alpha=0.05ka=-norminv(1-alpha/2)k1ma=-kacil=mu+ka*sig/sqrt(n)ciu=mu-ka*sig/sqrt(n)

Page 45: Statistical Inferences

Uncertainty Analysis for Engineers 45

Variance Not KnownWhat if the variance of the

population () is not known?That is, we only know variance of

sample.Let s=standard deviation of

sampleWe can show that

does not conform to a normal distribution, especially for small n

nsx

Page 46: Statistical Inferences

Uncertainty Analysis for Engineers 46

Variance Not KnownWe can show that this quantity

follows a Student’s t-distribution with n-1 degrees of freedom (f)

1

122/1)(

1,211,2

121

2

nn

f

t

t

nsxtP

ft

fftf

Page 47: Statistical Inferences

Uncertainty Analysis for Engineers 47

ExampleMeasure strength of rebar25 samplesMean=37.5 psis=3.5 psiFind 95% confidence interval for

mean

Page 48: Statistical Inferences

Uncertainty Analysis for Engineers 48

ScriptResult is 36.06, 38.94

xbar=37.5;s=3.5;n=25;alpha=0.05;ka=-tinv(1-alpha/2,n-1);kb=-tinv(alpha/2,n-1);cil=xbar+ka*s/sqrt(n)ciu=xbar+kb*s/sqrt(n)

Page 49: Statistical Inferences

Uncertainty Analysis for Engineers 49

One-Sided Confidence LimitSometimes we only care about

the upper or lower boundsLower

Upper

nstx

nKx

nstx

nKx

n

n

1,11

11

1,11

11

)

)

Page 50: Statistical Inferences

Uncertainty Analysis for Engineers 50

Example100 steel specimens – measure

strengthMean=2200 kgf; s=220 kgfSpecify 95% confidence limit of

mean

Assume =s=220 kgf1-=0.95; =0.05

216410022065.12200

65.1)95.0(

95.0

195.0

k Manufacturer has 95% confidence that yield strength is at least 2164 kgf

Page 51: Statistical Inferences

Uncertainty Analysis for Engineers 51

ExampleNow only 15 steel specimenMean=2200 kgf; s=220 kgfSpecify 95% confidence limit of mean

210015220761.12200

761.1

95.0

14,95.0

t

Manufacturer has 95% confidence that yield strength is at least 2100 kgf

Page 52: Statistical Inferences

Uncertainty Analysis for Engineers 52

Confidence Interval of Variance

1,2

2

1,21

2

1

2

1,212

2

1,2

1;1

11

nn

nn

csn

csn

csncP

Page 53: Statistical Inferences

Uncertainty Analysis for Engineers 53

Example25 storms, sample variance for measured

runoff is 0.36 in2

Find upper 95% confidence limit for variance

So, we can say, with 95% confidence, that the upper bound of the variance of the runoff is 0.624 in2 and the upper bound of the standard deviation is 0.79 in

2

1,

2

1

2 624.01 inc

sn

n

Page 54: Statistical Inferences

Uncertainty Analysis for Engineers 54

Scriptvar=0.36n=25alpha=0.05c=chi2inv(alpha,n-1)ci=1/c*var*(n-1)si=sqrt(ci)

Page 55: Statistical Inferences

Uncertainty Analysis for Engineers 55

Measurement TheorySuppose we are measuring

distancesd1, d2, …, dn are measured

distancesDistance estimate is

Standard error is◦s=standard deviation of sample◦d is the expected value of the mean

n

iidn

d1

1

ns

d

nstd

nstdd

nn 1,211,21;