Download pdf - Final Exam Review - MTUvlweb/CE 3710 Materials/CE3710 Lecture Supplem… · Final Exam Review CE 3710 ... ChemSci 101 Open book and open notes Exam will be cumulative ... 2.0 2.5

Final Exam Review

CE 3710December 11, 2015

Monday, December 1410:15 AM – 12:15 PMChemSci 101

Open book and open notes

Exam will be cumulative

**EMPHASIS ON MATERIAL COVERED SINCE EXAM II

CE 3710 Final Exam Review

• Key topics from Exam I:

– Be able to compute probabilities of events using intersections, unions, complements, etc., as well as knowledge of independence/conditional probability

– Be able to compute percentiles, probabilities, mean and variance of a random variable X for given pdf or cdf

– Be able to compute mean and variance of for linear functions of random variables

– Be able to apply Normal distribution

• Topics NOT on Final Exam:

– No counting

– No Bayes Theorem or Law of Total Probability

– No Binomial distribution

• Key topics from Exam II:

– Be able to state Central Limit Theorem and know when it applies

– Be able to calculate moments, parameters and percentiles of a lognormal distribution

– Be able to derive method of moments estimators

– Understand how to formulate a statistical hypothesis, and be able to articulate what the Type I and Type II errors represent

– Be able to perform one‐sample hypothesis tests

– Be able to construct probability plots to assess fit of Normal/Lognormal distributions

– Be able to construct confidence intervals for population mean and variance

• Topics NOT on Final Exam: No Gumbel or Weibull distributions

See Problem Set #9 and Exam II Problem 3

For X1, X2, ….Xn ~ iid(μ, σ2)

E[S2] = σ2

For X1, X2, ….Xn ~ iidN(μ, σ2), ~ N(μ, σ2/n) for any n

For X1, X2, ….Xn ~ iid(μ, σ2), ~ N(μ, σ2/n) for n ≥ 30

E[X]

2Var[X] / n

n

ii 1

X X / n

n

2 2i

i 1

1S [X X]n 1

X

X

Hypothesis Testing

H3. Students should be able to calculate the test statistic, rejection region for a specified α, and make a decision for one‐ and two‐sample z‐ and t‐tests. Students should also recognize when a paired analysis is appropriate and be able to perform the associated hypothesis test.

Two Sample Hypothesis Tests (Packet #22)

X ~ N[µX, σX2] {Xi} are independent, i = 1,...,nxY ~ N[µX, σY2] {Yj} are independent, j = 1,...,nYX and Y are independent

Test for difference in means: µX – µY

Ho: µX = µY (µX ‐ µY = 0)

‐‐versus‐‐

Ha: µX ≠ µY (µX ‐ µY ≠ 0) Reject if T ≥ tα/2,ν or T ≤ ‐tα/2,ν

Ha: µX < µY (µX ‐ µY < 0) Reject if T ≤ ‐tα,ν

Ha: µX > µY (µX ‐ µY > 0) Reject if T ≥ tα,ν

σX and σY known, X ~ N, Y ~ N, any n

X Ytest 2 2

X Y

X Y

(X Y) ( )Z

n n

σX and σY unknown, but nX and nY LARGE

X Ytest 2 2

X Y

X Y

(X Y) ( )ZS Sn n

σX and σY unknown, X ~ N, Y ~ N, small samples (< 30)

X Y2 2X Y

X Y

(X Y) ( )TS Sn n

22 2X Y

X Y4 2 4 2X X Y Y

X Y

s sn n

s / n s / nn 1 n 1

Paired TestsWhen two sets of observations are generated in pairs (Xi, Yi):

Let Di = Xi – YiµD = µX – µY Reduces two samples to one

Null hypothesis Ho: µX = µY becomes Ho: µD = 0

Possible alternative hypotheses:Ha: µD < 0 (lower tail)Ha: µD > 0 (upper tail)Ha: µD ≠ 0 (two tailed)

Use , sD and nD to compute test statistics: Ztest or Tn‐1

Additional computations are equivalent to procedures forone‐sample hypothesis test.

D

Goodness‐of‐Fit Analysis (Packet #27)Be able to compute the probability plot correlation test statistic, and perform a hypothesis test (Ryan Joiner Test of Normality) of whether the data set (or some transformation thereof) was drawn from a normal distribution.

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

‐3 ‐2 ‐1 0 1 2 3

x(i)

zpi

‐1.5

‐1.0

‐0.5

0.0

0.5

1.0

1.5

2.0

‐2.5 ‐2.0 ‐1.5 ‐1.0 ‐0.5 0.0 0.5 1.0 1.5 2.0 2.5

ln[x(i)]

zpi

See Problem 7 (Hmwk#11)

Ho: Population distribution is Normal (ρ = 1)

Ha: Population distribution is not Normal (ρ < 1)

Test Statistic: Rxz =

average value of the observations

average of the standard normal percentiles

Reject Ho if r ≤ ρc(ρc is obtained from the table below as a function of n and α)

i

i

n

p (i)i 1

n n2 2p (i)

i 1 i 1

z ( )

z ( )

x x

x x

n

ii 1

1n

x x

i

n

pi 1

1z z 0n

Or consider Rezfor residuals

Critical Values (ρc) for the Ryan Joiner Test of Normality Significance Level (α)

n 0.10 0.05 0.01

5 0.9033 0.8804 0.8320 10 0.9347 0.9180 0.8804 15 0.9506 0.9383 0.9110 20 0.9600 0.9503 0.9290 25 0.9662 0.9582 0.9408 30 0.9707 0.9639 0.9490 40 0.9767 0.9715 0.9597 50 0.9807 0.9764 0.9664 60 0.9835 0.9799 0.9710 75 0.9865 0.9835 0.9757 100 0.9893 0.9870 0.9812 300 0.99602 0.99525 0.99354 1000 0.99854 0.99824 0.99755

RegressionUnderstand the assumptions employed with the linear model Y = β0 + β1 x + ε.

‐1.00

‐0.80

‐0.60

‐0.40

‐0.20

0.00

0.20

0.40

0.60

0.80

2.50 2.70 2.90 3.10 3.30 3.50 3.70 3.90 4.10

Residu

als

Fitted y‐values

εi ~ iidN [ 0, 2 ]

Examples of possible violations from homework

‐1.5

‐1.0

‐0.5

0.0

0.5

1.0

1.5

2.0

0 2 4 6 8 10 12

Residu

alsFitted y‐values

‐0.80

‐0.60

‐0.40

‐0.20

0.00

0.20

0.40

0.60

0.80

0.0 2.0 4.0 6.0 8.0 10.0

Residu

als

Fitted y‐values)

Be able to calculate the estimators of β0 and β1, and estimate the precision of these estimators.

Regression Line: i 0 1 iˆ b b y x = predicted (fitted) y value

0 1b b y - x Var[b0] = 2

2n

2i

i 1

1n

( )

x

x x

n

i ii 1

1 n2

ii 1

( )( )b

( )

x x y y

x x Var[b1] =

2

n2

ii 1

( )

x x

n n2

i 1 i i2 2 i 1 i 1

e

( ) b ( )( )ˆ s

n 2

y y x x y y

Hypothesis Tests:

0 0

0

bTVar[b ]

or 1 1

1

bTVar[b ]

~ Student t distribution with n-2 degrees of freedom

Confidence Intervals:

0 /2,n 2 0b t Var(b )

1 /2,n 2 1b t Var(b )

Be able to calculate confidence intervals for β0 and β1, andconduct relevant hypothesis tests.

100(1-)% CI for mean of y ( 0 1 0ˆ b b y x ) given x = x0:

22 0

0 1 0 /2,n 2 n2

ii 1

( )1b b tn

( )

x xxx x

100(1-)% CI (or prediction interval) for a future observation of Y given x = xf:

22

0 1 /2,n 2 n2

ii 1

( )1b b t 1n

( )

f

fx x

xx x

Be able to calculate confidence intervals for the mean of y for a given x, and of prediction intervals for a future observation of y.

• Confidence interval reflects how well we can estimate the observed data (yi) how good is our model based on the {xi,yi} data available

• Want to use model for prediction/extrapolation --Prediction interval reflects precision of those estimates

• Intervals widen as move away from the mean; decreased precision for extremes and extrapolation beyond the data.

Other considerations:• Outliers• Leverage• Influence

SST = n

2i

i 1(y y)

= (n-1) 2

ys = total variation in the observed y values

SSR = n

2i

i 1ˆ(y y)

= variation explained by the regression model

SSE = n

2i

i 1ˆ(y y)

= (n-1) 2

es = unexplained variation attributed to random errors (noise)

Know the meaning of a sum‐of‐squares (ANOVA) table and the residual sum of squares.

Define SSxx, SSyy

Be able to calculate R2 and adjusted‐R2, and know what they represent.

Proportion of the observed variance in the yi explained by the fitted regression line:

R2 = R

T

SSSS

= 1 – E

T

SSSS

= 1 –

n2

i ii 1n

2i

i 1

ˆ( )

( )

y y

y y

2R = 1 – E

T

SS / (n k 1)SS / (n 1)

= 2e2y

s1s

Know the meaning of the correlation coefficient, be able to calculate its estimator Rxy, and be able to conduct a hypothesis test of whether ρxy = 0.

Ho: ρxy = 0 vs. Ha: ρxy ≠ 0

Test Statistic: ~ Student t distribution with n‐2 degrees of freedom

Decision Rule: Reject Ho if T ≥ tα/2,n‐2 or T ≤ ‐tα/2,n‐2

n

i ii 1

xy xy n n2 2

i ii 1 i 1

( )( )ˆ R

( ) ( )

x x y y

x x y y

2

R n 2T1 R

• Rxy is a measure of linear correlation

• Transform data (e.g. log‐linear, log‐log) to increase linearity of scatter plot

• Tests on Rxy are consistent with tests on β1

y = 0.6861x + 3.7506R² = 0.819

0

2

4

6

8

10

12

0 2 4 6 8 10

Elon

gatio

n (y)

Tensile Force (x)

y = 2.661x + 3.5982R² = 0.9763

0

1

2

3

4

5

6

7

8

9

10

‐0.500 0.000 0.500 1.000 1.500 2.000 2.500

Elon

gatio

n (y)

ln(x)