Final Exam Review
CE 3710December 11, 2015
Monday, December 1410:15 AM – 12:15 PMChemSci 101
Open book and open notes
Exam will be cumulative
**EMPHASIS ON MATERIAL COVERED SINCE EXAM II
CE 3710 Final Exam Review
• Key topics from Exam I:
– Be able to compute probabilities of events using intersections, unions, complements, etc., as well as knowledge of independence/conditional probability
– Be able to compute percentiles, probabilities, mean and variance of a random variable X for given pdf or cdf
– Be able to compute mean and variance of for linear functions of random variables
– Be able to apply Normal distribution
• Topics NOT on Final Exam:
– No counting
– No Bayes Theorem or Law of Total Probability
– No Binomial distribution
• Key topics from Exam II:
– Be able to state Central Limit Theorem and know when it applies
– Be able to calculate moments, parameters and percentiles of a lognormal distribution
– Be able to derive method of moments estimators
– Understand how to formulate a statistical hypothesis, and be able to articulate what the Type I and Type II errors represent
– Be able to perform one‐sample hypothesis tests
– Be able to construct probability plots to assess fit of Normal/Lognormal distributions
– Be able to construct confidence intervals for population mean and variance
• Topics NOT on Final Exam: No Gumbel or Weibull distributions
See Problem Set #9 and Exam II Problem 3
For X1, X2, ….Xn ~ iid(μ, σ2)
E[S2] = σ2
For X1, X2, ….Xn ~ iidN(μ, σ2), ~ N(μ, σ2/n) for any n
For X1, X2, ….Xn ~ iid(μ, σ2), ~ N(μ, σ2/n) for n ≥ 30
E[X]
2Var[X] / n
n
ii 1
X X / n
n
2 2i
i 1
1S [X X]n 1
X
X
Hypothesis Testing
H3. Students should be able to calculate the test statistic, rejection region for a specified α, and make a decision for one‐ and two‐sample z‐ and t‐tests. Students should also recognize when a paired analysis is appropriate and be able to perform the associated hypothesis test.
Two Sample Hypothesis Tests (Packet #22)
X ~ N[µX, σX2] {Xi} are independent, i = 1,...,nxY ~ N[µX, σY2] {Yj} are independent, j = 1,...,nYX and Y are independent
Test for difference in means: µX – µY
Ho: µX = µY (µX ‐ µY = 0)
‐‐versus‐‐
Ha: µX ≠ µY (µX ‐ µY ≠ 0) Reject if T ≥ tα/2,ν or T ≤ ‐tα/2,ν
Ha: µX < µY (µX ‐ µY < 0) Reject if T ≤ ‐tα,ν
Ha: µX > µY (µX ‐ µY > 0) Reject if T ≥ tα,ν
σX and σY known, X ~ N, Y ~ N, any n
X Ytest 2 2
X Y
X Y
(X Y) ( )Z
n n
σX and σY unknown, but nX and nY LARGE
X Ytest 2 2
X Y
X Y
(X Y) ( )ZS Sn n
σX and σY unknown, X ~ N, Y ~ N, small samples (< 30)
X Y2 2X Y
X Y
(X Y) ( )TS Sn n
22 2X Y
X Y4 2 4 2X X Y Y
X Y
s sn n
s / n s / nn 1 n 1
Paired TestsWhen two sets of observations are generated in pairs (Xi, Yi):
Let Di = Xi – YiµD = µX – µY Reduces two samples to one
Null hypothesis Ho: µX = µY becomes Ho: µD = 0
Possible alternative hypotheses:Ha: µD < 0 (lower tail)Ha: µD > 0 (upper tail)Ha: µD ≠ 0 (two tailed)
Use , sD and nD to compute test statistics: Ztest or Tn‐1
Additional computations are equivalent to procedures forone‐sample hypothesis test.
D
Goodness‐of‐Fit Analysis (Packet #27)Be able to compute the probability plot correlation test statistic, and perform a hypothesis test (Ryan Joiner Test of Normality) of whether the data set (or some transformation thereof) was drawn from a normal distribution.
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
‐3 ‐2 ‐1 0 1 2 3
x(i)
zpi
‐1.5
‐1.0
‐0.5
0.0
0.5
1.0
1.5
2.0
‐2.5 ‐2.0 ‐1.5 ‐1.0 ‐0.5 0.0 0.5 1.0 1.5 2.0 2.5
ln[x(i)]
zpi
See Problem 7 (Hmwk#11)
Ho: Population distribution is Normal (ρ = 1)
Ha: Population distribution is not Normal (ρ < 1)
Test Statistic: Rxz =
average value of the observations
average of the standard normal percentiles
Reject Ho if r ≤ ρc(ρc is obtained from the table below as a function of n and α)
i
i
n
p (i)i 1
n n2 2p (i)
i 1 i 1
z ( )
z ( )
x x
x x
n
ii 1
1n
x x
i
n
pi 1
1z z 0n
Or consider Rezfor residuals
Critical Values (ρc) for the Ryan Joiner Test of Normality Significance Level (α)
n 0.10 0.05 0.01
5 0.9033 0.8804 0.8320 10 0.9347 0.9180 0.8804 15 0.9506 0.9383 0.9110 20 0.9600 0.9503 0.9290 25 0.9662 0.9582 0.9408 30 0.9707 0.9639 0.9490 40 0.9767 0.9715 0.9597 50 0.9807 0.9764 0.9664 60 0.9835 0.9799 0.9710 75 0.9865 0.9835 0.9757 100 0.9893 0.9870 0.9812 300 0.99602 0.99525 0.99354 1000 0.99854 0.99824 0.99755
RegressionUnderstand the assumptions employed with the linear model Y = β0 + β1 x + ε.
‐1.00
‐0.80
‐0.60
‐0.40
‐0.20
0.00
0.20
0.40
0.60
0.80
2.50 2.70 2.90 3.10 3.30 3.50 3.70 3.90 4.10
Residu
als
Fitted y‐values
εi ~ iidN [ 0, 2 ]
Examples of possible violations from homework
‐1.5
‐1.0
‐0.5
0.0
0.5
1.0
1.5
2.0
0 2 4 6 8 10 12
Residu
alsFitted y‐values
‐0.80
‐0.60
‐0.40
‐0.20
0.00
0.20
0.40
0.60
0.80
0.0 2.0 4.0 6.0 8.0 10.0
Residu
als
Fitted y‐values)
Be able to calculate the estimators of β0 and β1, and estimate the precision of these estimators.
Regression Line: i 0 1 iˆ b b y x = predicted (fitted) y value
0 1b b y - x Var[b0] = 2
2n
2i
i 1
1n
( )
x
x x
n
i ii 1
1 n2
ii 1
( )( )b
( )
x x y y
x x Var[b1] =
2
n2
ii 1
( )
x x
n n2
i 1 i i2 2 i 1 i 1
e
( ) b ( )( )ˆ s
n 2
y y x x y y
Hypothesis Tests:
0 0
0
bTVar[b ]
or 1 1
1
bTVar[b ]
~ Student t distribution with n-2 degrees of freedom
Confidence Intervals:
0 /2,n 2 0b t Var(b )
1 /2,n 2 1b t Var(b )
Be able to calculate confidence intervals for β0 and β1, andconduct relevant hypothesis tests.
100(1-)% CI for mean of y ( 0 1 0ˆ b b y x ) given x = x0:
22 0
0 1 0 /2,n 2 n2
ii 1
( )1b b tn
( )
x xxx x
100(1-)% CI (or prediction interval) for a future observation of Y given x = xf:
22
0 1 /2,n 2 n2
ii 1
( )1b b t 1n
( )
f
fx x
xx x
Be able to calculate confidence intervals for the mean of y for a given x, and of prediction intervals for a future observation of y.
• Confidence interval reflects how well we can estimate the observed data (yi) how good is our model based on the {xi,yi} data available
• Want to use model for prediction/extrapolation --Prediction interval reflects precision of those estimates
• Intervals widen as move away from the mean; decreased precision for extremes and extrapolation beyond the data.
Other considerations:• Outliers• Leverage• Influence
SST = n
2i
i 1(y y)
= (n-1) 2
ys = total variation in the observed y values
SSR = n
2i
i 1ˆ(y y)
= variation explained by the regression model
SSE = n
2i
i 1ˆ(y y)
= (n-1) 2
es = unexplained variation attributed to random errors (noise)
Know the meaning of a sum‐of‐squares (ANOVA) table and the residual sum of squares.
Define SSxx, SSyy
Be able to calculate R2 and adjusted‐R2, and know what they represent.
Proportion of the observed variance in the yi explained by the fitted regression line:
R2 = R
T
SSSS
= 1 – E
T
SSSS
= 1 –
n2
i ii 1n
2i
i 1
ˆ( )
( )
y y
y y
2R = 1 – E
T
SS / (n k 1)SS / (n 1)
= 2e2y
s1s
Know the meaning of the correlation coefficient, be able to calculate its estimator Rxy, and be able to conduct a hypothesis test of whether ρxy = 0.
Ho: ρxy = 0 vs. Ha: ρxy ≠ 0
Test Statistic: ~ Student t distribution with n‐2 degrees of freedom
Decision Rule: Reject Ho if T ≥ tα/2,n‐2 or T ≤ ‐tα/2,n‐2
n
i ii 1
xy xy n n2 2
i ii 1 i 1
( )( )ˆ R
( ) ( )
x x y y
x x y y
2
R n 2T1 R
• Rxy is a measure of linear correlation
• Transform data (e.g. log‐linear, log‐log) to increase linearity of scatter plot
• Tests on Rxy are consistent with tests on β1
y = 0.6861x + 3.7506R² = 0.819
0
2
4
6
8
10
12
0 2 4 6 8 10
Elon
gatio
n (y)
Tensile Force (x)
y = 2.661x + 3.5982R² = 0.9763
0
1
2
3
4
5
6
7
8
9
10
‐0.500 0.000 0.500 1.000 1.500 2.000 2.500
Elon
gatio
n (y)
ln(x)