Upload
al-ahmadgaid-asaad
View
296
Download
6
Tags:
Embed Size (px)
DESCRIPTION
A problem set on Experimental Design, which is all about the computation of Complete Randomized Design.
Citation preview
Mindanao State University
Iligan Institute of Technology
Experimental Design
Problem Set Student Instructor Asaad, Al-Ahmadgaid B. Lopez, Rosadelima Visorbo alstat.weebly.com
alstatr.blogspot.com
June 20, 2012
I. Questions
3-1 The tensile strength of Portland cement is being studied. Four different mixing techniques
can be used economically. The following data have been collected:
Mixing Techniques Tensile Strength (lb/in2)
1 3129 3000 2865 2890
2 3200 3300 2975 3150
3 2800 2900 3985 3050
4 2600 2700 2600 2765
(a) Test the hypothesis that mixing techniques affect the strength of the cement. Use
.
(b) Construct a graphical display as described in Section 3-5.3 to compare the mean
tensile strengths for the four mixing techniques. What are your conclusions?
(c) Use the Fisher LSD method with to make comparisons between pairs of
means.
(d) Construct a normal probability plot of the residuals. What conclusion would you draw
about the validity of the normality assumption?
(e) Plot the residuals versus the predicted tensile strength. Comment on the plot.
(f) Prepare a scatter plot of the results to aid the interpretation of the results of this
experiment.
3-2 .
(a) Rework part (b) of Problem 3-1 using Duncan’s multiple range test with . Does
this make any difference in your conclusions?
(b) Rework part (b) of Problem 3-1 using Tukey’s test with . Do you get the same
conclusions from Tukey’s test that you did from the graphical procedure and/or
Duncan’s multiple range test?
(c) Explain the difference between the Tukey and Duncan procedures.
3-3 Reconsider the experiment in Problem 3-1. Find a 95 percent confidence interval on the
mean tensile strength of the Portland cement produced by each of the four mixing
techniques. Also find a 95 percent confidence interval on the difference in means for
techniques 1 and 3. Does this aid you in interpreting the results of the experiment?
II. Computational and Graphical Section
3-1 The tensile strength of Portland cement is being studied. Four different mixing techniques
can be used economically. The following data have been collected:
Mixing Techniques Tensile Strength (lb/in2) Totals
Averages
1 3129 3000 2865 2890 11884 2971
2 3200 3300 2975 3150 12625 3156.25
3 2800 2900 2985 3050 11735 2933.75
4 2600 2700 2600 2765 10665 2666.25
(a) Test the hypothesis that mixing techniques affect the strength of the cement. Use
.
I. Hypotheses:
H0:
H1: some means are different
II. Level of significance:
III. Test Statistics:
IV. Rejection Region:
V. Computation:
∑ ∑
( ) ( ) ( ) ( ) ( )
∑
(
) [( ) ( ) ]
( )
(
) [ ]
( )
ANOVA Table
Source Sum of
Squares
Degrees of
Freedom
Mean
Square P-Value
Model 489740.19 3 163246.73 12.73 0.0005
Error 153908.25 12 12825.69
Total 643648.44 15
The F-value of 12.73 implies that the model is significant, since it is greater than the
tabulated value, 3.49. And the p-value of it is also less than the level of significance. Thus,
will lead to the rejection of the null hypothesis and conclude that the mixing techniques
affect the strength of the cement significantly.
(b) Construct a graphical display as described in Section 3-5.3 to compare the mean tensile
strengths for the four mixing techniques. What are your conclusions?
√
√
Dashed line in the plot by color: Red – Mean of Treatment 4 (2666.25)
Pink – Grand Mean (2931.81)
Brown – Mean of Treatment 3 (2933.75)
Green – Mean of Treatment 1 (2971.00)
Blue – Mean of Treatment 2 (3156.25)
Based on the plot and from the data also, we would conclude that and are the
same, refer also to the plot of question 3-1 (f). Morever, the differs from that of and
, and that differs from and , and that and are different.
How did I do it?
First thing we need to do is to make a student t distribution with degrees of freedom N – 1
= 15. After having that plot, we need to insert the four means of the treatment and
locate it in the x-values. Now, since the mean values are not seen on the plot because
it’s too large, we then convert it first to t-values, using the following formula,
√
You can confirm this in the R Codes Section
(c) Use the Fisher LSD method with to make comparisons between pairs of means.
√(
) √ ( )
√
Thus, any pair of treatment averages that differ in absolute value by more than 174.495
would imply that the corresponding pair of population means are significantly different.
The differences in averages are
The starred values indicate pairs of means that are significantly different.
Means with the same letter are not significantly different, at .
Data Layout for Fisher LSD Method
Group Treatment Means
a B 3156.25
b A 2971.00
b C 2933.75
c D 2666.25
(d) Construct a normal probability plot of the residuals. What conclusion would you draw
about the validity of the normality assumption?
Nothing is unusual in the plot. The residuals met the normality assumption since the points
fluctuate within the 95 percent confidence interval.
(e) Plot the residuals versus the predicted tensile strength. Comment on the plot.
The points exhibits a little outward-opening funnel or megaphone, though not too
obvious but still affect the non-constancy of the error variance.
(f) Prepare a scatter plot of the results to aid the interpretation of the results of this
experiment.
3-2.
(a) Rework part (b) of Problem 3-1 using Duncan’s multiple range test with . Does this
make any difference in your conclusions?
Ranking the treatment averages in ascending order, we have
The standard error of each average is √(
) From the table of
significant ranges for 12 degrees of freedom and , we obtain ( ) ( ) ( ) . Thus, the least significant ranges are
( ) ( )( )
( ) ( )( )
( ) ( )( )
The comparison would yield
2 vs. 4: ( )
2 vs. 3: ( )
2 vs. 1: ( )
1 vs. 4: ( )
1 vs. 3: ( )
3 vs. 4: ( )
From the analysis we observed that there are significant differences between all pairs of
means except 1 and 3.
Means with the
same letter are
not significantly
different, at
.
This makes no difference in the previous conclusion of LSD method, which confirms that
the Duncan’s multiple range test and the LSD method produce identical conclusions.
(b) Rework part (b) of Problem 3-1 using Tukey’s test with . Do you get the same
conclusions from Tukey’s test that you did from the graphical procedure and/or
Duncan’s multiple range test?
( )√
√
( )
Thus, any pair of treatment averages that differ in absolute value by more than 237.825
would imply that the corresponding pair of population means is significantly different. The
four treatment averages are,
And the differences in averages are
The starred values indicate pairs of means that are significantly different.
The conclusions are not the same. The mean of Treatment 4 is different than the means
of Treatment 1, 2, and 3 in Duncans, and that mean of Treatment 2 is different than the
means of Treatment 1 and 3. However, in Tukey the mean of Treatment 2 is not different
than the means of Treatment 1 and 3. They were found to be different using the
graphical method and the Fisher LSD method.
Data Layout for Duncan’s Multiple Range Test
Group Treatment Means
a B 3156.25
b A 2971.00
b C 2933.75
c D 2666.25
(c) Explain the difference between the Tukey and Duncan procedures.
Tukey utilizes single critical value, while Duncan has several critical values. Morever, Tukey
is based on the studentized range statistic while Duncan is based on standard error of
each average.
3-3 Reconsider the experiment in Problem 3-1. Find a 95 percent confidence interval on the
mean tensile strength of the Portland cement produced by each of the four mixing
techniques. Also find a 95 percent confidence interval on the difference in means for
techniques 1 and 3. Does this aid you in interpreting the results of the experiment?
√
√
Treatment 1: √
Thus, the desired 95 percent confidence interval is
Treatment 2:
Thus, the desired 95 percent confidence interval is
Treatment 3:
Thus, the desired 95 percent confidence interval is
Treatment 4:
Thus, the desired 95 percent confidence interval is
Treatment 1 - Treatment 3:
√(
)
√(
)
√ ( )
Thus, the desired 95 percent confidence interval on the difference between Treatment 1
and 3 is
The above computations performed gives us an idea that the corresponding population
mean of every treatment means which we are estimating falls on the above intervals.
III. R Codes Section
Note: You cannot run the codes in questions 3-1 (c), (d), and so on unless you run first the
data inputted in the 3-1 (a). To avoid errors it is better to run the codes in every question
first, starting from question 3-1 (a).
#(3-1.a) Test the hypothesis that mixing techniques affect the strength of
#the cement. Use .
#INPUT
TensileData <- read.table(header = TRUE, text = "
Treatment Observations Predicted
A 3129 2971
A 3000 2971
A 2865 2971
A 2890 2971
B 3200 3156.25
B 3300 3156.25
B 2975 3156.25
B 3150 3156.25
C 2800 2933.75
C 2900 2933.75
C 2985 2933.75
C 3050 2933.75
D 2600 2666.25
D 2700 2666.25
D 2600 2666.25
D 2765 2666.25")
attach(TensileData)
Model<-aov(Observations~Treatment, data=TensileData)
summary(Model)
#OUTPUT
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 3 489740 163247 12.73 0.000489 ***
Residuals 12 153908 12826
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#(3-1.b) Construct a graphical display as described in Section 3-5.3 to
#compare the mean tensile strengths for the four mixing techniques. What are
#your conclusions?
#INPUT
library(ggplot2)
x <- seq(-4.5, 4.5, length = 90)
xval <- c(2666.25, 2933.75, 2971, 3156.25)
xvaltrn <- (xval - mean(xval))/(sd(xval)/sqrt(4))
tvalues <- dt(x,15)
vlines <- data.frame(xint = c(xvaltrn,mean(xvaltrn)),grp = letters[1:5])
attach(vlines)
qplot(x, tvalues) + geom_polygon(fill = "purple", colour = "purple",
alpha = 0.5) + geom_point(fill = "purple", colour = "purple", alpha = 0.2,
pch = 21) + geom_vline(data = vlines,aes(xintercept = xint, colour = grp),
linetype = "dashed", size = 1) + theme_bw() +
xlab(bquote(bold('x values with intercept of Average Tensile Strength
(lb/in'^'2'*')'))) + ylab(expression(bold(P(x)))) +
opts(title = expression(bold("Scaled t Distribution")),
plot.title = theme_text(size = 20, colour = "darkblue"),
panel.border = theme_rect(size = 2, colour = "red"))
#OUTPUT
#Refer to question 3-1 (b) of Computational and Graphical Section.
#(3-1.c) Use the Fisher LSD method with to make comparisons between
#pairs of means.
#INPUT
library(agricolae)
LSD.test(Model,"Treatment")
#OUTPUT
Study:
LSD t Test for Observations
Mean Square Error: 12825.69
Treatment, means and individual ( 95 %) CI
Observations std.err replication LCL UCL
A 2971.00 60.27852 4 2839.664 3102.336
B 3156.25 67.98820 4 3008.116 3304.384
C 2933.75 54.13621 4 2815.797 3051.703
D 2666.25 40.48534 4 2578.040 2754.460
alpha: 0.05 ; Df Error: 12
Critical Value of t: 2.178813
Least Significant Difference 174.4798
Means with the same letter are not significantly different.
Groups, Treatments and means
a B 3156.25
b A 2971
b C 2933.75
c D 2666.25
#(3-1.d) Construct a normal probability plot of the residuals. What
#conclusion would you draw about the validity of the normality assumption?
#INPUT
Residuals <- Observations – Predicted #Make sure you run the
#attach(TensileData) first
library(ggplot2)
library(MASS)
df<-data.frame(x=sort(Residuals),y=qnorm(ppoints(length(Residuals))))
probs <- c(0.01, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.99)
qprobs<-qnorm(probs)
xl <- quantile(Residuals, c(0.25, 0.75))
yl <- qnorm(c(0.25, 0.75))
slope <- diff(yl)/diff(xl)
int <- yl[1] - slope * xl[1]
fd<-fitdistr(Residuals, "normal") #Maximum-likelihood Fitting of Univariate
#Dist from MASS
xp_hat<-fd$estimate[1]+qprobs*fd$estimate[2] #estimated perc. for the fitted
#normal
#var. of estimated perc
v_xp_hat<- fd$sd[1]^2+qprobs^2*fd$sd[2]^2+2*qprobs*fd$vcov[1,2]
xpl<-xp_hat + qnorm(0.025)*sqrt(v_xp_hat) #lower bound
xpu<-xp_hat + qnorm(0.975)*sqrt(v_xp_hat) #upper bound
df.bound<-data.frame(xp=xp_hat,xpl=xpl, xpu = xpu,nquant=qprobs)
#The above codes was originally programmed by Julie B at stackoverflow.com,
#Link to her stackoverflow profile:
#http://stackoverflow.com/users/1200228/julie-b
#Link to the posted question in stackoverflow:
#http://stackoverflow.com/questions/3929611/recreate-minitab-normal-
#probability-plot
ggplot(data = df, aes(x = x, y = y)) + geom_point(colour = "darkred",
size = 3) + geom_abline(intercept = int,slope = slope, colour = "purple",
size = 2, alpha = 0.5) +
scale_y_continuous(limits=range(qprobs), breaks=qprobs, labels =
100*probs) + geom_line(data=df.bound,aes(x = xpl, y = qprobs), colour =
"darkgreen", alpha = 0.5, size = 1) +
geom_line(data=df.bound,aes(x = xpu, y = qprobs), colour = "darkgreen",
alpha = 0.5, size = 1) +
xlab(expression(bold("Residuals"))) +
ylab(expression(bold("Normal % Probability"))) + theme_bw() +
opts(title = expression(bold("Normal Probabiliy Plot of Residuals")),
plot.title = theme_text(size = 20, colour = "darkblue"),
panel.border = theme_rect(size = 2, colour = "red"))
#OUTPUT
#Refer to question 3-1 (d) of Computational and Graphical Section.
#(3-1.e) Plot the residuals versus the predicted tensile strength. Comment on
#the plot.
#INPUT
library(colorRamps)
ggplot(data = TensileData, aes(x = Predicted, y = Residuals)) + ylim(c(-210,
210)) + geom_point(aes(size = 3, colour = matlab.like(16))) + theme_bw() +
xlab(expression(bold("Predicted Values"))) +
ylab(expression(bold("Residuals"))) +
opts(title = expression(bold("Residuals versus Fitted")),
plot.title = theme_text(colour = "darkblue", size = 20),
panel.border = theme_rect(size = 2, colour = "red"), legend.position =
"none")
#OUTPUT
#Refer to question 3-1 (e) of Computational and Graphical Section.
#(3-1.f) Prepare a scatter plot of the results to aid the interpretation of
#the results of this experiment
#INPUT
ggplot(data = TensileData, aes(factor(Treatment), y = Observations)) +
geom_point(colour = "darkred", size = 3) +
labs(y ="Percent" , x="Data") + geom_boxplot(aes(fill =
factor(Treatment))) + xlab(expression(bold("Mixing Technique"))) +
ylab(expression(bold("Strength"))) + theme_bw() +
opts(title = bquote(bold('Mean of Tensile Strength (lb/in'^'2'*') by
Treatment')),
plot.title = theme_text(size = 20, colour = "darkblue"),
panel.border = theme_rect(size = 2, colour = "red"),
legend.position = "none")
#OUTPUT
#Refer to question 3-1 (f) of Computational and Graphical Section.
#(3-2.a) Rework part (b) of Problem 3-1 using Duncan’s multiple range test
#with . Does this make any difference in your conclusions?
#INPUT
duncan.test(Model,"Treatment")
#OUTPUT
Study:
Duncan's new multiple range test
for Observations
Mean Square Error: 12825.69
Treatment, means
Observations std.err replication
A 2971.00 60.27852 4
B 3156.25 67.98820 4
C 2933.75 54.13621 4
D 2666.25 40.48534 4
alpha: 0.05 ; Df Error: 12
Critical Range
2 3 4
174.4798 182.6303 187.5686
Means with the same letter are not significantly different.
Groups, Treatments and means
a B 3156.25
b A 2971
b C 2933.75
c D 2666.25
#(3-2.b) Rework part (b) of Problem 3-1 using Tukey’s test with . Do
#you get the same conclusions from Tukey’s test that you did from the
#graphical procedure and/or Duncan’s multiple range test?
#INPUT
TukeyHSD(Model)
#OUTPUT
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Observations ~ Treatment, data = TensileData)
$Treatment
diff lwr upr p adj
B-A 185.25 -52.50029 423.00029 0.1493561
C-A -37.25 -275.00029 200.50029 0.9652776
D-A -304.75 -542.50029 -66.99971 0.0115923
C-B -222.50 -460.25029 15.25029 0.0693027
D-B -490.00 -727.75029 -252.24971 0.0002622
D-C -267.50 -505.25029 -29.74971 0.0261838