View
45
Download
1
Category
Tags:
Preview:
DESCRIPTION
Statistics. Design of Experiment. Contents. An Introduction to Experimental Design Completely Randomized Designs Randomized Block Design Factorial Experiments. An Introduction to Experimental Design. Statistical studies can be classified as being either experimental or observational. - PowerPoint PPT Presentation
Citation preview
1/54
Statistics
Design of Experiment
2/54
An Introduction to Experimental Design Completely Randomized Designs Randomized Block Design Factorial Experiments
Contents
3/54
An Introduction to Experimental Design
Statistical studies can be classified as being either experimental or observational.
In an experimental study, one or more factors are controlled so that data can be obtained about how the factors influence the variables of interest.
In an observational study, no attempt is made to control the factors.
4/54
An Introduction to Experimental Design
Cause-and-effect relationships are easier to establish in experimental studies than in observational studies.
5/54
An Introduction to Experimental Design
A factor is a variable that the experimenter has selected for investigation.
A treatment is a level of a factor. Experimental units are the objects of
interest in the experiment.
6/54
An Introduction to Experimental Design
Example:Suppose we want to examine if there is a
difference among working methods A, B, and C. 18 workers are randomly chosen.
factor is the working methods. treatment is one of A, B, and C.Experimental units are the workers chosen
7/54
An Introduction to Experimental Design
A completely randomized design is an experimental design in which the treatments are randomly assigned to the experimental units.
If the experimental units are heterogeneous, blocking can be used to form homogeneous groups, resulting in a randomized block design.
8/54
Completely Randomized Design
Recall Simple Random Sampling Finite populations are often defined by lists
such as: Organization membership roster Credit card account numbers Inventory product numbers
9/54
Completely Randomized Design
A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.
Replacing each sampled element before selecting subsequent elements is called sampling with replacement.
Sampling without replacement is the procedure used most often.
10/54
Completely Randomized Design
In large sampling projects, computer-generated random numbers are often used to automate the sample selection process.
Excel provides a function for generating random numbers in its worksheets.
Infinite populations are often defined by an ongoing process whereby the elements of the population consist of items generated as though the process would operate indefinitely.
11/54
Completely Randomized Design
A simple random sample from an infinite population is a sample selected such that the following conditions are satisfied. Each element selected comes from the same
population. Each element is selected independently.
12/54
Completely Randomized Design
Random Numbers: the numbers in the table are random, these four-digit numbers are equally likely.
13/54
Completely Randomized Design
Most experiments have critical error on random sampling.
Ex: sampling 8 samples from a production line in one day Wrong method:
Get one sample every 3 hours not random!
14/54
Completely Randomized Design
Ex: sampling 8 samples from a production line Correct method:
You can get one sample at each 3 hours interval but not every 3 hours correct but not a simple random sampling
Get 8 samples in 24 hours Maximum population is 24, getting 8 samples two digits 63, 27, 15, 99, 86, 71, 74, 45, 10, 21, 51, … Larger than 24 is discarded So eight samples are collected at:
15, 10, 21, … hour
15/54
Completely Randomized Design
In Completely Randomized Design, samples are randomly collected by simple random sampling method.
Only one factor is concerned in Completely Randomized Design, and k levels in this factor.
16/54
Completely Randomized Design
Data:Treatment Observations sample sample(level) 1 2 i … nj mean SD
1 x11 x12 x1i … s1
2 x21 x22 x1i … s2
.
j xj1 xj2 xji … sj
.
.
k xk1 xk2 xki … sk
1x11nx
22nx
jjnx
kknx
2x
jx
kx
17/54
2
1( )
MSTR 1
k
j jj
n x x
k
Completely Randomized Design
denominator is thedenominator is thedegrees of freedomdegrees of freedomassociated with SSTRassociated with SSTR
numerator is callednumerator is calledthe the sum of squares sum of squares duedueto treatmentsto treatments (SSTR) (SSTR)
Between-Treatments Estimate of Population Variance
The between-samples estimate of σ2 is referred to as the mean square due to treatments (MSTR).
18/54
Within-Treatments Estimate of Population Variance.
The second estimate of σ2, the within-samples estimate, is referred to as the mean square due to error (MSE).
Completely Randomized Design
denominator is denominator is thethedegrees of degrees of freedomfreedomassociated with associated with SSESSE
numerator is numerator is calledcalledthe the sum of sum of squaressquaresdue to errordue to error (SSE)(SSE)
19/54
ANOVA Table
Completely Randomized Design
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquaresSquares FF
TreatmentsTreatments
ErrorError
TotalTotal
kk - 1 - 1
nnTT - 1 - 1
SSTRSSTR
SSESSE
SSTSST
nnT T - - kk
20/54
Completely Randomized Design--Example
Example: AutoShine, Inc. AutoShine, Inc. is considering marketing a
long-lasting car wax. Three different waxes (Type 1, Type 2, and Type 3) have been developed.
21/54
Completely Randomized Design--Example
In order to test the durability of these waxes, 5 new cars were waxed with Type 1, 5 with Type 2, and 5 with Type 3. Each car was then repeatedly run through an automatic carwash until the wax coating showed signs of deterioration.
22/54
The number of times each car went through the carwash is shown on the next slide. AutoShine, Inc. must decide which wax to market. Are the three waxes equally effective?
Completely Randomized Design--Example
23/54
1122334455
27273030292928283131
33332828313130303030
29292828303032323131
Sample MeanSample MeanSample VarianceSample Variance
ObservationObservationWaxWax
Type 1Type 1WaxWax
Type 2Type 2WaxWax
Type 3Type 3
2.52.5 3.3 3.3 2.5 2.529.029.0 30.4 30.4 30.030.0
Completely Randomized Design--Example
24/54
Hypotheses
where: μ1 = mean number of washes for Type 1 wax
μ2 = mean number of washes for Type 2 wax
μ3 = mean number of washes for Type 3 wax
H0: 1=2=3
Ha: Not all the means are equal
Completely Randomized Design--Example
25/54
Because the sample sizes are all equal:
MSE = 33.2/(15 - 3) = 2.77
MSTR = 5.2/(3 - 1) = 2.6
SSE = 4(2.5) + 4(3.3) + 4(2.5) = 33.2
SSTR = 5(29–29.8)2 + 5(30.4–29.8)2 + 5(30–29.8)2 = 5.2
Mean Square Error
Mean Square Between Treatments 1 2 3( )/ 3x x x x= (29 + 30.4 + 30)/3 = 29.8
Completely Randomized Design--Example
26/54
Rejection Rule
where F.05 = 3.89 is based on an F distributionwith 2 numerator degrees of freedom and 12denominator degrees of freedom
p-Value Approach: Reject H0 if p-value < .05
Critical Value Approach: Reject H0 if F > 3.89
Completely Randomized Design--Example
27/54
Test Statistic
There is insufficient evidence to conclude thatthe mean number of washes for the three waxtypes are not all the same.
Conclusion
F = MSTR/MSE = 2.6/2.77 = .939
The p-value is greater than .10, where F = 2.81. (Excel provides a p-value of .42.) Therefore, we cannot reject H0.
Completely Randomized Design--Example
28/54
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquaresSquares FF
TreatmentsTreatmentsErrorError
TotalTotal
22
1414
5.25.233.233.2
38.438.4
12122.602.602.772.77
.939.939
ANOVA Table
Completely Randomized Design--Example
29/54
Minitab Outputs
Completely Randomized Design--Example
One-way ANOVA: Observation versus Type
Source DF SS MS F PType 2 5.20 2.60 0.94 0.418Error 12 33.20 2.77Total 14 38.40
S = 1.663 R-Sq = 13.54% R-Sq(adj) = 0.00%
30/54
In Completely Randomized Experimental Design, the test for a difference among treatment means, we compute a F value by using the ratio
A problem can arise whenever differences due to extraneous factors (not considered in the experiment) cause the MSE become large.
Randomized Block Design
MSEMSTRF
31/54
In previous example, the time after wax is produced is not considered, but it could make differences.
To examine the effects on shift in a production line, shift is a factor but gender is not considered. It could also make differences.
Randomized Block Design
32/54
To isolate the effects of a known effect caused by extraneous factors, one can “block” it by using the Randomized Block Design.
After blocking the extraneous effect, the value of MSE becomes smaller and the ability of F test increases.
Randomized Block Design
33/54
ANOVA Procedure For a randomized block design the sum of squares
total (SST) is partitioned into three groups: sum of squares due to treatments, sum of squares due to blocks, and sum of squares due to error.
The total degrees of freedom, nT - 1, are partitioned such that k - 1 degrees of freedom go to treatments, b - 1 go to blocks, and (k - 1)(b - 1) go to the error term.
Randomized Block Design
SST = SSTR + SSBL + SSESST = SSTR + SSBL + SSE
34/54
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquaresSquares FF
TreatmentsTreatments
ErrorError
TotalTotal
kk - 1 - 1
nnTT - 1 - 1
SSTRSSTR
SSESSE
SSTSST
Randomized Block Design
ANOVA Table
BlocksBlocks SSBLSSBL bb - 1 - 1
((k k – 1)(– 1)(bb – 1) – 1)
SSBLMSBL -1b
35/54
Randomized Block Design
Example: Crescent Oil Co.
Crescent Oil has developed three new blends of gasoline and must decide which blend or blends to produce and distribute. A study of the miles per gallon ratings of the three blends is being conducted to determine if the mean ratings are the same for the three blends.
36/54
Randomized Block Design
Five automobiles have been tested using each of the three gasoline blends and the miles per gallon ratings are shown on the next slide.
37/54
Randomized Block Design
29.8 29.8 28.8 28.8 28.4 28.4TreatmentTreatmentMeansMeans
1122334455
31313030292933332626
30302929292931312525
30302929282829292626
30.33330.33329.33329.33328.66728.66731.00031.00025.66725.667
Type of Gasoline (Treatment)Type of Gasoline (Treatment) BlockBlockMeansMeansBlend XBlend X Blend YBlend Y Blend ZBlend Z
AutomobileAutomobile(Block)(Block)
38/54
Mean Square Due to Error
Randomized Block Design
MSE = 5.47/[(3 - 1)(5 - 1)] = .68
SSE = 62 - 5.2 - 51.33 = 5.47MSBL = 51.33/(5 - 1) = 12.8
SSBL = 3[(30.333 - 29)2 + . . . + (25.667 - 29)2] = 51.33
MSTR = 5.2/(3 - 1) = 2.6 SSTR = 5[(29.8 - 29)2 + (28.8 - 29)2 + (28.4 - 29)2] = 5.2
The overall sample mean is 29. Thus, Mean Square Due to Treatments
Mean Square Due to Blocks
39/54
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquaresSquares FF
TreatmentsTreatments
ErrorError
TotalTotal
22
1414
5.205.20
5.475.47
62.0062.00
88
2.602.60
.68.68
3.823.82
ANOVA Table
Randomized Block Design
BlocksBlocks 51.3351.33 12.8012.8044
40/54
Rejection Rule
Randomized Block Design
For = .05, F.05 = 4.46 (2 d.f. numerator and 8 d.f. denominator)
p-Value Approach: Reject H0 if p-value < .05
Critical Value Approach: Reject H0 if F > 4.46
41/54
Conclusion
Randomized Block Design
There is insufficient evidence to conclude thatthe miles per gallon ratings differ for the threegasoline blends.
The p-value is greater than .05 (where F = 4.46) and less than .10 (where F = 3.11). (Excel provides a p-value of .07). Therefore, we cannot reject H0.
F = MSTR/MSE = 2.6/.68 = 3.82
Test Statistic
42/54
Factorial Experiments
In some experiments we want to draw conclusions about more than one variable or factor.
Factorial experiments and their corresponding ANOVA computations are valuable designs when simultaneous conclusions about two or more factors are required.
43/54
Factorial Experiments The term factorial is used because the
experimental conditions include all possible combinations of the factors.
For example, for a levels of factor A and b levels of factor B, the experiment will involve collecting data on ab treatment combinations.
44/54
ANOVA ProcedureThe ANOVA procedure for the two-factor
factorial experiment is similar to the completely randomized experiment and the randomized block experiment
We again partition the sum of squares total (SST) into its sources.
SST = SSA + SSB + SSAB + SSESST = SSA + SSB + SSAB + SSE
Two-Factor Factorial Experiment
45/54
SST = SSA + SSB + SSAB + SSESST = SSA + SSB + SSAB + SSE
The total degrees of freedom, nT - 1, are partitioned such that (a – 1) d.f go to Factor A, (b – 1) d.f go to Factor B, (a – 1)(b – 1) d.f. go to Interaction, and ab(r – 1) go to Error.
Two-Factor Factorial Experiment
46/54
Two-Factor Factorial Experiment
SSAMSA -1a
MSAMSE
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquaresSquares FF
Factor AFactor A
ErrorError
TotalTotal
aa - 1 - 1
nnTT - 1 - 1
SSASSA
SSESSE
SSTSST
Factor BFactor B SSBSSB bb - 1 - 1
abab((rr – 1) – 1)
SSEMSE ( 1)ab r
SSBMSB -1b
InteractionInteraction SSABSSAB ((a a – 1)(– 1)(bb – 1) – 1) SSABMSAB ( 1)( 1)a b
MSBMSE
MSABMSE
47/54
Step 3 Compute the sum of squares for factor B
2
1 1 1SST = ( )
a b r
ijki j k
x x
2
1SSA = ( . )
a
ii
br x x
2
1SSB = ( . )
b
jj
ar x x
Two-Factor Factorial Experiment
Step 1 Compute the total sum of squares
Step 2 Compute the sum of squares for factor A
48/54
Step 4 Compute the sum of squares for interaction
2
1 1SSAB = ( . . )
a b
ij i ji j
r x x x x
Two-Factor Factorial Experiment
SSE = SST – SSA – SSB - SSABSSE = SST – SSA – SSB - SSAB
Step 5 Compute the sum of squares due to error
49/54
Example: State of Ohio Wage Survey A survey was conducted of hourly wages for a
sample of workers in two industries at three locations in Ohio. Part of the purpose of the survey was to determine if differences exist in both industry type and location. The sample data are shown on the next slide.
Two-Factor Factorial Experiment
50/54
Example: State of Ohio Wage Survey
Two-Factor Factorial Experiment
Industry Cincinnati Cleveland ColumbusI 12.10 11.80 12.90I 11.80 11.20 12.70I 12.10 12.00 12.20II 12.40 12.60 13.00II 12.50 12.00 12.10II 12.00 12.50 12.70
51/54
Factors Factor A: Industry Type (2 levels) Factor B: Location (3 levels)
Replications Each experimental condition is repeated 3
times
Two-Factor Factorial Experiment
52/54
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquaresSquares FF
Factor AFactor A
ErrorError
TotalTotal
11
1717
.50.50
1.431.43
3.423.42
1212
.50.50
.12.12
4.194.19
ANOVA Table
Factor BFactor B 1.121.12 .56.5622InteractionInteraction .37.37 .19.1922
4.694.691.551.55
Two-Factor Factorial Experiment
53/54
Conclusions Using the p-Value Approach Industries: p-value = .06 > = .05
Mean wages do not differ by industry type. Locations: p-value = .03 < = .05
Mean wages differ by location. Interaction: p-value = .25 > = .05
Interaction is not significant.(p-values were found using Excel)
Two-Factor Factorial Experiment
54/54
Conclusions Using the Critical Value ApproachIndustries: F = 4.19 < F = 4.75
Mean wages do not differ by industry type.Locations: F = 4.69 > F = 3.89
Mean wages differ by location.Interaction: F = 1.55 < F = 3.89
Interaction is not significant.
Two-Factor Factorial Experiment
Recommended