54
1/54 Statistics Design of Experiment

Statistics

  • Upload
    vahe

  • View
    45

  • Download
    1

Embed Size (px)

DESCRIPTION

Statistics. Design of Experiment. Contents. An Introduction to Experimental Design Completely Randomized Designs Randomized Block Design Factorial Experiments. An Introduction to Experimental Design. Statistical studies can be classified as being either experimental or observational. - PowerPoint PPT Presentation

Citation preview

Page 1: Statistics

1/54

Statistics

Design of Experiment

Page 2: Statistics

2/54

An Introduction to Experimental Design Completely Randomized Designs Randomized Block Design Factorial Experiments

Contents

Page 3: Statistics

3/54

An Introduction to Experimental Design

Statistical studies can be classified as being either experimental or observational.

In an experimental study, one or more factors are controlled so that data can be obtained about how the factors influence the variables of interest.

In an observational study, no attempt is made to control the factors.

Page 4: Statistics

4/54

An Introduction to Experimental Design

Cause-and-effect relationships are easier to establish in experimental studies than in observational studies.

Page 5: Statistics

5/54

An Introduction to Experimental Design

A factor is a variable that the experimenter has selected for investigation.

A treatment is a level of a factor. Experimental units are the objects of

interest in the experiment.

Page 6: Statistics

6/54

An Introduction to Experimental Design

Example:Suppose we want to examine if there is a

difference among working methods A, B, and C. 18 workers are randomly chosen.

factor is the working methods. treatment is one of A, B, and C.Experimental units are the workers chosen

Page 7: Statistics

7/54

An Introduction to Experimental Design

A completely randomized design is an experimental design in which the treatments are randomly assigned to the experimental units.

If the experimental units are heterogeneous, blocking can be used to form homogeneous groups, resulting in a randomized block design.

Page 8: Statistics

8/54

Completely Randomized Design

Recall Simple Random Sampling Finite populations are often defined by lists

such as: Organization membership roster Credit card account numbers Inventory product numbers

Page 9: Statistics

9/54

Completely Randomized Design

A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.

Replacing each sampled element before selecting subsequent elements is called sampling with replacement.

Sampling without replacement is the procedure used most often.

Page 10: Statistics

10/54

Completely Randomized Design

In large sampling projects, computer-generated random numbers are often used to automate the sample selection process.

Excel provides a function for generating random numbers in its worksheets.

Infinite populations are often defined by an ongoing process whereby the elements of the population consist of items generated as though the process would operate indefinitely.

Page 11: Statistics

11/54

Completely Randomized Design

A simple random sample from an infinite population is a sample selected such that the following conditions are satisfied. Each element selected comes from the same

population. Each element is selected independently.

Page 12: Statistics

12/54

Completely Randomized Design

Random Numbers: the numbers in the table are random, these four-digit numbers are equally likely.

Page 13: Statistics

13/54

Completely Randomized Design

Most experiments have critical error on random sampling.

Ex: sampling 8 samples from a production line in one day Wrong method:

Get one sample every 3 hours not random!

Page 14: Statistics

14/54

Completely Randomized Design

Ex: sampling 8 samples from a production line Correct method:

You can get one sample at each 3 hours interval but not every 3 hours correct but not a simple random sampling

Get 8 samples in 24 hours Maximum population is 24, getting 8 samples two digits 63, 27, 15, 99, 86, 71, 74, 45, 10, 21, 51, … Larger than 24 is discarded So eight samples are collected at:

15, 10, 21, … hour

Page 15: Statistics

15/54

Completely Randomized Design

In Completely Randomized Design, samples are randomly collected by simple random sampling method.

Only one factor is concerned in Completely Randomized Design, and k levels in this factor.

Page 16: Statistics

16/54

Completely Randomized Design

Data:Treatment Observations sample sample(level) 1 2 i … nj mean SD

1 x11 x12 x1i … s1

2 x21 x22 x1i … s2

.

j xj1 xj2 xji … sj

.

.

k xk1 xk2 xki … sk

1x11nx

22nx

jjnx

kknx

2x

jx

kx

Page 17: Statistics

17/54

2

1( )

MSTR 1

k

j jj

n x x

k

Completely Randomized Design

denominator is thedenominator is thedegrees of freedomdegrees of freedomassociated with SSTRassociated with SSTR

numerator is callednumerator is calledthe the sum of squares sum of squares duedueto treatmentsto treatments (SSTR) (SSTR)

Between-Treatments Estimate of Population Variance

The between-samples estimate of σ2 is referred to as the mean square due to treatments (MSTR).

Page 18: Statistics

18/54

Within-Treatments Estimate of Population Variance.

The second estimate of σ2, the within-samples estimate, is referred to as the mean square due to error (MSE).

Completely Randomized Design

denominator is denominator is thethedegrees of degrees of freedomfreedomassociated with associated with SSESSE

numerator is numerator is calledcalledthe the sum of sum of squaressquaresdue to errordue to error (SSE)(SSE)

Page 19: Statistics

19/54

ANOVA Table

Completely Randomized Design

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquaresSquares FF

TreatmentsTreatments

ErrorError

TotalTotal

kk - 1 - 1

nnTT - 1 - 1

SSTRSSTR

SSESSE

SSTSST

nnT T - - kk

Page 20: Statistics

20/54

Completely Randomized Design--Example

Example: AutoShine, Inc. AutoShine, Inc. is considering marketing a

long-lasting car wax. Three different waxes (Type 1, Type 2, and Type 3) have been developed.

Page 21: Statistics

21/54

Completely Randomized Design--Example

In order to test the durability of these waxes, 5 new cars were waxed with Type 1, 5 with Type 2, and 5 with Type 3. Each car was then repeatedly run through an automatic carwash until the wax coating showed signs of deterioration.

Page 22: Statistics

22/54

The number of times each car went through the carwash is shown on the next slide. AutoShine, Inc. must decide which wax to market. Are the three waxes equally effective?

Completely Randomized Design--Example

Page 23: Statistics

23/54

1122334455

27273030292928283131

33332828313130303030

29292828303032323131

Sample MeanSample MeanSample VarianceSample Variance

ObservationObservationWaxWax

Type 1Type 1WaxWax

Type 2Type 2WaxWax

Type 3Type 3

2.52.5 3.3 3.3 2.5 2.529.029.0 30.4 30.4 30.030.0

Completely Randomized Design--Example

Page 24: Statistics

24/54

Hypotheses

where: μ1 = mean number of washes for Type 1 wax

μ2 = mean number of washes for Type 2 wax

μ3 = mean number of washes for Type 3 wax

H0: 1=2=3

Ha: Not all the means are equal

Completely Randomized Design--Example

Page 25: Statistics

25/54

Because the sample sizes are all equal:

MSE = 33.2/(15 - 3) = 2.77

MSTR = 5.2/(3 - 1) = 2.6

SSE = 4(2.5) + 4(3.3) + 4(2.5) = 33.2

SSTR = 5(29–29.8)2 + 5(30.4–29.8)2 + 5(30–29.8)2 = 5.2

Mean Square Error

Mean Square Between Treatments 1 2 3( )/ 3x x x x= (29 + 30.4 + 30)/3 = 29.8

Completely Randomized Design--Example

Page 26: Statistics

26/54

Rejection Rule

where F.05 = 3.89 is based on an F distributionwith 2 numerator degrees of freedom and 12denominator degrees of freedom

p-Value Approach: Reject H0 if p-value < .05

Critical Value Approach: Reject H0 if F > 3.89

Completely Randomized Design--Example

Page 27: Statistics

27/54

Test Statistic

There is insufficient evidence to conclude thatthe mean number of washes for the three waxtypes are not all the same.

Conclusion

F = MSTR/MSE = 2.6/2.77 = .939

The p-value is greater than .10, where F = 2.81. (Excel provides a p-value of .42.) Therefore, we cannot reject H0.

Completely Randomized Design--Example

Page 28: Statistics

28/54

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquaresSquares FF

TreatmentsTreatmentsErrorError

TotalTotal

22

1414

5.25.233.233.2

38.438.4

12122.602.602.772.77

.939.939

ANOVA Table

Completely Randomized Design--Example

Page 29: Statistics

29/54

Minitab Outputs

Completely Randomized Design--Example

One-way ANOVA: Observation versus Type

Source DF SS MS F PType 2 5.20 2.60 0.94 0.418Error 12 33.20 2.77Total 14 38.40

S = 1.663 R-Sq = 13.54% R-Sq(adj) = 0.00%

Page 30: Statistics

30/54

In Completely Randomized Experimental Design, the test for a difference among treatment means, we compute a F value by using the ratio

A problem can arise whenever differences due to extraneous factors (not considered in the experiment) cause the MSE become large.

Randomized Block Design

MSEMSTRF

Page 31: Statistics

31/54

In previous example, the time after wax is produced is not considered, but it could make differences.

To examine the effects on shift in a production line, shift is a factor but gender is not considered. It could also make differences.

Randomized Block Design

Page 32: Statistics

32/54

To isolate the effects of a known effect caused by extraneous factors, one can “block” it by using the Randomized Block Design.

After blocking the extraneous effect, the value of MSE becomes smaller and the ability of F test increases.

Randomized Block Design

Page 33: Statistics

33/54

ANOVA Procedure For a randomized block design the sum of squares

total (SST) is partitioned into three groups: sum of squares due to treatments, sum of squares due to blocks, and sum of squares due to error.

The total degrees of freedom, nT - 1, are partitioned such that k - 1 degrees of freedom go to treatments, b - 1 go to blocks, and (k - 1)(b - 1) go to the error term.

Randomized Block Design

SST = SSTR + SSBL + SSESST = SSTR + SSBL + SSE

Page 34: Statistics

34/54

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquaresSquares FF

TreatmentsTreatments

ErrorError

TotalTotal

kk - 1 - 1

nnTT - 1 - 1

SSTRSSTR

SSESSE

SSTSST

Randomized Block Design

ANOVA Table

BlocksBlocks SSBLSSBL bb - 1 - 1

((k k – 1)(– 1)(bb – 1) – 1)

SSBLMSBL -1b

Page 35: Statistics

35/54

Randomized Block Design

Example: Crescent Oil Co.

Crescent Oil has developed three new blends of gasoline and must decide which blend or blends to produce and distribute. A study of the miles per gallon ratings of the three blends is being conducted to determine if the mean ratings are the same for the three blends.

Page 36: Statistics

36/54

Randomized Block Design

Five automobiles have been tested using each of the three gasoline blends and the miles per gallon ratings are shown on the next slide.

Page 37: Statistics

37/54

Randomized Block Design

29.8 29.8 28.8 28.8 28.4 28.4TreatmentTreatmentMeansMeans

1122334455

31313030292933332626

30302929292931312525

30302929282829292626

30.33330.33329.33329.33328.66728.66731.00031.00025.66725.667

Type of Gasoline (Treatment)Type of Gasoline (Treatment) BlockBlockMeansMeansBlend XBlend X Blend YBlend Y Blend ZBlend Z

AutomobileAutomobile(Block)(Block)

Page 38: Statistics

38/54

Mean Square Due to Error

Randomized Block Design

MSE = 5.47/[(3 - 1)(5 - 1)] = .68

SSE = 62 - 5.2 - 51.33 = 5.47MSBL = 51.33/(5 - 1) = 12.8

SSBL = 3[(30.333 - 29)2 + . . . + (25.667 - 29)2] = 51.33

MSTR = 5.2/(3 - 1) = 2.6 SSTR = 5[(29.8 - 29)2 + (28.8 - 29)2 + (28.4 - 29)2] = 5.2

The overall sample mean is 29. Thus, Mean Square Due to Treatments

Mean Square Due to Blocks

Page 39: Statistics

39/54

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquaresSquares FF

TreatmentsTreatments

ErrorError

TotalTotal

22

1414

5.205.20

5.475.47

62.0062.00

88

2.602.60

.68.68

3.823.82

ANOVA Table

Randomized Block Design

BlocksBlocks 51.3351.33 12.8012.8044

Page 40: Statistics

40/54

Rejection Rule

Randomized Block Design

For = .05, F.05 = 4.46 (2 d.f. numerator and 8 d.f. denominator)

p-Value Approach: Reject H0 if p-value < .05

Critical Value Approach: Reject H0 if F > 4.46

Page 41: Statistics

41/54

Conclusion

Randomized Block Design

There is insufficient evidence to conclude thatthe miles per gallon ratings differ for the threegasoline blends.

The p-value is greater than .05 (where F = 4.46) and less than .10 (where F = 3.11). (Excel provides a p-value of .07). Therefore, we cannot reject H0.

F = MSTR/MSE = 2.6/.68 = 3.82

Test Statistic

Page 42: Statistics

42/54

Factorial Experiments

In some experiments we want to draw conclusions about more than one variable or factor.

Factorial experiments and their corresponding ANOVA computations are valuable designs when simultaneous conclusions about two or more factors are required.

Page 43: Statistics

43/54

Factorial Experiments The term factorial is used because the

experimental conditions include all possible combinations of the factors.

For example, for a levels of factor A and b levels of factor B, the experiment will involve collecting data on ab treatment combinations.

Page 44: Statistics

44/54

ANOVA ProcedureThe ANOVA procedure for the two-factor

factorial experiment is similar to the completely randomized experiment and the randomized block experiment

We again partition the sum of squares total (SST) into its sources.

SST = SSA + SSB + SSAB + SSESST = SSA + SSB + SSAB + SSE

Two-Factor Factorial Experiment

Page 45: Statistics

45/54

SST = SSA + SSB + SSAB + SSESST = SSA + SSB + SSAB + SSE

The total degrees of freedom, nT - 1, are partitioned such that (a – 1) d.f go to Factor A, (b – 1) d.f go to Factor B, (a – 1)(b – 1) d.f. go to Interaction, and ab(r – 1) go to Error.

Two-Factor Factorial Experiment

Page 46: Statistics

46/54

Two-Factor Factorial Experiment

SSAMSA -1a

MSAMSE

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquaresSquares FF

Factor AFactor A

ErrorError

TotalTotal

aa - 1 - 1

nnTT - 1 - 1

SSASSA

SSESSE

SSTSST

Factor BFactor B SSBSSB bb - 1 - 1

abab((rr – 1) – 1)

SSEMSE ( 1)ab r

SSBMSB -1b

InteractionInteraction SSABSSAB ((a a – 1)(– 1)(bb – 1) – 1) SSABMSAB ( 1)( 1)a b

MSBMSE

MSABMSE

Page 47: Statistics

47/54

Step 3 Compute the sum of squares for factor B

2

1 1 1SST = ( )

a b r

ijki j k

x x

2

1SSA = ( . )

a

ii

br x x

2

1SSB = ( . )

b

jj

ar x x

Two-Factor Factorial Experiment

Step 1 Compute the total sum of squares

Step 2 Compute the sum of squares for factor A

Page 48: Statistics

48/54

Step 4 Compute the sum of squares for interaction

2

1 1SSAB = ( . . )

a b

ij i ji j

r x x x x

Two-Factor Factorial Experiment

SSE = SST – SSA – SSB - SSABSSE = SST – SSA – SSB - SSAB

Step 5 Compute the sum of squares due to error

Page 49: Statistics

49/54

Example: State of Ohio Wage Survey A survey was conducted of hourly wages for a

sample of workers in two industries at three locations in Ohio. Part of the purpose of the survey was to determine if differences exist in both industry type and location. The sample data are shown on the next slide.

Two-Factor Factorial Experiment

Page 50: Statistics

50/54

Example: State of Ohio Wage Survey

Two-Factor Factorial Experiment

Industry Cincinnati Cleveland ColumbusI 12.10 11.80 12.90I 11.80 11.20 12.70I 12.10 12.00 12.20II 12.40 12.60 13.00II 12.50 12.00 12.10II 12.00 12.50 12.70

Page 51: Statistics

51/54

Factors Factor A: Industry Type (2 levels) Factor B: Location (3 levels)

Replications Each experimental condition is repeated 3

times

Two-Factor Factorial Experiment

Page 52: Statistics

52/54

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquaresSquares FF

Factor AFactor A

ErrorError

TotalTotal

11

1717

.50.50

1.431.43

3.423.42

1212

.50.50

.12.12

4.194.19

ANOVA Table

Factor BFactor B 1.121.12 .56.5622InteractionInteraction .37.37 .19.1922

4.694.691.551.55

Two-Factor Factorial Experiment

Page 53: Statistics

53/54

Conclusions Using the p-Value Approach Industries: p-value = .06 > = .05

Mean wages do not differ by industry type. Locations: p-value = .03 < = .05

Mean wages differ by location. Interaction: p-value = .25 > = .05

Interaction is not significant.(p-values were found using Excel)

Two-Factor Factorial Experiment

Page 54: Statistics

54/54

Conclusions Using the Critical Value ApproachIndustries: F = 4.19 < F = 4.75

Mean wages do not differ by industry type.Locations: F = 4.69 > F = 3.89

Mean wages differ by location.Interaction: F = 1.55 < F = 3.89

Interaction is not significant.

Two-Factor Factorial Experiment