Anova single factor

ANOVAANOVA

One way Single Factor ModelsOne way Single Factor Models

KARAN DESAI-11BIE001DHRUV PATEL-11BIE024

VISHAL DERASHRI -11BIE030 HARDIK MEHTA-11BIE037MALAV BHATT-11BIE056

DEFINITION

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences between group means and their associated procedures (such as "variation" among and between groups), developed by R.A.Fisher .In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation

22

-Sir Ronald Aylmer Fisher

FRS was an English statistician, evolutionary biologist, geneticist, and eugenicist 33

Why ANOVA

• Compare the mean of more than two population?

• Compare populations each containing several subgroups or levels?

4

Problem with multiple T test

• One problem with this approach is the increasing number of tests as the number of groups increases

• The probability of making a Type I error increases as the number of tests increase.

• If the probability of a Type I error for the analysis is set at 0.05 and 10 t-tests are done, the overall probability of a Type I error for the set of tests = 1 – (0.95)10 = 0.40* instead of 0.05

5

In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. As doing multiple two-sample t-tests would result in an increased chance of committing a statistical type-I error, ANOVAs are useful in comparing (testing) three or more means (groups or variables) for statistical significance.

66

• Another way to describe the multiple comparisons problem is to think about the meaning of an alpha level = 0.05

• Alpha of 0.05 implies that, by chance, there will be one Type I error in every 20 tests: 1/20 = 0.05.

• This means that, by chance the null hypothesis will be incorrectly rejected once in every 20 tests

• As the number of tests increases, the probability of finding a ‘significant’ result by chance increases.

7

Importance of ANOVA

• The ANOVA is an important test because it enables us to see for example how effective two different types of treatment are and how durable they are.

• Effectively a ANOVA can tell us how well a treatment work, how long it lasts and how budget friendly it will be an

8

CLASSIFICATION OF ANOVA MODEL

1. Fixed-effects models: The fixed-effects model of analysis of

variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see if the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.

99

2. Random-effects model: Random effects models are used

when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables , some assumptions and the method of contrasting the treatments (a multi-variable generalization of simple differences) differ from the fixed-effects model.

1010

3.Mixed-effects models

A mixed-effects model contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types.

Example: Teaching experiments could be performed by a university department to find a good introductory textbook, with each text considered a treatment. The fixed-effects model would compare a list of candidate texts. The random-effects model would determine whether important differences exist among a list of randomly selected texts. The mixed-effects model would compare the (fixed) incumbent texts to randomly selected alternatives.

1111

ASSUMPTION

Normal distribution

Variances of dependent variable are equal in all populations

Random samples; independent scores

1212

One way Single factor ANOVA

1313

ONE-WAY ANOVA

One factor (manipulated variable)

One response variable

Two or more groups to compare

1414

USEFULLNESS

Similar to t-test

More versatile than t-test

Compare one parameter (response variable) between two or more groups

1515

Remember that…

Standard deviation (s) n

s = √[(Σ (xi – X)2)/(n-1)] i = 1

In this case: Degrees of freedom (df)

df = Number of observations or groups

1616

ANOVA

ANOVA (ANalysis Of VAriance) is a natural extension used to compare the means more than 2 populations.

Basic Question: Even if the true means of n populations were equal (i.e. we cannot expect the sample means (x1, x2, x3, x4 ) to be equal. So when we get different values for the x’s, How much is due to randomness? How much is due to the fact that we are sampling from

different populations with possibly different j’s.

ANOVA TERMINOLOGY

Response Variable (y) What we are measuringWhat we are measuring

Experimental Units The individual unit that we will measureThe individual unit that we will measure

Factors Independent variables whose values can change to affect Independent variables whose values can change to affect

the outcome of the response variable, ythe outcome of the response variable, y Levels of Factors

Values of the factorsValues of the factors Treatments

The combination of the levels of the factors applied to an The combination of the levels of the factors applied to an experimental unitexperimental unit

ExampleWe want to know how combinations of different

amounts of water (1 ac-ft, 3 ac-ft, 5 ac-ft) and different fertilizers (A, B, C) affect crop yields

Response variable – crop yield (bushels/acre)crop yield (bushels/acre)

Experimental unit Each acre that receives a treatmentEach acre that receives a treatment

Factors (2)(2) Water and fertilizerWater and fertilizer

Levels (3 for Water; 3 for Fertilizer)(3 for Water; 3 for Fertilizer) Water: 1, 3, 5; Fertilizer: A, B, CWater: 1, 3, 5; Fertilizer: A, B, C

Treatments (9 = 3x3)(9 = 3x3) 1A, 3A, 5A, 1B, 3B, 5B, 1C, 3C, 5C1A, 3A, 5A, 1B, 3B, 5B, 1C, 3C, 5C

Total Treatments

A B C1 AC-FT Treatment 1 Treatment 2 Treatment 3

Water 3 AC-FT Treatment 4 Treatment 5 Treatment 65 AC-FT Treatment 7 Treatment 8 Treatment 9

Fertilizer

Single Factor ANOVABasic Assumptions

If we focus on only one factor (e.g. fertilizer type in the previous example), this is called single factor ANOVA. In this case, levels and treatments are the same thing since

there are no combinations between factors.

Assumptions for Single Factor ANOVA1. The distribution of each population in the comparison has a

normal distribution2. The standard deviations of each population (although

unknown) are assumed to be equal (i.e.

3. Sampling is:RandomIndependent

Example The university would like to know if the delivery mode of the

introductory statistics class affects the performance in the class as measured by the scores on the final exam.

The class is given in four different formats: Lecture Text Reading Videotape Internet

The final exam scores from random samples of students from each of the four teaching formats was recorded.

Samples

Summary

There is a single factor under observation – teaching format There are k = 4 different treatments (or levels of teaching

formats) The number of observations (experimental units) are n1 = 7,

n2 = 8, n3 = 6, n4 = 5 total number of observations, n = 26

72 x : ns)observatio 26 all (ofmean Grand

74 x 75, x 65, x 76, x :MeansTreatment 4321

Why aren’t all thex’s the same? There is variability due to the different treatments --

Between Treatment Variability Between Treatment Variability (Treatment)(Treatment) There is variability due to randomness within each

treatment -- Within Treatment Variability Within Treatment Variability (Error)(Error)

If the average Between Treatment VariabilityBetween Treatment Variability is “large”

compared to the average Within Treatment VariabilityWithin Treatment Variability,

we can reasonably conclude that there really are

differences among the population means (i.e. at least

one μj differs from the others).

BASIC CONCEPTBASIC CONCEPT

Basic Questions

Given this basic concept, the natural questions are: What is “variability” due to treatment and due to error

and how are they measured? What is “average variability” due to treatment and due

to error and how are they measured? What is “large”?

How much larger than the observed average variability due to error does the observed average variability due to treatment have to be before we are convinced that there are differences in the true population means (the µ’s)?

How Is “Total” Variability Measured?

Variability is defined as the Sum of Square DeviationsSum of Square Deviations (from the grand mean). So,

SSTSST (Total Sum of Squares) Sum of Squared Deviations of all observations from the

grand mean. (McClave uses SSTotal)

SSTrSSTr (Between Treatment Sum of Squares) Sum of Square Deviations Due to Different Treatments.

(McClave uses SST)

SSESSE (Within Treatment Sum of Squares) Sum of Square Deviations Due to Error

SST = SSTr + SSESST = SSTr + SSE

How is “Average” Variability Measured?

“Average” Variability is measured in:

Mean Square ValuesMean Square Values (MSTr and MSE) Found by dividing SSTr and SSE by their respective

degrees of freedom

VariabilityVariability SSSS DFDF Mean Square (MS)Mean Square (MS)

Between Tr. (Treatment) SSTr k-1 SSTr/DFTR

Within Tr. (Error) SSE n-k SSE/DFE

TOTAL SST n-1

ANOVA TABLEANOVA TABLE

# observations -1

# treatments -1 DFT - DFTR

Formula for CalculatingSST

Calculating SST

Just like the numerator of the variance assuming all (26) entries come from one population

4394 )7281(...7282

)x(x SST

22

2ij

Formula for Calculating SSTr

Calculating SSTr Between Treatment

Variability

Replace all entries within each treatment by its mean – now all the variability is between (not within) treatments

578)7274(5)7275(6)7265(8)7276(7

)xx(n SSTr

2222

2jj

76767676767676

757575757575

6565656565656565

7474747474

Formula for Calculating SSE

Calculating SSE (Within Treatment Variability)

The difference between the SST and SSTr ---

3816578-4394

SSTr - SST SSE

Can we Conclude a Difference Among the 4 Teaching Formats?

We conclude that at least one population mean differs from the others if the average between treatment variability is large compared to the average within treatment variability, that is if MSTr/MSE is “large”.

The ratio of the two measures of variability for these normally distributed random variables has an F F distributiondistribution and the F-statistic (=MSTr/MSE)F-statistic (=MSTr/MSE) is compared to a critical F-value from an F distribution with: Numerator degrees of freedom = DFTr Denominator degrees of freedom = DFE

If the ratio of MSTr to MSE (the F-statistic) exceeds the critical F-value, we can conclude that at least one at least one population mean differs from the otherspopulation mean differs from the others.

Can We Conclude Different Teaching Formats Affect Final Exam Scores?

The F-test

H0:

HA: At least one j differs from the others

Select α = .05.

Reject H0 (Accept HA) if:

3.05FF MSE

MSTr F .05,3,22DFEDFTr,α,

Hand Calculations for the F-test

173.4522

3816

DFE

SSE MSE

192.673

578

DFTr

SSTr MSTr

CannotCannot conclude there is a difference among the conclude there is a difference among the μμjj’s’s

3.051.11

1.11173.45

192.67F

Excel Approach

EXCEL OUTPUT

p-value = .365975 > .05p-value = .365975 > .05Cannot conclude differencesCannot conclude differences

REVIEW

ANOVA Situation and Terminology Response variable, Experimental Units, Factors,

Levels, Treatments, Error Basic Concept

If the “average variability” between treatments is “a lot” greater than the “average variability” due to error – conclude that at least one mean differs from the others.

Single Factor Analysis By Hand By Excel

Education

Anova single factor