of 41 /41
BIOSTATISTICS CORRELATION AND REGRESSION, ANOVA SHRIVARDHAN DHEEMAN GURUKUL KANGRI UNIVERSITY HARIDWAR

# Correlation and Regression; ANOVA

Embed Size (px)

DESCRIPTION

Citation preview

BIOSTATISTICSCORRELATION AND REGRESSION, ANOVA

SHRIVARDHAN DHEEMAN

GURUKUL KANGRI UNIVERSITY

HARIDWAR

2

CORRELATION/CORRELATION ANALYSIS

When we going to finding a relationship (if it exist) between the two variables

(bivariate) under study

TOOL WE USE

Correlation

Method and techniques used for studying and

measuring the extent of the relationship between

two variables

CorrelationAnalysis

3

FIRST TO UNDERSTAND TERM BIVARIATEExample of bivariate distribution

In field 10 plants

Height and flower

In class 60 students

Obtained marks in two subject by all of them

S. No.

Height of plant

Flower on plant

1 4 12

2 3 10

3 4 13

4 5 15

5 5 16

6 4 11

7 6 18

8 3 9

9 5 14

10 4 12

4

0 2 4 6 8 10 1202468

101214161820

Hight of plant Flower on plant

5

TYPES OF CORRELATION

Analytical

Positive

Negative

Graphical

Linear

Non-linear

6

POSITIVE CORRELATIONProceeding goes in a single direction:

e.g.

Turbidity in a culture and OD

Concentration of Antibiotic and Zone of clearance

NEGATIVE CORRELATIONProceeding goes in a diverse/different direction:

e.g.

Volume and Pressure of gas

Demand of grain and Price

7

LINEAR CORRELATION This correlation is categorized based upon the graphical

representation:

The correlation gives a linear straight graph representation says a linear correlation.

Change in one unit of one variable result in the corresponding change in the other variable over the entire range of value:

e.g. X 2 4 6 8 10

Y 7 13 19 25 31

8

• Unit change in the value of X, there is a constant change in the corresponding value of Y and the above data can be expressed by relation

In general two variable X and Y are said to be Linearly related, if these exist in a relation ship of the from

Where,

a and b are the real numbers.

9

1 2 3 4 50

5

10

15

20

25

30

35

Linear Correlation Graph

XY

10

NON-LINEAR CORRELATION

Relation between two non-linear if corresponding to a unit change in one variable, the other variable does not change at a constant rate.

But, change at fluctuating rate, So graph will not get a straight line

11

1 2 3 4 50

5

10

15

20

25

30

35

Non-Linear Correlation Graph

XY

12

COEFFICIENT OF CORRELATION

Measure of the degree of association between two variable is called coefficient of correlation (r):

If the two set of data have r = +1

Thus, Positive correlation

If the two set of data have r = -1

Thus, Negative correlation

If the two set of data have r = 0

Thus, Non-correlation

13

SOLVED EXAMPLE

Problem: Find the relationship between the Flower on plant is correlated with the height of plant

S. No.

Height of plant

Flower on plant

1 4 12

2 3 10

3 4 13

4 5 15

5 5 16

6 4 11

7 6 18

8 3 9

9 5 14

10 4 12

14

SOLUTIONS. No. Height

of plant (x)

Flower on plant (y)

x2 y2 xy

1 4 12 16 144 48

2 3 10 9 100 30

3 4 13 16 169 52

4 5 15 25 225 75

5 5 16 25 256 80

6 4 11 16 121 44

7 6 18 36 324 108

8 3 9 9 81 27

9 5 14 25 196 70

10 4 12 16 144 48

Total 43 130 193 1760 582

15

𝒓=10 .(582)−43 .130

√¿¿¿

𝒓=5820−5590

√¿¿¿

𝒓=230

√(𝟖𝟏¿)(700)¿

𝒓=230

√𝟓𝟔𝟕𝟎𝟎

𝒓=230

𝟐𝟑𝟖 .𝟏𝟏

𝒓=230

𝟐𝟑𝟖 .𝟏𝟏 𝒓=0 .9659

16

REGRESSION

If the two are significantly correlated and if there is some theoretical basis for doing so, it is possible to predict value of one variable from the other. This method to analyze so is called the Regression Analysis.

“Estimation or prediction of the unknown value of the variable from the known value of the other variable.

M. M. Blair has addressed that “ regression analysis is mathematical measure of the average relationship between two or more variables in terms of the original unit of the data.

17

REGRESSION EQUATION

Size of sample = n

And the two set of measures is denoted by the X and Y

We can predict the value of Y given the value of X for desirable size n denoted with the X’

Following the equation is used as Regression Equation:

Y=a+bX’

Where,

a and b = coefficient

18

EXAMPLEProblem: Nitrogen produced by the treatment plant in the mid term and final. Develop a regression equation which may be used to predict final yield from the mid term score.

Treatment plant Mid term Final

1 98 90

2 66 74

3 100 98

4 96 88

5 88 80

6 45 62

7 76 78

8 60 74

9 74 86

10 82 80

19

SOLUTION

Treatment plant

Mid term (x) Final (y) x2 xy

1 98 90 9064 8820

2 66 74 4356 4884

3 100 98 10000 9800

4 96 88 9216 8448

5 88 80 7744 7040

6 45 62 2025 2790

7 76 78 5776 5928

8 60 74 3600 4440

9 74 86 5476 6364

10 82 80 6724 6560

Total 785 810 64521 65071

20

Numerator of b = 10x65071-785x810

= 65710-635850

= 14860

Dominator of b = 64521-(785)2

= 645210-616225

= 28985

Therefore b = 14860/28985

= 0.5127

Numerator of a = 810-785x0.5127

= 810-402.4695

= 407.5305

Dominator of a = 10

21

Thus,

Value of a = numerator of a/dominator of a

= 407.5305/a

= 40.7531

considering the formula of regression equation:

Y=a+b(X’)

Y= predicting value

a = value obtained

b = value obtained

X’ = number of object for the prediction is desirable

Thus,

Y = 40.7531+(0.5127)50

= 40.7531+25.631

= 66.3881

22

ANOVA

23

ANOVA

ANALYSIS OF VARIANCE

• statistical hypothesis• Analysis of experimental data• Method

• Making decision by using data

• Calculated• By the null hypothesis and the sample data

“Assuming the truth of the Null Hypothesis statistically result can be justifies to reject and accept for predict the inference regarding variance of the data. If the variation analysis is predict as accept thus the variation is not significant and vice versa.”

When the graphical data representation obtained after ANOVA data lies in the graph and the two region of graph is obtained one in acceptance region where data support the hypothesis and another in rejection region where data doesn't support the hypothesis

Null hypothesis is denoted by H0

24

25

HISTORY OF ANOVA

In year 1827 La’Place address the ANOVA problem regarding measurement of atmosphere tides.

1918 Sir Ronald Fisher introduced the term Varience in his article published in same year under the title “the correlation between relative on the supposition of medallion inheritance”.

Fischer introduced the method of analysis in his book published in the year 1925 named “statistical method for research workers”

26

COMPONET OF MEASURE OF ANOVA: F TEST

For the comparison of variance from a mixed poputation. It is recommended for ANOVA, where two estimates of the variance of the same sample are compared. While the F test is not generally used against the departures from normality, it has been found to be robust in the special case of ANOVA.

Citation from Moore and Mc Cabe (2003); uses F test in ANOVA, but there are not the same as the F statistic for computing standard deviation of two population.

27

The F-test is used for comparisons of the components of the total deviation. For example, in one-way, or single factor ANOVA, statistical significance is tested for by comparing the F test statistic

28

WHAT IS ANOVA

ANOVA apply in all groups of simply random sample of the single population, so the treatment want to implies the same effect.

ANOVA as a statistical design of experiments

Experiment adjust the factors & measures response in an attempt to determine effect.

ANOVA is the synthesis of several ideas and it is used for multiple response/purpose. As a consequences, it is difficult to define concisely and precisely.

29

CHARACTERISTIC& LOGIC

Characteristics:

• Used in the analysis of comparative experiments

• Determine by the ratio of two variances

Logic:

• The calculation of ANOVA can be characterized a computing a number of means and variances, dividing two variation and comparing the ratio to determine statistical significance.

• An effect of any treatment is estimated by taking the difference between the mean of the observation which receive the treatment and the general mean.

30

31

32

33

TYPE OF ANOVA

One way ANOVA:

This ANOVA is analyze for the single hypothesis from the obtained data.

Hypothesis is null hypothesis

Single hypothesis is analyze the effect or factor of the variance in the random data of groups. Further by F-test a limit of acceptance and rejection is obtained under the factor of F-test the graph is plotted between F value and the obtained value of ANOVA analysis.

Example:

Problem: Nitrogen produced by the treated plant with Fertilizer

H0: nitrogen is produce due to fertilizer Vs. itself by the plant

34

TYPE OF ANOVA

Two way ANOVA:

This ANOVA has a significant difference from the one way ANOVA that from this analysis we can test two hypothesis simultaneously under the Null hypothesis

From the two hypothesis one is rejected and the another is accepted for the data.

Example:

Problem: Bacterial growth observed in CFU on the 28 solid media plate. Where temperature and pH are the factor of growth. If we want to test the factor so we have to test the two hypothesis:

H0: bacterial growth is inhibited due to temp Vs. pH

H0’: bacterial groth is enhanced due to temp. Vs. pH

35

36

37

38

39

40