Upload
shrivardhan-dheeman
View
556
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
BIOSTATISTICSCORRELATION AND REGRESSION, ANOVA
SHRIVARDHAN DHEEMAN
GURUKUL KANGRI UNIVERSITY
HARIDWAR
2
CORRELATION/CORRELATION ANALYSIS
When we going to finding a relationship (if it exist) between the two variables
(bivariate) under study
TOOL WE USE
Correlation
Method and techniques used for studying and
measuring the extent of the relationship between
two variables
CorrelationAnalysis
3
FIRST TO UNDERSTAND TERM BIVARIATEExample of bivariate distribution
will clear your concept:
In field 10 plants
Height and flower
In class 60 students
Obtained marks in two subject by all of them
S. No.
Height of plant
Flower on plant
1 4 12
2 3 10
3 4 13
4 5 15
5 5 16
6 4 11
7 6 18
8 3 9
9 5 14
10 4 12
4
0 2 4 6 8 10 1202468
101214161820
Hight of plant Flower on plant
5
TYPES OF CORRELATION
Analytical
Positive
Negative
Graphical
Linear
Non-linear
6
POSITIVE CORRELATIONProceeding goes in a single direction:
e.g.
Turbidity in a culture and OD
Concentration of Antibiotic and Zone of clearance
NEGATIVE CORRELATIONProceeding goes in a diverse/different direction:
e.g.
Volume and Pressure of gas
Demand of grain and Price
7
LINEAR CORRELATION This correlation is categorized based upon the graphical
representation:
The correlation gives a linear straight graph representation says a linear correlation.
Change in one unit of one variable result in the corresponding change in the other variable over the entire range of value:
e.g. X 2 4 6 8 10
Y 7 13 19 25 31
8
• Unit change in the value of X, there is a constant change in the corresponding value of Y and the above data can be expressed by relation
In general two variable X and Y are said to be Linearly related, if these exist in a relation ship of the from
Where,
a and b are the real numbers.
9
1 2 3 4 50
5
10
15
20
25
30
35
Linear Correlation Graph
XY
10
NON-LINEAR CORRELATION
Relation between two non-linear if corresponding to a unit change in one variable, the other variable does not change at a constant rate.
But, change at fluctuating rate, So graph will not get a straight line
11
1 2 3 4 50
5
10
15
20
25
30
35
Non-Linear Correlation Graph
XY
12
COEFFICIENT OF CORRELATION
Measure of the degree of association between two variable is called coefficient of correlation (r):
If the two set of data have r = +1
Thus, Positive correlation
If the two set of data have r = -1
Thus, Negative correlation
If the two set of data have r = 0
Thus, Non-correlation
13
SOLVED EXAMPLE
Problem: Find the relationship between the Flower on plant is correlated with the height of plant
S. No.
Height of plant
Flower on plant
1 4 12
2 3 10
3 4 13
4 5 15
5 5 16
6 4 11
7 6 18
8 3 9
9 5 14
10 4 12
14
SOLUTIONS. No. Height
of plant (x)
Flower on plant (y)
x2 y2 xy
1 4 12 16 144 48
2 3 10 9 100 30
3 4 13 16 169 52
4 5 15 25 225 75
5 5 16 25 256 80
6 4 11 16 121 44
7 6 18 36 324 108
8 3 9 9 81 27
9 5 14 25 196 70
10 4 12 16 144 48
Total 43 130 193 1760 582
15
𝒓=10 .(582)−43 .130
√¿¿¿
𝒓=5820−5590
√¿¿¿
𝒓=230
√(𝟖𝟏¿)(700)¿
𝒓=230
√𝟓𝟔𝟕𝟎𝟎
𝒓=230
𝟐𝟑𝟖 .𝟏𝟏
𝒓=230
𝟐𝟑𝟖 .𝟏𝟏 𝒓=0 .9659
16
REGRESSION
If the two are significantly correlated and if there is some theoretical basis for doing so, it is possible to predict value of one variable from the other. This method to analyze so is called the Regression Analysis.
“Estimation or prediction of the unknown value of the variable from the known value of the other variable.
M. M. Blair has addressed that “ regression analysis is mathematical measure of the average relationship between two or more variables in terms of the original unit of the data.
17
REGRESSION EQUATION
Size of sample = n
And the two set of measures is denoted by the X and Y
We can predict the value of Y given the value of X for desirable size n denoted with the X’
Following the equation is used as Regression Equation:
Y=a+bX’
Where,
a and b = coefficient
18
EXAMPLEProblem: Nitrogen produced by the treatment plant in the mid term and final. Develop a regression equation which may be used to predict final yield from the mid term score.
Treatment plant Mid term Final
1 98 90
2 66 74
3 100 98
4 96 88
5 88 80
6 45 62
7 76 78
8 60 74
9 74 86
10 82 80
19
SOLUTION
Treatment plant
Mid term (x) Final (y) x2 xy
1 98 90 9064 8820
2 66 74 4356 4884
3 100 98 10000 9800
4 96 88 9216 8448
5 88 80 7744 7040
6 45 62 2025 2790
7 76 78 5776 5928
8 60 74 3600 4440
9 74 86 5476 6364
10 82 80 6724 6560
Total 785 810 64521 65071
20
Numerator of b = 10x65071-785x810
= 65710-635850
= 14860
Dominator of b = 64521-(785)2
= 645210-616225
= 28985
Therefore b = 14860/28985
= 0.5127
Numerator of a = 810-785x0.5127
= 810-402.4695
= 407.5305
Dominator of a = 10
21
Thus,
Value of a = numerator of a/dominator of a
= 407.5305/a
= 40.7531
considering the formula of regression equation:
Y=a+b(X’)
Y= predicting value
a = value obtained
b = value obtained
X’ = number of object for the prediction is desirable
Thus,
Y = 40.7531+(0.5127)50
= 40.7531+25.631
= 66.3881
22
ANOVA
23
ANOVA
ANALYSIS OF VARIANCE
• statistical hypothesis• Analysis of experimental data• Method
• Making decision by using data
• Calculated• By the null hypothesis and the sample data
“Assuming the truth of the Null Hypothesis statistically result can be justifies to reject and accept for predict the inference regarding variance of the data. If the variation analysis is predict as accept thus the variation is not significant and vice versa.”
When the graphical data representation obtained after ANOVA data lies in the graph and the two region of graph is obtained one in acceptance region where data support the hypothesis and another in rejection region where data doesn't support the hypothesis
Null hypothesis is denoted by H0
24
25
HISTORY OF ANOVA
In year 1827 La’Place address the ANOVA problem regarding measurement of atmosphere tides.
1918 Sir Ronald Fisher introduced the term Varience in his article published in same year under the title “the correlation between relative on the supposition of medallion inheritance”.
Fischer introduced the method of analysis in his book published in the year 1925 named “statistical method for research workers”
26
COMPONET OF MEASURE OF ANOVA: F TEST
For the comparison of variance from a mixed poputation. It is recommended for ANOVA, where two estimates of the variance of the same sample are compared. While the F test is not generally used against the departures from normality, it has been found to be robust in the special case of ANOVA.
Citation from Moore and Mc Cabe (2003); uses F test in ANOVA, but there are not the same as the F statistic for computing standard deviation of two population.
27
The F-test is used for comparisons of the components of the total deviation. For example, in one-way, or single factor ANOVA, statistical significance is tested for by comparing the F test statistic
28
WHAT IS ANOVA
ANOVA apply in all groups of simply random sample of the single population, so the treatment want to implies the same effect.
ANOVA as a statistical design of experiments
Experiment adjust the factors & measures response in an attempt to determine effect.
ANOVA is the synthesis of several ideas and it is used for multiple response/purpose. As a consequences, it is difficult to define concisely and precisely.
29
CHARACTERISTIC& LOGIC
Characteristics:
• Used in the analysis of comparative experiments
• Determine by the ratio of two variances
Logic:
• The calculation of ANOVA can be characterized a computing a number of means and variances, dividing two variation and comparing the ratio to determine statistical significance.
• An effect of any treatment is estimated by taking the difference between the mean of the observation which receive the treatment and the general mean.
30
31
32
33
TYPE OF ANOVA
One way ANOVA:
This ANOVA is analyze for the single hypothesis from the obtained data.
Hypothesis is null hypothesis
Single hypothesis is analyze the effect or factor of the variance in the random data of groups. Further by F-test a limit of acceptance and rejection is obtained under the factor of F-test the graph is plotted between F value and the obtained value of ANOVA analysis.
Example:
Problem: Nitrogen produced by the treated plant with Fertilizer
H0: nitrogen is produce due to fertilizer Vs. itself by the plant
34
TYPE OF ANOVA
Two way ANOVA:
This ANOVA has a significant difference from the one way ANOVA that from this analysis we can test two hypothesis simultaneously under the Null hypothesis
From the two hypothesis one is rejected and the another is accepted for the data.
Example:
Problem: Bacterial growth observed in CFU on the 28 solid media plate. Where temperature and pH are the factor of growth. If we want to test the factor so we have to test the two hypothesis:
H0: bacterial growth is inhibited due to temp Vs. pH
H0’: bacterial groth is enhanced due to temp. Vs. pH
35
36
37
38
39
40