Upload
irfanjunejo
View
219
Download
0
Embed Size (px)
Citation preview
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 1/11
QBA Term Report
SECTION A A New Way to Compute Pearson’s r WithoutReliance on Cross-Products
Submitted by
Irfan Junejo
Kantesh Rathi
Vali Mohammad
Instructor: Sir Rizwan Ahmed
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 2/11
2 | P a g e
Table of Contents
INTRODUCTION ............................................................................................................................................. 3
BY IRFAN JUNEJO ...................................................................................................................................... 3
THE TEACHING/COMPUTING STRATEGY ...................................................................................................... 6
BY KANTESH RATHI ................................................................................................................................... 6
COMMENTS ABOUT THE FORMULA ............................................................................................................. 8
BY VALI MOHAMMAD ............................................................................................................................... 8
EXAMPLE ....................................................................................................................................................... 9
BY IRFAN JUNEJO ...................................................................................................................................... 9
CONCLUSION ............................................................................................................................................... 10
BY VALI MOHAMMAD ............................................................................................................................. 10
INDEX .......................................................................................................................................................... 11
REFERENCES ................................................................................................................................................ 11
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 3/11
3 | P a g e
INTRODUCTIONBY IRFAN JUNEJO
The given article is taken from the journal ‘Teaching Statistics’ which is an international
journal for teachers and works under the banner of ‘Teaching Statistics Trust’ which is aregistered charity since 1979 and since then they have been publishing a journal thrice every
year.
In a recent article published in the same journal entitled ‘Correlation: From Picture to
Formula’, Peter Holmes1 (2001) accurately points out that scatter diagrams are very useful
when introducing students to the subject of correlation and makes it easier for them to judge
the relation between X and Y variables.
A scatter diagram is basically a tool for determining the potential relation between two
variables i.e. how one variable changes with the other one. The scatter diagram does not
however indicate the exact relation but it does indicate whether the variables are connectedor not.
For example, the scatter diagram below shows that there`s no relation between X and Y and
because all the data points don`t seem to make a distinguishable pattern.
However in this next graph on the left there`s a positive relation between the two variables
because as the value of one variable increases the other one also increases whereas the
scatter diagram on the right represents a negative relation.
1 Circulation Manager for ‘Teaching Statistics’
0
1
2
3
4
5
6
0 2 4 6
Y - A x i s
X - Axis
Data Points
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 4/11
4 | P a g e
In this manner the scatter diagrams helps in indicating the relation between the two variables
which is further explained under the next heading.
The scatter diagram helps in making a rough guess of r`s position which always lies between -
1 and +1. This r is the coefficient of correlation. Karl Pearson developed the correlation from
a similar but slightly different idea by Francis Galton. The coefficient of correlation i.e. r can
also be denoted by ρ. The diagram that follows explains how a scatter diagram helps the
students in making a fairly rough guess of the value of r.
Holmes states that a typical student can make reasonably fair predictions about the value of
‘r’ but they face difficulty is how the formula for Pearson`s ‘r’ to quantify its value and the
understanding that comes from observing the scatter diagram. Holmes tries to bridge the gap
between Pearson`s formula and a scatter diagram in a step by step fashion.
0
0.5
1
1.5
22.5
3
3.5
4
4.5
0 10 20
Y - A i x s
X - Axis
Data Points
0
5
10
15
20
25
0 10 20
Y - A i x s
X - Axis
Data Points
Figure 1 - http://en.wikipedia.org/wiki/Correlation_and_dependence
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 5/11
5 | P a g e
In the book ‘Comprehending Behavioral Statistics’, Dr. Russell Hurlbert2 also tries to bridge
the same gap between scatter diagram and r. Hurlbert first demonstrates how a tic-tac toe
grid can be super imposed on the data of the scatter diagram.
Followed by this superimposition Hurlbert argues that the data in the four corners (1, 3, 7,
and 9) of the grid have the most significant impact on the sign and magnitude of ‘r’. Lastly
Hurlbert computes the z-score cross products and then states that the Pearson`s correlation
is equal to the mean of these zxzy values.
Here`s how to compute the zxzy values:
X Zx Y Zy ZxZy 25 0.15 80 0.00 0.0014 -1.53 98 1.10 -1.6833 -1.38 50 -1.84 -2.5328 0.61 82 0.12 0.0820 -0.61 90 0.61 -0.38
∑ = 120 ∑ = 400 ∑ = -4.51Mean = 24 Mean = 80 r = -0.90SD = 6.54 SD = 16.3
The Z scores can be computed by subtracting the cell value from it Mean 3 and dividing the
whole by its Standard Deviation4. For example the Z score for the first class size is (25-24)/6.54 = 0.15
2Professor of psychology, University of Nevada
3For a data set mean is the sum of the observations divided by the number of observations.
4Standard Deviation shows how much variation there is from the "average" (mean).
Figure 2
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 6/11
6 | P a g e
For the value or r we sum up the ZxZy values and divide the sum by the total number of
observation i.e. 5 in this case.
Both Holmes and Hurlbert try to connect the scatter diagram with the correlation coefficient
by making use of the Z scores. The sum of the cross products although forms the numerator
for the value of r but both the authors state that there`s a better way to show that the
formula for r truly does quantify the qualitative understanding that one gets from looking at
the scatter diagram. The advantage for this alternative approach is that it does not rely on
the Z scores instead it involves the creation of separate ‘direct’ and ‘indirect’ components of
each score. These components, it is argued, are far more accordant with the intuitive ‘feel’
that one gets when looking at a scatter diagram.
THE TEACHING/COMPUTING STRATEGY
BY KANTESH RATHI
The best way for showing ”direct” and indirect influence of each data point in detail isstraight forward , closely understand the nature and strength of two variables (which aredependent to each other) relationship and investigate about those variables.
This procedure can be understood easily by four steps:
First convert all given scores on X and on Y axes into Z scores. This conversion will notaffect the Pearson product-moment correlation coefficient (sometimes referred to asthe PMCC, and typically denoted by r) is a measure of the correlation (lineardependence) between two variables X and Y, giving a value between +1 and −1inclusive. Students will be aware of this important needed feature of Pearson product-moment correlation coefficient (sometimes referred to as the PMCC, and typicallydenoted by r) if asked the question: ‘If we correlate centigrade and Farehinite, heightand weight, meters or centimeters or feet or inches affect the value of the correlationcoefficient.
Second , draw a scatter diagram with the Z-score Inside this scatter diagram, draw a
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 7/11
7 | P a g e
Line at a 45 angle from the origin to moving upward passing through centroid(centroid is the intersection of all hyper planes that divide X into two parts of equalmoment.). This line represent positive (direct ) relationship and reprent this line from(D) . Also draw another line that will be at 90 degree to D line and that should passfrom centroid. Second line show negative (indirect) relationship so represent this fromI.
Third, determine the projection of each data point on positive and negativelines ,measure the distances from each these projection of D and I points tocentroid and represent these distances, direct as d and indirect as i. Thedistance of positive line indicates direct influence on r and the distance fromnegative line indicates indirect influence on r.
Finally, after getting the value of I and D distances, we can compute the valueof r by doing squared of these values , summed and then put into the followingformula so as the value for Pearson product-moment correlation coefficient : r
0
20
40
60
80
0 20 40 60 80
Positive
0
10
20
30
40
50
60
70
0 20 40 60 80
Negative
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 8/11
8 | P a g e
COMMENTS ABOUT THE FORMULA
BY VALI MOHAMMAD
As we observe the above formula we can see that r will produce a positive value when thedistances d are large and the distances i are small. This situation will cause the r to produce a
positive value because this will create a compact path hence causing the i distances to beminute and much lower than the d distances hence producing a scatter diagram which ismoving from lower left to upper right on the other hand r will produce a negative result inthe case when the data points would form a cluster and are moving on the line I or in otherwords moving perpendicular to D (as can be seen from the figure above).
Another unique feature of the formula is that both will equal zero no matter whatthe data is or what the relationship between X and Y is. This means that it would be useless
and a waste of time to calculate the value of and also (i.e. the sum of theunsquared deviation scores) when measuring dispersion in the univariate case. So for peoplewho were wondering if they could find ds and ts and then divide the difference between theirsums: would have had their queries solved by the above statement and developed a clear
approach on how to use the formula best.
By looking at the above diagram some people may wonder what kind of a confusing diagram it
is and may form the opinion that it is quite difficult to find the values of as they arerepresented through the perpendicular axes rather than the vertical and horizontal axes
labeled but instead if they take a closer look they will realize that are
simple functions of (as shown below)
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 9/11
9 | P a g e
Through the above formulas we can easily calculate the value of and plug the valuesin the formula to get the final answer however it should be kept in mind that we require the
values of and not to calculate r so don’t go on wasting your time onsomething that is not needed instead utilize your time on the given requirements. However
you can calculate the values of and square then to get the values of :
To calculate the values of we must first determine what sign (positive or negative)
does to d or t posses. This signs of can be calculated using a set of rules which are:
The sign of d for any data point will bePositive if that data point’s z-scores meet anyone of these three conditions:(a) Both zx and zy are positive,
(b) zy is positive, zx is negative, and zy > |zx|, or(c) zx is positive, zy is negative, and zx > |zy|.If none of those conditions hold, then d will be negative. A similar set of rules can applied todetermine the sign of the i values. So we can see it is much easier to calculate the values o f
rather than .
EXAMPLEBY IRFAN JUNEJO
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 10/11
7/31/2019 QBA Report
http://slidepdf.com/reader/full/qba-report 11/11
11 | P a g e
INDEX
C
centroid 7
Correlation 3
F
Francis Galton 4
H
Hurlbert 5, 6
K
Karl Pearson 4
P
Pearson 6
Pythagoras 10
S
scatter diagram 3, 4, 5, 6, 8, 10
T
Teaching Statistics 3
Teaching Statistics Trust 3
Z
Z scores 5, 6, 10
REFERENCES
1. http://en.wikipedia.org/wiki/Karl_Pearson 2. http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient 3. http://wps.prenhall.com/wps/media/objects/2497/2557809/MEDIA/Ch3/learnmorech
3.pdf 4. Research Methods and Statistics: A Critical Thinking Approach by Sherri L. Jackson
5. Statistics for People Who (Think They) Hate Statistics: Excel 2007 Edition by Neil J.
Salkind