167
Multivariate Data

Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Embed Size (px)

Citation preview

Page 1: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Multivariate Data

Page 2: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Descriptive techniques for Multivariate data

In most research situations data is collected on more than one variable (usually many variables)

Page 3: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Graphical Techniques

• The scatter plot

• The two dimensional Histogram

Page 4: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The Scatter Plot

For two variables X and Y we will have a measurements for each variable on each case:

xi, yi

xi = the value of X for case i

and

yi = the value of Y for case i.

Page 5: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

To Construct a scatter plot we plot the points:

(xi, yi)

for each case on the X-Y plane.

(xi, yi)

xi

yi

Page 6: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

 Data Set #3

The following table gives data on Verbal IQ, Math IQ,Initial Reading Acheivement Score, and Final Reading Acheivement Score

for 23 students who have recently completed a reading improvement program 

Initial FinalVerbal Math Reading Reading

Student IQ IQ Acheivement Acheivement 

1 86 94 1.1 1.72 104 103 1.5 1.73 86 92 1.5 1.94 105 100 2.0 2.05 118 115 1.9 3.56 96 102 1.4 2.47 90 87 1.5 1.88 95 100 1.4 2.09 105 96 1.7 1.7

10 84 80 1.6 1.711 94 87 1.6 1.712 119 116 1.7 3.113 82 91 1.2 1.814 80 93 1.0 1.715 109 124 1.8 2.516 111 119 1.4 3.017 89 94 1.6 1.818 99 117 1.6 2.619 94 93 1.4 1.420 99 110 1.4 2.021 95 97 1.5 1.322 102 104 1.7 3.123 102 93 1.6 1.9

Page 7: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Scatter Plot

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120 140

Verbal IQ

Mat

h I

Q

Page 8: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Scatter Plot

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120 140

Verbal IQ

Mat

h I

Q

(84,80)

Page 9: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Scatter Plot

60

70

80

90

100

110

120

130

60 70 80 90 100 110 120 130

Verbal IQ

Mat

h I

Q

Page 10: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Some Scatter Patterns

Page 11: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

-100

-50

0

50

100

150

200

250

40 60 80 100 120 140

Page 12: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

-100

-50

0

50

100

150

200

250

40 60 80 100 120 140

Page 13: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• Circular

• No relationship between X and Y

• Unable to predict Y from X

Page 14: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

Page 15: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

Page 16: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• Ellipsoidal

• Positive relationship between X and Y

• Increases in X correspond to increases in Y (but not always)

• Major axis of the ellipse has positive slope

Page 17: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

Page 18: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Example

Verbal IQ, MathIQ

Page 19: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Scatter Plot

60

70

80

90

100

110

120

130

60 70 80 90 100 110 120 130

Verbal IQ

Mat

h I

Q

Page 20: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Some More Patterns

Page 21: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 22: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 23: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• Ellipsoidal (thinner ellipse)

• Stronger positive relationship between X and Y

• Increases in X correspond to increases in Y (more freqequently)

• Major axis of the ellipse has positive slope

• Minor axis of the ellipse much smaller

Page 24: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 25: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• Increased strength in the positive relationship between X and Y

• Increases in X correspond to increases in Y (almost always)

• Minor axis of the ellipse extremely small in relationship to the Major axis of the ellipse.

Page 26: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 27: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 28: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• Perfect positive relationship between X and Y

• Y perfectly predictable from X

• Data falls exactly along a straight line with positive slope

Page 29: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 30: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 31: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• Ellipsoidal

• Negative relationship between X and Y

• Increases in X correspond to decreases in Y (but not always)

• Major axis of the ellipse has negative slope slope

Page 32: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 33: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• The strength of the relationship can increase until changes in Y can be perfectly predicted from X

Page 34: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 35: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 36: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 37: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 38: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 39: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Some Non-Linear Patterns

Page 40: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

200

400

600

800

1000

1200

-20 -10 0 10 20 30 40 50

Page 41: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

200

400

600

800

1000

1200

-20 -10 0 10 20 30 40 50

Page 42: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• In a Linear pattern Y increase with respect to X at a constant rate

• In a Non-linear pattern the rate that Y increases with respect to X is variable

Page 43: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Growth Patterns

Page 44: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

-20

0

20

40

60

80

100

120

0 10 20 30 40 50

Page 45: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

-150

-100

-50

0

50

100

150

0 10 20 30 40 50

-20

0

20

40

60

80

100

120

0 10 20 30 40 50

Page 46: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• Growth patterns frequently follow a sigmoid curve

• Growth at the start is slow

• It then speeds up

• Slows down again as it reaches it limiting size

0

20

40

60

80

100

120

0 10 20 30 40 50

Page 47: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Reviewthe scatter plot

Page 48: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Some Scatter Patterns

-100

-50

0

50

100

150

200

250

40 60 80 100 120 140

-100

-50

0

50

100

150

200

250

40 60 80 100 120 140 0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

• Circular

• No relationship between X and Y

• Unable to predict Y from X

Ellipsoidal

• Positive relationship between X and Y

• Increases in X correspond to increases in Y (but not always)

• Major axis of the ellipse has positive slope

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 49: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Ellipsoidal

• Negative relationship between X and Y

• Increases in X correspond to decreases in Y (but not always)

• Major axis of the ellipse has negative slope slope

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140 0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 50: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Non-Linear Patterns

0

200

400

600

800

1000

1200

-20 -10 0 10 20 30 40 50

-20

0

20

40

60

80

100

120

0 10 20 30 40 50

Page 51: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Measures of strength of a relationship (Correlation)

• Pearson’s correlation coefficient (r)

• Spearman’s rank correlation coefficient (rho, )

Page 52: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Assume that we have collected data on two variables X and Y. Let

(x1, y1) (x2, y2) (x3, y3) … (xn, yn)

denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)

Page 53: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

From this data we can compute summary statistics for each variable.

The means

and

n

xx

n

ii

1

n

yy

n

ii

1

Page 54: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The standard deviations

and

11

2

n

xxs

n

ii

x

11

2

n

yys

n

ii

y

Page 55: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

These statistics:

• give information for each variable separately

but

• give no information about the relationship between the two variables

x yxs ys

Page 56: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Consider the statistics:

n

iixx xxS

1

2

n

iiyy yyS

1

2

n

iiixy yyxxS

1

Page 57: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The first two statistics:

• are used to measure variability in each variable

• they are used to compute the sample standard deviations

n

iixx xxS

1

2

n

iiyy yyS

1

2and

1

n

Ss xx

x 1

n

Ss yy

y

Page 58: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The third statistic:

• is used to measure correlation• If two variables are positively related the sign of

will agree with the sign of

n

iiixy yyxxS

1

xxi

yyi

Page 59: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

•When is positive will be positive.

•When xi is above its mean, yi will be above its

mean

•When is negative will be negative.

•When xi is below its mean, yi will be below its

mean

The product will be positive for most cases.

xxi yyi

xxi yyi

yyxx ii

Page 60: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many
Page 61: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

This implies that the statistic

• will be positive

• Most of the terms in this sum will be positive

n

iiixy yyxxS

1

Page 62: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

On the other hand

• If two variables are negatively related the sign of

will be opposite in sign to

xxi

yyi

Page 63: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

•When is positive will be negative.

•When xi is above its mean, yi will be below its

mean

•When is negative will be positive.

•When xi is below its mean, yi will be above its

mean

The product will be negative for most cases.

xxi yyi

xxi yyi

yyxx ii

Page 64: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Again implies that the statistic

• will be negative

• Most of the terms in this sum will be negative

n

iiixy yyxxS

1

Page 65: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Pearsons correlation coefficient is defined as below:

n

ii

n

ii

n

iii

yyxx

xy

yyxx

yyxx

SS

Sr

1

2

1

2

1

Page 66: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The denominator:

is always positive

n

ii

n

ii yyxx

1

2

1

2

Page 67: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The numerator:

• is positive if there is a positive relationship between X ad Y and

• negative if there is a negative relationship between X ad Y.

• This property carries over to Pearson’s correlation coefficient r

n

iii yyxx

1

Page 68: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Properties of Pearson’s correlation coefficient r

1. The value of r is always between –1 and +1.2. If the relationship between X and Y is positive, then

r will be positive.3. If the relationship between X and Y is negative,

then r will be negative.4. If there is no relationship between X and Y, then r

will be zero.

5. The value of r will be +1 if the points, (xi, yi) lie on a straight line with positive slope.

6. The value of r will be -1 if the points, (xi, yi) lie on a straight line with negative slope.

Page 69: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r =1

Page 70: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = 0.95

Page 71: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = 0.7

Page 72: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

r = 0.4

Page 73: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

-100

-50

0

50

100

150

200

250

40 60 80 100 120 140

r = 0

Page 74: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -0.4

Page 75: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -0.7

Page 76: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -0.8

Page 77: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -0.95

Page 78: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

40 60 80 100 120 140

r = -1

Page 79: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Computing formulae for the statistics:

n

iixx xxS

1

2

n

iiyy yyS

1

2

n

iiixy yyxxS

1

Page 80: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

n

x

xxxS

n

iin

ii

n

iixx

2

1

1

2

1

2

n

yx

yx

n

ii

n

iin

iii

11

1

n

y

yyyS

n

iin

ii

n

iiyy

2

1

1

2

1

2

n

iiixy yyxxS

1

Page 81: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

To compute

first compute

Then

xxS yyS xyS

n

iixC

1

2

n

iii yxE

1

n

iiyD

1

2

n

iiyB

1

n

iixA

1

n

ACSxx

2

n

BDS yy

2

n

BAESxy

Page 82: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Example

Verbal IQ, MathIQ

Page 83: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

 Data Set #3

The following table gives data on Verbal IQ, Math IQ,Initial Reading Acheivement Score, and Final Reading Acheivement Score

for 23 students who have recently completed a reading improvement program 

Initial FinalVerbal Math Reading Reading

Student IQ IQ Acheivement Acheivement 

1 86 94 1.1 1.72 104 103 1.5 1.73 86 92 1.5 1.94 105 100 2.0 2.05 118 115 1.9 3.56 96 102 1.4 2.47 90 87 1.5 1.88 95 100 1.4 2.09 105 96 1.7 1.7

10 84 80 1.6 1.711 94 87 1.6 1.712 119 116 1.7 3.113 82 91 1.2 1.814 80 93 1.0 1.715 109 124 1.8 2.516 111 119 1.4 3.017 89 94 1.6 1.818 99 117 1.6 2.619 94 93 1.4 1.420 99 110 1.4 2.021 95 97 1.5 1.322 102 104 1.7 3.123 102 93 1.6 1.9

Page 84: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Scatter Plot

60

70

80

90

100

110

120

130

60 70 80 90 100 110 120 130

Verbal IQ

Mat

h I

Q

Page 85: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Now

Hence

2214941

2

n

iix 227199

1

n

iii yx234363

1

2

n

iiy

23071

n

iiy2244

1

n

iix

652.255723

2244221494

2

xxS

87.296023

2307234363

2

yyS

043.2116

23

23072244227199 xyS

Page 86: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Thus Pearsons correlation coefficient is:

yyxx

xy

SS

Sr

769.087.2960652.2557

043.2116

Page 87: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Thus r = 0.769

• Verbal IQ and Math IQ are positively correlated.

• If Verbal IQ is above (below) the mean then for most cases Math IQ will also be above (below) the mean.

Page 88: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Is the improvement in reading achievement (RA) related to either Verbal IQ or Math IQ?

improvement in RA = Final RA – Initial RA

Page 89: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The Data

Student Math IQ Verbal IQ Initial RA Final RA Imp RA1 86 94 1.1 1.7 0.62 104 103 1.5 1.7 0.23 86 92 1.5 1.9 0.44 105 100 2 2 05 118 115 1.9 3.5 1.66 96 102 1.4 2.4 17 90 87 1.5 1.8 0.38 95 100 1.4 2 0.69 105 96 1.7 1.7 010 84 80 1.6 1.7 0.111 94 87 1.6 1.7 0.112 119 116 1.7 3.1 1.413 82 91 1.2 1.8 0.614 80 93 1 1.7 0.715 109 124 1.8 2.5 0.716 111 119 1.4 3 1.617 89 94 1.6 1.8 0.218 99 117 1.6 2.6 119 94 93 1.4 1.4 020 99 110 1.4 2 0.621 95 97 1.5 1.3 -0.222 102 104 1.7 3.1 1.423 102 93 1.6 1.9 0.3

r = 0.48469

Correlation between Math IQ and RA Improvement

Correlation between Verbal IQ and RA Improvement

r = 0.68318

Page 90: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

r = 0.48469Scatterplot: Math IQ vs RA Improvement

-0.4

0.1

0.6

1.1

1.6

70 80 90 100 110 120

Page 91: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Scatterplot: Verbal IQ vs RA Improvement

r = 0.68318

-0.4

0

0.4

0.8

1.2

1.6

70 80 90 100 110 120 130

Page 92: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Spearman’s rank

correlation coefficient

(rho)

Page 93: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Spearman’s rank correlation coefficient (rho)

Spearman’s rank correlation coefficient is computed as follows:• Arrange the observations on X in increasing order and assign them the ranks 1, 2, 3, …, n• Arrange the observations on Y in increasing order and assign them the ranks 1, 2, 3, …, n.

•For any case (i) let (xi, yi) denote the observations on X and Y and let (ri, si) denote the ranks on X and Y.

Page 94: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• If the variables X and Y are strongly positively correlated the ranks on X should generally agree with the ranks on Y. (The largest X should be the largest Y, The smallest X should be the smallest Y).

• If the variables X and Y are strongly negatively correlated the ranks on X should in the reverse order to the ranks on Y. (The largest X should be the smallest Y, The smallest X should be the largest Y).

• If the variables X and Y are uncorrelated the ranks on X should randomly distributed with the ranks on Y.

Page 95: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Spearman’s rank correlation coefficient

is defined as follows:

For each case let di = ri – si = difference in the two ranks.

Then Spearman’s rank correlation coefficient () is defined as follows:

1

61

21

2

nn

dn

ii

Page 96: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Properties of Spearman’s rank correlation coefficient 1. The value of is always between –1 and +1.2. If the relationship between X and Y is positive, then

will be positive.3. If the relationship between X and Y is negative,

then will be negative.4. If there is no relationship between X and Y, then

will be zero.5. The value of will be +1 if the ranks of X

completely agree with the ranks of Y.6. The value of will be -1 if the ranks of X are in

reverse order to the ranks of Y.

Page 97: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Examplexi 25.0 33.9 16.7 37.4 24.6 17.3 40.2

yi 24.3 38.7 13.4 32.1 28.0 12.5 44.9

Ranking the X’s and the Y’s we get:

ri 4 5 1 6 3 2 7

si 3 6 2 5 4 1 7

Computing the differences in ranks gives us:

di 1 -1 -1 1 -1 1 0

61

2

n

iid

Page 98: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

1

61

21

2

nn

dn

ii

177

661

2

47

31

487

361

893.028

25

Page 99: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Computing Pearsons correlation coefficient, r, for the same problem:

n

ii

n

ii

n

iii

yyxx

xy

yyxx

yyxx

SS

Sr

1

2

1

2

1

Page 100: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

n

x

xxxS

n

iin

ii

n

iixx

2

1

1

2

1

2

n

yx

yx

n

ii

n

iin

iii

11

1

n

y

yyyS

n

iin

ii

n

iiyy

2

1

1

2

1

2

n

iiixy yyxxS

1

Page 101: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

To compute

first compute

xxS yyS xyS

35.59721

2

n

iixC

78.60531

n

iii yxE

41.62541

2

n

iiyD

9.1931

n

iiyB1.195

1

n

iixA

Page 102: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Then

63.5347

1.19535.5972

22

n

ACSxx

38.8837

9.19341.6254

22

n

BDS yy

51.649

7

9.1931.19578.6053

n

BAESxy

Page 103: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

and

Compare with

945.038.88363.534

51.649r

893.0

Page 104: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Comments: Spearman’s rank correlation coefficient and Pearson’s correlation coefficient r

1. The value of can also be computed from:

2. Spearman’s is Pearson’s r computed from the ranks.

n

ii

n

ii

n

iii

ssrr

ssrr

1

2

1

2

1

Page 105: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

3. Spearman’s is less sensitive to extreme observations. (outliers)

4. The value of Pearson’s r is much more sensitive to extreme outliers.

This is similar to the comparison between the median and the mean, the standard deviation and the pseudo-standard deviation. The mean and standard deviation are more sensitive to outliers than the median and pseudo- standard deviation.

Page 106: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Scatter plots

Page 107: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Some Scatter Patterns

-100

-50

0

50

100

150

200

250

40 60 80 100 120 140

-100

-50

0

50

100

150

200

250

40 60 80 100 120 140 0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

• Circular

• No relationship between X and Y

• Unable to predict Y from X

Ellipsoidal

• Positive relationship between X and Y

• Increases in X correspond to increases in Y (but not always)

• Major axis of the ellipse has positive slope

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 108: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Ellipsoidal

• Negative relationship between X and Y

• Increases in X correspond to decreases in Y (but not always)

• Major axis of the ellipse has negative slope slope

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140 0

20

40

60

80

100

120

140

40 60 80 100 120 140

0

20

40

60

80

100

120

140

40 60 80 100 120 140

Page 109: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Non-Linear Patterns

0

200

400

600

800

1000

1200

-20 -10 0 10 20 30 40 50

-20

0

20

40

60

80

100

120

0 10 20 30 40 50

Page 110: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Measuring correlation

1. Pearson’s correlation coefficient r

2. Spearman’s rank correlation coefficient

n

ii

n

ii

n

iii

yyxx

xy

yyxx

yyxx

SS

Sr

1

2

1

2

1

iii

n

ii

srdnn

d

,

1

61

21

2

Page 111: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Simple Linear Regression

Fitting straight lines to data

Page 112: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The Least Squares Line The Regression Line

• When data is correlated it falls roughly about a straight line.

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

Page 113: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

In this situation wants to:• Find the equation of the straight line through

the data that yields the best fit.

The equation of any straight line:is of the form:

Y = a + bX

b = the slope of the linea = the intercept of the line

Page 114: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

a

Run = x2-x1

Rise = y2-y1

b =RiseRun x2-x1

=y2-y1

Page 115: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

• a is the value of Y when X is zero

• b is the rate that Y increases per unit increase in X.

• For a straight line this rate is constant.

• For non linear curves the rate that Y increases per unit increase in X varies with X.

Page 116: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Linear

Page 117: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

0 10 20 30 40 50

Non-linear

Page 118: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Age Class 30-40 40-50 50-60 60-70 70-80Mipoint Age (X) 35 45 55 65 75Median BP (Y) 114 124 143 158 166

Example: In the following example both blood pressure and age were measure for each female subject. Subjects were grouped into age classes and the median Blood Pressure measurement was computed for each age class. He data are summarized below:

Page 119: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

160

180

200

0 10 20 30 40 50 60 70 80

Y = 65.1 + 1.38 X

Graph:

Page 120: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Interpretation of the slope and intercept

1. Intercept – value of Y at X = 0.– Predicted Blood pressure of a newborn (65.1).– This interpretation remains valid only if

linearity is true down to X = 0.

2. Slope – rate of increase in Y per unit increase in X.

– Blood Pressure increases 1.38 units each year.

Page 121: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The Least Squares Line

Fitting the best straight line

to “linear” data

Page 122: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Reasons for fitting a straight line to data

1. It provides a precise description of the relationship between Y and X.

2. The interpretation of the parameters of the line (slope and intercept) leads to an improved understanding of the phenomena that is under study.

3. The equation of the line is useful for prediction of the dependent variable (Y) from the independent variable (X).

Page 123: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Assume that we have collected data on two variables X and Y. Let

(x1, y1) (x2, y2) (x3, y3) … (xn, yn)

denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)

Page 124: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

LetY = a + b X

denote an arbitrary equation of a straight line.a and b are known values.This equation can be used to predict for each value of X, the value of Y.

For example, if X = xi (as for the ith case) then the predicted value of Y is:

ii bxay ˆ

Page 125: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

For example if

Y = a + b X = 25.2 + 2.0 X

Is the equation of the straight line.

and if X = xi = 20 (for the ith case) then the

predicted value of Y is:

2.65200.22.25ˆ ii bxay

Page 126: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

If the actual value of Y is yi = 70.0 for case i, then the difference

is the error in the prediction for case i.

is also called the residual for case i

8.42.6570ˆ ii yy

iiiii bxayyyr ˆ

Page 127: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

If the residual

can be computed for each case in the sample,

The residual sum of squares (RSS) is

a measure of the “goodness of fit of the line

Y = a + bX to the data

iiiii bxayyyr ˆ

,ˆ,,ˆ,ˆ 222111 nnn yyryyryyr

n

iii

n

iii

n

ii bxayyyrRSS

1

2

1

2

1

2 ˆ

Page 128: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

X

Y=a+bX

Y

(x1,y1)

(x2,y2)

(x3,y3)

(x4,y4)

r1

r2

r3 r4

Page 129: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The optimal choice of a and b will result in the residual sum of squares

attaining a minimum.

If this is the case than the line:

Y = a + bX

is called the Least Squares Line

n

iii

n

iii

n

ii bxayyyrRSS

1

2

1

2

1

2 ˆ

Page 130: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

R.S.S = 3389.9

0

10

20

30

40

50

60

70

0 10 20 30 40 50

Y = 10 + (0.5)X

Page 131: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

R.S.S = 1861.9

0

10

20

30

40

50

60

70

0 10 20 30 40 50

Y = 15 + (0.5)X

Page 132: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

R.S.S = 833.9

0

10

20

30

40

50

60

70

0 10 20 30 40 50

Y = 20 + (0.5)X

Page 133: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

R.S.S = 883.1

0

10

20

30

40

50

60

70

0 10 20 30 40 50

Y = 20 + (1)X

Page 134: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

R.S.S = 303.98

0

10

20

30

40

50

60

70

0 10 20 30 40 50

Y = 20 + (0.7)X

Page 135: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

R.S.S = 225.74

0

10

20

30

40

50

60

70

0 10 20 30 40 50

Y = 26.46 + (0.55)X

Page 136: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The equation for the least squares line

Let

n

iixx xxS

1

2

n

iiyy yyS

1

2

n

iiixy yyxxS

1

Page 137: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

n

x

xxxS

n

iin

ii

n

iixx

2

1

1

2

1

2

n

yx

yx

n

ii

n

iin

iii

11

1

n

y

yyyS

n

iin

ii

n

iiyy

2

1

1

2

1

2

n

iiixy yyxxS

1

Computing Formulae:

Page 138: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Then the slope of the least squares line can be shown to be:

n

ii

n

iii

xx

xy

xx

yyxx

S

Sb

1

2

1

Page 139: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

and the intercept of the least squares line can be shown to be:

xS

Syxbya

xx

xy

Page 140: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

The following data showed the per capita consumption of cigarettes per month (X) in various countries in 1930, and the death rates from lung cancer for men in 1950. TABLE : Per capita consumption of cigarettes per month (Xi) in n = 11 countries in 1930, and the death rates, Yi (per 100,000), from lung cancer for men in 1950. 

Country (i) Xi Yi

Australia 48 18Canada 50 15Denmark 38 17Finland 110 35Great Britain 110 46Holland 49 24Iceland 23 6Norway 25 9Sweden 30 11Switzerland 51 25USA 130 20

 

Page 141: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Iceland

NorwaySweden

DenmarkCanada

Australia

HollandSwitzerland

Great Britain

Finland

USA

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120 140

Per capita consumption of cigarettes

deat

h ra

tes

from

lung

can

cer

(195

0)

Page 142: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Iceland

NorwaySweden

DenmarkCanada

Australia

HollandSwitzerland

Great Britain

Finland

USA

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120 140

Per capita consumption of cigarettes

deat

h ra

tes

from

lung

can

cer

(195

0)

Page 143: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

404,541

2

n

iix

914,161

n

iii yx

018,61

2

n

iiy

Fitting the Least Squares Line

6641

n

iix

2261

n

iiy

Page 144: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

55.1432211

66454404

2

xxS

73.1374

11

2266018

2

yyS

82.3271

11

22666416914 xyS

Fitting the Least Squares Line

First compute the following three quantities:

Page 145: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Computing Estimate of Slope and Intercept

288.055.14322

82.3271

xx

xy

S

Sb

756.611

664288.0

11

226

xbya

Page 146: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Iceland

NorwaySweden

DenmarkCanada

Australia

HollandSwitzerland

Great Britain

Finland

USA

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120 140

Per capita consumption of cigarettes

deat

h ra

tes

from

lung

can

cer

(195

0)

Y = 6.756 + (0.228)X

Page 147: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Interpretation of the slope and intercept

1. Intercept – value of Y at X = 0.– Predicted death rate from lung cancer

(6.756) for men in 1950 in Counties with no smoking in 1930 (X = 0).

2. Slope – rate of increase in Y per unit increase in X.

– Death rate from lung cancer for men in 1950 increases 0.228 units for each increase of 1 cigarette per capita consumption in 1930.

Page 148: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Age Class 30-40 40-50 50-60 60-70 70-80Mipoint Age (X) 35 45 55 65 75Median BP (Y) 114 124 143 158 166

Example: In the following example both blood pressure and age were measure for each female subject. Subjects were grouped into age classes and the median Blood Pressure measurement was computed for each age class. He data are summarized below:

Page 149: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

125,161

2

n

iix

155,401

n

iii yx

341,1011

2

n

iiy

Fitting the Least Squares Line

2751

n

iix

7051

n

iiy

Page 150: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

10005

27516125

2

xxS

1936

5

705101341

2

yyS

1380

5

70527540155 xyS

Fitting the Least Squares Line

First compute the following three quantities:

Page 151: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Computing Estimate of Slope and Intercept

38.11000

1380

xx

xy

S

Sb

1.655

275380.1

5

705

xbya

Page 152: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

0

20

40

60

80

100

120

140

160

180

200

0 10 20 30 40 50 60 70 80

Y = 65.1 + 1.38 X

Graph:

Page 153: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Relationship between correlation and Linear Regression

1. Pearsons correlation.

• Takes values between –1 and +1

n

ii

n

ii

n

iii

yyxx

xy

yyxx

yyxx

SS

Sr

1

2

1

2

1

Page 154: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

2. Least squares Line Y = a + bX– Minimises the Residual Sum of Squares:

– The Sum of Squares that measures the variability in Y that is unexplained by X.

– This can also be denoted by:

SSunexplained

n

iii

n

iii

n

ii bxayyyrRSS

1

2

1

2

1

2 ˆ

Page 155: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Some other Sum of Squares:

– The Sum of Squares that measures the total variability in Y (ignoring X).

n

iiTotal yySS

1

2

Page 156: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

– The Sum of Squares that measures the total variability in Y that is explained by X.

n

iiExplained yySS

1

Page 157: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

It can be shown:

(Total variability in Y) = (variability in Y explained by X) + (variability in Y unexplained by X)

n

iii

n

ii

n

ii yyyyyy

1

2

1

2

1

2 ˆˆ

lainedUnExplainedTotal SSSSSS exp

Page 158: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

It can also be shown:

= proportion variability in Y explained by X.

= the coefficient of determination

n

ii

n

ii

yy

yyr

1

2

1

2

2

ˆ

Page 159: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Further:

= proportion variability in Y that is unexplained by X.

n

ii

n

iii

yy

yyr

1

2

1

2

2

ˆ1

Page 160: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Example 

TABLE : Per capita consumption of cigarettes per month (Xi) in n = 11 countries in 1930, and the death rates, Yi (per 100,000), from lung cancer for men in 1950. 

Country (i) Xi Yi

Australia 48 18Canada 50 15Denmark 38 17Finland 110 35Great Britain 110 46Holland 49 24Iceland 23 6Norway 25 9Sweden 30 11Switzerland 51 25USA 130 20

 

Page 161: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

55.1432211

66454404

2

xxS

73.1374

11

2266018

2

yyS

82.3271

11

22666416914 xyS

Fitting the Least Squares Line

First compute the following three quantities:

Page 162: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Computing Estimate of Slope and Intercept

288.055.14322

82.3271

xx

xy

S

Sb

756.611

664288.0

11

226

xbya

Page 163: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Computing r and r2

737.0

73.137455.14322

82.3271

yyxx

xy

SS

Sr

544.0737.0 22 r

54.4% of the variability in Y (death rate due to lung Cancer (1950) is explained by X (per capita cigarette smoking in 1930)

Page 164: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Iceland

NorwaySweden

DenmarkCanada

Australia

HollandSwitzerland

Great Britain

Finland

USA

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100 120 140

Per capita consumption of cigarettes

deat

h ra

tes

from

lung

can

cer

(195

0)

Y = 6.756 + (0.228)X

Page 165: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

Comments• Correlation will be +1 or -1 if the data lies on a

straight line.

• Correlation can be zero or close to zero if the data is either– Not related or– In some situations non-linear

0

0.5

1

1.5

2

2.5

3

3.5

-1.5 -1 -0.5 0 0.5 1 1.5

Page 166: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

ExampleThe data

X Y

1.00 4.001.40 2.561.80 1.442.20 0.642.60 0.163.00 0.003.40 0.163.80 0.644.20 1.444.60 2.565.00 4.00

S xx = 17.6, S yy = 21.9648, S xy = 0

r = 0

0.00

1.00

2.00

3.00

4.00

0.00 1.00 2.00 3.00 4.00 5.00 6.00

Page 167: Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many

One should be careful in interpreting zero correlation.It does not necessarily imply that Y is not related to X.It could happen that Y is non-linearly related to X.One should plot Y vs X before concluding that Y is not related to X.