18
UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution Mean, Median and mode coincide , skewness is a measure to study the aspect of a statistical distribution. If adistribution is not symmetrical,we say that it is skewed. (ii) kurtosis: Kurtosis is a measure of fitness or peakness of a distribution. (iii) Pearsons coefficient of skewness = When Mode is not well defined (iv) Pearsons coefficient of skewness = () . Bowley’s formula for measuring skewness. Bowleys coefficient of skewness= 1. In a distribution mean=65,median=70 and the coefficient of skewness is -0.6. Find the coefficient of variation. Solution: () -0.6 = () = () = =25 Coefficient variation = = 2. In a distribution the sum of the two quartiles is 78.2 and their difference is 14.3 and if it’s median is 35.7 Find the coefficient of skewness Solution: Given =78.2 =14.3 Median M=35.7 Coefficient of skewness= = =0.4755 3. Pearson’s coefficient of -0.7 and the value of the median and standard deviation are 12.8 and 6 respectively. Estimate the value of mean. Solution:

UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

  • Upload
    others

  • View
    38

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

UNIT-II

SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis.

skewness In a perfectly symmetrical distribution Mean, Median and mode coincide , skewness is a measure to study the aspect of a statistical distribution. If adistribution is not symmetrical,we say that it is skewed. (ii) kurtosis: Kurtosis is a measure of fitness or peakness of a distribution.

(iii) Pearsons coefficient of skewness =

When Mode is not well defined

(iv) Pearsons coefficient of skewness = ( )

.

Bowley’s formula for measuring skewness.

Bowleys coefficient of skewness=

1. In a distribution mean=65,median=70 and the coefficient of skewness is

-0.6. Find the coefficient of variation.

Solution: ( )

-0.6 = ( )

= ( )

=

=25

Coefficient variation =

=

2. In a distribution the sum of the two quartiles is 78.2 and their difference is 14.3 and if it’s median is 35.7 Find the coefficient of skewness Solution: Given =78.2 =14.3

Median M=35.7

Coefficient of skewness=

=

=0.4755

3. Pearson’s coefficient of -0.7 and the value of the median and standard deviation are 12.8 and 6 respectively. Estimate the value of mean. Solution:

Page 2: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

Pearsons coefficient of skewness =-0.7,Median=12.8,S,D=6

( )

- 0.7= ( )

-1.4=Mean-Median

-1.4 = Mean-12.8 Mean=12.8-1.4 Mean=11.4

4. In a frequency distribution,the coefficient of skewness based upon quaetiles is 0.6.If the sum of the upper and lower quartiles is 100 and the median 38,Estimate the value of the upper quartile. Solution: =0.6, =100 ,M=38

=

( )

0.6 = ( )

=

( ) ---( ) Adding 1&2 2 =140 ( ) 5.Find the coefficient of skewness,If difference between two quartiles is equal to 8,sum of two quartiles is 22 and median is 10.5.

Solution: Given =22, =8 ,h=10.5

=

( )

=

=

=0.125

6. Calculate the coefficient of variation,if Karl Pearson’s coefficient of skewness is 0.42,mean is 86,and median is 80.

Solution: Given ,pearsons coefficient of Skewness =0.42

Mean=86,Median=80. S.K = ( )

⇒0.42= ( )

=> =

=42.857

Coefficient of variation =

x 100 =

7. The first four central moments of a distribution are 0,2.5,0.7 and 8.75.Write the skewness and kurtosis of the distribution.

Solution: The coefficient of skewness is given by

=

( )

( ) ,Since is positive ,the distribution is

positively Skewed.

Page 3: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

The measure of kurtosis is given by =

=

( ) =

=3

Since =3 the distribution is normal. 8 . The Karl Pearsons coefficient of skewness of a distribution is 0.32,it’s standard deviation is 6.5 and the mean is 29.6.Calculate the mode and the median.(L3)

Solution: =0.32, =6.5 ,Mean =29.6

S.K = ( )

=> 0.32=

( )

=> 0.32x6.5 =88.8 -3 Median =>3 Median =-2.08+88.8 =86.72

Median =

=28.90

Mean-Mode=3(Mean-Median) 29.6-Mode =3(29.6-28.90) =3(0.7) =2.1 Mode=29.6-2.1=27.5

9. Compute the first four central moments for the following data 8, 10,11,12,14. (L3)

Solution:

=

=

=11

x x- ( ) ( ) ( ) 8 10 11 12 14 55

-3 -1 0 1 3 0

9 1 0 1 9 20

-27 -1 0 1 27 56

81 1 0 1 81 144

The four central moments are

Page 4: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

∑( )

=

=0,

∑(( ) )

=

=4 , =

∑(( ) )

=

=11.2

=∑(( ) )

=

=32.8

10. The first three moments of a distribution about are 2,10 and -30. Find the value of (L1)

Solution: About the value x=3,

=2 , =10,

= -

=10-4=6,

=-30-3( )( )+2( ) =-30-60+16=-74

( )

Pearsons coefficient of skewness=

1.Calculate Karl Pearson’s coefficient of skewness. (L3)

Solution:

Marks Mid value

F d fd f

Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 No.of candidates

10 15 24 25 10 10 6

Page 5: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

0-10 10-20 20-30 30-40 40-50 50-60 60-70

5 15 25 35 45 55 65

10 15 24 25 10 10 6

-3 -2 -1 0 1 2 3

-30 -30 -24 0 10 20 18

90 60 24 0 10 40 64

A=35,d=

, =A+

-

Mode =l +

( )

( )

=30+

=30.625

√∑

(∑

)

=√

(

) =30.625

Coefficient of skewness=

=

=0.0476

1. Calculate the Pearson’s coefficient of skewness for the following data (L3)

Solution:

class Mid value F d fd f

Class 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 frequency 5 9 14 20 25 15 8 4

Page 6: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

9.5-19.5 19.5-29.5 29.5-39.5 39.5-49.5 49.5-59.5 59.5-69.5 69.5-79.5 79.5-89.5

14.5 24.5 34.5 44.5 55.5 65.5 75.5 85.5

5 9 14 20 25 15 8 4

3 -2 -1 0 1 2 3 4

-15 -18 -14 0 25 30 24 16

45 36 14 0 25 60 72 64

Let A=44.6 ;c=10 ,d=

Mean = A+∑

=44.5+

√∑

(∑

)

=√

(

)

=√ =17.12

Mode = l +

( ) =49.5+

( )

=49.5+

Pearsons coefficient of skewness=

2. Calculte the pearsons coefficient of skewness for the following data (L3) Class 3-7 8-12 13-17 18-22 23-27 28-32 33-37 38-42

Page 7: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

frequency 2 108 580 175 80 32 18 5

Solution:

class Mid value f d fd f

2.5-7.5 7.5-12.5 12.5-17.5 17.5-22.5 22.5-27.5 27.5-32.5 32.5-37.5 37.5-42.5

5 10 15 20 25 30 35 40

2 108 58 175 80 32 18 5

-3 -2 -1 0 1 2 3 4

-6 -216 -580 0 80 64 54 20

18 512 580 0 80 128 162 80

TOTAL 1000 584 1560

A=20 d=

Mean = A+∑

=20+

Mode = l +

( ) =15+

( )

=15+

+15=17.69

S √∑

(∑

)

(

) =5.52

Pearsons coefficient of skewness=

4. Calculate Pearson’s coefficient of skewness for the following data (L3)

Size 7 8 9 10 11 12 13 14

Frequency 2 11 36 64 39 39 22 2 Solution:

Page 8: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

This is a discrete data. Maximum frequency corresponds to x=10

X f d fd f

7 8 9 10 11 12 13 14

2 11 36 64 39 39 22 2

-3 -2 -1 0 1 2 3 4

-6 -22 -36 0 39 60 66 8

18 44 36 0 39 120 198 32

Mode =10 ,let A=10,d=x-10

Mean = A+∑

=10+

S √∑

(∑

) =√

(

)

Pearsons coefficient of skewness=

Bowley’s coefficient of Skewness =

5. Calculate Bowleys coefficient of skewness for the following data. ((L4) Weight(in kgs) 40 50 60 70 80 90

No.of persons 185 167 132 82 38 12 Solution:

More than No of persons class f Cf

40 50 60 70 80 90

185 167 132 82 38 12

40-50 50-60 60-70 70-80 80-90 90and above

18 35 50 44 26 12

18 53 103 147 173 185

Page 9: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

Median = l +

60 +

= 67.9

=

= 50 +

= 58.07

= 70 +

= 78.125

Bowley’s coefficient of Skewness =

=

( )

= 19.61

MOMENTS

∑ ( )

∑ ( )

∑ ( )

∑ ( )

6. Calculate the first four central moments for the following frequency distribution. (L3)

X 0 1 2 3 4 5 6 7 8

F 1 8 28 56 70 56 28 8 1 Solution:

X f D ( ) ( ) ( ) ( ) 0 1 2 3 4 5 6 7 8

1 8 28 56 70 56 28 8 1

-4 -3 -2 -1 0 1 2 3 4

-4 -24 -56 -56 0 56 56 24 4

16 72 112 56 0 56 112 72 16

-64 -216 -224 -56 0 56 224 216 64

256 648 448 56 0 56 748 648 256

256 0 512 0 2616

Page 10: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

=

=

= 4

∑ ( )

=

= 0

∑ ( )

=

= 2

∑ ( )

=

= 0

∑ ( )

=

= 10.22

Since the distribution is symmetrical

7. Calculate the first four central moments for the following frequency.(L4)

Marks less than 80 70 60 50 40 30 20 10 frequency 100 90 80 60 32 20 13 5

Solution:

Marks Mid value f d Fd f f f

0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80

5 15 25 35 45 55 65 75

5 8 7 12 28 20 10 10

-4 -3 -2 -1 0 1 2 3

-20 -24 -14 -12 0 20 20 30

80 72 28 12 0 20 40 90

-320 -216 -56 -12 0 20 80 270

1280 648 112 12 0 20 160 810

100 0 392 -234 3042

Let d =

, c = 10

Page 11: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

8. Calculate the moment measure of Kurtosis from the following data (L4) X 2 4 6 8 10 12 14

Y 4 11 48 27 20 16 8 Solution:

X F d fd f f f

2 4 6 8 10 12 14

4 11 18 27 20 16 8

-3 -2 -1 0 1 2 3

-12 -22 -18 0 20 32 24

36 44 18 0 20 64 72

-108 -88 -18 0 20 128 216

324 176 36 0 20 256 648

TOTAL 104 24 254 150 1442

( )

= 11.53 – 3 9.77 0.46 + 2(0.46)3 = 11.53 – 13.4826 + 0.0973 = – 1.8553

=221.84 –21.2152+12.404 –0.1341=212.89 Measure of Kurtosis based on moments

= 2.33

Page 12: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

CORRELATION. Correlation; Let X and Y be two random variables, Correlation is the measure

of co variability taking into account for the variance of X and Y. Correlation coefficient

Let X and Y be two random variables,the correlation coefficient denoted by

,is defined by ( )

√ √

( )

Types of correlation

Types of correlation: ( i) positive and negative (ii).Simple,partial and multiple (iii)Linear,non linear.

lines of regression.

Regression is a mathematical measure of average relationship between two or more variables in terms of original limits of the data. Lines of regression: The line of regression fn y on x is given by

y-

( ).

The line of regression fn x on y is given by

( )=r

( )

` Regression coefficient. A measure of assotiation between two random variables obtained as the expected value of the product of the two random variables around their Means;that is Cov( )=E( ) –E( ) ( )

1. If two regression coefficients are 0.8 and 0.6.Find coefficient of correlation?(L1)

Solution: Given =0.8, =0.6

= =( )( )=0.48

Page 13: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

r=0.692

2. The two equations of the variable are Find the correlation coefficient between (L1)

Solution: Given that the regression equations of X&Y are X=19.13-0.87y

the regression coefficient of X onY is

The regression eqn of Y on X is the regression coefficient of YonX is

the correlation oefficient between X &Y is given by

√ = √( )( )

=

3.Calculate the coefficient of correlation between from the following data. (L3)

x 1 3 5 8 9 10

y 3 4 8 10 12 11 Solution:

x y ( ) ( ) ( )( )

1 3 5 8 9 10

3 4 8 10 12 11

-5 -3 -1 2 3 4

-5 -4 0 2 4 3

25 9 1 4 9 16

25 16 0 4 16 9

25 12 0 4 12 12

36 48 0 0 64 70 65

∑( )( )

√∑( ) √∑( )

√ √

Page 14: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

4.Calculate coefficient of correlation between . (L3)

x 1 2 3 4 5 6 7 8 9

y 12 11 13 15 14 17 16 19 18 Solution:

Y ( ) ( ) ( )( )

1 2 3 4 5 6 7 8 9

12 11 13 15 14 17 16 19 18

-4 -3 -2 -1 0 1 2 3 4

-3 -4 -2 0 -1 2 1 4 3

16 9 4 1 0 1 4 9 16

9 16 4 0 1 4 1 16 9

12 12 4 0 0 2 2 12 12

45 135 0 0 60 60 56

∑( )( )

√∑( ) √∑( ) =

√ √

5.Ten competitors in a musical test were ranked by 3 judges X,Y,Z in the following order. (L2)

A B C D E F G H I J

Rank by X 1 6 5 10 3 2 4 9 7 8 Rank by Y 3 5 8 4 7 10 2 1 6 9

Rank by Z 6 4 9 8 1 2 3 10 5 7 Using rank correlation method ,Discuss which pair of judges has the nearest approach. Solution:

X y Z

1 6 5 10

3 5 8 4

6 4 9 8

-2 1 -3 6

-3 1 -1 -4

-5 2 -4 2

4 1 9 36

9 1 1 16

25 4 16 4

Page 15: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

3 2 4 9 7 8

7 10 2 1 6 9

1 2 3 10 5 7

-4 -8 2 8 1 -1

6 8 -1 -9 1 2

2 0 1

-1 2 1

16 64 4 64 1 1

36 64 1 81 1 4

4 0 1 1 4 1

200 214 60

The rank correlation between x & y is

( ) ∑

( )

( )

( )

The rank correlation between y & z is

( ) ∑

( )

( )

( )

The rank correlation between y & z is

( ) ∑

( )

( )

( )

Since ( ) is maximum and also positive, We conclude that the pair of judges x & z has the nearest approach to common likings in music

6. From the following data, Calculate (L3) (i) The two regression equations. (ii)The coefficient of correlation between the marks in Economics and Statistics. (iii)The most likely marks in statistics when marks in Economics are 30.

Marks in Economics

25 28 35 32 31 36 29 38 34 32

Marks in Statistics

43 46 49 41 36 32 31 30 33 39

Solution:

x Y x- =x-32 y- =y-38 ( ) ( ) ( )( )

25 28 35 32 31 36

43 46 49 41 36 32

-7 -4 3 0 -1 4

5 8 11 3 -2 -6

49 16 9 0 1 16

25 64 121 9 4 36

-35 -32 33 0 2 -24

Page 16: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

29 38 34 32

31 30 33 39

-3 6 2 0

-7 -8 -5 1

9 36 4 0

49 64 25 1

21 -48 -10 0

320 380 0 0 140 398 -93

Here ∑

&

Coefficient of regression of y on x is

∑( )( )

∑( )

Coefficient of regression of x on y is

∑( )( )

∑( )

(i)Equation of the line of regression of x on y is ( )

(ie) x-32 = -0.2337(y-38) = -0.2337 y + 0.2337 38 X = -0.2337 y + 40.8806 Equation of the line of regression of y on x is ( )

(ie) y-38 = -0.6643(x-32) = -0.6643 x + 0.6643 32 y = -0.6643 x + 59.2576 (ii)Coefficient of correlation

= (-0.6643) (-0.2337) = 0.1552

r = √ (iii)When x = 30, y = ? Y = -0.6643 x + 59.2576

y = -0.6643 30 + 59.2576 y = 39.32 39

Page 17: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

7. Find the regression equation showing the regression equation of capacity utilization on production from the following data. (L2)

Average Standard deviation

Production(in lakh units) 35.6 10.5 Capacity utilization (in percentage) 84.8 8.5

r=0.62.Estimate the production when the capacity utilization is 70 percent. Solution: Let production be denoted by the variable x and capacity utilization by y Then the regression equation is given by ( ) ----------------------(1)

Where

= 0.62

= 0.5019

& = 35.6 , = 84.8 (1) y – 84.8 = 0.5019 (x-35.6)

y = 66.9324 + 0.5019 x Which is the required regression of capacity utilization on production. To find regression equation x on y is ( ) -------------------------(2)

Where

= 0.62

= 0.7659

(2) x – 35.6 = 0.7659(y-84.8) X = 35.6 + 0.7659 y – 64.9483 = 0.7659 y – 29.3483

When y = 70, x = 0.7659(70) – 29.3483 = 24.2647 Hence the estimated production is 242.647 units when the capacity utilization is 70 percent.

8. The two lines of regression are (L6) The variance of x is 9. Evaluate (i)The mean values of X and Y.

(ii)Correlation coefficient between X and Y. Solution:

(i)Since both the lines of regression passes through the mean values , The point ( ) must satisfy the two given regression lines (ie) 8 – 10 = -66 -----------------(1) 40 - 18 = 214 -----------------(2)

Page 18: UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION · UNIT-II SKEWNESS,KURTOSIS,CORRELATION ,REGRESSION (i) skewness (ii) kurtosis. skewness In a perfectly symmetrical distribution

(1)*5 40 – 50 = -330 40 – 18 = 214 -----------------------

- 32 = 544 = 17 (1) 8 - 10*17 = -66 = 13 (ii) From (1) 10 y = 8 x + 66

y =

= 0.6

Since both the regression coefficients are positive, r must be positive r = 0.6