49
Measures of Measures of Variation Variation

Measures of Variation. For discrete variables, the Index of Qualitative Variation

Embed Size (px)

Citation preview

Measures of VariationMeasures of Variation

For discrete variables,

the Index of Qualitative Variation

Religious Preference Percent

Protestant 65.6Catholic 24.2Jewish 2.3Other 1.2None 6.1No Answer 0.5

KK

pIQV

K

ii

/)1(

11

2

where p = proportion of casesand K = # of categories

Religious Preference Percent Proportion

Protestant 65.6 0.656Catholic 24.2 0.242Jewish 2.3 0.023Other 1.2 0.012None 6.1 0.061No Answer 0.5 0.005

Religious Preference Percent Proportion Proportion2

Protestant 65.6 0.656 0.430Catholic 24.2 0.242 0.059Jewish 2.3 0.023 0.001Other 1.2 0.012 0.000None 6.1 0.061 0.004No Answer 0.5 0.005 0.000

Religious Preference Percent Proportion Proportion2

Protestant 65.6 0.656 0.430Catholic 24.2 0.242 0.059Jewish 2.3 0.023 0.001Other 1.2 0.012 0.000None 6.1 0.061 0.004No Answer 0.5 0.005 0.000

i

K

1

0 494.

KK

pIQV

K

ii

/)1(

11

2

IQV = (1 - 0.494) / [(6 - 1) / 6] = (0.506) / (5 / 6)

= (0.506) / (0.833) = 0.61

What does this mean?

When there is perfect dispersion, IQV = 1.00

When there is no dispersion, IQV = 0.00

Religious Preference Percent Proportion Proportion2

Protestant 16.67 0.1667 0.0279 Catholic 16.67 0.1667 0.0279 Jewish 16.67 0.1667 0.0279 Other 16.67 0.1667 0.0279 None 16.67 0.1667 0.0279 No Answer 16.67 0.1667 0.0279

IQV = (1 - 0.1674) / [(6 - 1) / 6]

= (0.833) / (5 / 6) = (0.833) / (0.833)

= 1.00

1674.01

K

i

Religious Preference Percent Proportion Proportion2

Protestant 100.00 1.00 1.00 Catholic 0.00 0.00 0.00 Jewish 0.00 0.00 0.00 Other 0.00 0.00 0.00 None 0.00 0.00 0.00 No Answer 0.00 0.00 0.00

IQV = (1 - 1.000) / [(6 - 1) / 6] = (0.000) / (5 / 6) = (0.000) / (0.833)

= 0.00

000.11

K

i

For continuous variables,

1. range 2. interquartile range

3. standard deviation 4. variance

The Range

The distance across 100% of scores

Range = H – L + 1

For example, take the following 12 values (N = 12):

5, 2, 27, 32, 3, 5, 35, 7, 31, 42, 37, 39

To determine any of the so-called quantile statistics such as the range, the scores first must be ranked or ordered, here in descending order: 

1st 4239373532312715 7 5

312th 2

[42.5] 1st 42

39373532312715 7 5

312th 2

[1.5]

Range = 42 – 2 + 1 = 41.0

The Interquartile Range

The distance across the middle 50% of scores

IQR = Q3 – Q1

1st 42 2nd 39 3rd 37 4th 35 5th 32 6th 31 7th 27 8th 15 9th 7

10th 5 11th 3

12th 2

Univariate and EDA Statistics  PPD 404   Stem Leaf # Boxplot 7 9 1 7 6 6 5 5 4 4 3 3 4 1 * 2 8 1 * 2 1 59 2 0 1 2 1 | 0 555556666777778889 18 +--+--+ 0 111111111111111111111111222222333344444 39 *-----* ----+----+----+----+----+----+----+---- Multiply Stem.Leaf by 10**+3

1st 42 2nd 39 3rd 37

------------------------------- 4th 35 5th 32 6th 31 7th 27 8th 15 9th 7

-------------------------------10th 5

11th 312th 2

1st 42 2nd 39 3rd 37

------------------------------- Q3

4th 35 5th 32 6th 31 7th 27 8th 15 9th 7

------------------------------- Q1

10th 5 11th 3

12th 2

1st 42 2nd 39 3rd 37

------------------------------- Q3 = (37.5 + 34.5)/2 = 36.0 4th 35

5th 32 6th 31 7th 27 8th 15 9th 7

------------------------------- Q1 = (7.5 + 4.5)/2 = 6.00 10th 5 11th 3

12th 2

1st 42 2nd 39 3rd 37

------------------------------- Q3 = (37.5 + 34.5)/2 = 36.0 4th 35

5th 32 6th 31 7th 27 8th 15 9th 7

------------------------------- Q1 = (7.5 + 4.5)/2 = 6.00 10th 5 11th 3

12th 2

IQR = Q3 – Q1 = 36.0 – 6.0 = 30.0

The Standard Deviation

YYi

33 19.333 13.66727 19.333 7.66719 19.333 -0.33314 19.333 -5.33312 19.333 -7.33311 19.333 -8.333

0.000 (0.003)

iY Y YYi

N

ii YY

1

The sum of the deviationswill always be zero

(except for rounding error)

The Sum of the Deviations

1—2—3—4—5^

Mean = 3.0- 2 ————— +2

- 1——— +10

Sum = (-2) + (+2) + (-1) + (+1) + (0)= 0.0

33 19.333 13.667 186.78727 19.333 7.667 58.78319 19.333 -0.333 0.11114 19.333 -5.333 28.44112 19.333 -7.333 53.77311 19.333 -8.333 69.439

= 0.000 = 397.334

iY Y YYi 2YYi

The Variance

1

1

2

2

N

YYs

N

ii

Y

33 19.333 13.667 186.78727 19.333 7.667 58.78319 19.333 -0.333 0.11114 19.333 -5.333 28.44112 19.333 -7.333 53.77311 19.333 -8.333 69.439

= 0.000 = 397.334

sy2 = 397.334 / (6 - 1) = 79.467

iY Y YYi 2YYi

The standard deviation

Simply the square root of the variance

sY = 8.914

1

1

2

N

YYs

N

ii

Y

Z-scorespure numbers with mean of 0.0 and standard deviation of 1.00

z1 = (68 - 70.0) / 6.45 = (-2.00) / 6.45 = - 0.31

z1 = (68 - 70.0) / 12.88 = (-2.00) / 12.88 = - 0.16

Y

ii s

YYz

Using SAS to Produce Z-Scores  libname old 'a:\';libname library 'a:\'; options ps=66 nodate nonumber; data temp1;set old.cities;popstd=populat;run; proc standard data=temp1 mean=0.0 std=1.0 out=temp2;var popstd;run; proc print data=temp2;id populat;var popstd;title1 'Z-Scores Produced by PROC STANDARD';title2;title3 'PPD 404';run;

Z-Scores Produced by PROC STANDARD  PPD 404  POPULAT POPSTD  275 -0.28030 116 -0.42296 127 -0.41309 497 -0.08112 117 -0.42206 301 -0.25698 82 -0.45347 641 0.04808 453 -0.12060 100 -0.43732 241 -0.31081 82 -0.45347 101 -0.43642 72 -0.46244 393 -0.17443 86 -0.44988 175 -0.37002 68 -0.46603 108 -0.43014

libname mydata 'a:\';libname library 'a:\'; options ps=66 nodate nonumber;

proc univariate data=mydata.cities;var populat;title1 'Univariate Statistics';run;

Univariate Statistics  PPD 404  Univariate Procedure Variable=POPULAT NUMBER OF RESIDENTS, IN 1,000S  Moments  N 63 Sum Wgts 63 Mean 587.4127 Sum 37007 Std Dev 1114.554 Variance 1242231 Skewness 5.090201 Kurtosis 30.74326 USS 98756687 CSS 77018305 CV 189.7395 Std Mean 140.4206 T:Mean=0 4.183237 Pr>|T| 0.0001 Num ^= 0 63 Num > 0 63 M(Sign) 31.5 Pr>=|M| 0.0001 Sgn Rank 1008 Pr>=|S| 0.0001 W:Normal 0.468356 Pr<W 0.0001

Quantiles(Def=5)  100% Max 7896 99% 7896 75% Q3 641 95% 1949 50% Med 278 90% 906 25% Q1 100 10% 72 0% Min 56 5% 60 1% 56 Range 7840 Q3-Q1 541 Mode 56   Extremes  Lowest Obs Highest Obs 56( 30) 1511( 56) 56( 24) 1949( 55) 58( 46) 2816( 54) 60( 21) 3367( 53) 65( 51) 7896( 52)

Calculate the INDEX OF QUALITATIVE VARIATION for the data in the following table.  =============================================================== Service Branch Frequency P P2

 --------------------------------------------------------------- Air Force 56Army 166Marine Corps 14Merchant Marines 1Navy 70

-------Total 307 ---------------------------------------------------------------

===============================================================

Service Branch Frequency P P2

 --------------------------------------------------------------- Air Force 56 0.182 0.033Army 166 0.541 0.292Marine Corps 14 0.046 0.002Merchant Marines 1 0.003 0.000Navy 70 0.228 0.052

--- ------Total 307 0.379 ---------------------------------------------------------------  

INDEX OF QUALITATIVE VARIATION = 0.776 

5/15

379.01

IQV

5/4

621.0IQV

800.0

621.0IQV

776.0IQV

Here are data once again from 16 European countries. ==============================================================================

Gross Domestic Percent in Crude Birth Nation Product (GDP) Agriculture Rate per

(in billion$) 1,000

------------------------------------------------------------------------------ Austria 3 18 18Belgium 4 7 16Denmark 6 23 18Finland 7 38 17France 8 25 18Germany 112 8 17Great Britain 98 5 18Greece 9 48 18Ireland 10 42 22Italy 17 24 19Netherlands 18 13 18Norway 7 24 18Portugal 4 48 23Spain 18 36 21Sweden 20 18 16Switzerland 14 15 19------------------------------------------------------------------------------ 

What is the RANGE for the PERCENT IN AGRICULTURE?

What is the INTERQUARTILE RANGE for the PERCENT IN AGRICULTURE?

First, rank the values in descending order. Find the difference between the HIGHEST and LOWEST values (and add 1). 

48484238362524242318181513 8 7 5

  

RANGE = H – L + 1 = 48 – 5 + 1 = 44.0

Having ranked the values in descending order, determine the value at the location dividing the upper 4 values from the lower 12 values. Then determine the value at the location dividing the upper 12 values from the lower 4 values. Find the difference between these two values. 

48484238-- Q3 = (38.5 + 35.5) / 2 = 37.03625242423181815-- Q1 = (15.5 + 12.5) / 2 = 14.013 8 7 5

  

IQR = Q3 – Q1 = 37.0 – 14.0 = 23.0

 ==============================================================================

Gross Domestic Percent in Crude Birth Nation Product (GDP) Agriculture Rate per

(in billion$) 1,000

------------------------------------------------------------------------------ Austria 3 18 18Belgium 4 7 16Denmark 6 23 18Finland 7 38 17France 8 25 18Germany 112 8 17Great Britain 98 5 18Greece 9 48 18Ireland 10 42 22Italy 17 24 19Netherlands 18 13 18Norway 7 24 18Portugal 4 48 23Spain 18 36 21Sweden 20 18 16Switzerland 14 15 19------------------------------------------------------------------------------ 

What is the STANDARD DEVIATION for GDP?

What is Germany’s Z-SCORE for GDP?

First, determine the value of the mean.

16

355Y

1875.22Y

Next, determine the deviations and squared deviations for each value. 

3 19.1875 368.16024 -18.1875 330.78526 -16.1875 262.03527 -15.1875 230.66028 -14.1875 201.2852

112 89.8125 8066.285 98 75.8125 5747.535

9 -13.1875 173.9102 10 -12.1875 148.5352 17 -5.1875 26.91016 18 -4.1875 17.53516

7 -15.1875 230.66024 -18.1875 330.7852

18 -4.1875 17.53516 20 -2.1875 4.785156 14 -8.1875 67.03516  355 16224.44

1

1

2

N

YYs

N

ii

Y

44.224,162

1

N

ii YY

15

44.16224Ys

629.1081Ys

8881.32Ys

What is Germany’s Z-SCORE for GDP?Germany’s GDP = 112Mean GDP = 22.188

Y

ii s

YYz

888.32

188.22112 iz

731.2iz