48

Measures of Variation

Embed Size (px)

Citation preview

Used to determine the scatter of values in a distribution. In this chapter, we will consider the six measures of variation: the range, quartile deviation, mean deviation, variance, standard deviation and the coefficient of variation

Range

o RangeThe difference between the highest and

lowest values in the distribution.

RANGE = H - LWhere: H= represents the highest value

L = represents the lower value 

 Ungrouped DataSubtract the lowest score from the highest

score.

Example: Find the range of distribution if the highest score is 100 and the lowest score is 21.Solution:Range = highest score- lowest score

= 100-21= 79

Grouped DataTo find the range for a frequency

distribution, just get the differences between the upper limit of the highest score and the lower limit of the lowest class interval

Example: Find the range for the frequency distribution Class interval

Frequency100-104 4105-109 6 110-114 10 115-119 13120-124 8125-129 6130-134 3

N= 50 

Range= Highest Class Upper Limit- Lowest Class Lower Limit

=134.5-99.5=35

Quartile Deviationsand

Mean Deviations

oQuartile Deviations 

Is a measure that describes the existing dispersion in terms of the distance selected observation points. The smaller the quartiles deviation, the greater the concentration in the middle half if the observation in the data set.  Are measures of variation which uses percentiles, deciles, or quartiles.  Quartile Deviation (QD) means the semi variation between the upper quartiles (Q3) and lower quartiles (Q1) in a distribution. Q3 - Q1 is referred as the interquartile range.

Formula:  QD = Q3 - Q1/2 

where   and   are the first and third quartiles and   is the interquartile range.

 

A. Ungrouped Data Example: given the data below

33 52 58 41

56 71 77

74 85 45

82 50 62

51 67 79

48 83 43

81 38 79

65 68 59

Solution: Arrange the 25 entries from lowest to highest.

33 38 41- 3rd entry 43 45 (n= 25)

48- 6th entry 50 51 52 56

79 81 82-23rd entry 83 85

68 71 74 77- 19th entry 79

58 59 62 65 67

A. For semi- inter quartile range Since Q3=P75 and Q1= P25 we use P75 and P25 for P75:

Cum. Freq. of P75 = x = 18.75 or 19

This means that P75 is the 19th entry Therefore, P75 = 77

For P25

Cum. Freq. of P25= . 25=6.6 or which means that P25 is entry 6th

P25= 48

But semi interquartile range= = =

Semi-interquartile range= = = or =

Hence semi interquartile range = 14.5

A. Group Data Example:

Class Intervals

f

<cf

21-23

24-26

3 4

3

7

27-29

6

13

30-32

10

23

33-35

5

28

36-38

2

n=30

30

Solution: Note that Q3-Q1= P75-P25 For P75

Cum freq. of P75 = x 75= 22.5 or 22

L= 29.5 f= 10 F=13, c=3 j= 75

P75= 32.35 For P25

Cum freq. of P25= x 25= 7.5 or 8

L= 26.5 f= 6 F=7, c=3 j= 25 P25= 26.75 Finally the interquartile range is P75-P25= 32.35-26.75= 5.6

o Mean Deviation

The mean deviation or average deviation is the arithmetic mean of the absolute deviations and is denoted by .

Example: Calculate the mean deviation of the following distribution: 9, 3, 8, 8, 9, 8, 9, 18

Mean Deviation for Grouped Data If the data is grouped in a frequency table, the expression of the mean deviation is:

Example: Calculate the mean deviation of the following distribution:

xi fi xi · fi |x - x| |x - x| · fi [10, 15) 12.5 3 37.5 9.286 27.858 [15, 20) 17.5 5 87.5 4.286 21.43 [20, 25) 22.5 7 157.5 0.714 4.998 [25, 30) 27.5 4 110 5.714 22.856 [30, 35) 32.5 2 65 10.714 21.428

21 457.5 98.57

Variance

In probability theory  and statistics variance measures how far a set of numbers is spread out. A variance of zero indicates that all the values are identical. Variance is always non-negative: a small variance indicates that the data points tend to be very close to the mean expected value and hence to each other, while a high variance indicates that the data points are very spread out around the mean and from each other.

It is important to distinguish between the variance of a population and the variance of a sample. They have different notation, and they are computed differently.

The variance of a population is denoted by σ2; and the variance of a sample, by s2.

The variance of a population is defined by the following formula:

σ2 = Σ ( Xi - X )2 / N

where σ2 is the population variance, X is the population mean, Xi is the ith element from the population, and N is the number of elements in the population.

The variance of a sample is defined by slightly different formula:

s2 = Σ ( xi - x )2 / ( n - 1 )

where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the variance of the sample is an unbiased estimate of the variance of the population.

For example, suppose you want to find the variance of scores on a test. Suppose the scores are 67, 72, 85, 93 and 98.

Write down the formula for variance: σ2 = ∑ (x-µ)2 / N There are five scores in total, so N = 5. σ2 = ∑ (x-µ)2 / 5

The formula will look like this: σ2 = [ (-16)2+(-11)2+(2)2+(10)2+(15)2] / 5

Then, square each paranthesis. We get 256, 121, 4, 100 and 225.

This is how:σ2 = [ (-16)x(-16)+(-11)x(-11)+(2)x(2)+(10)x(10)+(15)x(15)] / 5σ2 = [ 16x16 + 11x11 + 2x2 + 10x10 + 15x15] / 5

which equals:σ2 = [256 + 121 + 4 + 100 + 225] / 5

The mean (µ) for the five scores (67, 72, 85, 93, 98), so µ = 83.

σ2 = ∑ (x-83)2 / 5 Now, compare each score (x = 67, 72, 85, 93,

98) to the mean (µ = 83) σ2 = [ (67-83)2+(72-83)2+(85-83)2+(93-83)2+(98-83)2 ] / 5 Conduct the subtraction in each parenthesis. 67-83 = -16

72-83 = -1185-83 = 293-83 = 1098 - 83 = 15

Then summarize the numbers inside the brackets:

      σ2 = 706 / 5 To get the final answer, we divide the sum by

5 (Because it was five scores). This is the variance for the dataset:

        σ2 = 141.2 

Standard Deviation and Coefficient of

Variation

The Standard Deviation is a measure of how spread out numbers are.The symbol for Standard Deviation is σ (the Greek letter sigma).This is the formula for Standard Deviation:

Say we have a bunch of numbers like 9, 2, 5, 4, 12, 7, 8, 11.To calculate the standard deviation of those numbers:

1. Work out the Mean (the simple average of the numbers)2. Then for each number: subtract the Mean and square the result3. Then work out the mean of those squared differences.4. Take the square root of that and we are done!First, let us have some example values to work on:Example: Sam has 20 Rose Bushes.The number of flowers on each bush is9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4

o STANDARD DEVIATION

Work out the Standard Deviation.

 Step 1. Work out the meanIn the formula above μ (the greek letter "mu") is the mean of all our values ...Example: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4The mean is:9+2+5+4+12+7+8+11+9+3+7+4+12+5+4+10+9+6+9+420= 14020 = 7So: μ = 7

Step 2. Then for each number: subtract the Mean and square the resultThis is the part of the formula that says:So what is xi ? They are the individual x values 9, 2, 5, 4, 12, 7, etc...In other words x1 = 9, x2 = 2, x3 = 5, etc.

So it says "for each value, subtract the mean and square the result", like thisExample (continued):

(9 - 7)2 = (2)2 = 4(2 - 7)2 = (-5)2 = 25(5 - 7)2 = (-2)2 = 4(4 - 7)2 = (-3)2 = 9(12 - 7)2 = (5)2 = 25(7 - 7)2 = (0)2 = 0(8 - 7)2 = (1)2 = 1... etc ...

Step 3. Then work out the mean of those squared differences.To work out the mean, add up all the values then divide by how many.First add up all the values from the previous step.But how do we say "add them all up" in mathematics? We use "Sigma": ΣThe handy Sigma Notation says to sum up as many terms as we want:

We already calculated (x1-7)2=4 etc. in the previous step, so just sum them up:= 4+25+4+9+25+0+1+16+4+16+0+9+25+4+9+9+4+1+4+9 = 178But that isn't the mean yet, we need to divide by how many, which is simply done by multiplying by "1/N":

We want to add up all the values from 1 to N, where N=20 in our case because there are 20 values:Example (continued):Which means: Sum all values from (x1-7)2 to (xN-7)2

Step 4. Take the square root of that:Example (concluded):

Example (continued):

Mean of squared differences = (1/20) × 178 = 8.9(Note: this value is called the "Variance")

σ = √(8.9) = 2.983...Sample Standard Deviation

Sometimes our data is only a sample of the whole population.Example: Sam has 20 rose bushes, but what if Sam only counted the flowers on 6 of them?The "population" is all 20 rose bushes,and the "sample" is the 6 he counted. Let us say they are:9, 2, 5, 4, 12, 7

We can still estimate the Standard Deviation.

Step 4. Take the square root of that:Example (concluded):

But when we use the sample as an estimate of the whole population, the Standard Deviation formula changes to this:The formula for Sample Standard Deviation:

The important change is "N-1" instead of "N" (which is called "Bessel's correction").The symbols also change to reflect that we are working on a sample instead of the whole population:

The mean is now x (for sample mean) instead of μ (the population mean),And the answer is s (for Sample Standard Deviation) instead of σ.But that does not affect the calculations. Only N-1 instead of N changes the calculations.

OK, let us now calculate the Sample Standard Deviation:Step 1. Work out the mean

Example 2: Using sampled values 9, 2, 5, 4, 12, 7The mean is (9+2+5+4+12+7) / 6 = 39/6 = 6.5So: x = 6.5

Step 2. Then for each number: subtract the Mean and square the resultExample 2 (continued):

(9 - 6.5)2 = (2.5)2 = 6.25(2 - 6.5)2 = (-4.5)2 = 20.25(5 - 6.5)2 = (-1.5)2 = 2.25(4 - 6.5)2 = (-2.5)2 = 6.25(12 - 6.5)2 = (5.5)2 = 30.25(7 - 6.5)2 = (0.5)2 = 0.25

Step 3. Then work out the mean of those squared differences.

To work out the mean, add up all the values then divide by how many.But hang on ... we are calculating the Sample Standard Deviation, so instead of dividing by how many (N), we will divide by N-1

Example 2 (continued):

Sum = 6.25 + 20.25 + 2.25 + 6.25 + 30.25 + 0.25 = 65.5Divide by N-1: (1/5) × 65.5 = 13.1

(This value is called the "Sample Variance")

Step 4. Take the square root of that:Example 2 (concluded):

s = √(13.1) = 3.619...

Comparing

When we used the whole population we got: Mean = 7, Standard Deviation = 2.983...

When we used the sample we got: Sample Mean = 6.5, Sample Standard Deviation = 3.619...

Our Sample Mean was wrong by 7%, and our Sample Standard Deviation was wrong by 21%.

Why Would We Take a Sample?  Mostly because it is easier and cheaper.

Imagine you want to know what the whole country thinks ... you can't ask millions of people, so instead you ask maybe 1,000 people.

"You don't have to eat the whole ox to know that the meat is tough." 

This is the essential idea of sampling. To find out information about the population (such as mean and standard deviation), we do not need to look at all members of the population; we only need a sample.  But when we take a sample, we lose some accuracy.

 Summary

The Population Standard Deviation:

The Sample Standard Deviation:

oCoefficient of Variation (CV)

Refers to a statistical measure of the distribution of data points in a data series around the mean. It represents the ratio of the Standard Deviation to the mean. The coefficient of variation is a helpful statistic in comparing the degree of variation from one data series to the other, although the means are considerably different from each other.

The CV enables the determination of assumed volatility as compared to the amount of return expected from an investment. Putting it simple, a lower ratio of standard deviation to mean return indicates a better risk-return trade off. 

Coefficient of Variation Formula

Coefficient of Variation is expressed as the ratio of standard deviation and mean. It is often abbreviated as CV. Coefficient of variation is the measure of variability of the data. When the value of coefficient of variation is higher, it means that the data has high variability and less stability. When the value of coefficient of variation is lower, it means the data has less variability and high stability.The formula for coefficient of variation is given below:

Coefficient of Variation = Standard Deviation Mean

Question: find the coefficient of variation of 5, 10, 15, 20?

Formula for the mean: x = ∑x n

  x = 50 = 12.5 4

 x  x−x¯ (x−x )¯ 2

5  -7.5  56.25 

10  -2.5  6.25 

15  2.5  6.25 

20  7.5  56.25 

∑x = 50    ∑(x−x )¯ 2 = 125

Formula for population standard deviation:

S= √ ∑(x−x¯)2 n= √125 4=5.59

Coefficient of variation= standard deviation mean

= 5.59 12.5 = 0.447