Numerical Measures of Variability Deviations and the Spread of Data A deviation is the difference...

Preview:

Citation preview

Numerical Measures of Variability

Deviations and the Spread of DataA deviation is the difference between an observation value and the mean of its sample or population distribution.

𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥𝑖 = 𝑥𝑖 − 𝑥ҧ The sum total of all deviations in a sample or population always equals zero.

𝑇ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝐷𝑒𝑣𝑎𝑡𝑖𝑜𝑛𝑠= (𝑥𝑖 − 𝑥ҧ) = 0

But deviations tell us something more about the sample or population than does a measure of central tendency, such as the mean. The greater the deviations – both positive and negative – the more “spread out” is the data.

Example: Consider the difference between the ages of students in a KKU school bus…..

…and the ages of passengers in a typical city bus.

Let’s take a sample of the ages of n = 20 passengers (“observations”) in each of the two buses.

Note that, even though the spread of ages on the city bus is larger than that of the school bus, we can’t use the total deviations to show this!

*Mean = Σtotal/n = Σtotal/20

Age Deviation Age Deviation1 23 2 14 -72 20 -1 15 -63 20 -1 22 14 20 -1 6 -155 22 1 2 -196 22 1 7 -147 21 0 5 -168 21 0 27 69 20 -1 35 1410 21 0 44 2311 21 0 7 -1412 20 -1 15 -613 19 -2 16 -514 19 -2 67 4615 22 1 12 -916 20 -1 11 -1017 24 3 24 318 20 -1 31 1019 24 3 4 -1720 21 0 56 35

21 0 21 0Mean Total Mean Total

KKU School Bus City Bus

n = 20

Measures of Spread or Variability

Range

Interquartile Range

Mean Absolute Deviation

Variance or Standard Deviation ******

Coefficient of Variation

The Range = Largest Value – Smallest Value

Returning to the bus example the range R on the two buses is

R(KKU) = 24 – 19 = 5 and R(City) = 67 – 2 = 65

To use more of the information in the data set, the Interquartile Range can be used instead.

The Interquartile Range is found by ordering the data set as above, eliminating the lowest quarter (25%) of the data set and the highest quarter (25%) of the data set, and then finding the range of the middle half (50%) of the data set.

Back to deviations to measure spread

Deviations measure spread, but…

• Always sum total to zero• So we cannot use the sum total as a measure of spread

If all deviations could be made positive, then deviations could be used. But how?• Take the absolute value of the deviations (ignore the minus

signs)• Square the deviations (all squared2 values are positive!)

Mean Absolute Deviation (MAD)In mathematics, a number’s absolute value is that same number, but expressed as a positive number without the minus sign (if it is a negative number). Straight vertical lines denote the absolute value.

Absolute value of 3 = |3| = 3 Absolute value of – 3 = |-3| = 3

The Mean Absolute Deviation (MAD) takes the absolute value of each of the sample deviations, and then gets the mean of these values for the sample as a whole.

Because there are 7 sample values, the MAD = 32/7 = 4.57

Just ignore theminus signs!

Returning to the School Bus & City Bus Example…..

Note that the City Bus’s MAD is much larger than the KKU School Bus’s MAD, just as our intuition would suggest.

n = 20 passengers

Age Deviation | Deviation| Age Deviation | Deviation|1 23 2 2 14 -7 72 20 -1 1 15 -6 63 20 -1 1 22 1 14 20 -1 1 6 -15 155 22 1 1 2 -19 196 22 1 1 7 -14 147 21 0 0 5 -16 168 21 0 0 27 6 69 20 -1 1 35 14 1410 21 0 0 44 23 2311 21 0 0 7 -14 1412 20 -1 1 15 -6 613 19 -2 2 16 -5 514 19 -2 2 67 46 4615 22 1 1 12 -9 916 20 -1 1 11 -10 1017 24 3 3 24 3 318 20 -1 1 31 10 1019 24 3 3 4 -17 1720 21 0 0 56 35 35

21 0 1.1 21 0 13.8Mean Total MAD Mean Total MAD

KKU School Bus City Bus

n = 20

MAD = Σ|Deviation|/n = Σ|Deviation|/20

Recommended