Chapter 16 Measures of Dispersion

Contents

3.1 Range and Inter-quartile Range

3.2 Box-and-whisker Diagrams

3.3 Standard Deviation

3.4 Comparing the Dispersions of Different Sets of Data

3.5 Effects on the Dispersion with Change in Data

3 Measures of Dispersion

P. 2

Measures of Dispersion3

Content


Mean, the median and the mode are measures of the central tendency of a set of data. But such measurements cannot tell us the dispersion of the data. Dispersion is the statistical name for the spread or variability of data

A. Introduction to Dispersion

Consider the following two sets of data which shows the heights (in cm) of the team members in two different teams.

Team A: 160, 161, 162, 162, 163, 164, 165, 167, 171, 175

Team B: 154, 156, 158, 159, 162, 164, 166, 172, 174, 185

Both teams have the same mean height 165 cm but the distribution of the heights of the team members in Team B is more spread out.

P. 3


Content


B. Range

The range is a simple measure of the dispersion of a set of data. For ungrouped data, the range is the difference between the largest value and the smallest value of the set of data.

For a set of grouped data, the range is the difference between the highest class boundary and the lowest class boundary.

Range = largest value – smallest value

Range = highest class boundary – lowest class boundary

P. 4


Content

C. Inter-quartile Range


The inter-quartile range is defined as the difference between the upper quartile and the lower quartile of a set of data.

Inter-quartile range = Q3 – Q1

When the data is arranged in ascending order of magnitude, the quartiles divide the data into four parts. There are a total of three quartiles which are usually denoted by Q1, Q2 and Q3.

P. 5


Content

3.2 Box-and-whisker Diagrams

A box-and-whisker diagram illustrates the spread of a set of data. It provides a graphical summary of the set of data by showing the quartiles and the extreme values of the data.

From the above diagram, we know that the range of the data is 22 and the inter-quartile range is 9.

The difference between the two end-points of the line (represented by the highest and lowest marks) is the range.

The length of the box is the inter-quartile range.

Fig. 3.16

P. 6


Content

For a set of ungrouped data x1, x2, …, xn,

A. Standard Deviation for Ungrouped Data

3.3 Standard Deviation

n

xxf

n

xxxxxx

n

ii

n

1

21

222

21

)(

)()()( deviation Standard

data. of number total the is and mean the iswhere nx

Notes:

1. Two sets of data may have the same mean but different standard deviations.

2. The larger the standard deviation, the more spread out the data is.

P. 7


Content

For a set of grouped, we have to consider the frequency of each datum.

B. Standard Deviation for Grouped Data

3.3 Standing Deviation

n

ii

n

ii

n

nn

f

xxf

fff

xxfxxfxxf

1

1

21

21

2222

211

)(

)()()( deviation Standard

data. of number total the is and mean the isdata, of group th the offrequency the is where

nxifi

P. 8


Content

Consider the following data:

40 , 36, 47, 53, 56.

The following steps demonstrates how to use a calculator to find the mean and the standard deviation of the data.

C. Finding Standard Deviation by a Calculator

3.3 Standing Deviation

Step 1: Set the function mode of the calculator to standard deviation ‘SD’ by pressing mode mode 1. then clear all the previous data in the ‘SD’ mode by pressing SHIFT CLR 1 EXE.

Step 2: Press the following keys n sequence: 40 DT 36 DT 47 DT 53 DT 56 DT

Step 3: Press SHIFT S-VAR 1 EXE, then we can obtain the mean = 46.4. Press SHIFT S-VAR 2 EXE then we can obtain the standard

deviation = 7.55.

P. 9


Content

3.4 Comparing the Dispersions of Different Sets of Data

Measure of dispersion

Advantage Disadvantage

1. Range

2. inter-quartile range

3. Standard deviation

Only two data are involved, so it is the easiest one to calculate.

It only focuses on the middle 50% of data, thus avoiding the influence by extreme values.

It takes all the data into account.

Only extreme values are considered which may give a misleading impression of the dispersion.

It cannot show the dispersion of the whole group of data.

It is difficult to compute without using a calculator.

Table 3.34

P. 10


Content


If the greatest or the least value (assuming both are unique) in a data set is removed, then

(1) the range will decrease;

(2) the standard deviation will decrease as the data spread less widely;

(3) the inter-quartile range may increase or decrease.

A. Removal of a Certain Item from the Data

P. 11


Content


If a constant k is added to every datum in a set of data, then the following measures of dispersion

(1) the range,

(2) the inter-quartile range and

(3) the standard deviation

will not change.

B. Adding a Common Constant to the Whole Set of Data

P. 12


Content


The range, the inter-quartile range and the standard deviation will be k time the original values if the whole set of data is multiplied by a positive constant k.

C. Multiplying the Whole Set of Data by a Constant

P. 13


Content


In general, the zero value is the smallest one in a

set of data. So, it is similar to the case we studied

in Section A. But now, we insert the smallest one

into the set of data.

If a zero value is inserted in a positive data set, then

D. Insertion of Zero in the Data Set

In general, statistical data is non-negative, for example, height, weight, score, etc

• the range will increase;

• the standard deviation will increase as the data is spread more widely;

• the inter-quartile range may increase of decrease.

Documents

Chapter 16 Measures of Dispersion