Twitter: @Owen134866 · Measures of location and spread A measure of location is a single value which is used to represent a set of data. Examples include the mean, median and mode

Twitter: @Owen134866

www.mathsfreeresourcelibrary.com

Prior Knowledge Check

1) State whether each variable is qualitative or quantitative:

a) Car colour

b) Miles travelled by a cyclist

c) Favourite pet

d) Number of siblings

2) State whether each of these variables is discrete or continuous

a) Number of pets owned

b) Distance walked by hikers

c) Fuel consumption of lorries

d) Number of peas in a pod

e) Times taken by a group of athletes to run 1500m.

3) Calculate the 3 averages and range for the data set below:

Peas in a pod 3 4 5 6 7

Frequency 4 7 11 18 6

Qualitative

Qualitative

Quantitative

Quantitative

Discrete

Discrete

Continuous

Continuous

Continuous

Mean 5.33

Median 6

Mode 6

Range 4

Measures of location and spread

A measure of location is a single value which is used to represent a set of data. Examples include the mean,

median and mode

These can also be known as ‘measures of central tendency’

The mode or modal class is the value or class with the highest frequency

The median is the middle value when the data is put into ascending (or descending)

order

The mean is calculated using the formula:

2A/B

ҧ𝑥 =σ𝑥

𝑛

The mean, ‘x-bar’

The sum of 𝑥 (𝑥 represents each number in the data

set)

The number of ‘bits’ of dataYou need to start using ‘proper’

notation!



median and mode

The mean of a sample of 25 observations is 6.4. The mean of a second sample of 30 observations is 7.2. Calculate the mean of

all 55 observations.

2A/B

ҧ𝑥 =σ𝑥

𝑛

6.4 + 7.2

2

Why is this calculation wrong?

Each data set has a different quantity

The mean will be ‘weighted’ towards the data set with the higher quantity



median and mode

The mean of a sample of 25 observations is 6.4. The mean of a second sample of 30 observations of 7.2. Calculate the mean of

all 55 observations.

2A/B

ҧ𝑥 =σ𝑥

𝑛

Sample 1

6.4 =σ𝑥

25

= 160σ𝑥

Multiply by 25

Sample 2

7.2 =σ𝑥

30

= 216σ𝑥

Multiply by 30

So the total sum of the data is 376

There were 55 ‘bits’ of data in total

ҧ𝑥 =σ𝑥

𝑛

ҧ𝑥 =376

55

Sub in values

Sub in values

ҧ𝑥 = 6.84



median and mode

You will need to decide which is the best average to use depending on the

situation…

2A/B

ҧ𝑥 =σ𝑥

𝑛

Mode Median Mean

The mode is used for qualitative data, or

where the data has a single mode or two modes (bi-modal)

This is used for quantitative data, and is usually used when the

data has some extreme values (since this will not affect the median

too much)

The mean is used for quantitative data and

uses all data in a set. It therefore gives a true measure, but can be skewed by extreme

values



median and mode

For data given in a frequency table, you can calculate the mean by using the

formula below:

2A/B

ҧ𝑥 =σ𝑥

𝑛

ҧ𝑥 =σ𝑓𝑥

σ𝑓

The sum of the products of the data values and their

frequencies

The sum of the frequencies

ҧ𝑥 =σ𝑓𝑥

σ𝑓



median and mode

Rebecca records the shirt collar size, x, of the male students in her year. The

results are shown in the table.

For the data, calculate:

a) The mode

b) The median

c) The mean

d) Explain why a shirt manufacturer might use the mode for setting their

production quota

2A/B

ҧ𝑥 =σ𝑥

𝑛

ҧ𝑥 =σ𝑓𝑥

σ𝑓

Collar SizeNumber of Students

15 3

15.5 17

16 29

16.5 34

17 12

= 𝟏𝟔. 𝟓𝑓 = 95

𝑓𝑥

95 + 1

2

= 48𝑡ℎ

3

20

49

= 𝟏𝟔

The median is the 48th value Add the frequencies up until you get beyond

this value So the median must be 16 as it is in that group



median and mode




a) The mode

b) The median

c) The mean


production quota

2A/B

ҧ𝑥 =σ𝑥

𝑛

ҧ𝑥 =σ𝑓𝑥

σ𝑓


15 3

15.5 17

16 29

16.5 34

17 12

= 𝟏𝟔. 𝟓

𝑓𝑥 𝑓𝑥

45

263.5

464

561

204

𝑓𝑥 = 1537.5

= 𝟏𝟔. 𝟐

= 𝟏𝟔

𝑓 = 95

ҧ𝑥 =σ𝑓𝑥

σ𝑓

ҧ𝑥 =1537.5

95

ҧ𝑥 = 16.2

Sub in values

Calculate



median and mode




a) The mode

b) The median

c) The mean


production quota

2A/B

ҧ𝑥 =σ𝑥

𝑛

ҧ𝑥 =σ𝑓𝑥

σ𝑓


15 3

15.5 17

16 29

16.5 34

17 12

= 𝟏𝟔. 𝟓

𝑓𝑥 𝑓𝑥

45

263.5

464

561

204

𝑓𝑥 = 1537.5

= 𝟏𝟔. 𝟐

= 𝟏𝟔

𝑓 = 95

The mode is in this case more useful as it tells the manufacturer what size shirt it

needs to produce the most of



median and mode

The length, 𝑥 mm, to the nearest mm, of a random sample of pine cones is measured.

The data is shown in the table to the right.

a) Write down the modal class

b) Estimate the mean

c) Find the median class

2A/B

ҧ𝑥 =σ𝑥

𝑛

ҧ𝑥 =σ𝑓𝑥

σ𝑓

Cone length (mm)

Frequency

30-31 2

32-33 25

34-36 30

37-39 13

34-36

𝑓𝑥 𝑓𝑥

61

𝑓𝑥 = 2417.5𝑓 = 70

494

1050

812.5

ҧ𝑥 =σ𝑓𝑥

σ𝑓

ҧ𝑥 =2417.5

70

ҧ𝑥 = 34.5

Sub in values

Calculate

To calculate the mean from a grouped table you need to use the midpoint of each group

30.5

32.5

35

38

34.5



median and mode

The length, 𝑥 mm, to the nearest mm, of a random sample of pine cones is measured.

The data is shown in the table to the right.

a) Write down the modal class

b) Estimate the mean

c) Find the median class

2A/B

ҧ𝑥 =σ𝑥

𝑛

ҧ𝑥 =σ𝑓𝑥

σ𝑓

Cone length (mm)

Frequency

30-31 2

32-33 25

34-36 30

37-39 13

34-36

𝑓𝑥

𝑓 = 70

34.5 70 + 1

2

= 35.5𝑡ℎ

The median is the 35.5th value Add the frequencies up until you get beyond

this value So the median class is 34-36

2

27

57

34-36


You need to be able to calculate quartiles and percentiles of a data

set

The median describes the middle of a set of data, splitting the data into two

halves with 50% in each.

You can also calculate quartiles and percentiles, which are also both

measures of location

The median is also known as the second quartile

2C

𝑄2𝑄1 𝑄3Lowestvalue

Highestvalue

50% 50%25% 25% 25% 25%10% 90%

MedianLower

QuartileUpper

Quartile

When combined with the median, the lower and upper quartiles split the data into 4 equal sections

The 10th percentile is the value with 10% of the data lower than it

The 90th percentile is the value with 90% of the data lower than it

So if your test score was in the 90th

percentile, that is a good thing!



set

The way the quartiles are calculated depends on whether the data is

discrete or continuous…

2C

Lower Quartile Upper Quartile

Discrete

Continuous

Divide 𝑛 by 4 If whole, the LQ is between this

value and the one above If not whole, round up and take

that data point

Divide 3𝑛 by 4 If whole, the UQ is between this

value and the one above If not whole, round up and take

that data point

Divide 𝑛 by 4 and take that data point

Divide 3𝑛 by 4 and take that data point



set

From the large data set, the daily maximum gust (knots) during the first 20 days of June 2015 is recorded in

Hurn. The data is shown below:

Find the median and quartiles for this data.

2C

14 15 17 17 18

18 19 19 22 22

23 23 23 24 25

26 27 28 36 39

𝑄2 =𝑛+1

2𝑡ℎ value

𝑄2 =20 + 1

2

𝑄2 = 10.5𝑡ℎ value

𝑄2 = 22.5

There are 20 values

Calculate

Find this value

𝑸𝟐 = 𝟐𝟐. 𝟓

Note that we treat this as discrete data since we have all

the actual values!



set

From the large data set, the daily maximum gust (knots) during the first 20 days of June 2015 is recorded in

Hurn. The data is shown below:

Find the median and quartiles for this data.

2C

14 15 17 17 18

18 19 19 22 22

23 23 23 24 25

26 27 28 36 39

𝑸𝟐 = 𝟐𝟐. 𝟓

The data is discrete

For 𝑄1𝑛

4

20

4= 5

So take the 5.5th value

𝑄1 = 18

For 𝑄33𝑛

4

60

4= 15

So take the 15.5th value

𝑄3 = 25.5𝑸𝟏 = 𝟏𝟖 𝑸𝟑 = 𝟐𝟓. 𝟓



set

The length of time (to the nearest minute) spent on the internet each evening by a group of students is

shown in the table below.

a) Find an estimate for the upper quartile

b) Find an estimate for the 10th

percentile

2C

Time spent on internet (mins)

Frequency

30-31 2

32-33 25

34-36 30

37-39 13

The data is continuous

3𝑛

4

210

4= 52.5𝑡ℎ value

Find the group that this is in, and use linear interpolation to estimate the median…

2

27

57

You can use this formula:

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

Lower boundary of the group

Classwidth of the groupGroup

Frequency

Places into group

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊



set





percentile

2C

Time spent in internet (mins)

Frequency

30-31 2

32-33 25

34-36 30

37-39 13

= 52.5𝑡ℎ value

2

27

57

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

33.5 +25.5

30× 3

The value is 25.5 places into the group (it is the 52.5th value,

and we have already had 27 before the group started)

Remember for continuous data, you will need to use 33.5

and 36.5 as the class boundaries

= 36.05

Calculate



set





percentile

2C


Frequency

30-31 2

32-33 25

34-36 30

37-39 13

2

27

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

= 𝟑𝟔. 𝟎𝟓

The data is continuous

10𝑛

100

700

100= 7𝑡ℎ value

Find the group that this is in, and use linear interpolation to estimate it…

The 10th percentile is calculated as follows



set





percentile

2C


Frequency

30-31 2

32-33 25

34-36 30

37-39 13

2

27

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

= 𝟑𝟔. 𝟎𝟓

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

31.5 +5

25× 2

The value is 5 places into the group (it is the 7th value, and we have already had 2 before the

group started)

Remember for continuous data, you will need to use 31.5

and 33.5 as the class boundaries

𝑃10 = 31.9

Calculate

This notation is usually used for the 10th percentile


A measure of spread is a value which indicated how spread out the data set is. Examples include the range and interquartile range.

The range is the difference between the largest and smallest

values, and measures the spread of all the data

The interquartile range is the difference between the upper and lower quartiles, and measures the spread of the middle 50% of the

data

The interpercentile range is the difference between 2 given

percentiles

2D

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊



The table to the right shows the masses (tonnes) of 120 African

elephants.

Find estimates for:

a) The range

b) The interquartile range

c) The 10th to 90th percentile range

2D

Mass, m (t) Frequency

4.0 ≤ 𝑚 < 4.5 13

4.5 ≤ 𝑚 < 5.0 23

5.0 ≤ 𝑚 < 5.5 31

5.5 ≤ 𝑚 < 6.0 34

6.0 ≤ 𝑚 ≤ 6.5 19

The range is the biggest possible value subtract the

smallest possible value

6.5 - 4.0 = 2.5

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

𝟐. 𝟓




elephants.

Find estimates for:

a) The range



2D


4.0 ≤ 𝑚 < 4.5 13

4.5 ≤ 𝑚 < 5.0 23

5.0 ≤ 𝑚 < 5.5 31

5.5 ≤ 𝑚 < 6.0 34

6.0 ≤ 𝑚 ≤ 6.5 19

13

120

101

67

36

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

= 4.5 +17

23× 0.5

For 𝑄1

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

𝑛

4

=120

4= 30𝑡ℎ value

= 4.87

𝑸𝟏 = 𝟒. 𝟖𝟕

Sub in values

Calculate

Now use linear interpolation𝟐. 𝟓




elephants.

Find estimates for:

a) The range



2D


4.0 ≤ 𝑚 < 4.5 13

4.5 ≤ 𝑚 < 5.0 23

5.0 ≤ 𝑚 < 5.5 31

5.5 ≤ 𝑚 < 6.0 34

6.0 ≤ 𝑚 ≤ 6.5 19

𝟐. 𝟓

13

120

101

67

36

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

= 5.5 +23

34× 0.5

For 𝑄3

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

3𝑛

4

=360

4= 90𝑡ℎ value

= 5.84

𝑸𝟏 = 𝟒. 𝟖𝟕

Sub in values

Calculate

Now use linear interpolation

𝑸𝟑 = 𝟓. 𝟖𝟒

𝟓. 𝟖𝟒 − 𝟒. 𝟖𝟕 = 𝟎. 𝟗𝟕

𝟎. 𝟗𝟕




elephants.

Find estimates for:

a) The range



2D


4.0 ≤ 𝑚 < 4.5 13

4.5 ≤ 𝑚 < 5.0 23

5.0 ≤ 𝑚 < 5.5 31

5.5 ≤ 𝑚 < 6.0 34

6.0 ≤ 𝑚 ≤ 6.5 19

𝟐. 𝟓

13

120

101

67

36

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

𝟎. 𝟗𝟕

= 4.0 +12

13× 0.5

For 𝑃10

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

10𝑛

100

=1200

100= 12𝑡ℎ value

= 4.46

Sub in values

Calculate


𝑷𝟏𝟎 = 𝟒. 𝟒𝟔




elephants.

Find estimates for:

a) The range



2D


4.0 ≤ 𝑚 < 4.5 13

4.5 ≤ 𝑚 < 5.0 23

5.0 ≤ 𝑚 < 5.5 31

5.5 ≤ 𝑚 < 6.0 34

6.0 ≤ 𝑚 ≤ 6.5 19

𝟐. 𝟓

13

120

101

67

36

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

𝟎. 𝟗𝟕

= 6.0 +7

19× 0.5

For 𝑃90

𝐿𝐵 +𝑃𝐿

𝐺𝐹× 𝐶𝑊

90𝑛

100

=10800

100= 108𝑡ℎ value

= 6.18

Sub in values

Calculate


𝑷𝟏𝟎 = 𝟒. 𝟒𝟔 𝑷𝟗𝟎 = 𝟔. 𝟏𝟖

𝟔. 𝟏𝟖 − 𝟒. 𝟒𝟔 = 𝟏. 𝟕𝟐

Measures of location and spreadThe variance and standard deviation can also be used to analyse a set of

data

The variance and standard deviation are both measures of spread, and

involve the fact that each data point deviates from the mean by the

amount:

𝑥 − ҧ𝑥

Where 𝑥 is a data point and ҧ𝑥 is the mean of the data as a whole

2E

The variance and standard deviation can also be used to analyse a set of

data

The variance is defined as:

“The average of the squared distances from the mean”

So the distances of each data point from the mean are all squared, and

divided by how many there are.

This gives the formula to the right:

2E

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =σ 𝑥 − ҧ𝑥 2

𝑛

=σ𝑥2

𝑛−

σ𝑥

𝑛

2

=𝑆𝑥𝑥𝑛

This formula is equivalent

This formula is equivalent

“Mean of the squares subtract the square of the

mean”

The notation 𝑆𝑥𝑥 is short for σ 𝑥 − ҧ𝑥 2 or σ𝑥2 −σ 𝑥 2

𝑛


𝑛 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =𝑆𝑥𝑥𝑛



data

The standard deviation is the square root of the variance.

The symbol 𝜎 (lower case sigma) is used to represent standard deviation

Therefore 𝜎2 is usually used to represent the variance

The square root of any of these gives the Standard Deviation

2E


𝑛 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =𝑆𝑥𝑥𝑛

𝜎2 =σ 𝑥 − ҧ𝑥 2

𝑛 𝜎2 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝜎2 =𝑆𝑥𝑥𝑛 𝜎 =

σ 𝑥 − ҧ𝑥 2

𝑛𝜎 =

σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝜎 =𝑆𝑥𝑥𝑛


This is what it looks like in the formula booklet!

𝜎2 =σ 𝑥 − ҧ𝑥 2

𝑛


data

2E

𝜎2 =σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝑛𝜎 =

σ𝑥2

𝑛−

σ𝑥

𝑛

2



The Standard Deviation tells you the range from the mean which contains around 68% of the data (if data is normally distributed – you will learn about this at a later date)

For example, if 100 students have a mean height of 150cm and a standard deviation of 10cm.

150

140 160

130 170

68 of the students are within one Standard Deviation

95 of the students are within two Standard Deviations

σ𝑥 = 36

σ𝑥2 = 218

𝜎2 =σ 𝑥 − ҧ𝑥 2

𝑛


data

The marks gained in a test by seven randomly selected students are:

3 4 6 2 8 8 5

Find the variance and standard deviation of the marks of the seven

students.

The middle of the 3 formulae above is most commonly used when you have

the ‘raw’ data

2E

𝜎2 =σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝑛𝜎 =

σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝑥

𝑥2 9 16 36 4 64 64 25

𝜎2 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝜎2 =218

7−

36

7

2

𝜎2 = 4.69

Sub in values – we need σ𝑥 and σ𝑥2

Calculate

𝜎 = 2.17

Square root

𝜎2 =σ 𝑥 − ҧ𝑥 2

𝑛


data

The marks gained in a test by seven randomly selected students are:

3 4 6 2 8 8 5

Find the variance and standard deviation of the marks of the seven

students.

So this means that 68% of the data is within 2.17 of the mean. Note

that 68% is not always a possible percentage though!

2E

𝜎2 =σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝑛𝜎 =

σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝜎 = 2.17

𝑀𝑒𝑎𝑛 = 5.14

2 3 4 5 6 8 85.14 7.312.97

So out of the original 7 values, 4 are within 1 standard deviation of the mean

This is only 57% rather than 68%, because the data size is very small!

−2.17 +2.17

𝜎2 =σ 𝑥 − ҧ𝑥 2

𝑛


data

Shamsa records the time spent out of school during the lunch hour to the

nearest minute, x, of the female students in her year. The results are

shown in the table.

Calculate the standard deviation of the time spent out of school.

For tabled data, we need to use a modified formula…

2E

𝜎2 =σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝑛𝜎 =

σ𝑥2

𝑛−

σ𝑥

𝑛

2



Time (mins) Frequency

35 3

36 17

37 29

38 34

𝜎 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

σ𝑓 = 83

𝑥 𝑓 𝑓𝑥2𝑓𝑥

105

612

1073

1292

σ𝑓𝑥 = 3082 σ𝑓𝑥2 = 114504

3675

22032

39701

49096

𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

𝜎 =114504

83−

3082

83

2

𝜎 = 0.861

Sub in values – we need to use the

table to calculate these!

Calculate

σ𝑓 = 27 σ𝑓𝑥 = 285 σ𝑓𝑥2 = 6487.5

𝜎2 =σ 𝑥 − ҧ𝑥 2

𝑛


data

Andy recorded the length, in minutes, of each telephone call he made for a month. The data is summarized in the

table below.

Calculate an estimate of the standard deviation of the length of the

phonecalls

2E

𝜎2 =σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝑛𝜎 =

σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

Length of call (mins) Frequency

0 < 𝑙 ≤ 5 4

5 < 𝑙 ≤ 10 15

10 < 𝑙 ≤ 15 5

15 < 𝑙 ≤ 20 2

20 < 𝑙 ≤ 60 0

60 < 𝑙 ≤ 70 1

2.5

7.5

12.5

17.5

40

65

𝑥 𝑓 𝑓𝑥2𝑓𝑥

10

112.5

62.5

35

0

65

25

843.75

781.25612.5

0

4225

For a grouped table we need to use the midpoints (as in the previous section)

σ𝑓 = 27

σ𝑓𝑥 = 285

σ𝑓𝑥2 = 6487.5

𝜎2 =σ 𝑥 − ҧ𝑥 2

𝑛


data

Andy recorded the length, in minutes, of each telephone call he made for a month. The data is summarized in the

table below.

Calculate an estimate of the standard deviation of the length of the

phonecalls

2E

𝜎2 =σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝑛𝜎 =

σ𝑥2

𝑛−

σ𝑥

𝑛

2



𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

Length of call (mins) Frequency

0 < 𝑙 ≤ 5 4

5 < 𝑙 ≤ 10 15

10 < 𝑙 ≤ 15 5

15 < 𝑙 ≤ 20 2

20 < 𝑙 ≤ 60 0

60 < 𝑙 ≤ 70 1

2.5

7.5

12.5

17.5

40

65

𝑥 𝑓

𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

𝜎 =6487.5

27−

285

27

2

𝜎 = 11.35

Sub in values – we need to use the

table to calculate these!

Calculate

Measures of location and spreadCoding can be used to make a set

of values simpler to work with

If numbers in a data set are particularly large, they can all be altered in the same way to make

them smaller

This will however affect the measures of location and

dispersion that we have been calculating, and you need to be

aware of these effects…

2F

Imagine we have the following data on people’s heights (cm)

145 170 168 166 151 147 150 172

Mean = 158.625 Range = 27

If all the values were multiplied by 2, what would happen to the measures above?

Mean and range would double

If all the values above had 20 added to them, what would happen to the measures above?

Mean would increase by 20, but the range would stay the same

If you change a set of data by adding or subtracting an amount, this will not affect the range, or any other measures of spread, such as the IQR or

standard deviation

𝜎 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

ҧ𝑥 =σ𝑥

𝑛𝜎 =

𝑆𝑥𝑥𝑛

Coding can be used to make a set of values simpler to work with

A scientist measures the temperature, 𝑥℃ at five different

points in a nuclear reactor. Her results are given below:

332, 355, 306, 317, 340

a) Use the coding 𝑦 =𝑥−300

10to code

this data

b) Calculate the mean and standard deviation of the coded data

c) Use your answer to b) the calculate the mean and standard

deviation of the original data.

2F

𝑥 𝑦

332

355

306

317

340

σ𝑦 = 15

𝑦2


𝜎 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

ҧ𝑥 =σ𝑥

𝑛

10.24

30.25

0.36

2.89

16

σ𝑦2 = 59.74

To code the data. take each starting value, subtract 300 from it, and divide the answer by 10

Now we can calculate the mean and standard deviation of this new information…

ത𝑦 =σ𝑦

𝑛

ത𝑦 = 3

𝜎𝑦 =σ𝑦2

𝑛−

σ𝑦

𝑛

2

𝜎𝑦 =59.74

5−

15

5

2

𝜎𝑦 = 1.72

Sub in values

Sub in values

Work out

ത𝑦 =15

5 Work out

3.2

5.5

0.6

1.7

4



A scientist measures the temperature, 𝑥℃ at five different

points in a nuclear reactor. Her results are given below:

332, 355, 306, 317, 340

a) Use the coding 𝑦 =𝑥−300

10to code

this data

b) Calculate the mean and standard deviation of the coded data

c) Use your answer to b) the calculate the mean and standard

deviation of the original data.

2F


𝜎 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

ҧ𝑥 =σ𝑥

𝑛

3 × 10 + 300

𝝈𝒚 = 𝟏. 𝟕𝟐

The original data had 300 subtracted, and then was divided by 10

We need to reverse this, so multiply by 10, and then add 300

Original mean

= 330℃

1.72 × 10

The original data had 300 subtracted, and then was divided by 10

The subtracting 300 will not have affected the standard deviation, so we only need to multiply by

10

Original standard deviation

= 17.2℃

ഥ𝒚 = 𝟑



From the large data set, date on the maximum gust, 𝑔 knots, is

recorded in Leuchars during May and June 2015.

The data was coded using ℎ =𝑔−5

10

and the following statistics found:

𝑆ℎℎ = 43.58തℎ = 2

𝑛 = 61

Calculate the mean and standard deviation of the maximum gust in

knots.

2F


𝜎 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

ҧ𝑥 =σ𝑥

𝑛

The mean has had 5 subtracted and then been multiplied by 10

You can write this as a formula:

തℎ =ҧ𝑔 − 5

10

2 =ҧ𝑔 − 5

10

20 = ҧ𝑔 − 5

ҧ𝑔 = 25

We know the mean of h


Multiply by 10

Add 5

This is a better way to show your workings!

25 = ҧ𝑔


From the large data set, date on the maximum gust, 𝑔 knots, is

recorded in Leuchars during May and June 2015.

The data was coded using ℎ =𝑔−5

10

and the following statistics found:

𝑆ℎℎ = 43.58തℎ = 2

𝑛 = 61

Calculate the mean and standard deviation of the maximum gust in

knots.

2F


𝜎 =σ𝑥2

𝑛−

σ𝑥

𝑛

2

𝜎 =σ𝑓𝑥2

σ𝑓−

σ𝑓𝑥

σ𝑓

2

ҧ𝑥 =σ𝑥

𝑛

We need to find the standard deviation of h first!

Use the formula above…

ҧ𝑔 = 25


𝜎ℎ =𝑆ℎℎ𝑛

𝜎ℎ =43.58

61

𝜎ℎ = 0.845…

As the subtraction has not affected the standard deviation, we only need to undo the division by 10

Like with the previous example, we can write this as a formula

𝜎ℎ =𝜎𝑔

10

0.845 =𝜎𝑔

10

8.45 = 𝜎𝑔

Sub in values

Multiply by 10

Sub in values

Calculate

𝜎𝑔 = 8.45

Documents

Twitter: @Owen134866 · Measures of location and spread A measure of location is a single value which is used to represent a set of data. Examples include the mean, median and mode