Upload
others
View
3
Download
1
Embed Size (px)
Citation preview
Twitter: @Owen134866
www.mathsfreeresourcelibrary.com
Prior Knowledge Check
1) State whether each variable is qualitative or quantitative:
a) Car colour
b) Miles travelled by a cyclist
c) Favourite pet
d) Number of siblings
2) State whether each of these variables is discrete or continuous
a) Number of pets owned
b) Distance walked by hikers
c) Fuel consumption of lorries
d) Number of peas in a pod
e) Times taken by a group of athletes to run 1500m.
3) Calculate the 3 averages and range for the data set below:
Peas in a pod 3 4 5 6 7
Frequency 4 7 11 18 6
Qualitative
Qualitative
Quantitative
Quantitative
Discrete
Discrete
Continuous
Continuous
Continuous
Mean 5.33
Median 6
Mode 6
Range 4
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
These can also be known as ‘measures of central tendency’
The mode or modal class is the value or class with the highest frequency
The median is the middle value when the data is put into ascending (or descending)
order
The mean is calculated using the formula:
2A/B
ҧ𝑥 =σ𝑥
𝑛
The mean, ‘x-bar’
The sum of 𝑥 (𝑥 represents each number in the data
set)
The number of ‘bits’ of dataYou need to start using ‘proper’
notation!
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
The mean of a sample of 25 observations is 6.4. The mean of a second sample of 30 observations is 7.2. Calculate the mean of
all 55 observations.
2A/B
ҧ𝑥 =σ𝑥
𝑛
6.4 + 7.2
2
Why is this calculation wrong?
Each data set has a different quantity
The mean will be ‘weighted’ towards the data set with the higher quantity
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
The mean of a sample of 25 observations is 6.4. The mean of a second sample of 30 observations of 7.2. Calculate the mean of
all 55 observations.
2A/B
ҧ𝑥 =σ𝑥
𝑛
Sample 1
6.4 =σ𝑥
25
= 160σ𝑥
Multiply by 25
Sample 2
7.2 =σ𝑥
30
= 216σ𝑥
Multiply by 30
So the total sum of the data is 376
There were 55 ‘bits’ of data in total
ҧ𝑥 =σ𝑥
𝑛
ҧ𝑥 =376
55
Sub in values
Sub in values
ҧ𝑥 = 6.84
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
You will need to decide which is the best average to use depending on the
situation…
2A/B
ҧ𝑥 =σ𝑥
𝑛
Mode Median Mean
The mode is used for qualitative data, or
where the data has a single mode or two modes (bi-modal)
This is used for quantitative data, and is usually used when the
data has some extreme values (since this will not affect the median
too much)
The mean is used for quantitative data and
uses all data in a set. It therefore gives a true measure, but can be skewed by extreme
values
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
For data given in a frequency table, you can calculate the mean by using the
formula below:
2A/B
ҧ𝑥 =σ𝑥
𝑛
ҧ𝑥 =σ𝑓𝑥
σ𝑓
The sum of the products of the data values and their
frequencies
The sum of the frequencies
ҧ𝑥 =σ𝑓𝑥
σ𝑓
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
Rebecca records the shirt collar size, x, of the male students in her year. The
results are shown in the table.
For the data, calculate:
a) The mode
b) The median
c) The mean
d) Explain why a shirt manufacturer might use the mode for setting their
production quota
2A/B
ҧ𝑥 =σ𝑥
𝑛
ҧ𝑥 =σ𝑓𝑥
σ𝑓
Collar SizeNumber of Students
15 3
15.5 17
16 29
16.5 34
17 12
= 𝟏𝟔. 𝟓𝑓 = 95
𝑓𝑥
95 + 1
2
= 48𝑡ℎ
3
20
49
= 𝟏𝟔
The median is the 48th value Add the frequencies up until you get beyond
this value So the median must be 16 as it is in that group
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
Rebecca records the shirt collar size, x, of the male students in her year. The
results are shown in the table.
For the data, calculate:
a) The mode
b) The median
c) The mean
d) Explain why a shirt manufacturer might use the mode for setting their
production quota
2A/B
ҧ𝑥 =σ𝑥
𝑛
ҧ𝑥 =σ𝑓𝑥
σ𝑓
Collar SizeNumber of Students
15 3
15.5 17
16 29
16.5 34
17 12
= 𝟏𝟔. 𝟓
𝑓𝑥 𝑓𝑥
45
263.5
464
561
204
𝑓𝑥 = 1537.5
= 𝟏𝟔. 𝟐
= 𝟏𝟔
𝑓 = 95
ҧ𝑥 =σ𝑓𝑥
σ𝑓
ҧ𝑥 =1537.5
95
ҧ𝑥 = 16.2
Sub in values
Calculate
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
Rebecca records the shirt collar size, x, of the male students in her year. The
results are shown in the table.
For the data, calculate:
a) The mode
b) The median
c) The mean
d) Explain why a shirt manufacturer might use the mode for setting their
production quota
2A/B
ҧ𝑥 =σ𝑥
𝑛
ҧ𝑥 =σ𝑓𝑥
σ𝑓
Collar SizeNumber of Students
15 3
15.5 17
16 29
16.5 34
17 12
= 𝟏𝟔. 𝟓
𝑓𝑥 𝑓𝑥
45
263.5
464
561
204
𝑓𝑥 = 1537.5
= 𝟏𝟔. 𝟐
= 𝟏𝟔
𝑓 = 95
The mode is in this case more useful as it tells the manufacturer what size shirt it
needs to produce the most of
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
The length, 𝑥 mm, to the nearest mm, of a random sample of pine cones is measured.
The data is shown in the table to the right.
a) Write down the modal class
b) Estimate the mean
c) Find the median class
2A/B
ҧ𝑥 =σ𝑥
𝑛
ҧ𝑥 =σ𝑓𝑥
σ𝑓
Cone length (mm)
Frequency
30-31 2
32-33 25
34-36 30
37-39 13
34-36
𝑓𝑥 𝑓𝑥
61
𝑓𝑥 = 2417.5𝑓 = 70
494
1050
812.5
ҧ𝑥 =σ𝑓𝑥
σ𝑓
ҧ𝑥 =2417.5
70
ҧ𝑥 = 34.5
Sub in values
Calculate
To calculate the mean from a grouped table you need to use the midpoint of each group
30.5
32.5
35
38
34.5
Measures of location and spread
A measure of location is a single value which is used to represent a set of data. Examples include the mean,
median and mode
The length, 𝑥 mm, to the nearest mm, of a random sample of pine cones is measured.
The data is shown in the table to the right.
a) Write down the modal class
b) Estimate the mean
c) Find the median class
2A/B
ҧ𝑥 =σ𝑥
𝑛
ҧ𝑥 =σ𝑓𝑥
σ𝑓
Cone length (mm)
Frequency
30-31 2
32-33 25
34-36 30
37-39 13
34-36
𝑓𝑥
𝑓 = 70
34.5 70 + 1
2
= 35.5𝑡ℎ
The median is the 35.5th value Add the frequencies up until you get beyond
this value So the median class is 34-36
2
27
57
34-36
Measures of location and spread
You need to be able to calculate quartiles and percentiles of a data
set
The median describes the middle of a set of data, splitting the data into two
halves with 50% in each.
You can also calculate quartiles and percentiles, which are also both
measures of location
The median is also known as the second quartile
2C
𝑄2𝑄1 𝑄3Lowestvalue
Highestvalue
50% 50%25% 25% 25% 25%10% 90%
MedianLower
QuartileUpper
Quartile
When combined with the median, the lower and upper quartiles split the data into 4 equal sections
The 10th percentile is the value with 10% of the data lower than it
The 90th percentile is the value with 90% of the data lower than it
So if your test score was in the 90th
percentile, that is a good thing!
Measures of location and spread
You need to be able to calculate quartiles and percentiles of a data
set
The way the quartiles are calculated depends on whether the data is
discrete or continuous…
2C
Lower Quartile Upper Quartile
Discrete
Continuous
Divide 𝑛 by 4 If whole, the LQ is between this
value and the one above If not whole, round up and take
that data point
Divide 3𝑛 by 4 If whole, the UQ is between this
value and the one above If not whole, round up and take
that data point
Divide 𝑛 by 4 and take that data point
Divide 3𝑛 by 4 and take that data point
Measures of location and spread
You need to be able to calculate quartiles and percentiles of a data
set
From the large data set, the daily maximum gust (knots) during the first 20 days of June 2015 is recorded in
Hurn. The data is shown below:
Find the median and quartiles for this data.
2C
14 15 17 17 18
18 19 19 22 22
23 23 23 24 25
26 27 28 36 39
𝑄2 =𝑛+1
2𝑡ℎ value
𝑄2 =20 + 1
2
𝑄2 = 10.5𝑡ℎ value
𝑄2 = 22.5
There are 20 values
Calculate
Find this value
𝑸𝟐 = 𝟐𝟐. 𝟓
Note that we treat this as discrete data since we have all
the actual values!
Measures of location and spread
You need to be able to calculate quartiles and percentiles of a data
set
From the large data set, the daily maximum gust (knots) during the first 20 days of June 2015 is recorded in
Hurn. The data is shown below:
Find the median and quartiles for this data.
2C
14 15 17 17 18
18 19 19 22 22
23 23 23 24 25
26 27 28 36 39
𝑸𝟐 = 𝟐𝟐. 𝟓
The data is discrete
For 𝑄1𝑛
4
20
4= 5
So take the 5.5th value
𝑄1 = 18
For 𝑄33𝑛
4
60
4= 15
So take the 15.5th value
𝑄3 = 25.5𝑸𝟏 = 𝟏𝟖 𝑸𝟑 = 𝟐𝟓. 𝟓
Measures of location and spread
You need to be able to calculate quartiles and percentiles of a data
set
The length of time (to the nearest minute) spent on the internet each evening by a group of students is
shown in the table below.
a) Find an estimate for the upper quartile
b) Find an estimate for the 10th
percentile
2C
Time spent on internet (mins)
Frequency
30-31 2
32-33 25
34-36 30
37-39 13
The data is continuous
3𝑛
4
210
4= 52.5𝑡ℎ value
Find the group that this is in, and use linear interpolation to estimate the median…
2
27
57
You can use this formula:
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
Lower boundary of the group
Classwidth of the groupGroup
Frequency
Places into group
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
Measures of location and spread
You need to be able to calculate quartiles and percentiles of a data
set
The length of time (to the nearest minute) spent on the internet each evening by a group of students is
shown in the table below.
a) Find an estimate for the upper quartile
b) Find an estimate for the 10th
percentile
2C
Time spent in internet (mins)
Frequency
30-31 2
32-33 25
34-36 30
37-39 13
= 52.5𝑡ℎ value
2
27
57
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
33.5 +25.5
30× 3
The value is 25.5 places into the group (it is the 52.5th value,
and we have already had 27 before the group started)
Remember for continuous data, you will need to use 33.5
and 36.5 as the class boundaries
= 36.05
Calculate
Measures of location and spread
You need to be able to calculate quartiles and percentiles of a data
set
The length of time (to the nearest minute) spent on the internet each evening by a group of students is
shown in the table below.
a) Find an estimate for the upper quartile
b) Find an estimate for the 10th
percentile
2C
Time spent in internet (mins)
Frequency
30-31 2
32-33 25
34-36 30
37-39 13
2
27
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
= 𝟑𝟔. 𝟎𝟓
The data is continuous
10𝑛
100
700
100= 7𝑡ℎ value
Find the group that this is in, and use linear interpolation to estimate it…
The 10th percentile is calculated as follows
Measures of location and spread
You need to be able to calculate quartiles and percentiles of a data
set
The length of time (to the nearest minute) spent on the internet each evening by a group of students is
shown in the table below.
a) Find an estimate for the upper quartile
b) Find an estimate for the 10th
percentile
2C
Time spent in internet (mins)
Frequency
30-31 2
32-33 25
34-36 30
37-39 13
2
27
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
= 𝟑𝟔. 𝟎𝟓
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
31.5 +5
25× 2
The value is 5 places into the group (it is the 7th value, and we have already had 2 before the
group started)
Remember for continuous data, you will need to use 31.5
and 33.5 as the class boundaries
𝑃10 = 31.9
Calculate
This notation is usually used for the 10th percentile
Measures of location and spread
A measure of spread is a value which indicated how spread out the data set is. Examples include the range and interquartile range.
The range is the difference between the largest and smallest
values, and measures the spread of all the data
The interquartile range is the difference between the upper and lower quartiles, and measures the spread of the middle 50% of the
data
The interpercentile range is the difference between 2 given
percentiles
2D
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
Measures of location and spread
A measure of spread is a value which indicated how spread out the data set is. Examples include the range and interquartile range.
The table to the right shows the masses (tonnes) of 120 African
elephants.
Find estimates for:
a) The range
b) The interquartile range
c) The 10th to 90th percentile range
2D
Mass, m (t) Frequency
4.0 ≤ 𝑚 < 4.5 13
4.5 ≤ 𝑚 < 5.0 23
5.0 ≤ 𝑚 < 5.5 31
5.5 ≤ 𝑚 < 6.0 34
6.0 ≤ 𝑚 ≤ 6.5 19
The range is the biggest possible value subtract the
smallest possible value
6.5 - 4.0 = 2.5
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
𝟐. 𝟓
Measures of location and spread
A measure of spread is a value which indicated how spread out the data set is. Examples include the range and interquartile range.
The table to the right shows the masses (tonnes) of 120 African
elephants.
Find estimates for:
a) The range
b) The interquartile range
c) The 10th to 90th percentile range
2D
Mass, m (t) Frequency
4.0 ≤ 𝑚 < 4.5 13
4.5 ≤ 𝑚 < 5.0 23
5.0 ≤ 𝑚 < 5.5 31
5.5 ≤ 𝑚 < 6.0 34
6.0 ≤ 𝑚 ≤ 6.5 19
13
120
101
67
36
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
= 4.5 +17
23× 0.5
For 𝑄1
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
𝑛
4
=120
4= 30𝑡ℎ value
= 4.87
𝑸𝟏 = 𝟒. 𝟖𝟕
Sub in values
Calculate
Now use linear interpolation𝟐. 𝟓
Measures of location and spread
A measure of spread is a value which indicated how spread out the data set is. Examples include the range and interquartile range.
The table to the right shows the masses (tonnes) of 120 African
elephants.
Find estimates for:
a) The range
b) The interquartile range
c) The 10th to 90th percentile range
2D
Mass, m (t) Frequency
4.0 ≤ 𝑚 < 4.5 13
4.5 ≤ 𝑚 < 5.0 23
5.0 ≤ 𝑚 < 5.5 31
5.5 ≤ 𝑚 < 6.0 34
6.0 ≤ 𝑚 ≤ 6.5 19
𝟐. 𝟓
13
120
101
67
36
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
= 5.5 +23
34× 0.5
For 𝑄3
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
3𝑛
4
=360
4= 90𝑡ℎ value
= 5.84
𝑸𝟏 = 𝟒. 𝟖𝟕
Sub in values
Calculate
Now use linear interpolation
𝑸𝟑 = 𝟓. 𝟖𝟒
𝟓. 𝟖𝟒 − 𝟒. 𝟖𝟕 = 𝟎. 𝟗𝟕
𝟎. 𝟗𝟕
Measures of location and spread
A measure of spread is a value which indicated how spread out the data set is. Examples include the range and interquartile range.
The table to the right shows the masses (tonnes) of 120 African
elephants.
Find estimates for:
a) The range
b) The interquartile range
c) The 10th to 90th percentile range
2D
Mass, m (t) Frequency
4.0 ≤ 𝑚 < 4.5 13
4.5 ≤ 𝑚 < 5.0 23
5.0 ≤ 𝑚 < 5.5 31
5.5 ≤ 𝑚 < 6.0 34
6.0 ≤ 𝑚 ≤ 6.5 19
𝟐. 𝟓
13
120
101
67
36
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
𝟎. 𝟗𝟕
= 4.0 +12
13× 0.5
For 𝑃10
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
10𝑛
100
=1200
100= 12𝑡ℎ value
= 4.46
Sub in values
Calculate
Now use linear interpolation
𝑷𝟏𝟎 = 𝟒. 𝟒𝟔
Measures of location and spread
A measure of spread is a value which indicated how spread out the data set is. Examples include the range and interquartile range.
The table to the right shows the masses (tonnes) of 120 African
elephants.
Find estimates for:
a) The range
b) The interquartile range
c) The 10th to 90th percentile range
2D
Mass, m (t) Frequency
4.0 ≤ 𝑚 < 4.5 13
4.5 ≤ 𝑚 < 5.0 23
5.0 ≤ 𝑚 < 5.5 31
5.5 ≤ 𝑚 < 6.0 34
6.0 ≤ 𝑚 ≤ 6.5 19
𝟐. 𝟓
13
120
101
67
36
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
𝟎. 𝟗𝟕
= 6.0 +7
19× 0.5
For 𝑃90
𝐿𝐵 +𝑃𝐿
𝐺𝐹× 𝐶𝑊
90𝑛
100
=10800
100= 108𝑡ℎ value
= 6.18
Sub in values
Calculate
Now use linear interpolation
𝑷𝟏𝟎 = 𝟒. 𝟒𝟔 𝑷𝟗𝟎 = 𝟔. 𝟏𝟖
𝟔. 𝟏𝟖 − 𝟒. 𝟒𝟔 = 𝟏. 𝟕𝟐
Measures of location and spreadThe variance and standard deviation can also be used to analyse a set of
data
The variance and standard deviation are both measures of spread, and
involve the fact that each data point deviates from the mean by the
amount:
𝑥 − ҧ𝑥
Where 𝑥 is a data point and ҧ𝑥 is the mean of the data as a whole
2E
The variance and standard deviation can also be used to analyse a set of
data
The variance is defined as:
“The average of the squared distances from the mean”
So the distances of each data point from the mean are all squared, and
divided by how many there are.
This gives the formula to the right:
2E
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =σ 𝑥 − ҧ𝑥 2
𝑛
=σ𝑥2
𝑛−
σ𝑥
𝑛
2
=𝑆𝑥𝑥𝑛
This formula is equivalent
This formula is equivalent
“Mean of the squares subtract the square of the
mean”
The notation 𝑆𝑥𝑥 is short for σ 𝑥 − ҧ𝑥 2 or σ𝑥2 −σ 𝑥 2
𝑛
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =σ 𝑥 − ҧ𝑥 2
𝑛 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =𝑆𝑥𝑥𝑛
Measures of location and spread
The variance and standard deviation can also be used to analyse a set of
data
The standard deviation is the square root of the variance.
The symbol 𝜎 (lower case sigma) is used to represent standard deviation
Therefore 𝜎2 is usually used to represent the variance
The square root of any of these gives the Standard Deviation
2E
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =σ 𝑥 − ҧ𝑥 2
𝑛 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =𝑆𝑥𝑥𝑛
𝜎2 =σ 𝑥 − ҧ𝑥 2
𝑛 𝜎2 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎2 =𝑆𝑥𝑥𝑛 𝜎 =
σ 𝑥 − ҧ𝑥 2
𝑛𝜎 =
σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =𝑆𝑥𝑥𝑛
Measures of location and spread
This is what it looks like in the formula booklet!
𝜎2 =σ 𝑥 − ҧ𝑥 2
𝑛
The variance and standard deviation can also be used to analyse a set of
data
2E
𝜎2 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎2 =𝑆𝑥𝑥𝑛 𝜎 =
σ 𝑥 − ҧ𝑥 2
𝑛𝜎 =
σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =𝑆𝑥𝑥𝑛
Measures of location and spread
The Standard Deviation tells you the range from the mean which contains around 68% of the data (if data is normally distributed – you will learn about this at a later date)
For example, if 100 students have a mean height of 150cm and a standard deviation of 10cm.
150
140 160
130 170
68 of the students are within one Standard Deviation
95 of the students are within two Standard Deviations
σ𝑥 = 36
σ𝑥2 = 218
𝜎2 =σ 𝑥 − ҧ𝑥 2
𝑛
The variance and standard deviation can also be used to analyse a set of
data
The marks gained in a test by seven randomly selected students are:
3 4 6 2 8 8 5
Find the variance and standard deviation of the marks of the seven
students.
The middle of the 3 formulae above is most commonly used when you have
the ‘raw’ data
2E
𝜎2 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎2 =𝑆𝑥𝑥𝑛 𝜎 =
σ 𝑥 − ҧ𝑥 2
𝑛𝜎 =
σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =𝑆𝑥𝑥𝑛
Measures of location and spread
𝑥
𝑥2 9 16 36 4 64 64 25
𝜎2 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎2 =218
7−
36
7
2
𝜎2 = 4.69
Sub in values – we need σ𝑥 and σ𝑥2
Calculate
𝜎 = 2.17
Square root
𝜎2 =σ 𝑥 − ҧ𝑥 2
𝑛
The variance and standard deviation can also be used to analyse a set of
data
The marks gained in a test by seven randomly selected students are:
3 4 6 2 8 8 5
Find the variance and standard deviation of the marks of the seven
students.
So this means that 68% of the data is within 2.17 of the mean. Note
that 68% is not always a possible percentage though!
2E
𝜎2 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎2 =𝑆𝑥𝑥𝑛 𝜎 =
σ 𝑥 − ҧ𝑥 2
𝑛𝜎 =
σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =𝑆𝑥𝑥𝑛
Measures of location and spread
𝜎 = 2.17
𝑀𝑒𝑎𝑛 = 5.14
2 3 4 5 6 8 85.14 7.312.97
So out of the original 7 values, 4 are within 1 standard deviation of the mean
This is only 57% rather than 68%, because the data size is very small!
−2.17 +2.17
𝜎2 =σ 𝑥 − ҧ𝑥 2
𝑛
The variance and standard deviation can also be used to analyse a set of
data
Shamsa records the time spent out of school during the lunch hour to the
nearest minute, x, of the female students in her year. The results are
shown in the table.
Calculate the standard deviation of the time spent out of school.
For tabled data, we need to use a modified formula…
2E
𝜎2 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎2 =𝑆𝑥𝑥𝑛 𝜎 =
σ 𝑥 − ҧ𝑥 2
𝑛𝜎 =
σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =𝑆𝑥𝑥𝑛
Measures of location and spread
Time (mins) Frequency
35 3
36 17
37 29
38 34
𝜎 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
σ𝑓 = 83
𝑥 𝑓 𝑓𝑥2𝑓𝑥
105
612
1073
1292
σ𝑓𝑥 = 3082 σ𝑓𝑥2 = 114504
3675
22032
39701
49096
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
𝜎 =114504
83−
3082
83
2
𝜎 = 0.861
Sub in values – we need to use the
table to calculate these!
Calculate
σ𝑓 = 27 σ𝑓𝑥 = 285 σ𝑓𝑥2 = 6487.5
𝜎2 =σ 𝑥 − ҧ𝑥 2
𝑛
The variance and standard deviation can also be used to analyse a set of
data
Andy recorded the length, in minutes, of each telephone call he made for a month. The data is summarized in the
table below.
Calculate an estimate of the standard deviation of the length of the
phonecalls
2E
𝜎2 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎2 =𝑆𝑥𝑥𝑛 𝜎 =
σ 𝑥 − ҧ𝑥 2
𝑛𝜎 =
σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =𝑆𝑥𝑥𝑛
Measures of location and spread
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
Length of call (mins) Frequency
0 < 𝑙 ≤ 5 4
5 < 𝑙 ≤ 10 15
10 < 𝑙 ≤ 15 5
15 < 𝑙 ≤ 20 2
20 < 𝑙 ≤ 60 0
60 < 𝑙 ≤ 70 1
2.5
7.5
12.5
17.5
40
65
𝑥 𝑓 𝑓𝑥2𝑓𝑥
10
112.5
62.5
35
0
65
25
843.75
781.25612.5
0
4225
For a grouped table we need to use the midpoints (as in the previous section)
σ𝑓 = 27
σ𝑓𝑥 = 285
σ𝑓𝑥2 = 6487.5
𝜎2 =σ 𝑥 − ҧ𝑥 2
𝑛
The variance and standard deviation can also be used to analyse a set of
data
Andy recorded the length, in minutes, of each telephone call he made for a month. The data is summarized in the
table below.
Calculate an estimate of the standard deviation of the length of the
phonecalls
2E
𝜎2 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎2 =𝑆𝑥𝑥𝑛 𝜎 =
σ 𝑥 − ҧ𝑥 2
𝑛𝜎 =
σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =𝑆𝑥𝑥𝑛
Measures of location and spread
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
Length of call (mins) Frequency
0 < 𝑙 ≤ 5 4
5 < 𝑙 ≤ 10 15
10 < 𝑙 ≤ 15 5
15 < 𝑙 ≤ 20 2
20 < 𝑙 ≤ 60 0
60 < 𝑙 ≤ 70 1
2.5
7.5
12.5
17.5
40
65
𝑥 𝑓
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
𝜎 =6487.5
27−
285
27
2
𝜎 = 11.35
Sub in values – we need to use the
table to calculate these!
Calculate
Measures of location and spreadCoding can be used to make a set
of values simpler to work with
If numbers in a data set are particularly large, they can all be altered in the same way to make
them smaller
This will however affect the measures of location and
dispersion that we have been calculating, and you need to be
aware of these effects…
2F
Imagine we have the following data on people’s heights (cm)
145 170 168 166 151 147 150 172
Mean = 158.625 Range = 27
If all the values were multiplied by 2, what would happen to the measures above?
Mean and range would double
If all the values above had 20 added to them, what would happen to the measures above?
Mean would increase by 20, but the range would stay the same
If you change a set of data by adding or subtracting an amount, this will not affect the range, or any other measures of spread, such as the IQR or
standard deviation
𝜎 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
ҧ𝑥 =σ𝑥
𝑛𝜎 =
𝑆𝑥𝑥𝑛
Coding can be used to make a set of values simpler to work with
A scientist measures the temperature, 𝑥℃ at five different
points in a nuclear reactor. Her results are given below:
332, 355, 306, 317, 340
a) Use the coding 𝑦 =𝑥−300
10to code
this data
b) Calculate the mean and standard deviation of the coded data
c) Use your answer to b) the calculate the mean and standard
deviation of the original data.
2F
𝑥 𝑦
332
355
306
317
340
σ𝑦 = 15
𝑦2
Measures of location and spread
𝜎 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
ҧ𝑥 =σ𝑥
𝑛
10.24
30.25
0.36
2.89
16
σ𝑦2 = 59.74
To code the data. take each starting value, subtract 300 from it, and divide the answer by 10
Now we can calculate the mean and standard deviation of this new information…
ത𝑦 =σ𝑦
𝑛
ത𝑦 = 3
𝜎𝑦 =σ𝑦2
𝑛−
σ𝑦
𝑛
2
𝜎𝑦 =59.74
5−
15
5
2
𝜎𝑦 = 1.72
Sub in values
Sub in values
Work out
ത𝑦 =15
5 Work out
3.2
5.5
0.6
1.7
4
𝜎 =𝑆𝑥𝑥𝑛
Coding can be used to make a set of values simpler to work with
A scientist measures the temperature, 𝑥℃ at five different
points in a nuclear reactor. Her results are given below:
332, 355, 306, 317, 340
a) Use the coding 𝑦 =𝑥−300
10to code
this data
b) Calculate the mean and standard deviation of the coded data
c) Use your answer to b) the calculate the mean and standard
deviation of the original data.
2F
Measures of location and spread
𝜎 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
ҧ𝑥 =σ𝑥
𝑛
3 × 10 + 300
𝝈𝒚 = 𝟏. 𝟕𝟐
The original data had 300 subtracted, and then was divided by 10
We need to reverse this, so multiply by 10, and then add 300
Original mean
= 330℃
1.72 × 10
The original data had 300 subtracted, and then was divided by 10
The subtracting 300 will not have affected the standard deviation, so we only need to multiply by
10
Original standard deviation
= 17.2℃
ഥ𝒚 = 𝟑
𝜎 =𝑆𝑥𝑥𝑛
Coding can be used to make a set of values simpler to work with
From the large data set, date on the maximum gust, 𝑔 knots, is
recorded in Leuchars during May and June 2015.
The data was coded using ℎ =𝑔−5
10
and the following statistics found:
𝑆ℎℎ = 43.58തℎ = 2
𝑛 = 61
Calculate the mean and standard deviation of the maximum gust in
knots.
2F
Measures of location and spread
𝜎 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
ҧ𝑥 =σ𝑥
𝑛
The mean has had 5 subtracted and then been multiplied by 10
You can write this as a formula:
തℎ =ҧ𝑔 − 5
10
2 =ҧ𝑔 − 5
10
20 = ҧ𝑔 − 5
ҧ𝑔 = 25
We know the mean of h
𝜎 =𝑆𝑥𝑥𝑛
Multiply by 10
Add 5
This is a better way to show your workings!
25 = ҧ𝑔
Coding can be used to make a set of values simpler to work with
From the large data set, date on the maximum gust, 𝑔 knots, is
recorded in Leuchars during May and June 2015.
The data was coded using ℎ =𝑔−5
10
and the following statistics found:
𝑆ℎℎ = 43.58തℎ = 2
𝑛 = 61
Calculate the mean and standard deviation of the maximum gust in
knots.
2F
Measures of location and spread
𝜎 =σ𝑥2
𝑛−
σ𝑥
𝑛
2
𝜎 =σ𝑓𝑥2
σ𝑓−
σ𝑓𝑥
σ𝑓
2
ҧ𝑥 =σ𝑥
𝑛
We need to find the standard deviation of h first!
Use the formula above…
ҧ𝑔 = 25
𝜎 =𝑆𝑥𝑥𝑛
𝜎ℎ =𝑆ℎℎ𝑛
𝜎ℎ =43.58
61
𝜎ℎ = 0.845…
As the subtraction has not affected the standard deviation, we only need to undo the division by 10
Like with the previous example, we can write this as a formula
𝜎ℎ =𝜎𝑔
10
0.845 =𝜎𝑔
10
8.45 = 𝜎𝑔
Sub in values
Multiply by 10
Sub in values
Calculate
𝜎𝑔 = 8.45