14
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data. The common measures of variation in data are range, deviation, variance and standard deviation. 2.4 Measures of Variation

2.4 Measures of Variation

Embed Size (px)

DESCRIPTION

2.4 Measures of Variation. What is variability in data?. Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data. The common measures of variation in data are – range , deviation , variance and standard deviation. - PowerPoint PPT Presentation

Citation preview

Page 1: 2.4 Measures of Variation

What is variability in data?

Measuring how much the group as a whole deviates from the center.

Gives you an indication of what is the spread of the data.

The common measures of variation in data are – range, deviation, variance and standard deviation.

2.4 Measures of Variation

Page 2: 2.4 Measures of Variation

Range

The range is the simplest measure of variation. It is difference between the biggest and smallest random variable.

Range = Maximum value - Minimum value

Range has the advantage of being easy to compute.Its disadvantage, however, is that it uses only two entries from the entire data set.Age based on class survey data: 26, 25, 35, 35, 40, 41, 21, 19, 20, 20, 30, 25, 24, 47, 36, 16, 23, 48, 40, 21, 27, 22, 39, 34, 26, 25, 16, 24, 33, 32, 28, 48, 40, 38. Range = maximum – minimum = 48 – 16 = 32

Page 3: 2.4 Measures of Variation

Deviation, Variance and Standard Deviation

The deviation of an entry xi in a data set is the difference between that entry and the mean μ of the data set i.e. xi – μ

The population variance of the population data set of N entries is:

The population standard deviation is the square root of the population variance i.e.

The sample variance of the sample data set of N entries is:

The sample standard deviation is the square root of the sample variance i.e.

Page 4: 2.4 Measures of Variation

Deviation, Variance and Standard Deviation

Age based on class survey: 26, 25, 35, 35, 40, 41, 21, 19, 20, 20, 30, 25, 24, 47, 36, 16, 23, 48, 40, 21, 27, 22, 39, 34, 26, 25, 16, 24, 33, 32, 28, 48, 40, 38. Population size N = 34, Population mean μ = 1024/34 = 30.11765

σ2 = 82.2803σ = 9.0708

Age (xi) xi - μ (xi – μ)2

26 -4.1176 16.9550

25 -5.1176 26.1903

: : :

: : :

38 7.8823 62.1314

Σ=2797.5294

Page 5: 2.4 Measures of Variation

Deviation, Variance and Standard Deviation

Variance and standard deviation take into consideration all the data. However they are both easily influenced by extreme scores since it is a square term.

Variance is hard to interpret since it is a squared measure, standard deviation is interpreted as the average deviation from the mean.

Page 6: 2.4 Measures of Variation

Interpreting Standard Deviation

When interpreting the standard deviation, remember that it is a measure of the typical amount an entry deviates from the mean. The more the entries are spread out, the greater the standard deviation.

Page 7: 2.4 Measures of Variation

Interpreting Standard Deviation

Empirical Rule or The 68-95-99.7 rule: For a bell shaped symmetric distribution 68% of the data lies within one standard deviation of the mean, 95% of the data lies within two standard deviations of the mean and 99.7% of the data lies within 3 standard deviations of the mean.

Page 8: 2.4 Measures of Variation

Interpreting Standard Deviation

Chebychev’s theoremWhen the distribution is not bell shaped or symmetric then this theorem gives a lower bound to the proportion of data the lies with k standard deviations of the mean. It states that:The proportion of any data set lying within k standard deviations

of the mean is at least

• k=2, In any data set, at least i.e. 75% of the data lies within 2 standard deviations of the mean.

Page 9: 2.4 Measures of Variation

Standard Deviation of Grouped Data

Sample standard deviation for a frequency distribution is:

Where c is the number of classes, xi is the ith data point in the sample, fi is the corresponding frequency, n is the sample size.

Page 10: 2.4 Measures of Variation

What are measures of position?

A measure of position gives you some idea of where particular data values would rank in an ordering of a data set

where a data value falls with respect to the mean of the sample or population..

2.5 Measures of Position

Page 11: 2.4 Measures of Variation

Quartiles

Quartiles divide the data into 4 equal parts. We need three quartiles to divide any data set into 4 equal parts, Q1, Q2 and Q3. About a quarter of the data falls below the first

quartile, Q1

About a half of the data falls below the second quartile, Q2

About three quarters of the data falls below the third quartile, Q3

Interquartile range (IQR) of a data set is the difference between the third and first quartiles, Q3 – Q1

Page 12: 2.4 Measures of Variation

Quartiles

In essence five values can use used to describe a data set: Minimum data value, three quartiles - Q1, Q2, Q3 and maximum data value. These five numbers are called the five number summary since they describe the central tendency, the spread and the variation in the data.

Drawing a Box-whisker plot Find the five-number summary of the data set. Construct a horizontal; scale that spans the range of the data. Plot the five number above the horizontal scale. Draw a box above the horizontal scale from Q1 to Q3 and draw

a vertical line in the box at Q2.

Draw whiskers from the box to minimum and maximum entries

For the age data: Min = 16, Q1=23.25, Q2 = 27.5, Q3 = 37.5, Max = 48

Min entry Q1 Q2, Median Q3 Max entry

Whisker Box Whisker

Page 13: 2.4 Measures of Variation

Percentiles and Other Fractiles

Fractiles Summary Symbols

Quartiles Divide a data set into 4 equal parts

Q1, Q2, Q3

Deciles Divide a data set into 10 equal

parts

D1, D2, D3,.. Q9

Percentiles Divide a data set into 100 equal

parts

P1, P2, P3,.. P99

Fractiles are numbers that divide an ordered data set into equal parts.Some commonly used fractiles are:

Page 14: 2.4 Measures of Variation

z-score

The standard score or z-score, represents the number of standard deviations a given value x falls from the mean μ. To find the z-score for a given value,

A z-score can be positive, negative or zero. If z is positive, the data point > the mean,If z is negative, the data point < the mean,If z = 0, the data point = mean.

x

stdev

meanvaluez