31
Chapter 5 Describing Distributions Numerically

Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Embed Size (px)

Citation preview

Page 1: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Chapter 5

Describing Distributions Numerically

Page 2: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Describing the Distribution Center

Median (.5 quantile, 2nd quartile, 50th percentile)

Mean Spread

Range Interquartile Range Standard Deviation

Page 3: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Median

Literally = middle number (data value)

Has the same units as the data n (number of observations) is odd

Order the data from smallest to largest Median is the middle number on the list (n+1)/2 number from the smallest value

• Ex: If n=11, median is the (11+1)/2 = 6th number from the smallest value

• Ex: If n=37, median is the (37+1)/2 = 19th number from the smallest value

Page 4: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Example – Frank Thomas

Career Home Runs 4 7 15 18 24 28 29 32 35 38 40 40 41 42 43

Remember to order the values, if they aren’t already in order!

• 15 observations– (15+1)/2 = 8th

observation from bottom

• Median = 32 HRs

Page 5: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Median

n is even Order the data from smallest to largest

Median is the average of the two middle numbers

(n+1)/2 will be halfway between these two numbers•Ex: If n=10, (10+1)/2 = 5.5, median is average of 5th and 6th numbers from smallest value

Page 6: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Example – Ryne Sandberg

Career Home Runs0 5 7 8 9 12 14 16 19 19 25 26 26 26 30 40 Remember to order the values if they aren’t already in order!

• 16 observations– (16 + 1)/2 = 8.5,

average of 8th and 9th observations from bottom

• Median = average of 16 and 19

• Median = 17.5 HRs

Page 7: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Mean

Ordinary average Add up all observations Divide by the number of observations

Has the same units as the data Formula

n observations y1, y2, y3, …, yn are the values

Page 8: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Mean

y y1 y2 y3 L yn

n

yn

1

ny

Page 9: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Examples

Thomas

Sandberg

(4 7 15 18 ... 43)

1526.4HRs

(0 5 7 8 ... 40)

1617.625 HRs

Page 10: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Mean vs. Median

Median = middle number Mean = value where histogram balances

Mean and Median similar when Data are symmetric

Mean and median different when Data are skewed There are outliers

Page 11: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Mean vs. Median

Mean influenced by unusually high or unusually low values Example: Income in a small town of 6 people

$25,000 $27,000 $29,000 $35,000 $37,000 $38,000

**The mean income is $31,830**The median income is $32,000

Page 12: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Mean vs. Median

Bill Gates moves to town$25,000 $27,000 $29,000 $35,000 $37,000 $38,000 $40,000,000

**The mean income is $5,741,571**The median income is $35,000

Mean is pulled by the outlier Median is not Mean is not a good center of these data

Page 13: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Mean vs. Median

Skewness pulls the mean in the direction of the tail Skewed to the right = mean > median Skewed to the left = mean < median

Outliers pull the mean in their direction Large outlier = mean > median Small outlier = mean < median

Page 14: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Spread

Range = maximum – minimum Thomas

Min = 4, Max = 43, Range = 43 - 4 = 39 HRs

Sandberg Min = 0, Max = 40, Range = 40 - 0 = 40 HRs

Page 15: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Spread

Range is a very basic measure of spread It is highly affected by outliers Makes spread appear larger than reality

Ex. The annual numbers of deaths from tornadoes in the U.S. from 1990 to 2000:

53 39 39 33 69 30 25 67 130 94 40• Range with outlier: 130 – 25 = 105 tornadoes• Range without outlier: 94 – 25 = 69 tornadoes

Page 16: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Spread

Interquartile Range (IQR) First Quartile (Q1)

•Larger than about 25% of the data Third Quartile (Q3)

•Larger than about 75% of the data

IQR = Q3 – Q1 Center (Middle) 50% of the values

Page 17: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Finding Quartiles

Order the data Split into two halves at the median When n is odd, include the median in both halves

When n is even, do not include the median in either half

Q1 = median of the lower half Q3 = median of the upper half

Page 18: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Example – Frank Thomas

Order the values (15 values)

4 7 15 18 24 28 29 32 35 38 40 40 41 42 43Lower Half = 4 7 15 18 24 28 29 32

Q1 = Median of lower half = 21 HRs Upper Half = 32 35 38 40 40 41 42 43 Q3 = Median of upper half = 40 HRs

IQR = 40 – 21 = 19 HRs

Page 19: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Example – Ryne Sandberg Order the values (16 values) 0 5 7 8 9 12 14 16 19 19 25 26 26 26 30 40

Lower Half = 0 5 7 8 9 12 14 16 Q1 = Median of lower half = 8.5 HRs

Upper Half =19 19 25 26 26 26 30 40 Q3 = Median of upper half = 26 HRs

IQR = Q3 – Q1 = 26 – 8.5 = 17.5 HRs

Page 20: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Five Number Summary

Minimum Q1 Median Q3 Maximum

Page 21: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Examples Thomas

Min = 4 HRs Q1 = 21 HRs Median = 32 HRs Q3 = 40 HRs Max = 43 HRs

Sandberg Min = 0 HRs Q1 = 8.5 HRs Median = 17.5 HRs Q3 = 26 HRs Max = 40 HRs

Page 22: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Graph of Five Number Summary Boxplot

Box between Q1 and Q3 Line in the box marks the median Lines extend out to minimum and maximum

Best used for comparisons Use this simpler method

Page 23: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Example – Thomas & Sandberg Boxplot of Thomas Home Runs

Box from 21 to 40 Line in box 32 Lines extend out from box from 4 and 43

Boxplot of Sandberg Home Runs Box from 8.5 to 26 Line in box at 17.5 Lines extend out from box to 0 and 40

Page 24: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Side by Side Boxplots of Thomas & Sandberg Home Runs

Page 25: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Spread

Standard deviation “Average” spread from mean Most common measure of spread

•(Although it is influenced by skewness and outliers)

Denoted by letter s Make a table when calculating by hand

Page 26: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Standard Deviation

s (y1 y )2 (y2 y )2 K (yn y )2

n 1

y y 2n 1

1

n 1y y 2

Page 27: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Example – Deaths from Tornadoes

53 53-56.27 =-3.27 10.69

39 39-56.27 = -17.27 298.25

39 39-56.27 = -17.27 298.25

33 33-56.27 = -23.27 541.49

69 69-56.27 = 12.73 162.05

30 30-56.27 = -26.27 690.11

25 25-56.27 = -31.27 977.81

67 67-56.27 = 10.73 115.13

130 130-56.27 = 73.73 5436.11

94 94-56.27 = 37.73 1423.55

40 40-56.27 = -16.27 264.71

y )( yy 2)( yy

s 10.69 298.25 L 264.71

11 131.97 tornadoes

Page 28: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Example – Frank Thomas Find the standard deviation of the number of home runs given the following statistic:

74.2329)( 2 yy

s (y y )2n 1

2329.74

15 112.9HRs

Page 29: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Properties of s

s = 0 only when all observations are equal; otherwise, s > 0

s has the same units as the data s is not resistant

Skewness and outliers affect s, just like mean

Tornado Example: • s with outlier: 31.97 tornadoes• s without outlier: 21.70 tornadoes

Page 30: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Which summaries should you use? What numbers are affected by outliers? Mean Standard deviation Range

What numbers are not affected by outliers? Median IQR

Page 31: Chapter 5 Describing Distributions Numerically. Describing the Distribution Center Median (.5 quantile, 2nd quartile, 50th percentile) Mean Spread Range

Which summaries should you use? Five Number Summary

Skewed Data Data with outliers

Mean and Standard Deviation Symmetric Data

ALWAYS PLOT YOUR DATA!!