40
© 2002 Thomson / South-Western Slide 3-1 Chapter 3 Descriptive Statistics

© 2002 Thomson / South-Western Slide 3-1 Chapter 3 Descriptive Statistics

Embed Size (px)

Citation preview

© 2002 Thomson / South-Western Slide 3-1

Chapter 3

Descriptive Statistics

© 2002 Thomson / South-Western Slide 3-2

Learning ObjectivesLearning Objectives

• Distinguish between measures of central tendency, measures of variability, and measures of shape

• Understand the meanings of mean, median, mode, quartile, percentile, and range

• Compute mean, median, mode, percentile, quartile, range, variance, standard deviation, and mean absolute deviation

© 2002 Thomson / South-Western Slide 3-3

Learning Objectives -- ContinuedLearning Objectives -- Continued

• Differentiate between sample and population variance and standard deviation

• Understand the meaning of standard deviation as it is applied by using the empirical rule

• Understand box and whisker plots, skewness, and kurtosis

© 2002 Thomson / South-Western Slide 3-4

Measures of Central TendencyMeasures of Central Tendency

• Measures of central tendency yield information about “particular places or locations in a group of numbers.”

• Common Measures of Location–Mode–Median–Mean–Percentiles–Quartiles

© 2002 Thomson / South-Western Slide 3-5

ModeMode

• The most frequently occurring value in a data set

• Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)

• Bimodal -- Data sets that have two modes

• Multimodal -- Data sets that contain more than two modes

© 2002 Thomson / South-Western Slide 3-6

• The mode is 44.• There are more 44s

than any other value.

35

37

37

39

40

40

41

41

43

43

43

43

44

44

44

44

44

45

45

46

46

46

46

48

Mode -- ExampleMode -- Example

© 2002 Thomson / South-Western Slide 3-7

Median (ΔΙΑΜΕΣΟΣ)Median (ΔΙΑΜΕΣΟΣ)

• Middle value in an ordered array of numbers.

• Applicable for ordinal, interval, and ratio data

• Not applicable for nominal data• Unaffected by extremely large and

extremely small values.

© 2002 Thomson / South-Western Slide 3-8

Median: Computational ProcedureMedian: Computational Procedure

• First Procedure– Arrange observations in an ordered array.– If number of terms is odd, the median is

the middle term of the ordered array.– If number of terms is even, the median is

the average of the middle two terms.

• Second Procedure– The median’s position in an ordered array

is given by (n+1)/2.

© 2002 Thomson / South-Western Slide 3-9

Median: Example with an Odd Number of TermsMedian: Example with

an Odd Number of TermsOrdered Array includes: 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22

• There are 17 terms in the ordered array.• Position of median = (n+1)/2 = (17+1)/2 = 9• The median is the 9th term, 15.• If the 22 is replaced by 100, the median

remains at 15.• If the 3 is replaced by -103, the median

remains at 15.

© 2002 Thomson / South-Western Slide 3-10

Mean (ΜΕΣΟΣ)Mean (ΜΕΣΟΣ)

• Is the average of a group of numbers• Applicable for interval and ratio data,

not applicable for nominal or ordinal data

• Affected by each value in the data set, including extreme values

• Computed by summing all values in the data set and dividing the sum by the number of values in the data set

© 2002 Thomson / South-Western Slide 3-11

Population MeanPopulation Mean

X

N NX X X X N1 2 3

24 13 19 26 11

593

518 6

...

.

© 2002 Thomson / South-Western Slide 3-12

Sample MeanSample Mean

XX

n nX X X X n

1 2 3

57 86 42 38 90 66

6379

663 167

...

.

© 2002 Thomson / South-Western Slide 3-13

QuartilesQuartiles

Measures of central tendency that divide a group of data into four subgroups

• Q1: 25% of the data set is below the first quartile

• Q2: 50% of the data set is below the second quartile

• Q3: 75% of the data set is below the third quartile

© 2002 Thomson / South-Western Slide 3-14

Quartiles, continued

• Q1 is equal to the 25th percentile

• Q2 is located at 50th percentile and equals the median

• Q3 is equal to the 75th percentile

Quartile values are not necessarily members of the data set

© 2002 Thomson / South-Western Slide 3-15

QuartilesQuartiles

25% 25% 25% 25%

Q 3Q 2Q 1

© 2002 Thomson / South-Western Slide 3-16

• Ordered array: 106, 109, 114, 116, 121, 122, 125, 129

• Q1:• Q2:• Q3:

Quartiles: ExampleQuartiles: Example

i Q

25

1008 2

109 114

2111 51( ) .

i Q

50

1008 4

116 121

2118 52( ) .

i Q

75

1008 6

122 125

2123 53( ) .

© 2002 Thomson / South-Western Slide 3-17

Measures of Variability Measures of Variability • Measures of variability describe the spread

or the dispersion of a set of data.• Common Measures of Variability

–Range–Interquartile Range–Mean Absolute Deviation–Variance–Standard Deviation– Z scores–Coefficient of Variation

© 2002 Thomson / South-Western Slide 3-18

VariabilityVariability

Mean

Mean

MeanNo Variability in Cash Flow

Variability in Cash Flow Mean

© 2002 Thomson / South-Western Slide 3-19

VariabilityVariability

No Variability

Variability

© 2002 Thomson / South-Western Slide 3-20

RangeRange• The difference between the largest and

the smallest values in a set of data• Simple to compute• Ignores all data points

except the two extremes

• Example: Range = Largest - Smallest

= 48 - 35 = 13

35

37

37

39

40

40

41

41

43

43

43

43

44

44

44

44

44

45

45

46

46

46

46

48

© 2002 Thomson / South-Western Slide 3-21

Interquartile RangeInterquartile Range

• Range of values between the first and third quartiles

• Range of the “middle half”• Less influenced by extremes

Interquartile Range Q Q 3 1

© 2002 Thomson / South-Western Slide 3-22

Deviation from the MeanDeviation from the Mean

• Data set: 5, 9, 16, 17, 18• Mean:

• Deviations from the mean: -8, -4, 3, 4, 5

X

N

65

513

0 5 10 15 20

-8 -4 +3 +4+5

© 2002 Thomson / South-Western Slide 3-23

Mean Absolute DeviationMean Absolute Deviation

• Average of the absolute deviations from the mean

59

161718

-8-4+3+4+5

0

+8+4+3+4+524

X X X M A D

X

N. . .

.

24

54 8

© 2002 Thomson / South-Western Slide 3-24

Population VariancePopulation Variance

• Average of the squared deviations from the arithmetic mean

59

161718

-8-4+3+4+5

0

6416

916

25130

X X 2

X 2

2

130

526 0

XN

.

© 2002 Thomson / South-Western Slide 3-25

Population Standard DeviationPopulation Standard Deviation• Square root of the

variance

2

2

2

130

526 0

26 0

51

XN

.

.

.

59

161718

-8-4+3+4+5

0

6416

916

25130

X X 2

X

© 2002 Thomson / South-Western Slide 3-26

Empirical RuleEmpirical Rule

• Data are normally distributed (or approximately normal)

1 2 3

95

99.7

68

Distance from the Mean

Percentage of ValuesFalling Within Distance

© 2002 Thomson / South-Western Slide 3-27

Sample VarianceSample Variance

• Average of the squared deviations from the arithmetic mean

2,3981,8441,5391,3117,092

62571

-234-462

0

390,6255,041

54,756213,444663,866

X X X 2

X X 2

2

1663 866

3221 288 67

SX Xn

,

, .

© 2002 Thomson / South-Western Slide 3-28

Sample Standard DeviationSample Standard Deviation

• Square root of the sample variance 2

2

2

1663 866

3221 288 67

221 288 67

470 41

SX X

S

n

S

,

, .

, .

.

2,3981,8441,5391,3117,092

62571

-234-462

0

390,6255,041

54,756213,444663,866

X X X 2

X X

© 2002 Thomson / South-Western Slide 3-29

Coefficient of VariationCoefficient of Variation

• Ratio of the standard deviation to the mean, expressed as a percentage

• Measurement of relative dispersion

C V. .

100

© 2002 Thomson / South-Western Slide 3-30

Coefficient of VariationCoefficient of Variation

1 29

4 6

100

4 6

29100

1586

1

11

1

.

.

.

. .CV

2 84

10

100

10

84100

1190

2

22

2

CV. .

.

© 2002 Thomson / South-Western Slide 3-31

Measures of ShapeMeasures of Shape

• Skewness– Absence of symmetry– Extreme values in one side of a

distribution• Kurtosis

– Peakedness of a distribution• Box and Whisker Plots

– Graphic display of a distribution– Reveals skewness

© 2002 Thomson / South-Western Slide 3-32

SkewnessSkewness

NegativelySkewed

PositivelySkewed

Symmetric(Not Skewed)

© 2002 Thomson / South-Western Slide 3-33

SkewnessSkewness

NegativelySkewed

Mode

Median

Mean

Symmetric(Not Skewed)

MeanMedianMode

PositivelySkewed

Mode

Median

Mean

© 2002 Thomson / South-Western Slide 3-34

Coefficient of SkewnessCoefficient of Skewness• Summary measure for skewness

• If S < 0, the distribution is negatively skewed (skewed to the left).

• If S = 0, the distribution is symmetric (not skewed).

• If S > 0, the distribution is positively skewed (skewed to the right).

S

M d

3

© 2002 Thomson / South-Western Slide 3-35

Coefficient of SkewnessCoefficient of Skewness

1

1

1

1

1 1

1

23

26

12 3

3

3 23 26

12 3073

M

SM

d

d

.

..

2

2

2

2

2 2

2

26

26

12 3

3

3 26 26

12 30

M

SM

d

d

.

.

3

3

3

3

3 3

3

29

26

12 3

3

3 29 26

12 3073

M

SM

d

d

.

..

© 2002 Thomson / South-Western Slide 3-36

KurtosisKurtosis• Peakedness of a distribution

– Leptokurtic: high and thin– Mesokurtic: normal in shape– Platykurtic: flat and spread out

Leptokurtic

MesokurticPlatykurtic

© 2002 Thomson / South-Western Slide 3-37

Box and Whisker PlotBox and Whisker Plot

• Five specific values are used:– Median, Q2

– First quartile, Q1

– Third quartile, Q3

– Minimum value in the data set– Maximum value in the data set

© 2002 Thomson / South-Western Slide 3-38

Box and Whisker Plot, continued

• Inner Fences– IQR = Q3 - Q1

– Lower inner fence = Q1 - 1.5 IQR– Upper inner fence = Q3 + 1.5 IQR

• Outer Fences– Lower outer fence = Q1 - 3.0 IQR– Upper outer fence = Q3 + 3.0 IQR

© 2002 Thomson / South-Western Slide 3-39

Box and Whisker PlotBox and Whisker Plot

Q1 Q3Q2Minimum Maximum

© 2002 Thomson / South-Western Slide 3-40

Skewness: Box and Whisker Plots, and Coefficient of Skewness

Skewness: Box and Whisker Plots, and Coefficient of Skewness

NegativelySkewed

PositivelySkewed

Symmetric(Not Skewed)

S < 0 S = 0 S > 0