1 Descriptive Statistics Frequency Tables Visual Displays Measures of Center

1

Descriptive Statistics

Frequency Tables

Visual Displays

Measures of Center

2


summarize or describe the important characteristics of a known set of population data

Inferential Statistics

use sample data to make inferences (or generalizations) about a population

Overview

3

1. Center: A representative or average value that indicates where the middle of the data set is located

2. Variation: A measure of the amount that the values vary among themselves

3. Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed)

4. Outliers: Sample values that lie very far away from the vast majority of other sample values

5. Time: Changing characteristics of the data over time

Important Characteristics of Data

4

Frequency Table

lists classes (or categories) of values,

along with frequencies (or counts) of

the number of values that fall into each

class

Summarizing Data With Frequency Tables

5

Qwerty Keyboard Word RatingsTable 2-1

2 2 5 1 2 6 3 3 4 2

4 0 5 7 7 5 6 6 8 10

7 2 2 10 5 8 2 5 4 2

6 2 6 1 7 2 7 2 3 8

1 5 2 5 2 14 2 2 6 3

1 7

6

Frequency Table of Qwerty Word RatingsTable 2-3

0 - 2 20

3 - 5 14

6 - 8 15

9 - 11 2

12 - 14 1

Rating Frequency

7

Frequency Table

Definitions

8

Lower Class Limits

Lower ClassLimits

0 - 2 20

3 - 5 14

6 - 8 15

9 - 11 2

12 - 14 1

Rating Frequency

are the smallest numbers that can actually belong to different classes

9

Upper Class Limits

Upper ClassLimits

0 - 2 20

3 - 5 14

6 - 8 15

9 - 11 2

12 - 14 1

Rating Frequency

are the largest numbers that can actually belong to different classes

10

Class Boundaries

ClassBoundaries

0 - 2 20

3 - 5 14

6 - 8 15

9 - 11 2

12 - 14 1

Rating Frequency- 0.5

2.5

5.5

8.5

11.5

14.5

number separating classes

11

midpoints of the classes

Class Midpoints

ClassMidpoints

0 - 1 2 20

3 - 4 5 14

6 - 7 8 15

9 - 10 11 2

12 - 13 14 1

Rating Frequency

12

Class Width

Class Width

3 0 - 2 20

3 3 - 5 14

3 6 - 8 15

3 9 - 11 2

3 12 - 14 1

Rating Frequency

is the difference between two consecutive lower class limits or two consecutive class boundaries

13

1. Be sure that the classes are mutually exclusive.

2. Include all classes, even if the frequency is zero.

3. Try to use the same width for all classes.

4. Select convenient numbers for class limits.

5. Use between 5 and 20 classes.

6. The sum of the class frequencies must equal the number of original data values.

Guidelines For Frequency Tables

14

3. Select for the first lower limit either the lowest score or a convenient value slightly less than the lowest score.

4. Add the class width to the starting point to get the second lower class limit, add the width to the second lower limit to get the

third, and so on.

5. List the lower class limits in a vertical column and enter the upper class limits.

6. Represent each score by a tally mark in the appropriate class.

Total tally marks to find the total frequency for each class.

Constructing A Frequency Table1. Decide on the number of classes .

2. Determine the class width by dividing the range by the number of classes (range = highest score - lowest score) and round

up.class width round up of

range

number of classes

15

Inputting a Frequency Table into the TI83 Calculator

• Push Stat

• Select Edit

• Input Class Marks in List 1

• Input Frequencies in List 2

16

Relative Frequency Table

relative frequency =class frequency

sum of all frequencies

17

Relative Frequency Table

0 - 2 20

3 - 5 14

6 - 8 15

9 - 11 2

12 - 14 1

Rating Frequency

0 - 2 38.5%

3 - 5 26.9%

6 - 8 28.8%

9 - 11 3.8%

12 - 14 1.9%

RatingRelative

Frequency

20/52 = 38.5%

14/52 = 26.9%

etc.

Total frequency = 52

18

Cumulative Frequency Table

CumulativeFrequencies

0 - 2 20

3 - 5 14

6 - 8 15

9 - 11 2

12 - 14 1

Rating Frequency

Less than 3 20

Less than 6 34

Less than 9 49

Less than 12 51

Less than 15 52

RatingCumulativeFrequency

19

Frequency Tables

0 - 2 20

3 - 5 14

6 - 8 15

9 - 11 2

12 - 14 1

Rating Frequency

0 - 2 38.5%

3 - 5 26.9%

6 - 8 28.8%

9 - 11 3.8%

12 - 14 1.9%

RatingRelative

Frequency

Less than 3 20

Less than 6 34

Less than 9 49

Less than 12 51

Less than 15 52

RatingCumulative Frequency

20

depict the nature or shape of the data distribution

Histograma bar graph in which the horizontal scale represents classes and the vertical scale

represents frequencies

Pictures of Data

21

Histogram of Qwerty Word Ratings

0 - 2 20

3 - 5 14

6 - 8 15

9 - 11 2

12 - 14 1

Rating Frequency

22

Relative Frequency Histogram of Qwerty Word Ratings

0 - 2 38.5%

3 - 5 26.9%

6 - 8 28.8%

9 - 11 3.8%

12 - 14 1.9%

RatingRelative

Frequency

23

Histogram and

Relative Frequency Histogram

24

Viewing a HistogramUsing the TI83 Calculator

• First input class marks and frequencies in L1 & L2

• 2nd y= (Statplot)

• Select 1

• Select “on”

• Select histogram type (top row, 3rd graph)

• X list = 2nd 1 (L1)

• Freq = 2nd 2 (L2)

• Zoom

• 9 (ZoomStat)

25

Frequency Polygon

26

Viewing a Frequency PolygonUsing the TI83 Calculator



• Select 1

• Select “on”

• Select polygon type (top row, 2nd graph)

• X list = 2nd 1 (L1)

• Freq = 2nd 2 (L2)

• Zoom

• 9 (ZoomStat)

27

Ogive

28

Dot Plot

29

Stem-and Leaf Plot

Raw Data (Test Grades)

67 72 85 75 89

89 88 90 99 100

Stem Leaves

6 7 8 910

72 55 8 9 90 9 0

30

Pareto Chart

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

0

Mo

tor

Ve

hic

le

Fa

lls

Po

iso

n

Dro

wn

ing

Fir

e

Ing

es

tio

n o

f fo

od

or

ob

jec

t

Fir

ea

rms

Fre

quen

cy

Accidental Deaths by Type

31

Pie Chart

Firearms(1400. 1.9%)

Ingestion of food or object(2900. 3.9%

Fire(4200. 5.6%)

Drowning(4600. 6.1%)

Poison(6400. 8.5%)

Falls(12,200. 16.2%)

Motor vehicle(43,500. 57.8%)

Accidental Deaths by Type

32

Scatter Diagram

••

••

••

••

• •••

••

0

0.0 0.5

••

•••

•

1.0 1.5

10

20

•

NICOTINE

TA

R

33

Boxplots (textbook section 2-7)

Pictographs

Pattern of data over time

Other Graphs

34

a value at the

center or middle of a data set

Measures of Center

35

Mean

(Arithmetic Mean)

AVERAGE

the number obtained by adding the values and dividing the total by the number of values

Definitions

36

Notation

denotes the addition of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

37

Notation

µ is pronounced ‘mu’ and denotes the mean of all values in a population

is pronounced ‘x-bar’ and denotes the mean of a set of sample values

Calculators can calculate the mean of data

x =n

xx

Nµ =

x

38

Finding a MeanUsing the TI83 Calculator

• Push Stat

• Select Edit

• Enter data into L1

• 2nd mode (quit)

• Push Stat

• >Calc

• Select 1 (1-VarStats)

• 2nd 1 (L1)

• The first output item is the mean

39

Definitions Median

the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude

often denoted by x (pronounced ‘x-tilde’)

is not affected by an extreme value

~

40

6.72 3.46 3.60 6.44 26.70

3.46 3.60 6.44 6.72 26.70

(in order - odd number of values)

exact middle MEDIAN is 6.44

41

6.72 3.46 3.60 6.44

3.46 3.60 6.44 6.72 no exact middle -- shared by two numbers

3.60 + 6.44

2

(even number of values)

MEDIAN is 5.02

42

Finding a MedianUsing the TI83 Calculator

• Push Stat

• Select Edit

• Enter data into L1

• 2nd mode (quit)

• Push Stat

• >Calc


• 2nd 1 (L1)

• Arrow down to “Med” to see the value for median

43

Definitions Mode

the score that occurs most frequently

BimodalMultimodalNo Mode

denoted by M

the only measure of central tendency that can be used with nominal data

44

a. 5 5 5 3 1 5 1 4 3 5

b. 1 2 2 2 3 4 5 6 6 6 7 9

c. 1 2 3 6 7 8 9 10

Examples

Mode is 5

Bimodal 2 and 6

No Mode

45

Midrange

the value midway between the highest and lowest values in the original data set

Definitions

Midrange =highest score + lowest score

2

46

Finding a Midrange Using the TI83 Calculator

• Push Stat• Select Edit• Enter data into L1• 2nd mode (quit)• Push Stat• >Calc• Select 1 (1-VarStats)• 2nd 1 (L1)• Arrow down to see the “min” and “max” values

for the midrange calculation

47

Carry one more decimal place than is present in the original set of values

Round-off Rule for Measures of Center

48

use class midpoint of classes for variable x

Mean from a Frequency Table

x = class midpoint

f = frequency

f = n

x =f

(f • x)

49

Finding a Grouped MeanUsing the TI83 Calculator

• Push Stat

• Select Edit

• Enter class marks into L1 and frequencies into L2

• 2nd mode (quit)

• Push Stat

• >Calc


• 2nd 1, 2nd 2 (L1,L2)

• The first output item is the mean

50


Measures of Variability

Measures of Position

51

SymmetricData is symmetric if the left half of its histogram is roughly a mirror of its right half.

SkewedData is skewed if it is not symmetric and if it extends more to one side

than the other.

Definitions

52

Skewness

Mode = Mean = Median

SKEWED LEFT(negatively)

SYMMETRIC

Mean Mode Median

SKEWED RIGHT(positively)

Mean Mode Median

53

Jefferson Valley Bank

Bank of Providence

6.5

4.2

6.6

5.4

6.7

5.8

6.8

6.2

7.1

6.7

7.3

7.7

7.4

7.7

7.7

8.5

7.7

9.3

7.7

10.0

Jefferson Valley Bank

7.15

7.20

7.7

7.10

Bank of Providence

7.15

7.20

7.7

7.10

Mean

Median

Mode

Midrange

Waiting Times of Bank Customers at Different Banks

in minutes

54

Dotplots of Waiting Times

55

Measures of Variation

Range

valuehighest lowest

value

56

a measure of variation of the scores about the mean

(average deviation from the mean)


Standard Deviation

57

Sample Standard Deviation Formula

calculators can compute the sample standard deviation of data

(x - x)2

n - 1S =

58

Sample Standard Deviation Shortcut Formula

n (n - 1)

s =n (x2) - (x)2

calculators can compute the sample standard deviation of data

59

Population Standard Deviation

calculators can compute the population standard deviation

of data

2 (x - µ)

N =

60

Symbolsfor Standard DeviationSample Population

x

xn

s

Sx

xn-1

Textbook

Some graphicscalculators

Somenon-graphicscalculators

Textbook

Some graphicscalculators

Somenon-graphics

calculators

Articles in professional journals and reports often use SD for standard deviation and VAR for variance.

61

Finding a Standard DeviationUsing the TI83 Calculator

• Push Stat• Select Edit• Enter data into L1• 2nd mode (quit)• Push Stat• >Calc• Select 1 (1-VarStats)• 2nd 1 (L1)• The 4th output item is the sample s.d. and the 5th

output item is the population s.d.

62


Variance

standard deviation squared

s

2

2

}

use square keyon calculatorNotation

63

SampleVariance

PopulationVariance

Variance

(x - x )2

n - 1s2 =

(x - µ)2

N 2 =

64

Round-off Rulefor measures of variation

Carry one more decimal place than is present in the original set of

values.

Round only the final answer, never in the middle of a calculation.

65

Standard Deviation from a Frequency Table

Use the class midpoints as the x values

Calculators can compute the standard deviation for frequency table

n (n - 1)S

=

n [(f • x 2)] -[(f • x)]2

66

Finding a Grouped Standard DeviationUsing the TI83 Calculator

• Push Stat• Select Edit• Enter class marks into L1 and frequencies into L2• 2nd mode (quit)• Push Stat• >Calc• Select 1 (1-VarStats)• 2nd 1, 2nd 2 (L1, L2)• The 4th output item is the sample s.d and the 5th

output item is the population s.d

67

Estimation of Standard DeviationRange Rule of Thumb

x - 2s x x + 2s

Range 4sor

(minimumusual value)

(maximum usual value)

Range

4s =

highest value - lowest value

4

68

minimum ‘usual’ value (mean) - 2 (standard deviation)

minimum x - 2(s)

maximum ‘usual’ value (mean) + 2 (standard deviation)

maximum x + 2(s)

Usual Sample Values

69

x

The Empirical Rule(applies to bell-shaped distributions)

70

x - s x x + s

68% within1 standard deviation

34% 34%


71

x - 2s x - s x x + 2sx + s


34% 34%

95% within 2 standard deviations


13.5% 13.5%

72

x - 3s x - 2s x - s x x + 2s x + 3sx + s


34% 34%

95% within 2 standard deviations

99.7% of data are within 3 standard deviations of the mean


0.1% 0.1%

2.4% 2.4%

13.5% 13.5%

73

Chebyshev’s Theorem applies to distributions of any shape.

the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1 - 1/K2 , where K is any positive number greater than 1.

at least 3/4 (75%) of all values lie within 2 standard deviations of the mean.

at least 8/9 (89%) of all values lie within 3 standard deviations of the mean.

74

Measures of Variation Summary

For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations.

75

z Score (or standard score)

the number of standard deviations that a given value x is above or

below the mean


76

Sample

z = x - xs

Population

z = x - µ

Round to 2 decimal places

Measures of Positionz score

77

- 3 - 2 - 1 0 1 2 3

Z

Unusual Values

Unusual Values

OrdinaryValues

Interpreting Z Scores

78


Quartiles, Deciles,

Percentiles

79

Q1, Q2, Q3 divides ranked scores into four equal parts

Quartiles

25% 25% 25% 25%

Q3Q2Q1(minimum) (maximum)

(median)

80

D1, D2, D3, D4, D5, D6, D7, D8, D9

divides ranked data into ten equal parts

Deciles

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

D1 D2 D3 D4 D5 D6 D7 D8 D9

81

99 Percentiles

Percentiles

82

Quartiles, Deciles, Percentiles

Fractiles

(Quantiles)partitions data into approximately equal parts

83

Quartiles

Q1 = P25

Q2 = P50

Q3 = P75

D1 = P10

D2 = P20

D3 = P30

• • •

D9 = P90

Deciles

84

Viewing QuartilesUsing the TI83 Calculator

• Push Stat• Select Edit• Enter data into L1• 2nd mode (quit)• Push Stat• >Calc• Select 1 (1-VarStats)• 2nd 1 (L1)• Arrow down to view the values for Q2 and Q3 (the

median “med” is Q2)

85

Exploratory Data Analysis

the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate the data sets in order to understand their important characteristics

86

Outliers a value located very far away from almost all

of the other values

an extreme value

can have a dramatic effect on the mean, standard deviation, and on the scale of the histogram so that the true nature of the distribution is totally obscured

87

Boxplots(Box-and-Whisker Diagram)

Reveals the: center of the data spread of the data distribution of the data presence of outliers

Excellent for comparing two or more data sets

88

Boxplots

5 - number summary

Minimum

first quartile Q1

Median (Q2)

third quartile Q3

Maximum

89

Boxplots

Boxplot of Qwerty Word Ratings

2 4 6 8 10 12 14

02 4 6

14

0

90

Viewing BoxplotsUsing the TI83 Calculator



• Select 1

• Select “on”

• Select boxplot type (bottom row, 2nd graph)

• X list = 2nd 1 (L1)

• Freq = 2nd 2 (L2)

• Zoom

• 9 (ZoomStat)

91

Bell-Shaped Skewed

Boxplots

Uniform

92

Exploring Measures of center: mean, median, and mode

Measures of variation: standard deviation and range

Measures of spread & relative location:

minimum values, maximum value, and quartiles

Unusual values: outliers

Distribution: histograms, stem-leaf plots, and boxplots

Documents

1 Descriptive Statistics Frequency Tables Visual Displays Measures of Center