Upload
imogene-banks
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
2
Descriptive Statistics
summarize or describe the important characteristics of a known set of population data
Inferential Statistics
use sample data to make inferences (or generalizations) about a population
Overview
3
1. Center: A representative or average value that indicates where the middle of the data set is located
2. Variation: A measure of the amount that the values vary among themselves
3. Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed)
4. Outliers: Sample values that lie very far away from the vast majority of other sample values
5. Time: Changing characteristics of the data over time
Important Characteristics of Data
4
Frequency Table
lists classes (or categories) of values,
along with frequencies (or counts) of
the number of values that fall into each
class
Summarizing Data With Frequency Tables
5
Qwerty Keyboard Word RatingsTable 2-1
2 2 5 1 2 6 3 3 4 2
4 0 5 7 7 5 6 6 8 10
7 2 2 10 5 8 2 5 4 2
6 2 6 1 7 2 7 2 3 8
1 5 2 5 2 14 2 2 6 3
1 7
6
Frequency Table of Qwerty Word RatingsTable 2-3
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
8
Lower Class Limits
Lower ClassLimits
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
are the smallest numbers that can actually belong to different classes
9
Upper Class Limits
Upper ClassLimits
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
are the largest numbers that can actually belong to different classes
10
Class Boundaries
ClassBoundaries
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency- 0.5
2.5
5.5
8.5
11.5
14.5
number separating classes
11
midpoints of the classes
Class Midpoints
ClassMidpoints
0 - 1 2 20
3 - 4 5 14
6 - 7 8 15
9 - 10 11 2
12 - 13 14 1
Rating Frequency
12
Class Width
Class Width
3 0 - 2 20
3 3 - 5 14
3 6 - 8 15
3 9 - 11 2
3 12 - 14 1
Rating Frequency
is the difference between two consecutive lower class limits or two consecutive class boundaries
13
1. Be sure that the classes are mutually exclusive.
2. Include all classes, even if the frequency is zero.
3. Try to use the same width for all classes.
4. Select convenient numbers for class limits.
5. Use between 5 and 20 classes.
6. The sum of the class frequencies must equal the number of original data values.
Guidelines For Frequency Tables
14
3. Select for the first lower limit either the lowest score or a convenient value slightly less than the lowest score.
4. Add the class width to the starting point to get the second lower class limit, add the width to the second lower limit to get the
third, and so on.
5. List the lower class limits in a vertical column and enter the upper class limits.
6. Represent each score by a tally mark in the appropriate class.
Total tally marks to find the total frequency for each class.
Constructing A Frequency Table1. Decide on the number of classes .
2. Determine the class width by dividing the range by the number of classes (range = highest score - lowest score) and round
up.class width round up of
range
number of classes
15
Inputting a Frequency Table into the TI83 Calculator
• Push Stat
• Select Edit
• Input Class Marks in List 1
• Input Frequencies in List 2
17
Relative Frequency Table
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
0 - 2 38.5%
3 - 5 26.9%
6 - 8 28.8%
9 - 11 3.8%
12 - 14 1.9%
RatingRelative
Frequency
20/52 = 38.5%
14/52 = 26.9%
etc.
Total frequency = 52
18
Cumulative Frequency Table
CumulativeFrequencies
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
Less than 3 20
Less than 6 34
Less than 9 49
Less than 12 51
Less than 15 52
RatingCumulativeFrequency
19
Frequency Tables
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
0 - 2 38.5%
3 - 5 26.9%
6 - 8 28.8%
9 - 11 3.8%
12 - 14 1.9%
RatingRelative
Frequency
Less than 3 20
Less than 6 34
Less than 9 49
Less than 12 51
Less than 15 52
RatingCumulative Frequency
20
depict the nature or shape of the data distribution
Histograma bar graph in which the horizontal scale represents classes and the vertical scale
represents frequencies
Pictures of Data
22
Relative Frequency Histogram of Qwerty Word Ratings
0 - 2 38.5%
3 - 5 26.9%
6 - 8 28.8%
9 - 11 3.8%
12 - 14 1.9%
RatingRelative
Frequency
24
Viewing a HistogramUsing the TI83 Calculator
• First input class marks and frequencies in L1 & L2
• 2nd y= (Statplot)
• Select 1
• Select “on”
• Select histogram type (top row, 3rd graph)
• X list = 2nd 1 (L1)
• Freq = 2nd 2 (L2)
• Zoom
• 9 (ZoomStat)
26
Viewing a Frequency PolygonUsing the TI83 Calculator
• First input class marks and frequencies in L1 & L2
• 2nd y= (Statplot)
• Select 1
• Select “on”
• Select polygon type (top row, 2nd graph)
• X list = 2nd 1 (L1)
• Freq = 2nd 2 (L2)
• Zoom
• 9 (ZoomStat)
29
Stem-and Leaf Plot
Raw Data (Test Grades)
67 72 85 75 89
89 88 90 99 100
Stem Leaves
6 7 8 910
72 55 8 9 90 9 0
30
Pareto Chart
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
0
Mo
tor
Ve
hic
le
Fa
lls
Po
iso
n
Dro
wn
ing
Fir
e
Ing
es
tio
n o
f fo
od
or
ob
jec
t
Fir
ea
rms
Fre
quen
cy
Accidental Deaths by Type
31
Pie Chart
Firearms(1400. 1.9%)
Ingestion of food or object(2900. 3.9%
Fire(4200. 5.6%)
Drowning(4600. 6.1%)
Poison(6400. 8.5%)
Falls(12,200. 16.2%)
Motor vehicle(43,500. 57.8%)
Accidental Deaths by Type
35
Mean
(Arithmetic Mean)
AVERAGE
the number obtained by adding the values and dividing the total by the number of values
Definitions
36
Notation
denotes the addition of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
37
Notation
µ is pronounced ‘mu’ and denotes the mean of all values in a population
is pronounced ‘x-bar’ and denotes the mean of a set of sample values
Calculators can calculate the mean of data
x =n
xx
Nµ =
x
38
Finding a MeanUsing the TI83 Calculator
• Push Stat
• Select Edit
• Enter data into L1
• 2nd mode (quit)
• Push Stat
• >Calc
• Select 1 (1-VarStats)
• 2nd 1 (L1)
• The first output item is the mean
39
Definitions Median
the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude
often denoted by x (pronounced ‘x-tilde’)
is not affected by an extreme value
~
40
6.72 3.46 3.60 6.44 26.70
3.46 3.60 6.44 6.72 26.70
(in order - odd number of values)
exact middle MEDIAN is 6.44
41
6.72 3.46 3.60 6.44
3.46 3.60 6.44 6.72 no exact middle -- shared by two numbers
3.60 + 6.44
2
(even number of values)
MEDIAN is 5.02
42
Finding a MedianUsing the TI83 Calculator
• Push Stat
• Select Edit
• Enter data into L1
• 2nd mode (quit)
• Push Stat
• >Calc
• Select 1 (1-VarStats)
• 2nd 1 (L1)
• Arrow down to “Med” to see the value for median
43
Definitions Mode
the score that occurs most frequently
BimodalMultimodalNo Mode
denoted by M
the only measure of central tendency that can be used with nominal data
44
a. 5 5 5 3 1 5 1 4 3 5
b. 1 2 2 2 3 4 5 6 6 6 7 9
c. 1 2 3 6 7 8 9 10
Examples
Mode is 5
Bimodal 2 and 6
No Mode
45
Midrange
the value midway between the highest and lowest values in the original data set
Definitions
Midrange =highest score + lowest score
2
46
Finding a Midrange Using the TI83 Calculator
• Push Stat• Select Edit• Enter data into L1• 2nd mode (quit)• Push Stat• >Calc• Select 1 (1-VarStats)• 2nd 1 (L1)• Arrow down to see the “min” and “max” values
for the midrange calculation
47
Carry one more decimal place than is present in the original set of values
Round-off Rule for Measures of Center
48
use class midpoint of classes for variable x
Mean from a Frequency Table
x = class midpoint
f = frequency
f = n
x =f
(f • x)
49
Finding a Grouped MeanUsing the TI83 Calculator
• Push Stat
• Select Edit
• Enter class marks into L1 and frequencies into L2
• 2nd mode (quit)
• Push Stat
• >Calc
• Select 1 (1-VarStats)
• 2nd 1, 2nd 2 (L1,L2)
• The first output item is the mean
51
SymmetricData is symmetric if the left half of its histogram is roughly a mirror of its right half.
SkewedData is skewed if it is not symmetric and if it extends more to one side
than the other.
Definitions
52
Skewness
Mode = Mean = Median
SKEWED LEFT(negatively)
SYMMETRIC
Mean Mode Median
SKEWED RIGHT(positively)
Mean Mode Median
53
Jefferson Valley Bank
Bank of Providence
6.5
4.2
6.6
5.4
6.7
5.8
6.8
6.2
7.1
6.7
7.3
7.7
7.4
7.7
7.7
8.5
7.7
9.3
7.7
10.0
Jefferson Valley Bank
7.15
7.20
7.7
7.10
Bank of Providence
7.15
7.20
7.7
7.10
Mean
Median
Mode
Midrange
Waiting Times of Bank Customers at Different Banks
in minutes
56
a measure of variation of the scores about the mean
(average deviation from the mean)
Measures of Variation
Standard Deviation
57
Sample Standard Deviation Formula
calculators can compute the sample standard deviation of data
(x - x)2
n - 1S =
58
Sample Standard Deviation Shortcut Formula
n (n - 1)
s =n (x2) - (x)2
calculators can compute the sample standard deviation of data
59
Population Standard Deviation
calculators can compute the population standard deviation
of data
2 (x - µ)
N =
60
Symbolsfor Standard DeviationSample Population
x
xn
s
Sx
xn-1
Textbook
Some graphicscalculators
Somenon-graphicscalculators
Textbook
Some graphicscalculators
Somenon-graphics
calculators
Articles in professional journals and reports often use SD for standard deviation and VAR for variance.
61
Finding a Standard DeviationUsing the TI83 Calculator
• Push Stat• Select Edit• Enter data into L1• 2nd mode (quit)• Push Stat• >Calc• Select 1 (1-VarStats)• 2nd 1 (L1)• The 4th output item is the sample s.d. and the 5th
output item is the population s.d.
62
Measures of Variation
Variance
standard deviation squared
s
2
2
}
use square keyon calculatorNotation
64
Round-off Rulefor measures of variation
Carry one more decimal place than is present in the original set of
values.
Round only the final answer, never in the middle of a calculation.
65
Standard Deviation from a Frequency Table
Use the class midpoints as the x values
Calculators can compute the standard deviation for frequency table
n (n - 1)S
=
n [(f • x 2)] -[(f • x)]2
66
Finding a Grouped Standard DeviationUsing the TI83 Calculator
• Push Stat• Select Edit• Enter class marks into L1 and frequencies into L2• 2nd mode (quit)• Push Stat• >Calc• Select 1 (1-VarStats)• 2nd 1, 2nd 2 (L1, L2)• The 4th output item is the sample s.d and the 5th
output item is the population s.d
67
Estimation of Standard DeviationRange Rule of Thumb
x - 2s x x + 2s
Range 4sor
(minimumusual value)
(maximum usual value)
Range
4s =
highest value - lowest value
4
68
minimum ‘usual’ value (mean) - 2 (standard deviation)
minimum x - 2(s)
maximum ‘usual’ value (mean) + 2 (standard deviation)
maximum x + 2(s)
Usual Sample Values
70
x - s x x + s
68% within1 standard deviation
34% 34%
The Empirical Rule(applies to bell-shaped distributions)
71
x - 2s x - s x x + 2sx + s
68% within1 standard deviation
34% 34%
95% within 2 standard deviations
The Empirical Rule(applies to bell-shaped distributions)
13.5% 13.5%
72
x - 3s x - 2s x - s x x + 2s x + 3sx + s
68% within1 standard deviation
34% 34%
95% within 2 standard deviations
99.7% of data are within 3 standard deviations of the mean
The Empirical Rule(applies to bell-shaped distributions)
0.1% 0.1%
2.4% 2.4%
13.5% 13.5%
73
Chebyshev’s Theorem applies to distributions of any shape.
the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1 - 1/K2 , where K is any positive number greater than 1.
at least 3/4 (75%) of all values lie within 2 standard deviations of the mean.
at least 8/9 (89%) of all values lie within 3 standard deviations of the mean.
74
Measures of Variation Summary
For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations.
75
z Score (or standard score)
the number of standard deviations that a given value x is above or
below the mean
Measures of Position
79
Q1, Q2, Q3 divides ranked scores into four equal parts
Quartiles
25% 25% 25% 25%
Q3Q2Q1(minimum) (maximum)
(median)
80
D1, D2, D3, D4, D5, D6, D7, D8, D9
divides ranked data into ten equal parts
Deciles
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D1 D2 D3 D4 D5 D6 D7 D8 D9
82
Quartiles, Deciles, Percentiles
Fractiles
(Quantiles)partitions data into approximately equal parts
84
Viewing QuartilesUsing the TI83 Calculator
• Push Stat• Select Edit• Enter data into L1• 2nd mode (quit)• Push Stat• >Calc• Select 1 (1-VarStats)• 2nd 1 (L1)• Arrow down to view the values for Q2 and Q3 (the
median “med” is Q2)
85
Exploratory Data Analysis
the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate the data sets in order to understand their important characteristics
86
Outliers a value located very far away from almost all
of the other values
an extreme value
can have a dramatic effect on the mean, standard deviation, and on the scale of the histogram so that the true nature of the distribution is totally obscured
87
Boxplots(Box-and-Whisker Diagram)
Reveals the: center of the data spread of the data distribution of the data presence of outliers
Excellent for comparing two or more data sets
90
Viewing BoxplotsUsing the TI83 Calculator
• First input class marks and frequencies in L1 & L2
• 2nd y= (Statplot)
• Select 1
• Select “on”
• Select boxplot type (bottom row, 2nd graph)
• X list = 2nd 1 (L1)
• Freq = 2nd 2 (L2)
• Zoom
• 9 (ZoomStat)