Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Chapter 2: Frequency Distributions and Graphs
Objectives:
❑ Organizing data using frequency
distributions.
❑ Illustrating data using graphs.
❑ Interpreting graphs.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Overview of Chapter 2
Sec. # Title Page(s)
2 - 1 Organizing data 42 - 56
2 - 2Histogram, frequency polygons, and
ogives57 - 74
2 - 3 Other types of graphs 74 - 108
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
2 – 1: Organizing data
Inductee Blood
type
1 A
2 B
3 B
4 AB
⋮ ⋮
25 A
These data are called
raw data ! (i.e., they
are in their original
form)
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
2 – 1: Organizing data (cont.)
By using descriptive statistical methods, we
transform raw data into …
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
1. Frequency Distributions
Inductee Blood
type
1 A
2 B
3 B
4 AB
⋮ ⋮
25 A
Blood type Number of
inductees
A 5
B 7
O 9
AB 4
Descriptive
Statistics
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
2. Graphs
A20%
B28%
O36%
AB16%
A B O AB
Inductee Blood
type
1 A
2 B
3 B
4 AB
⋮ ⋮
25 A
Descriptive
Statistics
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
2 – 1: Organizing data (cont.)
A frequency distribution is the organization of raw data in table form, using classes and frequencies.
A class is a quantitative or qualitative aspect in which data are accordingly distributed.
A frequency of a class is the number of data values placed in this class.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
2 – 1: Organizing data (cont.)
Types of frequency distributions:
Categorical frequency distribution is used
when in the cases of nominal-level or ordinal-
level data.
Grouped frequency distribution is used in the
case of quantitative data with large range.
Ungrouped frequency distribution is used in
the case of quantitative data with relatively
small range or when the data are discrete.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 1: Distribution of Blood Types
(Categorical frequency distribution, page 43)
Twenty-five army inductees were given a
blood test to determine their blood type.
The data set is as shown in page 43.
This is called the
sample size!
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 1 (cont.)
Step 1. determine the classes.
We have four blood types, therefore, there are four classes are they are A, B, O, and AB.
Step 2. Create a table with three columns, the first column is for blood types (classes), the second column for counting, and the third column is for the number (#) of inductees (frequencies).
Step 3. Tally data and then delete column 2.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 1 (cont.)
A AB
AB
A
A AB
AB A A
Class Tally Frequency
A |||| 5
B 7
O 9
AB |||| 4
Total 25
Raw data Frequency distribution
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 1 (cont.)
Blood type
(Class)
# of inductees
(Frequency)
A 5 = 𝒇𝟏
B 7 = 𝒇𝟐
O 9 = 𝒇𝟑
AB 4 = 𝒇𝟒
Total ∑𝒇 = 𝒇𝟏 + 𝒇𝟐 + 𝒇𝟑 + 𝒇𝟒 = 𝟐𝟓
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Important Rules!
Regardless of the type of the frequency
distribution, if the sample size is represented by 𝒏, then
∑𝒇 = 𝒏
Regardless of the type of the frequency
distribution, the (cumulative) percentage of class
number 𝒊 is defined as
𝑷𝒊 =𝒇𝒊𝒏× 𝟏𝟎𝟎%
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Important Rules! (cont.)
When manually drawing the pie-chart, we need to
calculate the degree of class number 𝒊 which is defined as
𝑫𝒊 =𝒇𝒊𝒏× 𝟑𝟔𝟎°
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 1 (Revisited)
Blood type
(Class)
# of inductees
(Frequency)
Percentage of
inductees (%)
Degree of
inductees
A 5 20% 72˚
B 7 28% 100.8˚
O 9 36% 129.6˚
AB 4 16% 57.6˚
Total 25 100% 360˚
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2: Record High Temperatures
(Groped frequency distribution, page 47)
The data in page 47 represent the record
high temperatures in degrees Fahrenheit (˚F)
for each of the 50 (= 𝒏) states. Construct a grouped frequency distribution for the data
using 7 classes.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2 (cont.)
Step 1. determine the classes as follows:
Calculate the range (𝑹) which is the difference between the highest value (𝑯) and the lowest value (𝑳), i.e.
𝑹 = 𝑯− 𝑳 = 𝟏𝟑𝟒 − 𝟏𝟎𝟎 = 𝟑𝟒
Given the number of classes, the class width is defined as:
𝐜𝐥𝐚𝐬𝐬 𝐰𝐢𝐝𝐭𝐡 = 𝒉 =𝑹
𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐜𝐥𝐚𝐬𝐬𝐞𝐬 = 𝒌
=𝟑𝟒
𝟕= 𝟒. 𝟗 ≈ 𝟓
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2 (cont.)
Step 1. (cont.)
A class of a grouped frequency distribution consists of
class limits (boundaries); a lower limit, which is the
smallest data value that can be included in the class, and
an upper limit, which represents the largest data value
that can be included in the class.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2 (cont.)
Step 1. (cont.) (textbook)
Lower limits: Consider the lowest value to be the starting point for the lowest class limits, i.e., first use 100 as the lower limit of the first class then repeatedly add the class width to get the lower limit of the next six classes, i.e., 105, 110, 115, 120, 125, 130.
Upper limits: Subtract one unit from the lower limits of the second class until the seventh class to get the upper limit of the first class until the sixth class. Finally, use the largest value as the upper limit of the final class.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2 (cont.)
Step 1. (cont.)
The limits of the classes are:
𝑳 = 100 – 104
105 – 109
𝒍𝟑 =110 – 114 = 𝒖𝟑115 – 119
120 – 124
125 – 129
130 – 134 = 𝑯
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2 (cont.)
Step 2. Create a table with three columns, the first
column is for temperature (classes), the second
column for counting, and the third column is for the
number (#) of states (frequencies).
Step 3. Tally data and then delete column 2.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2 (cont.)
Temperature
(Class)
# of states
(Frequency)
100 – 104 2
105 – 109 8
110 – 114 18
115 – 119 13
120 – 124 7
125 – 129 1
130 – 134 1
Total 50
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Important Rules!
Sometimes, we need to calculate the midpoints
of each class which is given by
𝐜𝐥𝐚𝐬𝐬 #𝒊 𝐦𝐢𝐝𝐩𝐨𝐢𝐧𝐭 =𝒍𝒊 + 𝒖𝒊𝟐
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Important Rules! (cont.)
Grouped (and also ungrouped) frequency
distributions use class boundaries so that there are
no gaps in the frequency distribution. They are given
by
𝒍𝒊 − 𝟎. 𝟓, 𝒖𝒊 + 𝟎. 𝟓
in the case of grouped frequency distributions.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Important Rules! (cont.)
Usually, grouped frequency distributions consist of
equal class widths. The class width based on the
limits of any class 𝒊 is given by
𝐜𝐥𝐚𝐬𝐬 𝐰𝐢𝐝𝐭𝐡 = 𝒍𝒊+𝟏 − 𝒍𝒊or
𝐜𝐥𝐚𝐬𝐬 𝐰𝐢𝐝𝐭𝐡 = 𝒖𝒊+𝟏 − 𝒖𝒊
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2 (Revisited)
Temperature
(Class)
Class
boundaries
Class
Midpoints
# of states
(Frequency)
100 – 104 99.5 – 104.5 102 2
105 – 109 104.5 – 109.5 107 8
110 – 114 109.5 – 114.5 112 18
115 – 119 114.5 – 119.5 117 13
120 – 124 119.5 – 124.5 122 7
125 – 129 124.5 – 129.5 127 1
130 – 134 129.5 – 134.5 132 1
Total 50
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 3: MPGs for SUVs
(Ungrouped frequency distribution, page 49)
The data shown in page 49 represent the
number of miles per gallon (mpg) that 30 (=𝒏) selected four-wheel-drive sports utility vehicles obtained in city driving. Construct a
frequency distribution, and analyze the
distribution.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 3 (cont.)
Step 1. determine the classes.
Notice that the range of data is small (𝑅 = 19 − 12 = 7). The classes are 12, 13, 14, 15, 16, 17, 18, and 19.
Step 2. Create a table with three columns, the first column is for
MPG (classes), the second column for counting, and the third
column is for the # of SUVs (frequencies).
Step 3. Tally data and then delete column 2.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 3 (cont.)
MPG
(Class)
# of SUVs
(Frequency)
12 6
13 1
14 3
15 6
16 8
17 2
18 3
19 1
Total 30
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Remark!
Notice that the class boundaries calculated for the
ungrouped frequency distribution since MPG is an
example of a continuous variable.
Only in the case of continuous data, we can
obtain class boundaries by subtracting 0.5 from
each class value to get the lower class boundary,
and adding 0.5 to each class value to get the
upper class boundary.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 3 (Revisited)
MPG
(Class)Class boundaries
# of SUVs
(Frequency)
12 11.5 – 12.5 6
13 12.5 – 13.5 1
14 13.5 – 14.5 3
15 14.5 – 15.5 6
16 15.5 – 16.5 8
17 16.5 – 17.5 2
18 17.5 – 18.5 3
19 18.5 – 19.5 1
Total 30
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
2 – 2: Histograms, frequency polygons,
and ogives
Histogram:
Definition is in page 57, and the corresponding
illustrative Example 2 – 4 is in pages 57-58.
Frequency polygon:
Definition is in page 58, and the corresponding
illustrative Example 2 – 5 is in pages 58-59.
Ogive:
Definition is in page 59, and the corresponding
illustrative Example 2 – 6 is in pages 59-61.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Histogram
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Frequency Polygon
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Cumulative Frequency Distribution
A cumulative frequency distribution is a
distribution that shows the number of data values
less than or equal to each upper boundary.
The values are found by adding the frequencies of
the classes less than or equal to the upper class
boundary of a specific class. This gives an
ascending cumulative frequency. The last
cumulative frequency must be equal to 𝒏.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2 (Revisited)
Temp.
(Class)
Class
boundariesFrequency
Cumulative
Frequency
Less than 99.5 0
100 – 104 99.5 – 104.5 2 Less than 104.5
105 – 109 104.5 – 109.5 8 Less than 109.5
110 – 114 109.5 – 114.5 18 Less than 114.5
115 – 119 114.5 – 119.5 13 Less than 119.5
120 – 124 119.5 – 124.5 7 Less than 124.5
125 – 129 124.5 – 129.5 1 Less than 129.5
130 – 134 129.5 – 134.5 1 Less than 134.5© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Example 2 – 2 (Revisited)
Temp.
(Class)
Class
boundariesFrequency
Cumulative
Frequency
Less than 99.5 0
100 – 104 99.5 – 104.5 2 Less than 104.5 2
105 – 109 104.5 – 109.5 8 Less than 109.5 10
110 – 114 109.5 – 114.5 18 Less than 114.5 28
115 – 119 114.5 – 119.5 13 Less than 119.5 41
120 – 124 119.5 – 124.5 7 Less than 124.5 48
125 – 129 124.5 – 129.5 1 Less than 129.5 49
130 – 134 129.5 – 134.5 1 Less than 134.5 𝟓𝟎 = 𝒏
Keep Adding!
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Ogive
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Why using ogives?
The ogive is mainly used to visually represent how
many cumulative frequency (percentage) are
approximately below a certain upper class
boundary and vice versa.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Case 1: Getting the cumulative frequency based on
an upper class boundary.
cumulative frequency ≈ 𝟒𝟓
Upper (limit) boundary= 𝟏𝟐𝟐
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Case 2: Getting the upper class boundary based on a
cumulative frequency.
Upper (limit) boundary≈ 𝟏𝟏𝟕. 𝟓
cumulative frequency = 𝟑𝟓
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Case 3: Getting the cumulative frequencies based on
two upper class boundaries.
cumulative frequency = 𝟒𝟓
𝐑𝐞𝐪𝐮𝐢𝐫𝐞𝐝 cumulative frequency = 𝟒𝟓 − 𝟑𝟓 = 𝟏𝟎
cumulative frequency = 𝟑𝟓
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Remarks!
If the sample size and the (cumulative)
percentage of a specific class are given, then the
corresponding (cumulative) frequency of this class
can be calculated by
𝒇𝒊 =𝑷𝒊
𝟏𝟎𝟎%× 𝒏
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Remarks! (cont.)
If both the (cumulative) frequency and the
(cumulative) percentage of a specific class are
given, then the sample size of this class can be
calculated by
𝒏 =𝒇𝒊𝑷𝒊
× 𝟏𝟎𝟎%
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
2 – 2: Other types of graphs
Bar graph:
Definition is in page 75, and the corresponding
illustrative Example 2 – 8 is in pages 77.
Time series chart:
Definition is in page 78, and the corresponding
illustrative Example 2 – 10 is in pages 78-79.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
2 – 2: Other Types of Graphs
Pie chart:
Definition is in page 80, with Example 2 – 11 in
pages 80-81, and Example 2 – 12 in pages 82.
Stem-and-leaf plot:
Definition is in page 84, with Example 2 – 14 in
pages 84, and Example 2 – 15 in pages 85.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Summary of graphs
Bar graph is obtained based on a categorical frequency
distribution, i.e., it is typically used with qualitative data.
Note that this graph is often preferred when the data are
ordinal-level qualitative. Also, note that this figure can
be also be used when the data are
Time series charts are used when the quantitative data
are observed over a period of time (e.g., minutes, hours,
etc.). Here, the independent variable is the time and the
variable observed over time is the dependent variable.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Summary of graphs (cont.)
Like the bar graph, the Pie chart is obtained based
on a categorical frequency distribution, and it is
highly recommended for nominal-level qualitative
data.
Stem-and-leaf plot is only used when the data are
quantitative.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Bar graph
0
2
4
6
8
10
12
14
16
18
A+ A B+ B C+ C D+ D F DN
Grade
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Time series chart
0
100
200
300
400
500
600
700
800
2003 2004 2005 2006 2007
Number of homicides
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Pie Chart
Remember: the degree of a class in a pie chart is defined as
𝒇
𝒏× 𝟑𝟔𝟎°
A20%
B28%
O36%
AB16%
A B O AB
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Stem and leaf plot
The stem and leaf plot is similar to a horizontally
flipped histogram!
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Stem and leaf plot
Consider the following numbers 1403 and 102.
The stem of 1403 is 140 and the leaf is 3.
The stem of 102 is 10 and the leaf is 2.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Stem and leaf plot
By joining the stems and the leaves, we notice that the
minimum is 02, while the maximum is 57.
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Summary
Nominal Ordinal Discrete Continuous
Categorical
frequency
distribution
✓ ✓
Grouped
frequency
distribution
✓
(large range)
Ungrouped
frequency
distribution
✓✓
(small range)
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019
Summary (cont.)
Nominal Ordinal Discrete Continuous
Bar Chart ✓ ✓ ✓
Time Series
Independent variable:
time
Dependent variable:
discrete or continuous
Pie Chart ✓ ✓
Stem-and-
leaf plot✓ ✓
© FAROUQ MOHAMMAD ALAM, DEPT OF STAT, KAU, 2019