Name ____________________________________ Period _____________________________
HW Organizing and Describing Data
1. (a) Gender (female or male) Categorical
(b) Age (years) Quantitative - discrete
(c) Race Categorical
(d) Smoker (yes or no) Categorical
(e) Systolic blood pressure (millimeters of mercury) Quantitative - continuous
(f) Level of calcium in the blood (micrograms per milliliter) Quantitative - continuous
(g) Number of prior surgeries Quantitative - discrete
2. (a) What percent of spam would fall in the “other” category? 7%
(b) Display this data in a bar graph. Use graph paper and a formal neat presentation with all the
required elements.
3. (a) Present these data in a well-labeled bar graph.
(b) Suggest some possible reasons why there are fewer births in the weekends.
Labor induced during the week when doctors are in the office
4. Based on these results, do you think there was a change in people’s attitudes during the 10 years
between these polls? There doesn’t appear to be much of a difference in people’s attitudes as the
height of the bars are approximately the same for each category. Support your conclusions using a
side-by-side bar graph.
5.
(a) Construct a segmented bar graph to show the different percentages of physical activity for the
three BMI Groups.
(b) Do these data prove that lack of exercise causes obesity? Explain. Cause cannot be determined
from an observational study.
6. a)Describe each of the four distributions.
Process A: shape is roughly symmetrical and bell shaped. The center is at 11.5 and the spread is from
about 11.4 to 11.6. There are no unusual features.
Process B: The shape is uniform. The center is around 12 and the spread is from 11.4 to 12.7. There are
clusters throughout that are separated by small gaps, almost every .05 in change in
diameter length.
Process C: The shaped is roughly bimodal. The center is around 11.7 and the data is spread from 11.1 to
12.2. There are many gaps all along the interval.
Process D: The shape is roughly symmetrical and bell shaped. The center is at 12 and the spread is from
11.8 to 12.2.
a) Which process is the best? Why? Process D because almost all of the dots appear to be in the acceptable
range of 11.8 to 12.2.
b) Which process is the most stable? Why? Process A is the most stable because it has the smallest spread
in the data, approximately .2 cm.
c) Which process is the least stable? Why? Process B is the least stable because it has the largest spread in
the data, approximately 1.3 cm.
7. a) Create a dotplot and a stemplot for these data.
b) Describe the distribution.
Fairly symmetrical bell-shaped with a peak at 35. The center appears to be 35 with a range
from 14 to 54. Looking at the dotplot we can see gaps at 17, 20, 21 and 36-37 which gives it the
appearance of clusters of data scattered throughout.
8. a) Create a dotplot and a stemplot for these data.
b) Describe the distribution.
The distribution is skewed to the right with a center of 10 and a range from 1 to 31. Looking at
the dot there are gaps from 15-18, 20, 24, 26 and 28-30. There is a cluster of data around 22
and 31 could be an outlier.
9. a) Create a dotplot of this data.
The distribution of number of
hurricanes is skewed to the right with a
center at 2 and a range from 0 to 7.
There are no unusual features to this
distribution.
10.
The distribution of horse-power for autos is slightly skewed to the right with a center at 103 and a range
from 65 to 155. There aren’t any unusual features with this distribution.
The split stem shows the details a
bit better where you are able to
make out the cluster of data around
86.
The distribution of the amount of
money spent by shoppers in a store
is skewed to the right with a cluster
of amounts around 86. The
distribution has a center of 28 and a
range from 3 to 94.
The distribution of division times for
lung cells exposed to Berylllium is
skewed to the right with a center
around 28 and a spread from 14 to 73.
There are no unusual features.
13.
The distribution of zinc intake is
roughly symmetric and bell-shaped
with a center around 11 and a spread
from 5 to 19. There are no unusual
features.
14. Complete this frequency distribution.
Class Limits Class Boundaries Frequency Relative
Frequency
Cumulative
Frequency
Cumulative
Relative Freq.
14-20 13.5-20.5 6 .136 6 .136
21-27 20.5-27.5 7 .159 13 .295
28-34 27.5-34.5 7 .159 20 .455
35-41 34.5-41.5 11 .25 31 .705
42-48 41.5-48.5 8 .182 39 .886
49-55 48.5-55.5 5 .114 44 1.000
Create a cumulative relative frequency plot (ogive).
15) What DRP score is at the 20th percentile?
23
16) What DRP score is at the 90th percentile?
50
17) What is the median DRP score?
36
18) What is the IQR for the distribution of DRP
scores? 17
The following cumulative relative frequency plot shows the age of people enrolled in school in 1996.
19) What is the median age of school enrollment in 1996?
11
20) What is the interquartile range of school enrollment in
1996?
10
21) At or below what age is the bottom 10% of school
enrollment in 1996?
4
22) At or above what age is the top 20% of school enrollment in
1996?
19
105 15 20 25 30 35 40
.1
.2
.3
.4
.5
.6
.7
.8
.9
1
Age
Cum
ulat
ive r
ela
tive
fre
quenc
y
21 28 35 42 49 56
DRP Score
14
The following ogive shows the grade point average (GPA) for students at a certain school.
23) What is the median GPA?
2.4
24) What is the IQR of GPA?
1.0
25) What GPA is at the 85th percentile?
3.0
26) What does the steepness of the line imply?
The cumulative relative frequency increased greatly because there were many values in that
class.
27) What does it mean if the line is horizontal between two points?
There were no values between those two points.
28) (a) Find the mean and median of each medal count (total, gold silver and bronze).
Total: mean = 31.29, median = 22 Gold: mean = 9.95, median = 7
Silver: mean = 10.1, median = 7 Bronze: mean = 11.24, median = 9
(b) Which is larger, the mean or the median? Is the difference considerable “large”?
The mean is larger for all. It seems considerably large in the total category.
29) (a) Use the formula to calculate the mean. Mean = 85
(b) the fifteenth quiz and he receives a score of zero. Mean = 79.3 mean is non-resistant
30) (a) Find the mean score from the formula for the mean. Mean = 141.06
(b) Find the mean for the 17 observations when you drop the outlier. Mean = 137.59; the
mean was impacted severely by the outlier.
31) What is the team’s annual payroll for players? 30 million If you only knew the median salary,
would you be able to answer the question? NO
.1
.2
.3
.4
.5
.6
.7
.8
.9
1
1 2 3 4Grade Point Average
Cum
ulat
ive P
ropo
rtio
n
32) Find the median of these scores. Median = 138.5 < Mean because distribution is skewed right
33) What is the mean salary paid at this firm? 60,000 How many employees earn less 7/8; They
could promote the $60,000 as the average even though most employees (7/8) earn
significantly less than that.
34) $490,000 (Median) and 1,160,000 (Mean). Salaries are skewed right.
35) a). State the IQR of this data set. 79 84.5 88.5 93 111 IQR = 8.5
b) Find the mean and the median. Mean is 90.7, Median is 88.5
c) Based on the mean and median, describe the distribution. Skewed right
36) a) Construct a boxplot (modified if necessary) of the data. 44 61.5 65.5 71.5 80
b) Find the value of the IQR. 10
c) Are there any outliers? 44; Didn’t study
37) a) Compute the 5 number summary. 1.12 1.88 2.23 2.86 4.69
b) Draw a modified boxplot if you suspect outliers. Are there any? 4.69
c) the shape of the distribution, mean to fall distinctly above the median, skewed right
38) Find the range (58), mean (29.75), variance (613.07) and standard deviation (24.76)
39) Find the range (4.6), mean (6.681), variance (12.91) and standard deviation (3.59) for this data.
40) Find the range (11), mean (8.58), variance (11.72) and standard deviation (3.42).
41) • Choose four numbers that have the smallest possible standard deviation. All same
• Choose four numbers that have the largest possible standard deviation. 0,0,10,10
• Is more than one choice possible in either (a) (YES) or (b) (NO)
42) SAT Verbal scores for a high school’s graduating class
Males
Females
300 400 500 600 700 800
In general, graduating females at this high school tend to slightly out-perform their male
counterparts in SAT Verbal scores. Both distributions are skewed to the left, but the males
will have outliers with any of the scores that are below 312.5. The median score for females is
625 which is slightly higher than the males at 600. Female scores have a bit more variability in
the IQR (150 vs. 135), but the scores for males are more spread out overall with a range of
490 compared to only 410 for females.
43) Number of C-sections performed in a year by doctors
Male Female
0 5 7
1 0 4 8 9
8 7 5 5 0 2 5 9
7 6 4 2 1 3 1 3
4 4
9 0 5
6
7
6 5 8
44) Agility test performance by 4th - graders
Male Female
1 2
8 7 7 * 6 9
4 3 3 2 2 2 1 2 0 1 2 2 4
9 * 5 5 6 7 8
45) Which of the following groups have outliers? Associate and Instructor
46) a) Which group has the individual largest salary of anyone listed; South; 125,000
b) Which group has the largest median salary; South; 90,000
c) Which group has the smallest interquartile range; Northeast; 17,000
d) Which group has the lowest Q3; West; 90,000
e) The top half of the south make the same or more as 75% of the west.
f) What is the spread of the lower 25% of the midwest? 65,000-75,000
g) Which region has the widest middle 50% of their data? Midwest
47) a) If the company decides to give every employee a $5000 raise, how will that affect the
(mean, median, mode) + 5,000, (range, variance and standard deviation) Unaffected
b) Suppose instead the CEO decides to give everyone a 20% raise, how will that affect the
(mean, median, mode, range, and standard deviation) times 1.2; variance times 1.44
1 4 represents 14
c-sections performed
In general, it appears that male doctors
perform more c-sections in a year than
female doctors. The shape of the
distribution of the number of c-sections
performed by male doctors is skewed to
the right with a gap from 59 to 85 and two
outliers in the 80s, while the shape of the
distribution for female doctors is roughly
symmetric. The median for the males (34)
is much higher than the median for females
(18). The males have a larger range of 86 –
20 = 66 in comparison to the range of the
females, which is 33 – 5 = 28.
1 2 represents an
agility test score of 12
In general, it appears that females out-
perform males on a 4th – grade agility test.
The shape of the distribution of the
number of agility test scores by females is
skewed to the left while the distribution of
male scores is skewed to the right. Neither
distribution seems to have any unusual
features. The median score for both
genders is 22, however, the females have
more scores in the upper 20s. The females
have a slightly larger range of 28 – 12 = 16
in comparison to the range of the males,
which is 29 – 17 = 12.
48) a) Assume all those families were able to use a $1.00 off coupon, how will that affect the
(mean, median, mode) minus $1, (range, variance and standard deviation) Unaffected
b) Instead of a $1.00 of coupon they were able to save 20%, how will that affect the (mean,
median, mode, range, and standard deviation) times .8; variance times .64
c) It is their lucky week and they can take $1.00 and then also save 20%; what impact will
that have on the (mean, median, mode) .8(x – 1), (range, and standard deviation) times .8
and variance times .64