View
72
Download
1
Category
Preview:
Citation preview
AP Statistics Summer Assignment and Information
Hello future students!
I am so excited that you’ll be joining my AP Statistics course this fall, and want to start off by giving you a bit of information
regarding the class.
What is AP Statistics?
AP Statistics is a college-level introductory statistics class taught in high school.
Statistics is much more than making graphs and calculating mean, median, and mode. Rather, it is using data to evaluate
claims and make predictions.
For example:
We will learn the principles of experimental design to answer the question “Does taking caffeine affect your pulse rate?”
We will learn the principles of survey construction to answer the question “Do the characteristics of an interviewer affect the responses to a survey question?”
We will learn how to analyze variables to answer questions like: “What is the relationship between fat and calories in McDonalds products?”
We will learn the principles of statistical inference to answer questions like: “Do the majority of couples kiss to the right?” and “Does seat location affect performance in a math class?”
Why should you take AP Statistics?
It is a fresh start: This isn’t a typical math class. You don’t need to remember how to factor a polynomial, graph a
sine curve, or prove that triangles are congruent. However, this doesn’t mean the class will be easy! You will need
good critical thinking and communication skills.
It is a great preparation for college. It is taught over a full year (instead of a semester in college) so you have more
time to learn the material. The skills you learn will help you in a wide variety of fields. Current students will tell you
that AP Statistics has already helped them in their psychology and science classes.
You will NEVER ask: “When will we ever use this?” Examples and exercises are based on real-world studies in a
variety of fields. We will also read articles from current newspapers and magazines and learn how to think critically
about claims made in the media.
College credit, for much less money than you would pay in college.
Who can take AP Statistics?
Any student who has completed Algebra 2 is eligible to take AP Statistics. It counts for Math credit towards
graduation, it is a weighted grade, and it will look great on your college applications. It is also possible to take AP
Statistics simultaneously with another math course, such as Precalculus or AP Calculus.
How hard is AP Stats?
It’s a college course, so the expectations are high. You’ll be expected to do homework every day, and not just math
problems. There is reading and writing involved. In fact, if you think math is all formulas, equations, and calculations, you’ll
soon find that this isn’t really a math course. It’s a course in reading, analyzing, thinking, and writing clearly.
Need more info?
Contact Ms. Landgraf.
Email: klandgraf@ccboe.com
Summer Assignment Directions:
Read/take additional notes (if needed) on Chapter 1.1 you have a PDF printout of the notes and it is available on the
school website.
Complete the two worksheets;
Worksheet 1 has you practicing the information discussed in the notes
Worksheet 2 gives you an idea of what multiple choice questions look like for the course.
All work is due the first week of school.
Exploring DataCHAPTER 1
Data Analysis:
Making Sense of DataCHAPTER 1.0
Vocabulary
Statistics: the science of data
Data Analysis: the act of organizing, displaying,
summarizing, and questioning data.
Is the data biased?
Vocabulary
Individuals
Definition: the objects
described by the data.
Ex. People, animals, things
Variables
Definition: any
characteristic of an
individual
Ex. Gender, age, color,
etc.
Types of Variables
Categorical Variable
Describes and
categorizes individuals
by placing them into
one of several different
groups, categories
Ex. Gender, race, color,
breed, etc.
Quantitative Variable
Takes on numerical
values in which it makes
sense to take an
average
Ex. Weight, age, number
of pets, etc.
Non-examples?
Vocabulary
Distribution
Definition: describes
what values a variable
takes on and how often
Ex.
Inference
Definition: the notion of
drawing conclusions
What are the chances
that these results will be
repeated?
How can we know if our
conclusion are accurate
and trustworthy?
Variables take on many values. We
are interested in how often a
variable takes on each
value…that’s distribution.
Examine each variable
by itself.
Then study relationships
among the variables.
Start with a graph
or graphs
How to Explore Data
Add numerical
summaries
Population
Sample
Collect data from a
representative
Sample...
Perform Data
Analysis, keeping
probability in
mind…
Make an Inference
about the
Population.
From Data Analysis to Inference
Data Analysis:
Analyzing Categorical
Data
CHAPTER 1.1
Categorical VariablesCategorical variables place individuals into one of several groups or categories.
Frequency Table
Format Count of Stations
Adult Contemporary 1556
Adult Standards 1196
Contemporary Hit 569
Country 2066
News/Talk 2179
Oldies 1060
Religious 2014
Rock 869
Spanish Language 750
Other Formats 1579
Total 13838
Relative Frequency Table
Format Percent of Stations
Adult Contemporary 11.2
Adult Standards 8.6
Contemporary Hit 4.1
Country 14.9
News/Talk 15.7
Oldies 7.7
Religious 14.6
Rock 6.3
Spanish Language 5.4
Other Formats 11.4
Total 99.9
Count
Percent
Variable
Values
Frequency tables can be difficult to read.
Sometimes is is easier to analyze a distribution by displaying it
with a bar graph or pie chart.
Displaying Categorical Data
11%
9%
4%
15%
16%8%
15%
6%
5%
11%
Percent of Stations
Adult Contemporary
Adult Standards
Contemporary hit
Country
News/Talk
Oldies
Religious
Rock
Spanish
Other
0
500
1000
1500
2000
2500
Count of Stations
Graphs: Good and Bad
Bar graphs compare several quantities by comparing the heights of
bars that represent those quantities. Our eyes, however, react to the
area of the bars as well as to their height.
When you draw a bar graph, make the bars equally wide.
It is tempting to replace the bars with pictures for greater eye appeal.
Don’t do it!
There are two important lessons to keep in mind:
(1) beware the pictograph, and
(2) watch those scales.
Two-Way Tables and Marginal
Distributions
When a dataset involves
two categorical variables,
we begin by examining the
counts or percents in
various categories for one
of the variables.
A two-way table describes two categorical variables,
organizing counts according to a row variable and a
column variable.
Young adults by gender and chance of getting rich
Female Male Total
Almost no chance 96 98 194
Some chance, but probably not 426 286 712
A 50-50 chance 696 720 1416
A good chance 663 758 1421
Almost certain 486 597 1083
Total 2367 2459 4826
What are the variables
described by this
two-way table?
How many young
adults were surveyed?
Two-Way Tables and Marginal
Distributions
The marginal distribution of one of the categorical variables in a two-
way table of counts is the distribution of values of that variable among
all individuals described by the table.
Note: Percents are often more informative than counts, especially
when comparing groups of different sizes.
How to examine a marginal distribution:
1)Use the data in the table to calculate the marginal
distribution (in percents) of the row or column totals.
2)Make a graph to display the marginal distribution.
0
5
10
15
20
25
30
35
Almostnone
Somechance
50-50chance
Goodchance
Almostcertain
Per
cen
tSurvey Response
Chance of being wealthy by age 30
Two-Way Tables and Marginal
Distributions
Response Percent
Almost no
chance194/4826 = 4.0%
Some chance 712/4826 = 14.8%
A 50-50 chance 1416/4826 = 29.3%
A good chance 1421/4826 = 29.4%
Almost certain 1083/4826 = 22.4%
Examine the marginal
distribution of chance
of getting rich.
Young adults by gender and chance of getting rich
Female Male Total
Almost no chance 96 98 194
Some chance, but probably not 426 286 712
A 50-50 chance 696 720 1416
A good chance 663 758 1421
Almost certain 486 597 1083
Total 2367 2459 4826
Relationships Between Categorical
Variables
A conditional distribution of a variable describes the values of that
variable among individuals who have a specific value of another
variable.
How to examine or compare conditional distributions:
1) Select the row(s) or column(s) of interest.
2) Use the data in the table to calculate the conditional
distribution (in percents) of the row(s) or column(s).
3) Make a graph to display the conditional distribution.
• Use a side-by-side bar graph or segmented bar
graph to compare distributions.
Relationships Between Categorical
Variables
Response Male
Almost no chance 98/2459 =
4.0%
Some chance 286/2459 =
11.6%
A 50-50 chance 720/2459 =
29.3%
A good chance 758/2459 =
30.8%
Almost certain 597/2459 =
24.3%
Calculate the conditional
distribution of opinion among
males. Examine the relationship
between gender and opinion.
Young adults by gender and chance of getting rich
Female Male Total
Almost no chance 96 98 194
Some chance, but probably not 426 286 712
A 50-50 chance 696 720 1416
A good chance 663 758 1421
Almost certain 486 597 1083
Total 2367 2459 4826
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Males FemalesP
erc
en
t
Opinion
Chance of being wealthy by age 30
Almost certain
Good chance
50-50 chance
Some chance
Almost nochance
Female
96/2367 =
4.1%
426/2367 =
18.0%
696/2367 =
29.4%
663/2367 =
28.0%
486/2367 =
20.5%
Relationships Between Categorical
Variables
Caution!
Even a strong association between two categorical variables can
be influenced by other variables lurking in the background.
Can we say there is an association
between gender and opinion in the
population of young adults?
Making this determination requires
formal inference, which will have to
wait a few chapters.
Name: Hour: Date:
Lesson 1.1: What was your favorite toy as a child?
Is gender associated with certain favorite childhood toys? Collect class data using the following options: Barbies, Legos, Ninja Turtles, Video games, or dress-up clothes.
1. The table below displays the number of students who indicated the specified toy as their favorite.
Barbie Legos Ninja Turtles Video games Dress up clothes
8 9 2 8 2
2. Identify the individuals and variable?
3. Is the variable categorical or quantitative?
4. Sketch a bar graph and pie chart of the data below.
5. Sometimes it is helpful to graph more than one variable. The table below indicates how many male & female students indicated each toy as their favorite.
Find each of the following:
% of students who chose Ninja Turtles:
% of students who are Male and chose
Ninja Turtles:
% of Females who chose Legos:
Gender
Male Female
Barbie 1 7
Legos 5 4
Type of toy Ninja Turtles 2 0
Video games 5 3
Dress up clothes 0 2
Is gender associated with certain favorite childhood toys? Use the data below to
analyze this question.
Name: Hour: Date:
6. How many variables does the table have? Are the variables categorical or
quantitative?
7. Which variable would best explain or predict the other variable?
8. Make a side-by-side bar graph and a segmented bar graph for the data from
number 5. Sketch them below.
9. How do the bars in the side-by-side-bar graph relate to the bars in the
segmented bar graph?
10. Is there an association between gender and type of toy? If so, describe it.
11. If there was not an association between gender and toy, what would the graphs
look like?
Name: Hour: Date:
Lesson 1.1 – Analyzing Categorical Data
Check Your Understanding:
1. Students at a local high school were asked which
gaming system they preferred: the Playstation 3,
the Xbox 360 or neither. The graph shown at right
shows the results. Explain why the graph may be
misleading.
2. An article in the Journal of the American Medical Association reports the results of a
study designed to see if the herb St. John’s wort is effective in treating moderately
severe cases of depression. The study involved 338 patients who were being treated
for major depression. The subjects were randomly assigned to receive one of three
treatments: St. John’s wort, Zoloft (a prescription drug), or placebo (an inactive
treatment) for an 8-week period. The two way table summarizes the data from the
experiment.
a. What proportion of subjects in the study were
randomly assigned to take St. John’s wort?
Explain why this value makes sense.
b. Find the distribution of change in depression
for the subjects in this study using relative
frequencies.
c. What percent of subjects took Zoloft and showed a full response?
Important Ideas:
Chapter 1.1 Multiple Choice Practice Name:
Multiple-choice. Circle the letter corresponding to the best answer choice.
1. Sixteen percent of the students in the College of Engineering are majoring in Civil Engineering, 20% in
Chemical, 20% in Electrical, 35% in Biomedical, and 9% in Aeronautical. Which of the following could be
used to represent the data?
(A) a bar graph only. (B) a pie chart only. (C) a box plot only. (D) Either (A) or (B).
2. The pie chart above describes the distribution of favorite school subject for 300 high-school students. The
number of students whose favorite subject is math is:
(A) 26 (B) 45 (C) 60 (D) 78
3. If you want to make a graph to display the distribution of the salaries of the 1500 professors at a large
university. The best choice is
(A) a boxplot.
(B) a histogram.
(C) a pie chart.
(D) a stemplot.
4. A common abuse of bar graphs is
(A) to use bars of equal width, so that we can't see differences among the bars.
(B) to replace bars by pictures and increase both height and width as the variable plotted increases.
(C) to use bars that don't touch each other.
(D) to stretch or squeeze the scale at the base of the bars, so that our eyes are misled.
5. A company database contains the following information about each employee: age, date
hired, sex (male or female), ethnic group (Asian, black, Hispanic, etc.), job category (clerical, management,
technical, etc.), yearly salary. Which of the following list of variables
are all categorical?
(A) age, sex, ethnic group.
(B) ethnic group, job category, yearly salary.
(C) sex, ethnic group, job category.
(D) yearly salary, age, job category
Favorite Subject
Arts
8%
English
32%
History
11%
Other
8%
Math
26%
Science
15%
6. In a poll of 200 randomly selected students, males and females were asked their favorite type of pet. The
comparative bar chart shows the results.
Which statement below is true?
(A) Almost half the dog lovers are female.
(B) The number of male dog lovers in the data set is greater than the number of female dog lovers.
(C) The percentage of females who prefer cats is higher than the corresponding percentage for men.
(D) About 30% of the students in the sample who prefer cats are male.
7. One way economists measure the health of the real estate market is by counting “housing starts,” or the
number of permits issued for construction of new homes. Below is a graph displaying housing starts (in
thousands) in the United States from 2006 to 2009.
What is the principle weakness of this graphical presentation of data?
(A)
The data only shows housing starts for four years, which is not enough time to
identify a meaningful trend.
(B) Data of this type should only be displayed in a pie chart.
(C) Using proportionally-sized pictograms exaggerates the difference between years.
(D) It is unclear which dimension of the house represents the number of housing starts
for that year.
Recommended