13
AP statistics Chapter 2 Notes Name ______________________ Modeling Distributions of Data Per ___ Date ________________ 2.1A Distribution of a variable is the ________ a variable takes and ____________ it takes that value. When working with quantitative data we can calculate the _________ in the distribution by calculating the ______________ of that value. The percentile consists of the percentage of values that ______________ the value. Percentiles should be whole numbers, so if you get a decimal use your rounding rules. Sample: Wins in Major League Baseball The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2009. 5 9 6 2455 7 00455589 8 0345667778 9 123557 10 3 Find the percentiles for the following teams: (a) The Colorado Rockies, who won 92 games. (b) The New York Yankees, who won 103 games. (c) The Kansas City Royals and Cleveland Indians, who both won 65 games. 2.1A In chapter 1, we talked about frequency charts/graphs (________________), relative frequency charts/graphs (_____________), and now we have cumulative frequency charts/graphs. The cumulative frequency chart/graph is directly related to percentiles. It tells you how what _________________ of the values _______________ a certain value. We can also use this to find the 1 st quartile (25% below), the median (50% below), the 3 rd quartile (75% below) along with the five number summary and the interquartile range. Sample: State Median Household Incomes Here is a table showing the distribution of median household incomes for the 50 states and the District of Columbia. Median Income ($1000s) Frequency Relative Frequency Cumulative Frequency Cumulative Relative Frequency 35 to < 40 1 1/51 = 0.020 40 to < 45 10 10/51 = 0.196 45 to < 50 14 14/51 = 0.275 50 to < 55 12 12/51 = 0.236 55 to < 60 5 5/51 = 0.098 60 to < 65 6 6/51 = 0.118 65 to < 70 3 3/51 = 0.059 Key: 5|9 represents a team with 59 wins.

AP statistics Chapter 2 Notes Name - matsuk12.us

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

AP statistics Chapter 2 Notes Name ______________________

Modeling Distributions of Data Per ___ Date ________________

2.1A Distribution of a variable is the ________ a variable takes and ____________ it takes that value. When working with quantitative data we can calculate the _________ in the distribution by calculating the ______________ of that value. The percentile consists of the percentage of values that ______________ the value. Percentiles should be whole numbers, so if you get a decimal use your rounding rules. Sample: Wins in Major League Baseball The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2009.

5 9 6 2455 7 00455589 8 0345667778 9 123557 10 3

Find the percentiles for the following teams:

(a) The Colorado Rockies, who won 92 games.

(b) The New York Yankees, who won 103 games.

(c) The Kansas City Royals and Cleveland Indians, who both won 65 games.

2.1A In chapter 1, we talked about frequency charts/graphs (________________), relative frequency charts/graphs (_____________), and now we have cumulative frequency charts/graphs. The cumulative frequency chart/graph is directly related to percentiles. It tells you how what _________________ of the values _______________ a certain value. We can also use this to find the 1st quartile (25% below), the median (50% below), the 3rd quartile (75% below) along with the five number summary and the interquartile range. Sample: State Median Household Incomes Here is a table showing the distribution of median household incomes for the 50 states and the District of Columbia.

Median

Income

($1000s)

Frequency Relative

Frequency

Cumulative

Frequency

Cumulative

Relative

Frequency

35 to < 40 1 1/51 = 0.020

40 to < 45 10 10/51 = 0.196

45 to < 50 14 14/51 = 0.275

50 to < 55 12 12/51 = 0.236

55 to < 60 5 5/51 = 0.098

60 to < 65 6 6/51 = 0.118

65 to < 70 3 3/51 = 0.059

Key: 5|9 represents a

team with 59 wins.

Here is the cumulative relative frequency graph for the income data.

Determine the five number summary from the graph. Interpret the 3rd quartile from the graph:

***Remember the _____________ the segment of the graph, the more values that lie in that interval.

2.1A Standardized scores: The z-score Another way to measure the ____________ of a value in a distribution is the z-score. It is a

standardized ______________ that tells you how many standard deviations from the mean the

value is. Having a standardized location allows you to __________________ from different

distributions of the same shape. The z-score is not measured in the same units as the variable, it

is measured in ___________________. The z-score is also directional, a positive z-score tells

you it is ____________ the mean where a negative z-score tells you it is __________ the mean.

The number tells you how ________________________ above or below.

*****All standardized scores will have a mean of 0 and a standard deviation of 1.

******A standardized distribution will have the ____________________ as the original distribution.

Sample: Wins in Major League Baseball In 2009, the mean number of wins was 81 with a standard deviation of 11.4 wins. Find and interpret the z-scores for the following teams.

(a) The New York Yankees, with 103 wins.

(b) The New York Mets, with 70 wins.

2.1A Comparing z-scores Two basketball players are arguing about who is a better player. One is a top rebounder that averages 12 rebounds per game. The other is an assist leader with an average of 8.7 assists per game. The distribution of average rebounds per game in the league has a mean of 11.3 and a standard deviation of 1.74. The distribution of average assists per game in the league has a mean of 8.1 and a standard deviation of 1.47. Determine who is the better player based on their stats.

2.1A Sample: The characteristics of speed and strength are considered to be of equal importance to the team in selecting a player for the position. Based on the information about the means and standard deviations of the speed and strength data for all players and the measurements listed in the table below for Players A and B, which player should the team select if the team can only select one of the two players? Justify your answer.

Player A Player B x xS

Time to run 40 yds 4.42 s 4.57 s 4.60 s 0.15 s

Weight lifted 370 lbs 375 lbs 310 lbs 25 lbs

2.1B Transforming data The ______________ is just transformed data. We can analyze the ___________ of transformations on a distribution. Transformations occur when you add, subtract, multiply, or divide _________________ by a number.

When you add or subtract by a constant, the shape and spread ___________________.

The center will change in the same way (_______________________________________)

and so will each location value.

When you multiply or divide by a constant, the shape will ________________________.

The center, spread, and location will change is the same way (_____________________

__________________________)

Sample: Test Scores Here are a graph and table of summary statistics for a sample of 30 test scores. The maximum possible score on the test was 50 points.

n x xs Min Q1 M Q3 Max IQR Range

Score 30 35.8 8.17 12 32 37 41 48 9 36

a.) Suppose that the teacher was nice and added 5 points to each test score. How would this

change the shape, center, and spread of the distribution?

b.) Suppose that the teacher wanted to convert the original test scores to percents. Since the test was out of 50 points, how would this change the shape, center, and spread of the distribution?

When combining transformations, write an equation to illustrate what is happening. Solve the equation for the _____________________ then apply the transformations.

Score

10 15 20 25 30 35 40 45 50

Collection 1 Dot Plot

Sample: Taxi Cabs In 2010, Taxi Cabs in New York City charged an initial fee of $2.50 plus $2 per mile. In equation form, fare = 2.50 + 2(miles). At the end of a month a businessman collects all of his taxi cab receipts and calculates some numerical summaries. The mean fare he paid was $15.45 with a standard deviation of $10.20. What are the mean and standard deviation of the lengths of his cab rides in miles?

2.1B Density curves In chapter 1, we developed a guideline for analyzing quantitative data.

1. Graph your data: dotplot, stemplot, or histogram 2. Analyze the distribution (shape, center, spread) 3. Calculate the numerical summaries

Now we add 4. Sometimes the overall pattern of a large distribution is so regular that it can be

described by a smooth curve. This curve is called a density curve.

A density curve will always be ________________

the horizontal axis.

A density curve will always have an _____________

below it.

Density curves come in many ____________ and will

describe the overall pattern of the distribution.

No set of data is described ______________ by the

density curve.

It is an approximation that is easy and __________

______________to use.

2.1B Mean and median of density curves The median of a density curve can be found by dividing the area under the curve into __________ _____________________. The mean of a density curve is the ____________________ of the curve, where it would balance if it were made of solid material.

If the distribution is ____________________, the mean and median will be the same. If the distribution is skewed, the mean will be pulled toward the ________________.

2.2A Normal curves and normal distributions

One special type of density curve is the _____________ which describes a normal distribution. Normal distributions have special properties and contrary to their name, they are ______ normal, in fact they are unusual.

All normal curves are __________________________________________________.

The mean is located in the ___________ of the curve and is the same as the median.

Changing the mean without changing the standard deviation just moves the distribution along the ___________________.

The standard deviation controls the ______________ of the distribution.

The mean and standard deviation define the ___________ of the normal curve.

We should be able to eyeball the curve to determine the _______________________. The point on the curve where the slope begins to ____________ will be the standard deviation.

Normal distributions are good approximations of many kinds of __________ outcomes. Many statistical inference procedures are based on ________________________.

2.2A Empirical rule In a normal distribution with mean μ and standard deviation σ.

68% of the observations fall within one standard deviation (1σ) of the mean (μ)

95% of the observations fall within two standard deviations of the mean

99.7% of the observations fall within three standard deviations of the mean

Sample: Batting Averages In a previous example about batting averages for Major League Baseball players in 2009, the mean of the 432 batting averages was 0.261 with a standard deviation of 0.034. Suppose that the distribution is exactly Normal with = 0.261 and = 0.034.

(a) Sketch a Normal density curve for this distribution of batting averages. Label the points that are 1, 2, and 3 standard deviations from the mean. (b) What percent of the batting averages are above 0.329? Show your work. (c)What percent of the batting averages are between 0.227 and .295? Show your work.

2.2B Standard Normal distribution A normal distribution where the mean is ___ and the standard deviation is __ is called a _______ normal distribution. ________________ in this distribution has been translated into a standardized score (_____________).

2.2B Standard normal table

Usually we can use the Empirical rule to help us determine the area of a shaded region under the curve, but we have a ___________ that will allow us to more accurately find the area of a shaded region under the curve called a ________________.

1. Shade the appropriate region on the graph and use the table to find the area. a. z < 0.58 b. z < -1.12

2. Shade the appropriate region on the graph and use the table to find the area.

a. z > 0.26 b. z > -1.58

3. Shade the appropriate region on the graph and use the table to find the area. b. -1.19 < z < 0.26 b. -0.58 < z < 2.09

4. Using the table to find z-score. a.) 20th percentile b.) 76th percentile

2.2B Sample: Serving Speed In the 2008 Wimbledon tennis tournament, Rafael Nadal averaged 115 miles per hour (mph) on his first serves. Assume that the distribution of his first serve speeds is Normal with a mean of 115 mph and a standard deviation of 6 mph. About what proportion of his first serves would you expect to exceed 120 mph?

State: Nadal’s first serve has a Normal distribution with a mean of 115mph and a standard deviation of 6 mph. Find the percentage of serves that exceed 120 mph Plan: The graph to the left shows the distribution with the area of interest shaded. Do: Standardize:

Conclude: About % of Nadal’s first serves travel faster than 120 mph.

2.2B About what proportion of his first serves are between 100 and 110 mph? State: Nadal’s first serve has a Normal distribution with a mean of 115mph and a standard deviation of 6 mph. Find the percentage of serves that are between 100 and 110 mph

Plan: The graph to the left shows the distribution with the area of interest shaded. Do: Standardize:

Conclude: About % of Nadal’s first serves travel between 100 and 110 mph.

2.2C Normal Distribution Calculations with a calculator To find the area under the curve:

Press 2nd Vars (DISTR)

Choose 2 : normalcdf

Type in the lower bound

Type in the upper bound

Type in the mean

Type in the standard deviation z-score/standard normal curve will have mean 0 and standard deviation of 1

Enter

2.2C Normal Distribution Calculations with a calculator Starting with the area under the curve BELOW the value, finding the value:

Press 2nd Vars (DISTR)

Choose 3 : invNorm

Type in the area below the value and under the curve

Type in the mean

Type in the standard deviation z-score/standard normal curve will have mean 0 and standard deviation of 1

Enter

2.2C Normal Distribution Calculations with a calculator 1. Use your calculator to find the appropriate value, show your work. a. z < 0.58 b. z < -1.12

c. z > 0.26 d. z > -1.58

e. -1.19 < z < 0.26 f. -0.58 < z < 2.09

2. Using the calculator to find z-score. a.) 20th percentile b.) 76th percentile

2.2C Sample: Serving Speed In the 2008 Wimbledon tennis tournament, Rafael Nadal averaged 115 miles per hour (mph) on his first serves. Assume that the distribution of his first serve speeds is Normal with a mean of 115 mph and a standard deviation of 6 mph. About what proportion of his first serves would you expect to exceed 120 mph? Use the four-step process.

State: Nadal’s first serve has a Normal distribution with a mean of 115mph and a standard deviation of 6 mph. Find the percentage of serves that exceed 120 mph Plan: The graph to the left shows the distribution with the area of interest shaded. Do: Using the calculator:

Conclude: About % of Nadal’s first serves travel faster than 120 mph.

2.2C About what proportion of his first serves are between 100 and 110 mph? State: Nadal’s first serve has a Normal distribution with a mean of 115mph and a standard deviation of 6 mph. Find the percentage of serves that are between 100 and 110 mph Plan: The graph to the left shows the distribution with the area of interest shaded.

Do: Calculator only: Conclude: About % of Nadal’s first serves travel between 100 and 110 mph.

2.2C Sample: Heights of three-year-old females According to http://www.cdc.gov/growthcharts/, the heights of 3 year old females are approximately Normally distributed with a mean of 94.5 cm and a standard deviation of 4 cm. What is the third quartile of this distribution?

State: The heights of 3 year old females have a distribution N(94.5, 4). The third quartile is the value with 75% of the distribution to the left. Plan: The graph show the distribution with the area of interest shaded. Do: Using the calculator

Conclude: About 75% of three year old females are shorter than

2.2D Assessing Normality Normal distributions provide good models for some distributions of real data. Examples include SAT scores, IQ scores, highway gas mileage of all 2009 Corvette convertibles, weight of a 9-oz bag of cookies, etc. It can be risky though to _________ that a distribution is normal without actually ________________the data. We will use the empirical rule (________________) to determine if a distribution is approximately normal. Sample: No Space in the Fridge? The measurements listed below describe the useable capacity (in cubic feet) of a sample of 36 side-by-side refrigerators. <source: Consumer Reports, May 2010> Are the data close to Normal?

12.9 13.7 14.1 14.2 14.5 14.5 14.6 14.7 15.1 15.2 15.3 15.3 15.3 15.3 15.5 15.6 15.6 15.8 16.0 16.0 16.2 16.2 16.3 16.4 16.5 16.6 16.6 16.6 16.8 17.0 17.0 17.2 17.4 17.4 17.9 18.4

Step one: Graph the data Here is a histogram of these data. It seems roughly symmetric and bell shaped. Step two: On your calculator calculate the numerical summaries. Step three: Calculate the values one, two, and three standard deviations from the mean.

Step four: Calculate the percentage of observations for each interval. Conclude if it is approximately normal or not.

2.2D Normal probability plots

This is a plot that can also __________________.

It is a scatterplot with the ___________________ on the horizontal axis and the _____________ on the vertical axis.

If the points on the plot lie close to a _____________ _________, the plot indicates that the data are normal.

Sample: State land areas The histogram and Normal probability plot below display the land areas for the 50 states. Is this distribution approximately Normal?

Sample: NBA free throw percentage This is an example of a distribution that is skewed to the left. Notice that the lowest free throw percentages are too the left of what we would expect and the highest free throw percentages are not as far to the right as we would expect.

2.2C Sample: No Space in the Fridge?

The measurements listed below describe the useable capacity (in cubic feet) of a sample of 36 side-by-side refrigerators. <source: Consumer Reports, May 2010> Are the data close to Normal?

12.9 13.7 14.1 14.2 14.5 14.5 14.6 14.7 15.1 15.2 15.3 15.3 15.3 15.3 15.5 15.6 15.6 15.8 16.0 16.0 16.2 16.2 16.3 16.4 16.5 16.6 16.6 16.6 16.8 17.0 17.0 17.2 17.4 17.4 17.9 18.4

Sketch a normal probability plot to determine if the distribution is approximately normal: