55
Unit 1: Descriptive Statistics

Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

Unit 1:

Descriptive

Statistics

Page 2: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

1

Day 1: Dot Plots & Basic Statistics

Mean: “Average” of numbers. Add all the numbers together and divide by the total number. Median: Middle number. Number from least to greatest to determine. 50% of the data is above it and 50% is below it. Mode: Occurs most often. Can have more than one mode or none. Range: Highest minus lowest.

Dot Plots Dot Plots – A data display that represents the data values as dots over a number line. These are useful

to see both the most repeated data value and the spread of the data.

Things to remember when constructing dot plots:

- draw and label the horizontal axis and add a title

- draw a dot to represent a single occurrence of data

- choose an appropriate scale which goes up by a consistent value

Example 1: Students in Mr. Z's class were surveyed as to the number of brothers and sisters in

their families (not counting themselves). The results are displayed in the table below. a.) Make a Dot Plot

Student Number of

Brothers/Sisters

Allison 2

Bernard 4

Carlos 3

Catherine 2

Delia 2

Dion 1

Emma 0

Fiona 2

Harley 3

Ian 2

Justin 1

Paul 1

Rhianna 3

Stanley 0

Vincent 4

Page 3: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

2

b) Which is the most common number of brothers and sisters in this class?

c) How many students have a total of 3 brothers and sisters?

d) How many students, in total, were surveyed?

e) How many students have more than two brothers and sisters?

f) What percent of students have more than two brothers and sisters? (Round to nearest tenth)

How do I use this Calculator?

I am on some weird looking screen…how do I get back to my home screen? Press 2nd then mode (the ________) button! How do I put data into my calculator? First you will need to put your data into your calculator. To do this press ________, then hit ________, and enter your data into L1. How do I get rid of the numbers in L1? If you already have data in L1 and you need to get rid of it, press ________________. How do I find the mean, median, Q1, Q3 or standard deviation? Press ________ and move right to the ________ menu and hit ______________. If your data is only in L1, then your calculator should look like this: OR If your data is in L1 and their frequencies are in L2, then your calculator will look like this: OR

How do I type L1 and L2 into my calculator? Hit the 2nd button and ______ for L1 or 2nd _______ for L2.

Page 4: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

3

1) The dot plot shows the number of miles that Mrs. Knapp traveled in her new pull behind RV. Find the

mean, median, mode, and range of the data set.

2) During each marking period, there are 5 tests. If John needs a 65 average to pass this

marking period and his first four grades are 60,72,55, and 80, what is the lowest score he can

earn on the last test to have a passing average?

3) The prices of seven race cars sold last week are listed in the table below.

a) What is the mean value of these race cars, in dollars?

b) What is the median value of these race cars, in dollars?

Page 5: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

4

Day 2: Histograms Warm-Up:

1) A basketball team consists of 15 girls. The table shows the number of points each player

scored in one season. Which interval contains the median for these data?

2) Based on the frequency table below which represents the distribution of weight, in pounds, of 32

students, answer the following questions below.

a.) Which interval contains the median?

b.) Which interval contains the lower quartile?

c.) Which interval contains the upper quartile?

3) Maria could not remember her scores from five math tests. She did remember the mean was

exactly 80, the median was 81, and the mode was 88. If all her scores were integers with 100 the

highest score possible and 0 the lowest score possible, what was the lowest score she could have

received on any one test?

Page 6: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

5

Histograms – Also can be referred to as frequency histograms are connected bar graphs in which the

interval is represented by the width of its bar and the frequency is represented by the height.

Things to remember when constructing histograms:

- complete any tables (if necessary)

- title the histogram

- draw and label the vertical and horizontal axes

- draw vertical bars for each interval (no spaces between the bars)

- if the interval does not start at zero include a break

Example 1: The following data represents the heights of 15 students in a certain class:

63, 59, 71, 63, 59, 68, 61, 60, 69, 55, 64, 70, 64, 68, 72

a) Complete the table:

b) Construct a frequency histogram based on the data.

Interval Frequency

Interval Cumulative Frequency

55 – 58

55 – 62

55 – 66

55 – 70

55 – 74

Page 7: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

6

Example 2: The following data represents the number of minutes late 17 students were to their math class:

9, 25, 30, 31, 34, 36, 37, 42, 45, 47, 49, 43, 55, 58, 61, 63, 67

a) Complete the table:

b) Construct a frequency histogram based on the data.

c) What percent of students, rounded to the nearest thousandth, were between 31 and 40 minutes late?

d) What percent of students were at least 41 minutes late? Justify your answer.

Interval Frequency

0-9

10-19

20-29

30-49

50-59

60-69

Page 8: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

7

Practice Problems

_____ 1) The histogram shows the distribution of the number of children in the families of the students in

a ninth-grade class. The mode of the set of data in the histogram is

(1) 5 (2) 2

(3) 3 (4) 7

_____ 2) From January 3 to January 7, Buffalo recorded the following daily high temperatures:

5º, 7º,6º,5º, and 7º. Which statement is true?

(1) mean = median (2) median = mode

(3) mean = mode (4) mean < median

3) The histogram shows the grade distribution for a mathematics test given to Ms. Keith's class.

How many students are in the class? What is the mode?

4) The frequency table shows the distribution of weight, in pounds, of 32 students. What

percentage of students weigh more than 119 pounds?

Page 9: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

8

Day 3: Box Plots

Box Plot – Shows the minimum, maximum, first quartile, median, and third quartile of numerical data.

A box plot shows the range of data in a data set and measures of center can be easily seen.

Things to remember when constructing box plots:

- find Q1, median, Q3, min and max values

- choose and set up an appropriate scale on a horizontal axis

- the vertical lines in the box represent Q1, median, and Q3, in that order

- horizontal lines are needed to connect the box to the minimum and maximum values

The website Rate My Phone conducts reviews of Smartphones. One aspect of the phones that is

tested is battery life. The minutes of battery life for the newest 25 phones is recorded in the table

below. Draw a box plot to represent the data.

Page 10: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

9

Box Plot Vocabulary:

First Quartile: __________________________________________________________________ __________________________________________________________ _____________

Third Quartile: __________________________________________________________________ __________________________________________________________ _____________

Interquartile Range: __________________________________________________________________ __________________________________________________________ _____________

Outliers: __________________________________________________________________ __________________________________________________________ _____________

Example 1: The accompanying diagram shows a box-and-whisker plot of student test scores on

last year’s Algebra midterm examination.

According to the box and whisker plot shown above, what is:

a) the first quartile: ______ b) median: ______ c) the third quartile: ______

d) the range: ______ e) interquartile range: ______ f) maximum value: ______

Example 2: The heights, in inches, of 12 students are listed below.

61, 67, 72, 62, 65, 59, 60, 79, 60, 61, 64, 63

Peg thinks 79 is an outlier, is she correct? Justify your answer.

*

Page 11: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

10

Example 3:

a) Using the following data sets, draw a box plot for each.

Data Set 1: 12, 14, 14, 12, 16, 13, 11, 14, 18

Data Set 2: 16, 14, 13, 13, 18, 12, 11, 12, 12

Data Set 1:

Data Set 2:

b) Answer the following questions using the given box plots, explain your reasoning for each.

1) Which data set has the greater median?

2) Which data set has the greater interquartile range?

3) Describe all the similarities and differences between the two data sets.

4) Given just a box plot, can you determine the mean?

Page 12: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

11

Day 4: Center, Shape, and Spread

Data sets can be compared by examining the differences and similarities between

measures of center and spread.

Measures of Center: Mean and Median

1.)

a.) Which player appears to be an outlier for the data above? Verify it using the outlier formula.

b.) Determine the mean and median.

c.) Which measure of center (mean or median) is a better representation of the data?

Key Concepts: 1) When analyzing a set of data which is closely related you can use ________________________.

2) If there are any data values in the set that are much larger or smaller than the rest of the set

use the _________________ to determine the measure of center.

Page 13: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

12

Shape:

Skewed Left Tail to the _______________.

Concentrated toward the

_________________ part of the data.

Measure of Center: ______________

Mean _____________ Median

Skewed Right Tail to the _______________.

Concentrated toward the

_________________ part of the data.

Measure of Center: ______________

Mean _____________ Median

Symmetric

Measure of Center: ________________

Mean _____________ Median

Uniform Measure of Center: ______________

Mean _____________ Median

�̅� = 30 Med = 31

�̅� = 26 Med = 25

�̅� = 30 Med = 30

�̅� = 30 Med = 30

Page 14: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

13

Spread: Range, Interquartile Range and Standard

Deviation

In order to get a more complete view of a set of data, it is sometimes helpful to consider the amount of

spread (variation) within the set of data. Range and interquartile range are two measures of spread.

Standard deviation is another.

The Deviant Bowler

a) Below are two different tables which represents bowling scores for 6 different games.

Both Kate and Gary have a bowling mean (average) of 185, but what differences

in their scores do you notice?

b) Find the standard deviation for Kate and Gary. (Round to the nearest tenth)

c.) Based on your answer from Part A, what does a larger standard deviation mean?

Kate’s Score

(x)

180

182

185

185

188

190

Gary’s Score

(x)

135

155

185

185

200

250

Page 15: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

14

2) Two hockey teams recorded the number of goals scored each game in the tables below. Find

the standard deviation for each team. Then answer the questions below

Ice Kings S.D. : ____________

Gliders S.D. : ____________

Key Concepts:

1) ___________ , _______________________ , and ______________________are

three measures of __________________ (variation, dispersion, etc.).

2) The larger the standard deviation the larger the __________________.

3.) When describing spread use __________ or ___________ when data is skewed.

4.) When describing the spread use _____________ ________________when the data is

symmetric.

Page 16: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

15

If I am asked to find shape, center, spread…

I MUST find _______________ first

Page 17: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

16

Day 5: Putting it all Together

Warm-Up:

1) Find the following statistical measures for the given data. Round measures to the nearest tenth, if necessary: 7, 14, 6, 49, 15, 74, 24, 64, 75, 74, 24, 63, 24, 64, 5, 64, 12, 5, 6, 25

Mean: ______ Median: ______ First quartile: ______

Third quartile: _____ Range: ______ IQR: _____ Standard Dev: _____

How do I plot a histogram in my calculator? First put your data into a list (L1). Then press the button called “stat plot”…it is 2nd ________! Click on any stat plot and turn it on. Move the arrow down and click on the picture of a histogram under “type”. Make sure the list is correct for where you put your data. If you did it correct, your screen should look like this:

To actually have your calculator draw the histogram, you will need to press __________________!!

Why would I need to do this? You can see the shape of the data and start to decide which measure of center would be best to use. How do I plot a box plot in my calculator? First put your data into a list (L1). Then press the button called “stat plot”…it is 2nd ________! Click on any stat plot and turn it on. Move the arrow down and click on the _________ picture of the box plot under “type”. Make sure the list is correct for where you put your data. If you did it correct, your screen should look like this:

To actually have your graph draw the box plot, you will need to press __________________!!

**Fun fact: You can press the trace button and move left and right on your box plot to have the calculator tell you the 5 important measures!

Why would I need to do this? If the box plot shows an *, then your data includes an ______________.

Page 18: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

17

Ex. 1) The auto repair shop is concerned about the reliability of its lathe. A lathe is a rotating

piece of equipment used in restoring parts. The shop’s owner measured the rotations per minute

at various times over the course of a month. These are his measurements:

250, 251, 253, 253, 253, 254, 257, 257, 259, 259, 261, 263, 265, 270, 291

a. Describe center, shape, and spread. (Round to nearest tenth)

b. When reporting the center of this data, or the “typical RPM,” which would be more

appropriate, the mean or the median? Why?

Ex. 2) Using the following dot plot, determine the center and spread.

Page 19: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

18

Ex. 3) As the Managing Grower at All-Organic Orchards, you know the farmer’s market will

only accept fruit that is above average size for your orchard. However, they do not define how to

calculate “average,” with a mean or with a median. You want to sell as much fruit as possible

for premium prices at the farmer’s market.

a. Your apple crop is represented by the first histogram and box plot above. To get the maximum

amount of fruit to the market, should you use the mean or the median for the “average?”

b. Your lemon crop is represented by the second histogram and box plot above. Why do you suppose

the right “whisker” is missing? Which average should you calculate?

c. Why might the farmers at other orchards feel like you were being deceptive in reporting the mean?

d. To best represent numerically the center, or the “typical” value, of a large quantity of data, when

would you use the mean and when would you use the median?

Page 20: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

19

Day 6: Putting it all Together Continued

Ex. 1) Speed Trap: A statistically-minded state trooper named Suzy wondered if the speed

distributions are similar for cars traveling northbound and for cars traveling southbound on an

isolated stretch of interstate highway. She uses a radar gun to measure the speed of all

northbound cars and all southbound cars passing a particular location during a fifteen minute

period. Here are her results.

Northbound Cars

60 62 62 63 63

63 64 64 64 65

65 65 65 66 66

67 68 70 83

Draw box plots of these two data sets, and then use the plots and appropriate numerical

summaries of the data to write a few sentences comparing the speeds of northbound cars and

southbound cars at this location during the fifteen minute time period.

*Use 1 Number Line for both Box Plots and be sure to check for outliers.

Southbound Cars

55 56 57 57 58

60 61 61 62 63

64 65 65 67 67

68 68 68 68 71

Page 21: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

20

Ex. 2) Kirsten plays softball in the spring. Each game, she records the number of times she

reaches first base without being called out. Use the data in the table to solve problems a – e.

a. Create a dot plot showing the number of times Kirsten reached first base in each game.

b. What is the median number of times Kirsten reached first base?

c. Find the minimum, maximum, first quartile, and third quartile of the data set.

d. Create a box plot showing the number of times Kirsten reached first base.

e. Kirsten wants to analyze her performance using this data. She wants to understand the range of her

data and the frequency of different results. Which graph, the dot plot or the box plot, will be most

useful to Kirsten? Explain.

Page 22: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

21

Ex. 3) Dr. Singh is a veterinarian. He records the weights of each pet. The weights of 10

German shepherds, all 4-year-old males, are in the table below, rounded to the nearest pound.

Use this information to solve problems a – c.

a. Create a histogram showing the weights of Dr. Singh’s German shepherds.

b. Create a box plot showing the weights of the German shepherds.

c. Dr. Singh wants to analyze the weights of the German shepherds. He wants to understand the

center and spread of his data, so that he has a better idea of an expected weight for a 4-year-old

male German shepherd. Which graph would be most useful to Dr. Singh? Explain.

Page 23: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

22

Day 7: Two-Way Frequency Tables

Information about people who are surveyed can be captured in two-way frequency tables. A two-way

frequency table is a table of data that separates responses by a characteristic of the respondents.

Let’s gather some class data about gender vs. social media preference and analyze the results.

Think before we start: Why might a dot plot, histogram, or box plot not be the best form of

representing this data?

Twitter Instagram Snapchat

Male

Female

Follow-up questions:

1. What percentage of students in the class preferred Snapchat?

2. What percentage of males in the class preferred Snapchat?

3. Based on the sample, if there are 350 freshman at Victor High school, how many of them

will prefer Instagram?

Page 24: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

23

Ex. 1) Cameron surveys students in his school who play sports, and asks them which sport they

prefer. He records the responses in the table below.

a) How many male students prefer soccer?

b) How many female students prefer basketball?

These are both examples of .

c) How many students prefer baseball?

d) How many females were surveyed?

These are both examples of .

e) What is the percentage of students, to the nearest tenth of a percent, surveyed who prefer

basketball?

f) What is the percentage, to the nearest percent, of students surveyed who prefer soccer?

These are both examples of .

g) What is the percentage, to the nearest percent of females who prefer soccer?

h) Of the students who prefer basketball, what percentage, to the nearest hundredth of a

percent, are male?

These are both examples of

Page 25: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

24

Ex. 2) Abigail surveys students in different grades, and asks each student which pet they prefer.

The responses are in the table below. Round all answers to the nearest tenth of a percent.

a) What percent of 9th graders prefer dogs?

b) True or False: “About 9% of 10th graders prefer fish”

Justify your answer.

c) What percentage of students (to the nearest percent) like birds?

d) To the nearest tenth of a percent, what is the conditional relative frequency of the 10th graders

who like cats?

Ex. 3) Mr. Smith keeps track of his students’ homework completion. He keeps track of how many boys

and girls do not complete their homework. He puts students who don’t complete their homework into two

categories: first time offenders and repeat offenders. He uses a table to keep track of the results.

First-Time Offenders Repeat Offenders Total

Boys

Girls

Total

a) In one month 36 girls and 12 boys did not do their homework for the first time. 12 girls and 30 boys

did not do their homework again. Put these figures in the table above.

b) How many students did not complete all of their homework assignments this month?

c) What percentage of the students who did not complete their homework were boys who were

First-Time Offenders?

d) Are boys or girls more likely to not complete their homework? Explain.

Page 26: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

25

Ex. 4) Complete the two-way table for 9th Grader’s school transportation survey:

Male Female Total

Walk 46

Car 28 45

Bus 12 27

Bike 17 69

Total 200

a) What percentage of 9th grade girls, to the nearest hundredth of a percent, walk to school?

b) Of all the 9th graders surveyed, what percentage (to the nearest percent) are girls who walk to

school?

c) What is the joint frequency of males who bussed to school?

d) What percent of males walk to school? (Round to the nearest thousandth of a percent.)

e) Based on this survey, if the sophomores follow the same trend as the freshmen, of the 275

sophomores, about how many will walk to school?

f) If 50 out of the 170 juniors in the school ride their bike to school, is this percentage higher or

lower than the freshmen? Justify your answer below.

Page 27: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

26

Ex 5)

Page 28: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

27

Day 8: Linear Regression

Warm-up: Type the following data into your calculator where body mass is L1

and L2 is bite force.

Example 1: Crocodiles and Alligators

Scientists are interested in finding out how different species adapt to finding food sources. One

group studied crocodilians to find out how their bite force was related to body mass and diet. The

table below displays the information they collected on body mass (in pounds) and bite force (in

pounds).

Crocodilian Biting

Species Body Mass

(pounds)

Bite Force

(pounds)

Dwarf crocodile 35 450

Crocodile F 40 260

Alligator A 30 250

Caiman A 28 230

Caiman B 37 240

Caiman C 45 255

Crocodile A 110 550

Nile crocodile 275 650

Crocodile B 130 500

Crocodile C 135 600

Crocodile D 135 750

Caiman D 125 550

Indian gharial

crocodile 225 400

Crocodile G 220 1,000

American crocodile 270 900

Crocodile E 285 750

Crocodile F 425 1,650

American alligator 300 1,150

Alligator B 325 1,200

Alligator C 365 1,450

Data Source: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0031781#pone-

0031781-t001

(Note: Body mass and bite force have been converted to pounds from kilograms and newtons,

respectively.)

Page 29: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

28

The scatter plot below displays the data on body mass and bite force for the crocodilians in the study.

Answer the following questions.

1. Describe the relationship between body mass and bite force for the crocodilians shown in the

scatter plot.

2. On the scatter plot above draw a line to represent the trend in the data. This line is sometimes

referred to as a “line of best fit”. Will everyone in the class have the same line?

3. Several students decided to draw lines to represent the trend in the data. Consider the lines

drawn by Sol, Patti, Marrisa, and Taylor, which are shown below. Which line best fits the data?

4. Write a rule for the function that you selected above where x is the body mass and y is the

bite force. (HINT: Start by using two points on your line to calculate slope).

Body Mass (pounds)

Bit

e F

orc

e (

po

un

ds)

4003002001000

1800

1600

1400

1200

1000

800

600

400

200

0

Body Mass (pounds)

Bit

e F

orc

e (

po

un

ds)

400350300250200150100500

1800

1600

1400

1200

1000

800

600

400

200

0

Sol's Line

Body Mass (pounds)

Bit

e F

orc

e (

po

un

ds)

400350300250200150100500

1800

1600

1400

1200

1000

800

600

400

200

0

Patti's Line

Page 30: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

29

5. Put your rule (equation) into slope-intercept form if it is not already in that form. Explain

what your slope and y-intercept mean in the context of the problem.

6. Using your calculator, now determine the linear regression model round all values to the

nearest hundredth by following the steps below.

Finding the Linear Regression Line, Least-Squares Line, or Line of Best Fit using your calculator

Step 1: From your home screen, press STAT.

Step 2: From the STAT menu, select the EDIT option.

Step 3: Enter the x-values of the data set in L1.

Step 4: Enter the y-values of the data set in L2.

Step 5: Select STAT. Move cursor to the menu item CALC and then move the cursor to option 4 and

then press enter twice.

The graphing calculator can find a linear regression model (equation) for any two sets of data.

Some sets of data have a stronger linear relationship than others, though, making the linear

regression equation a good model for the data.

1) How can you tell from a scatter plot is there is a strong linear relationship between the

two sets of data?

Page 31: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

30

In addition to creating a line of best fit, the calculator can also provide information about how

strong the linear relationship is between two variables. It calculates a value called a correlation

coefficient. Below are seven scatter plots and their correlation coefficients.

a) What do the correlation coefficients for graphs A, B and E have in common? How does

this relate to the graphs?

b) What do the correlation coefficients for graphs C, D and G have in common? How does

this relate to the graphs?

c) What is true about the points in the scatter plots for graphs A and D? Do they represent a

strong linear relationship?

d) Graph F has a correlation coefficient of zero associated with it. Looking at the scatter

plot, what do you think r = 0 tells you about the relationship between the input and output

values?

A B C D

E F G

r = -1 r = -0.90 r = 0.90

r = 1

r = -0.4 r = 0

r = 0.60

Page 32: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

31

Key Concepts:

1) A ______________________ is a relationship between two events, where a change in one

event implies a change in another event. (It does not CAUSE a change, it implies a change)

2) The strength of the relationship between data that has a linear trend can be analyzed using the

______________

__________________________________ , r.

3) A correlation coefficient falls between _____ and ______. It can be written as the inequality

___________________.

4.) A correlation coefficient greater than zero, but less than .5 is considered __________________.

5.) A correlation coefficient between 0.5 to 0.8 is considered _________________________ .

6.) A correlation coefficient above 0.8 is considered _______________________.

Linear Correlation

Estimate the correlation coefficient for the following scatter plots.

Page 33: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

32

1) Using the data below representing class size and average student test score to answer the

questions below.

a) Write the linear regression equation for these data, rounding all values to the nearest

thousandth.

b) Explain what the numbers in the equation above represent in the context of the problem.

c) State the correlation coefficient of the linear regression equation, to the nearest hundredth.

Explain the meaning of this value in the context of these data.

Page 34: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

33

Day 9: Linear Regression Continued

_____ 1) What is the correlation coefficient of the linear fit of the data shown below, to the nearest hundredth?

(1) 1.00 (3) –0.93

(2) 0.93 (4) –1.00

2) In a mathematics class of ten students, the teacher wanted to determine how a homework

grade influenced a student’s performance on the subsequent test. The homework grade and

subsequent test grade for each student are given in the accompanying table.

a) Write the equation of the linear regression line for this set of data. Round the coefficients to

the nearest thousandth.

b) A new student comes to the class and earns a homework grade of 78. Based on the equation

in part a, what grade would the teacher predict the student would receive on the subsequent test,

to the nearest integer?

Page 35: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

34

3)

Page 36: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

35

4) A real estate agent plans to compare the price of a cottage, y, in a town on the seashore to the

number of blocks, x, the cottage is from the beach. The accompanying table shows a random

sample of sales and location data.

a) Write a linear regression equation that relates the price of a cottage to its distance from the beach. Round

to the nearest whole number.

b) Use the equation to predict the price of a cottage, to the nearest dollar, located three blocks from the beach.

5) The availability of leaded gasoline is decreasing, as shown in the accompanying table.

a) State a least-squares line (linear regression equation) represented by the data table when 𝑥 = 0

is used to represent the year 1984 and y is used to represent gallons available (round all values to

the nearest tenth).

b) If this relationship continues, determine the number of gallons, to the nearest gallon, of

leaded gasoline available in the year 2005.

c) If this relationship continues, during what year will leaded gasoline first become unavailable?

Page 37: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

36

6) Five scatter plots are shown below and five correlation coefficients are given. Write the

letter of the correlation coefficient that would go with each scatter plot on the lines below

the correct graph.

a. 0.05 b. 0.97 c. -0.94 d. -0.49 e. 0.68

__________ __________ __________

___________ ___________

Page 38: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

37

7) Given the information in the table below, answer the given questions.

Page 39: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

38

8) Two different tests were designed to measure understanding of a topic. The two tests were

given to ten students with the following results:

(a) Write an equation for the line of best fit, round to the nearest hundredth if necessary.

(b) Predict the score, to the nearest integer, on test y for a student who scored 87 on test x.

(c) State the correlation coefficient of the linear regression equation, to the nearest hundredth.

Explain the meaning of this value in the context of these data.

_____ 9) What could be the approximate value of the correlation coefficient for the accompanying scatter plot?

(1) -0.85

(2) -0.16

(3) 0.21

(4) 0.90

Page 40: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

39

Day 10: Causation & Correlation

The following graph depicts the relationship between the number of sunglasses sold and ice cream sales.

a) Is there a correlation between the two variables? If so, describe the correlation.

b) Remie thinks this graph shows that the number of sunglasses sold caused the amount of

money earned in ice cream sales to increase. Do you agree? Why or why not?

Page 41: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

40

Using data from reliable sources, Remie created the following scatter plot.

c) She notices a strong negative correlation and concludes that the number of highway

fatalities between 1996 and 2000 were dependent on the number of fresh lemons

imported from Mexico. Do you agree with her analysis of the data? Why or why not?

Yesterday in class Remie’s health teacher said “Scientific research shows that there is a strong

positive correlation between the number of minutes a person spends exercising and the number

of calories burned”.

d) Sketch a scatter plot representing the relationship described between calories burned and

minutes spent exercising.

e) How is this relationship different than the correlations in the previous two examples?

Minutes Exercising

Cal

ori

es B

urn

ed

Page 42: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

41

Key Concepts:

_____________________ occurs when the change in one quantity causes the change in the other

quantity.

Correlation does not imply _______________________________. In other words, just because

two variables are strongly correlated we cannot assume that one causes the other.

Note: Correlation can easily be shown with a scatter plot and/or a best fit line. Causation

requires controlled experiments and more in-depth statistical study.

Example 1

There is a positive correlation between the amount of time a basketball player spends in a

game and the number of points that s/he scores. Does this mean that spending more time

in a game causes a player to score more points? Explain.

Example 2

Each of the following pairs of variables has a strong correlation. Which pair do you think also

has causation?

(A) Shoe size and reading level

(B) Age and height

(C) Taking pain killers and level of pain

(D) Number of pets and age

Page 43: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

42

Practice:

_____ 1) Which situation describes a correlation that is not a causal relationship?

(1) The rooster crows and the Sun rises.

(2) The more miles driven, the more gasoline needed.

(3) The more powerful the microwave, the faster the food cooks.

(4) The faster the pace of a runner, the quicker the runner finishes.

_____ 2) Which relationship can best be described as causal?

(1) height and intelligence

(2) shoe size and running speed

(3) number of correct answers on a test and test score

(4) number of students in a class and number of students with brown hair

_____ 3) Which situation describes a correlation that is not a causal relationship?

(1) the length of the edge of a cube and the volume of the cube

(2) the distance traveled and the time spent driving

(3) the age of a child and the number of siblings the child has

(4) the number of classes taught in a school and the number of teachers employed

______4) What type of relationship exists between the number of miles driven in a car and the amount

of gasoline used by that car.

(1) Positive correlation but not causal

(2) Positive correlation and causal

(3) Negative correlation but not causal

(4) Negative correlation and causal

______5) What type of relationship exists between the number of miles driven in a car and the amount

of gasoline left in the gas tank of the car.

(1) Positive correlation but not causal

(2) Positive correlation and causal

(3) Negative correlation but not causal

(4) Negative correlation and causal

_____ 6) Which phrase best describes the relationship between the number of miles driven and the

amount of gasoline used?

(1) causal, but not correlated

(2) correlated, but not causal

(3) both correlated and causal

(4) neither correlated nor causal

Page 44: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

43

_____ 8) The table below shows the number of prom tickets sold over a ten-day period.

The scatter plot of the data in the table has which type of correlation?

(1) Positive

(2) Negative

(3) Zero

(4) None

_____ 9) A scatter plot was constructed on the graph below and a line of best fit was drawn.

What is the equation of this line of best fit?

(1) y = x + 5

(2) y = x + 25

(3) y = 5x + 5

(4) y = 5x + 25

_____ 10) A linear regression equation of best fit between a student’s attendance and the degree of

success in school is h = 0.5x + 68.5. The correlation coefficient, r, for these data would be

(1) 0 < r <1

(2) −1 < r <0

(3) r = 0

(4) r = −1

Page 45: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

44

11) A hockey coach wants to determine if players who take many practice shots during practice

have a higher shooting percentage. The shooting percentage is calculated by dividing the number

of goals scored by the number of shots taken. The coach records the number of practice shots 10

players take each practice, and compares the number with each player’s shooting percentage over

the season. Is there a linear relationship between the practice shots and shooting percentage? Use

the correlation coefficient, r, to explain your answer.

_____ 12) Which graph represents a linear regression that produces a correlation coefficient closest to -1?

(1) (2)

(3) (4)

Page 46: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

45

Day 11: Residuals and Residual Plots

Page 47: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

46

(b) Fill in the table below with the predicted values (round to the nearest integer) and the

residuals for each data point.

Hours Studying 3 7 2 11 8 16 5 9

GPA 78 80 75 94 89 92 80 84

Prediction

Residual

(c) Produce, using your calculator, the residual graph.

It does not need to be exact and draw a sketch below.

Page 48: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

47

Key Concepts:

1) The fit of a linear function to a set of data can be assessed by analyzing

__________________.

2) A residual is the ___________________ distance between the line of best fit and an actual data value.

Residual = ____________________ – ______________________

3) The predicted value can be found by using the ____________________________________ .

4) Representing residuals on a residual plot provides a visual representation of the residuals.

A ____________________ residual plot, with both positive and negative residual values, indicates that the

line is a good fit for the data.

5) A ____________________ residual plot indicates that the line is likely not a good fit for the data.

Page 49: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

48

(c) Fill in the table below with the predicted values (round to the nearest integer) and the

residuals for each data point.

Time (sec) 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Distance (ft) 0 0.4 1.5 3.2 5.6 8.5 12.6 17.2 22.8

Prediction

Residual

(d) Produce, using your calculator, the residual graph.

It does not need to be exact and draw a sketch below.

Page 50: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

49

Day 12: Residuals and Residual Plots Continued

1.

Page 51: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

50

2.

3.

Page 52: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

51

4. Kendra wondered if the relationship between shoe length and height might be different for women.

To investigate, she collected data on shoe length (in inches) and height (in inches) for 12 women.

Shoe Length Height

8.9 61

9.6 61

9.8 66

10.0 64

10.2 64

10.4 65

10.6 65

10.6 67

10.5 66

10.8 67

11.8 70

11.0 67

a) Is there a relationship between shoe length and height for these 12 women? Explain.

b) Find the equation of the least-squares line. (Round values to the nearest hundredth)

c) Find the correlation coefficient for the data, rounded to the nearest thousandth.

d) Suppose that these 12 women are representative of adult women in general. Based on the

least-squares line, what would the height of a woman whose shoe length is 10.5 inches, rounded to the

nearest inch?

e) One of the women in the sample had a height of 63.55 inches. Based on the regression line, what

would her shoe length be, rounded to the nearest tenth of an inch?

Page 53: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

52

f) What is the value of the residual associated with the observation for the woman with the shoe length

of 9.8?

g) Describe the slope of the least-squares line within the context of the problem.

h) Does the y – intercept for your least-squares line make sense in the context of this problem?

Explain why or why not.

5. A linear function is used to estimate a data set. The residuals of the line fitted to the data are

determined and are included in the residual plot below, where the x-axis represents days.

a) What information does the residual plot tell you about your linear regression? Explain.

b) If the linear regression model for the data set is y = 4x + 3, then determine the actual value at

day 4.

Page 54: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

53

6. The curb weight of a car is the weight of the car without luggage or passengers. The table

below shows the curb weights (in hundreds of pounds) and fuel efficiencies (in miles per gallon)

of five compact cars.

Curb Weight

(100 lb)

Fuel Efficiency

(miles per gallon)

25.33 43

26.94 38

27.79 30

30.12 34

32.47 30

Using a calculator, the least-squares line for this data set was found to have the equation:

y = 78.62 – 1.5290x

where x is the curb weight (in hundreds of pounds) and y is the predicted fuel efficiency (in miles per gallon).

The scatter plot of this data set is shown below, and the least-squares line is shown on the graph.

a) Before calculating the residual, look at the scatter plot. Will the residual for the car whose curb

weight is 25.33 be positive or negative? Roughly what is the value of the residual for this point?

b) Determine if the residual for the car whose curb weight is 27.79 will be positive or negative?

Roughly what is the value of the residual for this point?

Page 55: Unit 1: Descriptive StatisticsSpread: Range, Interquartile Range and Standard Deviation In order to get a more complete view of a set of data, it is sometimes helpful to consider the

54

c) Calculate the residuals, rounded to the nearest tenth, and write them in the table.

d) Suppose that a car has a curb weight (in hundreds of pounds) of 31. What does the least-

squares line predict for the fuel efficiency of this car?

Curb Weight

(100 lb)

Fuel Efficiency

(miles per gallon)

Predicted

Value Residual

25.33 43

26.94 38

27.79 30

30.12 34

32.47 30