60
Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL Session #2 Presented by: Dr. Del Ferster Immaculata Week 2014 July 28—August 1, 2014

Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL Session #2 Presented by: Dr. Del Ferster

Embed Size (px)

Citation preview

Page 1: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Statistics:Analyzing 2 Quantitative

Variables

MIDDLE SCHOOL LEVEL

Session #2 Presented by: Dr. Del Ferster

Immaculata Week 2014July 28—August 1, 2014

Page 2: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Are 2 quantitative variables always related?

If there is a strong trend, can we assume a cause and effect status?

Why is linear regression important?

What does it let us do?

Some questions to get us started

Page 3: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

We’re going to spend time today on QUANTITATIVE STATISTICS.

We’ll examine scatter plots and look for patterns and strength of relationships.

What’s in store for today?

Page 4: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

We’ll interpret regression lines in the context of the problem.

We’ll look at correlation—a measure of the linear trend of the data.

I’ve also included a “spiffy” activity that I think you can use with your students.

What’s in store for today?

Page 5: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Analyzing Scatterplots

Page 6: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

SCATTER PLOTS

Page 7: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Scatterplot

A graphical display of two quantitative variables

We plot the explanatory (independent) variable on the x-axis and the response (dependent) variable on the y-axis

Each dot represents a single observation and its ordered pair (x,y)

Page 8: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Describing Scatterplots

When we consider scatterplots, we focus on 4 things:◦Direction◦Form◦Scatter◦Unusual elements

Page 9: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Direction Positive: as values of the explanatory variable increase, values in the response variable tend to increase

As x gets larger, y gets larger

Page 10: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Direction Negative: as values of the

explanatory variable increase, values in the response variable tend to decrease

As x gets larger, y gets smaller

Page 11: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Direction Null: no discernible patter of change in the response variable

Page 12: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Form (Shape) Linear: The shape has the appearance of a linear relationship.

There doesn’t have to be a perfect fit.

Page 13: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Form

Curved We can use logarithms to transform into linear forms.

Page 14: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Form None No discernible form

Page 15: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Strength (Scatter)

Strong association: very little scatter

Page 16: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Strength Moderate strength:

Page 17: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Strength Weak strength: lots of scatter

Page 18: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Unusual Features Outliers—They just don’t fit the trend

Page 19: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Unusual Features Look for changes in the scatter. A horn shape:

Page 20: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Let’s check some outHow would you describe the following plots?

Page 21: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

The scatterplot shows a moderately strong, negative association.

There is a bit of a curve.

#1

Page 22: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

The scatterplot shows a weak, positive linear association.

The scatter tends to decrease as the scores in Exam 1 increase.

#2

Page 23: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

The scatterplot shows a moderately strong, positive linear association.

There appears to be an outlier around (9, 35).

#3

Page 24: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

The scatterplot shows no apparent association.

There is great scatter and an outlier around (60,8).

#4

Page 25: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

The scatterplot shows a curved form in which the scatter increases as the explanatory variable increases

#5

Page 26: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Linear RegressionDetermining the LINE that best fits our data.

Page 27: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Regression Line

A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes.

A regression line summarizes the relationship between two variables, but only in a specific setting: when one of the variables helps explain or predict the other.

Page 28: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Regression Line

We often use a regression line to predict the value of y for a given value of x.

Regression, unlike correlation, requires that we have an explanatory variable and a response variable

Page 29: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Regression Line

Fitting a line to data means drawing a line that comes as close as possible to the points.

Extrapolation-the use of a regression line for prediction far outside the range of values of the explanatory variable x that you used to obtain the line.◦Such predictions are often not accurate.

Page 30: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Linear Regression

Regression analysis finds the equation of the line that best describes the relationship between the two variables.

In other words, what line best fits the data that is represented on our scatterplot.

While there are formulas to calculate this line, most of the time we’d use a graphing calculator or app for our ipad.

Page 31: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Least-Squares Regression Line The equation of the least-squares

regression line of y on x is or more simply, the regression line

22

slope: n xy x y

bn x x

y-intercept: a y bx

y a bx

Page 32: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Interpreting our line

The slope, b, is the amount by which y changes when x increases by one unit.

The intercept, a, is the value of y when .0x

Page 33: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

LINEAR REGRESSIONLet’s look at an example

Page 34: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

It seems that people are living longer these days (I hope so! ), so I’ve done some research to study this trend. According to US Government statistics, the following data represents the life expectancy for an infant born in the given year.

Linear Regression Example

Year of Birth

2001 2002 2003 2004 2005 2006 2007 2008 2009

Life Expectancy(yrs)

77.9 78.2 78.5 79.0 79.2 79.7 80.1 80.2 80.6

Page 35: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Scatter Plot of our Data

Page 36: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

More information on our data

Regression Line: 612.46 0.345y x

Page 37: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

1. Does there appear to be a linear relationship between year of birth and life expectancy?

2. Based on the context of the problem, interpret the y-intercept of the line.

3. Based on the context of the problem, interpret the slope of the line.

Some Questions to ponder

Page 38: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

4. According to your trend line, what is the predicted life expectancy for a baby born in 2012?

5. According to your trend line, what is the predicted life expectancy for a baby born in 2050?

6. Why might you be a bit skeptical about your response to the last question?

Some Questions to ponder

Page 39: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

1. Yes, there appears to be a strong linear trend. The regression line has a positive slope, so as x (the year of birth) increases, so does y (the life expectancy)

2. The regression line is The y-intercept means at year 0 an infant’s life expectancy is -612.46 years. NOTE: in the context of this problem this is MEANINGLESS!

Solution to our Example

612.46 0.345y x

Page 40: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

3. The regression line is The slope means that every year an infant’s life expectancy increases by 0.345 years.

4. According to the regression line, an infant born in 2012 has a life expectancy of 81.68 years.

5. According to the regression line, an infant born in 2050 has a life expectancy of 94.79 years.

Solution to our Example612.46 0.345y x

Page 41: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

6. The year 2050 falls well outside the set of x values (from 2001 to 2009) upon which the regression line is based, so this is most likely EXTRAPOLATION. We’re seeking to use our regression line to predict for an x value that is WELL OUTSIDE the set of data that was used to generate the equation of the line. This isn’t a good statistical practice, so this prediction would be met with a great deal of skepticism.

Solution to our Example

Page 42: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

CorrelationA way to measure the strength of a LINEAR trend.

Page 43: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

CORRELATION, denoted by r measures the direction and strength of the linear relationship between two quantitative variables.

General Properties It must be between -1 and 1, or (-1≤ r ≤ 1). If r is negative, the relationship is negative. If r = –1, there is a perfect negative linear

relationship (extreme case). If r is positive, the relationship is positive.

Some facts about CORRELATION

Page 44: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

General Properties If r = 1, there is a perfect positive linear relationship (extreme case).

If r is 0, there is no linear relationship. r measures the strength of the linear relationship.

If explanatory and response are switched, r remains the same.

r has no units of measurement associated with it

Scale changes do not affect r

Some facts about CORRELATION

Page 45: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Relationships between 2 numeric variables

Examples of extreme cases

r = 1 r = 0 r = -1

Page 46: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Scatterplots with correlation values

Page 47: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

r = 0.07

r = -0.768

r = -0.944

r = 0.936

r = 0.496

r = 1

Guess the Correlation Value#1

Page 48: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

r = 0.07

r = -0.768

r = -0.944

r = 0.936

r = 0.496

r = 1

Guess the Correlation Value#2

Page 49: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

r = 0.07

r = -0.768

r = -0.944

r = 0.936

r = 0.496

r = 1

Guess the Correlation Value#3

Page 50: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

r = 0.07

r = -0.768

r = -0.944

r = 0.936

r = 0.496

r = 1

Guess the Correlation Value#4

Page 51: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

r = 0.07

r = -0.768

r = -0.944

r = 0.936

r = 0.496

r = 1

Guess the Correlation Value#5

Page 52: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

r = 0.07

r = -0.768

r = -0.944

r = 0.936

r = 0.496

r = 1

Guess the Correlation Value#6

Page 53: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Remember: Correlation measures LINEAR TREND only

It is possible for there to be a strong relationship between two variables and still have r ≈ 0.

EXAMPLE

Page 54: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Another look at our previous example

What would you guess for the value of r for this data?

Page 55: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Another look at our previous example (with the regression line added)

What would you guess for the value of r for this data?

r=0.996 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 201076.5

77

77.5

78

78.5

79

79.5

80

80.5

81

Life Expectancy

Year

Life Expectancy

Page 56: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Association does not imply causation Correlation does not imply causation Slope is not correlation A scale change does not change the correlation.

Correlation doesn’t measure the strength of a non-linear relationship.

Summary of Correlation

Page 57: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Time for An ActivityNow, let’s see if we can apply some of those things that we learned today.

Page 58: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

If you want a blank copy for future use, or if you want a copy of my answers, just let me know.

You’re more than welcome to have one!!

Answers are available

Page 59: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Questions or comments

Remember, you make a difference in kids’ lives everyday!!

Challenge your students, support them, and share their successes!!

Wrapping it up for today?

Page 60: Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster

Thanks for your attention, participation, and energy. (I know it’s a long time to sit! )

Head out and hit the links! If I can be of help during the school year,

please don’t hesitate to let me know

EMAIL:◦ Here at Immaculata:◦ [email protected]◦ My home email:◦ [email protected]

PHONE: (610) 369-7344 (HOME) (610) 698-7615 (CELL)

The last slide!!!