32
BPS - 3rd Ed . Chapter 5 1 Chapter 5 Regression

BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

Embed Size (px)

DESCRIPTION

BPS - 3rd Ed. Chapter 53 “Returning Birds” Example Plot data first to see if relation can be described by straight line (important!) Illustrative data from Exercise 4.4 Y = adult birds joining colony X = percent of birds returning, prior year

Citation preview

Page 1: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 1

Chapter 5

Regression

Page 2: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 2

To describe the change in Y per unit X

To predict the average level of Y at a given level of X

Objectives of Regression

Page 3: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 3

“Returning Birds” Example

Plot data first to see if relation can be described by straight line (important!)

Illustrative data from Exercise 4.4

Y = adult birds joining colony

X = percent of birds returning, prior year

Page 4: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 4

If data can be described by straight line

… describe relationship with equation Y = (intercept) + (slope)(X)

May also be written:Y = (slope)(X) + (intercept)

Intercept where line crosses Y axis

Slope “angle” of line

Page 5: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 5

Linear Regression Algebraic line every point falls on line:

exact y = intercept + (slope)(X)

Statistical line scatter cloud suggests a linear trend:

“predicted y” = intercept + (slope)(X)

Page 6: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 6

Regression Equation ŷ = a + bx, where

– ŷ (“y-hat”) is the predicted value of Y– a is the intercept

– b is the slope

– x is a value for X

Determine a & b for “best fitting line”

The TI calculators reverse a & b!

Page 7: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 7

What Line Fits Best?

If we try to draw the line by eye, different people will draw different lines

We need a method to draw the “best line”

This method is called “least squares”

Page 8: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 8

The “least squares” regression lineEach point has:Residual = observed y – predicted y

= distance of point from prediction line

The least squares line minimizes the sum of the square residuals

Page 9: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 9

Calculating Least Squares Regression Coefficients

Formula (next slide) Technology

– TI-30XIIS– Two variable Applet – Other

Page 10: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 10

xbya

ss

rbx

y

b = slope coefficient a = intercept coefficient

Formulas

where sx and sy are the standard deviations of the two variables, and r is their correlation

Page 11: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 11

Technology: Calculator

BEWARE!

TI calculators label the slope and intercept backwards!

Page 12: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 12

Regression Line For the “bird data”:

a = 31.9343 b = 0.3040

The linear regression equation is: ŷ = 31.9343 0.3040x

The slope (-0.3040) represents the average change in Y per unit X

Page 13: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 13

Use of Regression for Prediction

Suppose an individual colony has 60% returning (x = 60). What is the predicted number of new birds for this colony?

Answer: ŷ = a + bx = 31.9343 (0.3040)(60) = 13.69

Interpretation: the regression model predicts 13.69 new birds (ŷ) for a colony with x = 60.

Page 14: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 14

Prediction via Regression Line Number of new birds and Percent returning

When X = 60, the regression model predicts Y = 13.69

Page 15: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 15

Case Study

Per Capita Gross Domestic Productand Average Life Expectancy for

Countries in Western Europe

Page 16: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 16

Country Per Capita GDP (x) Life Expectancy (y)Austria 21.4 77.48Belgium 23.2 77.53Finland 20.0 77.32France 22.7 78.63

Germany 20.8 77.17Ireland 18.6 76.39

Italy 21.5 78.51Netherlands 22.0 78.15Switzerland 23.8 78.99

United Kingdom 21.2 77.37

Regression CalculationCase Study

Page 17: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 17

Life Expectancy and GDP (Europe)

Case Study (Life Expectancy)

76

77

78

79

18 19 20 21 22 23 24

Per Capital GDP

Life

exp

ecta

ncy

(yrs

)

Page 18: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 18

0.795 1.5320.809 77.754 21.52

yx ssryx

Calculations:

68.716.52)(0.420)(21-77.754

0.4201.5320.795(0.809)

xbyass

rbx

y

ŷ = 68.716 + 0.420x

Regression Calculationby Hand (Life Expectancy Study)

Page 19: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 19

BPS/3e Two Variable Applet

Page 20: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 20

Applet: Data Entry

Page 21: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 21

Applet: Calculations

Page 22: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 22

Applet: Scatterplot

Page 23: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 23

Applet: least squares line

Page 24: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 24

InterpretationLife Expectancy Case Study

Model: ŷ = 68.716 + (0.420)X Slope: For each increase in GDP

0.420 years increase in life expectancy Prediction example: What is the life

expectancy in a country with a GDP of 20.0?ANSWER:ŷ = 68.716 + (0.420)(20.0) = 77.12

Page 25: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 25

Coefficient of Determination (R2)(Fact 4 on p. 111)

“Coefficient of determination, (R2)Quantifies the fraction of the Y “mathematically

explained” by X

Examples: r=1: R2=1: regression line explains all (100%) of

the variation in Y r=.7: R2=.49: regression line explains almost half

(49%) of the variation in Y

Page 26: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 26

We are NOT going to cover the analysis of residual plots (pp. 113-116)

Page 27: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 27

Outliers and Influential Points

An outlier is an observation that lies far from the regression line

Outliers in the y direction have large residuals

Outliers in the x direction are influential– removal of influential point would markedly

change the regression and correlation values

Page 28: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 28

Outliers:Case Study

Gesell Adaptive Score and Age at First Word

From all the datar2 = 41%

r2 = 11%

After removing child 18

Page 29: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 29

CautionsAbout Correlation and Regression Describe only linear relationships Are influenced by outliers Cannot be used to predict beyond the

range of X (do not extrapolate) Beware of lurking variables (variables other

than X and Y) – Association does not always equal causation!

Page 30: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 30

Do not extrapolate (Sarah’s height)

Sarah’s height is plotted against her age

Can you predict her height at age 42 months?

Can you predict her height at age 30 years (360 months)?

80

85

90

95

100

30 35 40 45 50 55 60 65

age (months)

heig

ht (c

m)

Page 31: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 31

Do not extrapolate (Sarah’s height)

Regression equation: ŷ = 71.95 + .383(X)

At age 42 months: ŷ = 71.95 + .383(42) = 88(Reasonable)

At age 360 months: ŷ = 71.95 + .383(360) = 209.8(That’s over 17 feet

tall!)

7090

110130150170190210

30 90 150 210 270 330 390

age (months)

heig

ht (c

m)

Page 32: BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given

BPS - 3rd Ed. Chapter 5 32

Even very strong correlations may not correspond to a causal

relationship between x and y

(Beware of the lurking variable!)

Caution: Correlation does not always mean causation