21
SECTION 4.1 – TRANSFORMING RELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described linearly. In some cases the removal of outliers from data may cause a drop in the correlation so that linear no longer does a satisfactory job of describing the data. In some cases the bulk of the data may not be linear at all. Non-linear relationships between two quantitative variables can sometimes be changed into linear relationships by transforming one or both variables.

S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

Embed Size (px)

Citation preview

Page 1: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

SECTION 4.1 – TRANSFORMING RELATIONSHIPS

Linear regression using the LSRL is not the only model for describing data.  Some data are not best described linearly.

In some cases the removal of outliers from data may cause a drop in the correlation so that linear no longer does a satisfactory job of describing the data. 

In some cases the bulk of the data may not be linear at all.

Non-linear relationships between two quantitative variables can sometimes be changed into linear relationships by transforming one or both variables. 

Page 2: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

SECTION 4.1 – TRANSFORMING RELATIONSHIPS

Transforming can be thought of as re-expressing the data.

We may want to transform either the explanatory variable x, or the response variable y in a scatter plot, or maybe even both.

We will call the transformed variable "t" when talking about the transforming in general.

Many variables take only 0 or positive values, so we are particularly interested in how functions behave for positive values of t.

Page 3: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

The following models are common functions in which you should be familiar with their shape and equation. These are models in which t > 0.

, slope 0a bt b

Page 4: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described
Page 5: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

The following scatterplot represents brain weight against body weight for 96 species of mammals.

Page 6: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

The scatterplot is not very satisfactory since most mammals are so small relative to elephants and hippos The lower left corner of the plot shows that most

of the species overlap forming a “blob”

The correlation with all 96 species is r = .86, but removing the elephant, r = .50

To get a closer look at the observations that are in the lower-left corner, the 4 outliers were removed

Page 7: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

This scatterplot represents the 92 observations with the 4 outliersremoved.

Instead of a linear relationship, you can see that as body weight increases, the graph bends to the right which is representative ofa logarithmic function.

Page 8: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

The following plot includes the original 96 observations, but instead of plotting the y-value against the x-value, the logarithm of the brainweights (y-value) were plotted against the logarithm of the body weights (x-value).

There are no longer any extreme outliers or very influential observations and the pattern is very linear with r = .96

Page 9: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

The ladder of power functions is in the form:

( 1) /pt p

linear

reciprocal square root

logarithmic

inverse

square

Page 10: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

CONCAVITY OF POWER FUNCTIONS

Page 11: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

EXPONENTIAL GROWTH

Exponential growth occurs when a variable is multiplied by a fixed number in each time period. Ex. – consider a population of bacteria in which each bacterium splits into

two each hour. Beginning with 1, we have 2 after one hour, 4 after two hours, 8 after three hours, 16 after four hours, 32, 64, 128 and so on. After one day of doubling there are or 16,777,216 bacteria in the population.

Exponential growth increases by a fixed percentage of the previous total whereas linear growth increases by a fixed amount in each equal time period.

If a variable grows exponentially, its logarithm grows linearly.

Page 12: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

TRANSFORMING DATA

Page 13: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

REVIEW PROPERTIES OF LOGARITHMS

log yb x y b x

log( ) log( ) log( )ab a b

log log( ) log( )a

a bb

log( ) log( )px p x

13

Page 14: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

EXAMPLE 1 – GROWTH OF CELL PHONE USE

The cell phone industry enjoyed substantial growth in the 1990’s. One way to measure cell phone growth is to look at the number of subscribers. Find a linear model to predict the number of subscribers in the year 2000.

Year 1990 1993 1994 1995 1996 1997 1998 1999

Subscribers (thousands)

5283 16,009

24,134

33,786

44,043 55,312

69,209

86,047

There is an increasing trend, but the overall pattern is not linear.The pattern looks like an exponential curve. Is this exponential growth?

Page 15: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

While the curve may appear to be exponential growth, we can’t simply depend on what our eyes see.

If you suspect exponential growth, first calculate the ratios of consecutive terms to see if they are the same fixed percentage of the previous total.

To avoid overflow in the calculator it is good practice to code the years (let 1990 = 1) Don’t use 0 since you can’t take the log of 0

EXAMPLE 1 – GROWTH OF CELL PHONE USE

Year Subscribers

Ratios log(y)

1 5,283 -- 3.72288

4 16,009 -- 4.20436

5 24,134 1.51 4.38263

6 33,786 1.40 4.52874

7 44,043 1.30 4.64388

8 55,312 1.26 4.74282

9 69,209 1.25 4.84016

10 86,047 1.24 4.93474

Means that the # of subscribers in 1994 is 151% of or 1.51 times the # of subscribers in 1993. Could also say that it’s a 51% increase.

On average, subscribers were increasing around 35% each year.

Page 16: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

Now that you have verified that the ratios are similar, the next step is to apply a mathematical transformation that changes exponential growth into linear growth.

We had hypothesized that an exponential model of the form represented the cell phone growth, therefore we need to use properties of logarithms to transform:

EXAMPLE 1 – GROWTH OF CELL PHONE USE

log log log xy a b

log log (log )

log log (log )

y a x b

y a b x

Page 17: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

Since looks like we can plot versus x and if the data are linear we would have better reason to believe that the cell phone growth is exponential.

The plot appears slightly concave down, but certainly more linear than the original scatterplot.

Applying the least squares regression we get:

This means that 98.2% of the variation in is explained by the least squares regression of on x.

EXAMPLE 1 – GROWTH OF CELL PHONE USE

Page 18: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

Although the model appears to be useful for prediction purposes because the is so high, you should always check the residual plot.

The purpose of finding a linear model is to be able to predict the number of subscribers in 2000. One approach would be to discard the first 4 data points since they are the oldest and furthest removed from the year 2000.

EXAMPLE 1 – GROWTH OF CELL PHONE USE

Page 19: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

By removing the first 4 points, the improves to .99897 which is even better than the first.

The LSRL is represented by:

EXAMPLE 1 – GROWTH OF CELL PHONE USE

Page 20: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

Now that we have the linear model, we can use it to predict the number of subscribers in the year 2000 by substituting an 11 in for “NewX” and then “undoing” the logarithm

EXAMPLE 1 – GROWTH OF CELL PHONE USE

log( ) 3.966 .097( )subscribers year

log( ) 3.966 .097( )10 10subscribers year3.966 .097( )(10 )(10 )yearsubscribers

( )(9246.9817)(1.2503 )yearsubscribers

(11)(9246.9817)(1.2503 )subscribers

𝑠𝑢𝑏𝑠𝑐𝑟𝑖𝑏𝑒𝑟𝑠=107,933.6

Page 21: S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described

Homework: p.212-213 #’s 6-8