12
Re-Expressing Data

Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Embed Size (px)

DESCRIPTION

Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency Re-Expression!!

Citation preview

Page 1: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Re-Expressing Data

Page 2: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Scatter Plot of: Weight of

Vehicle vs. Fuel Efficiency

Residual Plot of: Weight of

Vehicle vs. Fuel Efficiency

Page 3: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Scatter Plot of: Weight of

Vehicle vs. Fuel Efficiency

Residual Plot of: Weight of

Vehicle vs. Fuel Efficiency

Re-Expression!!

Page 4: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Goal 1:Make distribution more symmetric!

Allows use of 68 – 95 – 99.7 Rule Take log of x variable

Page 5: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Goal 2:Make spread more alike even if centers are

different Groups with common spread are easier to

compare Take log of x

Page 6: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Goal 3:Make scatter plot nearly linear

Linear is easier to model Take log of x

Page 7: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Goal 4:Make scatter plot more spread out than

thick at one end or the other Take log x

Page 8: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

There is a family of simple re-expressions that move data toward our goals in a consistent way. This collection of re-expressions is called the Ladder of Ladder of PowersPowers.

The Ladder of Powers orders the effects that the re-expressions have on data.

Ladder of Ladder of PowersPowers

Page 9: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Ladder of Ladder of PowersPowers

Ratios of two quantities (e.g., mph) often benefit from a reciprocal.

The reciprocal of the data––11

An uncommon re-expression, but sometimes useful.

Reciprocal square root––1/21/2

Measurements that cannot be negative often benefit from a log re-expression.

We’ll use logarithms here““0”0”

Counts often benefit from a square root re-expression.

Square root of data values½½

Data with positive and negative values and no bounds are less likely to benefit from re-expression.

Raw data11

Try with unimodal distributions that are skewed to the left.

Square of data values22

CommentCommentNameNamePowerPower

Page 10: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Plan B: Attack of the Plan B: Attack of the LogarithmsLogarithms

We seek a “useful” model, not We seek a “useful” model, not perfection!!!!perfection!!!!

Page 11: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Why Not Just Use a Curve?Why Not Just Use a Curve?

If there’s a curve in the scatterplot, why not just fit a curve to the data?

Page 12: Re-Expressing Data. Scatter Plot of: Weight of Vehicle vs. Fuel Efficiency Residual Plot of: Weight of Vehicle vs. Fuel Efficiency

Why Not Just Use a Curve? Why Not Just Use a Curve? (cont.)(cont.)

The mathematics and calculations for “curves of best fit” are considerably more difficult than “lines of best fit.”

Besides, straight lines are easy to understand. We know how to think about the slope

and the y-intercept.