Notebook File Chapter 8

Embed Size (px)

Citation preview

  • 8/10/2019 Notebook File Chapter 8

    1/11

    October 15, 20

    The least squares regression line(LSRL)

    Recall from yesterday:

    Scatterplot:

    Direction

    Form

    Strength

    Unusual features

    quantitative variables

    does not imply

    causation

    'r' applies tolinearassociations only

    a 'bent' association

    may be made linear by

    re-expressing the data

    After standardization

  • 8/10/2019 Notebook File Chapter 8

    2/11

    October 15, 20

    Linear Regression Line

    Equation 'models' the relationship allowing us to predict y

    values for any given x value

    y = 34x +310 where y = rent with hotel

    x = spaces from 'go'

    (39, 2000)

    y = 34(39)+ 310

    y = 1636

    'hat' means predicted

    Linear Regression Line

    Residual

    residual = y - y

    residual = 2000 - 1636

    = 364

    if residual > 0 then observed y value is greater than predicted

    value

    i.e. the actual data point is abovethe regression line

    residual = observed - predicted

    Calculating Slope of Least Squares Regression line

    Recall: r = .83

    for explanation, see

    Math Box on page 175

    Moving away from 'standardized' equation to actual

    equation

    This is nice, but it means we have to change all of our data to z-

    scores, solve equation and then convert z-scores back to regular

    data but this is too cumbersome!

    slope=

    y-intercept

  • 8/10/2019 Notebook File Chapter 8

    3/11

    October 15, 20

    Revisiting yesterday's problem

    a =

    b =

    Remember: the

    least squares

    regression line

    always passes

    through the

    center of the

    data (x, y)

  • 8/10/2019 Notebook File Chapter 8

    4/11

    October 15, 20

    QUOTATION OF THE DAY

    "Thank God the research didn't find that novels increased toothdecay or blocked up your arteries."

    LOUISE ERDRICH,

    the novelist, on studies finding that after reading literary fiction people performedbetter on tests measuring empathy, social perception and emotional intelligence.

    To find the equation of a regression line,

    use the slope

    and the point

    If a scatterplot satisfies what conditions, we can discusscorrelation and regression line?

    no outliers

    no discernible pattern besides being linear

    both variables quantitative

  • 8/10/2019 Notebook File Chapter 8

    5/11

    October 15, 20

    Important notes about the LSRL

    Before you use the LSRL, you must look at the

    scatterplot to make sure the relationship is

    linear

    Be sure to distinguish between actual data values and

    predicted values. Use words like 'predicted' and 'on

    average'

    The slope of the LSRL has the same sign as r, the coorelation

    coefficent

    Be able to answer questions, such as:

    What is the equation of the LSRL?

    What is the interpretation of the slope and y-intercept?

  • 8/10/2019 Notebook File Chapter 8

    6/11

    October 15, 20

    The answer? Residuals

    How welldoes the line fit the data?

    How?Create a scatterplot of

    the residuals vs. the x-values.

    How appropriate is our line in makingpredications?

    What is it about R2?

    R2: the coefficient of determination

    Shows how good one term is at predicting

    another. The closer r2is to 1, the better the

    statistical chance that the input x will correctly

    produce term y

    If r square is .80, then 80% of the variation in the values of y is explained

    by its linear relationship with the values of x

    Residuals

    Registrations

    400 600 800 1000

    0

    -10

    10

  • 8/10/2019 Notebook File Chapter 8

    7/11

    October 15, 20

    Residuals (continued)

    A good residual plot - one that confirms the

    appropriate use of the LSRS - should :

    have similar scatter throughout

    have no interesting features

    should have no direction of shape

    Residuals (continued)

    A good residual plot - one that confirms the

    appropriate use of the LSRS - should :

    se = standard deviation

    of the residuals

    A small Seimplies residuals that are closely centered

    about the mean

    The mean of the residuals = 0

    the units of the residuals are the same units of the y-values

    Step 1: Check that the two variables arequantitative

    Check that the association islinear

    Check for presence of outliers

    Before you find the best-fit-line:

    Step 2: Do the math

    Find the least squares line

    r

    r2

  • 8/10/2019 Notebook File Chapter 8

    8/11

    October 15, 20

    State the conclusion

    What does the y-intercept tell you? What does R2 tell you?

    se = standard deviation of the residuals

  • 8/10/2019 Notebook File Chapter 8

    9/11

    October 15, 20

    Finally: check theresidual plotto make sure that the linear model is

    appropriate

    Note: you can not work backwards;

    You cannot use the linear regression line to predict x

  • 8/10/2019 Notebook File Chapter 8

    10/11

    October 15, 20

  • 8/10/2019 Notebook File Chapter 8

    11/11

    October 15, 20

    r(sy/sx).