75
REGRESSION Descriptions

Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Embed Size (px)

Citation preview

Page 1: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

REGRESSIONDescriptions

Page 2: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Description

• Correlation – simply finding the relationship between two scores

○ Both the magnitude (how strong or how big)○ And direction (positive / negative)

Page 3: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Description

Whereas regression seeks to use one of the variables as the predictorTherefore you have an X variable (IV) -

predictorAnd Y variable (DV) - criterion

Page 4: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Description

Predictor X – variables – more flexible than ANOVACan be any combination of variables,

continuous, Likert, categorical Dependent Y-variables – usually

continuous, but you can predict categorical variablesBetter with discriminant or log regression

Page 5: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Description

Still not causal design, unless you manipulate the X (IV) variable

However, sometimes very obvious which variable would be predictiveSmoking predicts cancer

Page 6: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

REGRESSIONResearch Questions

Page 7: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Research Questions

Usually want to know the relationship between IV and DV and the importance of each IV

ORControl for some variables variance and

then see if other IVs add any additional prediction

Compare sets of IV and how predictive they are (which is better)

Page 8: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Research Questions

How good is the equation?Is it better than chance? Or better than

using the mean to predict scores?

Page 9: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Research Questions

Importance of IVsWhich IVs are the most important? Which

contribute the most prediction to the equation?

Page 10: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Research Questions

Adding IVsFor example, PTSD scores are predictive of

alcohol useAfter we control for these scores, do

meaning in life scores help predict alcohol use?

Page 11: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Research Questions

Non-linear relationships can be assessed and determinedSo, you can use X2 to help with curvilinear

relationships that you might see when data screening

Page 12: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Research Questions

Controlling for other sets of IVs Using demographics to control for unequal

groups or additional variance over being people

Comparing sets of IVsUsing several IVs together to be predictive

over another set of IVs

Page 13: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Research Questions

Making an equation to predict new people’s scoresAfter you have shown that your IVs are

predictive, using those scores to assess new people’s performance

Entrance exams for school, military, etc

Page 14: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

REGRESSION PARTS

Page 15: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Equation

Y-hat = A + B1X1 + B2X2 + …Y hat = predicted value for each participantA = constant, value added to each score to

predict participants scores @ zero (y-intercept)

Page 16: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Equation

Y-hat = A + B1X1 + B2X2 + …B = coefficient

○ Holding all other variables constant for every one unit increase in X there is a B unit increase in Y

○ Slope for that X variable given all others are zero

Page 17: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Equation

Standardized EquationY-hat = βx1 + βx2 …Beta = standardized B (or z-score B if you

like)For each 1 standard deviation increase in X,

there is a B standard deviation increase in Y○ Difficult to interpret○ BUT! B is standardized to -1 to 1 so you can

treat it as if it were r (which means you can tell direction and magnitude)

Page 18: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Equation

Pearson product – moment correlation = RR is the correlation between y and y-hat R2 = variance accounted for in DV by all the

IVs (not just one like r, but ALL of them).

Page 19: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

SR

Semipartial correlations = sr = part in SPSSUnique contribution of

IV to R2 for those IVsIncrease in proportion of

explained Y variance when X is added to the equation

A/DV variance

DV Variance

IV 1

IV 2

A

Page 20: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

PR

Partial correlation = pr = partial in SPSSProportion in variance in Y not explained by

other predictors but this X onlyA/BPr > sr

DV Variance

IV 1

IV 2

AB

Page 21: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

TYPES OF REGRESSION

Page 22: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

ANOVA = Regression

ANOVA = Regression with discrete variablesHowever, you cannot easily create a ANOVA

from a regressionMust convert continuous variables into

discrete variables, which causes you to lose variance

More power with regression

Page 23: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Simple (SLR)

SLR involves only one IV and one DV.It’s called simple because there’s only ONE

thing predicting.In this case, beta = r.

Page 24: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Multiple (MLR)

MLR uses several IVs and only one DV.You can use a mix of variables – continuous,

categorical, Likert, etc.You can use MLR to figure out which IVs are

the most important.○ 3 Types MLR

Page 25: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Simultaneous/Standard

All of the variables are entered “at once” Each variable assessed as if it were the

last variable enteredThis “controls” for the other IVs, as we

talked about the interpretation of B.Evaluates sr > 0?

Page 26: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Simultaneous/Standard

If you have two highly correlated IVs the one with the biggest sr gets all the variance

Therefore the other IV will get very little variance associated with it and look unimportant

Page 27: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Sequential/Hierarchical

IVs enter the regression equation in an order specified by the researcher

First IV is basically tested against r (since there’s nothing else in the equation it gets all the variance)

Next IVs are tested against pr (they only get the left over variance)

Page 28: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Sequential/Hierarchical

What order?Assigned by theoretical importanceOr you can control for nuisance variables in

the first step

Page 29: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Sequential/Hierarchical

Using SETS of IVs instead of individualsSo, say you have a group of IVs that are

super highly correlated but you don’t know how to combine them or want to eliminate them.

Instead you will process each step as a SET and you don’t care about each individual predictor

Page 30: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Stepwise/Statistical Entry into the equation is solely based on

statistical relationship and nothing to do with theory or your experiment

Page 31: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Stepwise/Statistical

Forward – biggest IV is added first, then each IV is added as long as it accounts for enough variance

Backward – all are entered in the equation at first, and then each one is removed if it doesn’t account for enough variance

Stepwise – mix between the two (adds them but then may later delete them if they are no longer important).

Page 32: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

ASSUMPTIONS

Page 33: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Number of People

Ratio of cases to IVsIf you have less cases than IVs you will get

a perfect solution (aka account for all the variance in the DV)

But that doesn’t mean anything…

Page 34: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Number of People

Ratio of cases to IVsGpower = for how many cases given alpha,

power, predictors, etc.Rules of thumb = more than 50 + 8(K)

(number of IVs)Or 104 + K (for testing importance of

predictors)

Page 35: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Number of People

How many people?However…you can have too many people.Any correlation or predictor will be

significant with very large N○ Practical versus statistical significance

Page 36: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Missing Data

Continuous data – linear trend at point, mean replace, etc.

Categorical data – best to leave it out because you can’t guess at it.

Page 37: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Outliers

Now, since IVs are continuous, we want to make sure there are not outliers on both the IVs and DVsMahalanobis

Page 38: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Outliers

Leverage – how much influence over the slope a point hasCut off rule of thumb = (2K+2)/N

Discrepancy – how far away from other data points a point is (no influence)

Cooks – influence – combination of both leverage and discrepancyCut off rule of thumb = 4/(N-K-1)

Page 39: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Multicollinearity

If IVs are too highly correlated there are several issuesSPSS may not runSPSS picks which variable to go first

depending on the type of analysis Check – bivariate correlation table of IVs

(you want it to be correlated with DV!)

Page 40: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Normal/Linear

Normality – we want our IVs and DVs to be normally distributedResidual Histogram

Linearity – relationships between IV and DV should be linear or you will do a special X2 Normality PP Plot

Page 41: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Homogeneity/Homoscedasticity

Homogeneity – you want the IVs/DVs to have equal variancesResidual Plot (equal spread up and down -

raining) Homoscedasticity – you want the errors

to be spread evenly across the values of the other variablesResidual Plot (equal spread up and down

across the bottom – megaphones)

Page 42: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Theoretical Assumption

Independence of errorsYou need to know that the scores of the first

person tested are not affecting the scores of the last person tested

Mud on a scale

Page 43: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

EXAMPLES

Page 44: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

SLR

Data set 1 IV

Books – number of books people readAttend – attendance for class

DVGrade – final grade in the class

Page 45: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

SLR

Research Question:Does the number of books predict final

grade in the course?Does attendance predict final grade in the

course?

Page 46: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

MLR - Simultaneous

Research QuestionDo books and attendance both predict final

course grade?○ Overall – together?○ Individual predictors?

Page 47: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

MLR – Hierarchical

Research question: What predicts how well people take care of their cars?We want to first control for demographics

(age, gender)And then use extroversion to predict how

well people take care of their cars.

Page 48: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

MLR Hierarchical

So after controlling for demographics, does extroversion predict?

Page 49: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions

Dummy Coding Types

Two categoricalOne categorical, One continuousTwo continuous

Page 50: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Dummy Coding

A way to do ANOVA in regressionIf you have two levels, simply type them in

as 0 and 1If you have more than two levels, you need

to enter each separately

Page 51: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Dummy Coding

More than two levels:You will need Levels – 1 columns F – value tells you the overall main effectB value – compares that group to the group

coded as all zeros

Page 52: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Dummy Coding After you enter each variable separately, then

enter them as a set (or one simultaneous) regression

The significance of the overall model will tell if you if the main effect is significant

B gives you differences between groups (two levels)

Page 53: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Dummy Coding

How many friends do people have?This example is from ANOVA.IV: Health condition – excellent, fair or poor.DV: Number of Friends.

Page 54: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Dummy Coding

Since we have three groups or levels, we’ll need to recode this variable into 2 variables.One for excellentOne for fairThe blanks for poor.

Page 55: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Dummy Coding

Why not three?Because that would be repetitive.

Page 56: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions

Interactions – well we automatically test for interactions in ANOVA, why not in regression?In regression an interaction says that there

are differences in the slope of the line predicting Y from one IV depending on the level of the other IV

Page 57: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions

Nominal variable interactions:So we have two categorical predictors.Example – create interaction term

○ Testing environment by Learning Environment.

Page 58: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions - Nominal

Now that we’ve created our interaction terms, we can test them using a hierarchical regressionStep one – main effectsStep two – main effects and interactions

Page 59: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions - Nominal

Now we examine step 1 for main effects Step two for interactions

You ignore the main effects in Step 2

Page 60: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions - Nominal

What does all that mean?!After a significant ANOVA, you do a post

hoc correct?Simple slopes – post hoc analyses for

interactions in regression○ These are “harder to get” than an ANOVA, but

there are less “tests” to run so technically more powerful/less type 1 error

Page 61: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions - Nominal

You will write out the equation and figure out the slopes/means/picture for each condition combination.

Equation = 30.8 + -8 (learning) + -14.1 (testing) + 20.5(learning X testing)

 

Page 62: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions - Nominal

Now we’ll fill in the equation for all the combinations.Learning (0 or 1)Testing (0 or 1)Interaction (0 or 1 depending on the

combination).

Page 63: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interaction - NominalDry (0) Wet (1)

Dry (0) 30.8 16.7

Wet (1) 22.8 29.2

Dry (0) Wet (1)0

5

10

15

20

25

30

35

Dry (0)

Wet (1)

Learning Environment

Sco

re

Page 64: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions - Mix

Data Set 4IVsEvents – number of events attendedStatus – low (0) versus high (1)DVsStress levels

Page 65: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

How to

Create interactionTransform > compute > multiply

Run regression as beforeStep 1 – main effectsStep 2 – main effects and interaction

Page 66: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions - Mix

LOW status, look at events slope.B = .121, β = .52, t(57) =3.94, p<.001,

indicating that low status people feel more stress as the number of events they attend increases.

HIGH status, look at events slope. B = .02, β = .10, t(57) = .55, p.=58, indicating

that high status people feel the same amount of stress no matter how many events they attend.

Page 67: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interaction - MixLow Events High Events

Low Status 20.93 27.81

High Status 17.78 18.98

Low Events High Events0

5

10

15

20

25

30

Low Status

High Status

Page 68: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions - continuous

Most likely combination since you are running a regressionCreate interaction term first (multiply them

together)Books * Attendance Interaction to predict

grades.

Page 69: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions – continuous Pick ONE variable to examine. Let’s go

with attendance.You can get the AVERAGE slope for attendance

and books. Since we picked attendance, we will look at the slope for books, β=-.532, t(37) = -1.21, p=.24. So at average attendance, readings books do not increase your grade.

Let’s create hi and lo terms for ONE of the variables.AttendanceHI, AttendanceLOAttendanceHI by Books, AttendanceLO by Books.

Page 70: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interactions - continuous

Now, we can’t just use 1 and 0 for different groupsSo we have to create “hi” and “lo” groups for

one variableThis theory is also backwards…for the hi

group, you subtract 1 SD, for the lo group you add 1SD

Basically you are bringing them up or down to the mean

Page 71: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Interaction

low books high books0

10

20

30

40

50

60

70

80

90

High Attendance

Average Attendance

Low Attendance

Books Read

Gra

de

Page 72: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Mediation

Page 73: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Mediation

Mediation occurs when the relationship between an X variable and a Y variable is eliminated or lowered when an additional Mediator variable is added to the equation.

Page 74: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Mediation Steps

Baron and KennyStep 1 – use X to predict Y to get c pathway.Step 2 – use X to predict M to get a

pathway.Step 3 – use X and M to predict Y to get b

pathway.Step 4 – use the same regression to look at

the c’ pathway. Sobel test

Page 75: Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction

Mediation Steps