17
© Prof. Andy Field, 2016 www.discoveringstatistics.com Page 1 Linear Models: Looking for Bias The following sections have been adapted from Field (2013) Chapter 8. These sections have been edited down considerably and I suggest (especially if you’re confused) that you read this Chapter in its entirety. You will also need to read this chapter to help you interpret the output. If you’re having problems there is plenty of support available: you can (1) email or see your seminar tutor (2) post a message on the course bulletin board or (3) drop into my office hour. More on Bias Outliers We have seen that outliers can bias a model: they bias estimates of the regression parameters. we know that an outlier, by its nature, is very different from all of the other scores. Therefore, if we were to work out the differences between the data values that were collected, and the values predicted by the model, we could detect an outlier by looking for large differences. The differences between the values of the outcome predicted by the model and the values of the outcome observed in the sample are called residuals. If a model is a poor fit of the sample data then the residuals will be large. Also, if any cases stand out as having a large residual, then they could be outliers. The normal or unstandardized residuals described above are measured in the same units as the outcome variable and so are difficult to interpret across different models. All we can do is to look for residuals that stand out as being particularly large: we cannot define a universal cut-off point for what constitutes a large residual. To overcome this problem, we use standardized residuals, which are the residuals converted to z-scores, which means they are converted into standard deviation units (i.e., they are distributed around a mean of 0 with a standard deviation of 1). By converting residuals into z-scores (standardized residuals) we can compare residuals from different models and use what we know about the properties of z-scores to devise universal guidelines for what constitutes an acceptable (or unacceptable) value. For example, in a normally distributed sample, 95% of z-scores should lie between −1.96 and +1.96, 99% should lie between −2.58 and +2.58, and 99.9% (i.e., nearly all of them) should lie between −3.29 and +3.29. Some general rules for standardized residuals are derived from these facts: (1) standardized residuals with an absolute value greater than 3.29 (we can use 3 as an approximation) are cause for concern because in an average sample a value this high is unlikely to occur; (2) if more than 1% of our sample cases have standardized residuals with an absolute value greater than 2.58 (we usually just say 2.5) there is evidence that the level of error within our model is unacceptable (the model is a fairly poor fit of the sample data); and (3) if more than 5% of cases have standardized residuals with an absolute value greater than 1.96 (we can use 2 for convenience) then there is also evidence that the model is a poor representation of the actual data. Influential Cases As well as testing for outliers by looking at the error in the model, it is also possible to look at whether certain cases exert undue influence over the parameters of the model. So, if we were to delete a certain case, would we obtain different regression coefficients? This type of analysis can help to determine whether the regression model is stable across the sample, or whether it is biased by a few influential cases. There are numerous ways to look for influential cases, all described in scintillating detail in Field (2013). We’ll just look at 1 of them, Cook’s distance, which quantifies the effect of a single case on the model as a whole. Cook and Weisberg (1982) have suggested that values greater than 1 may be cause for concern. Generalization Remember from your lecture on bias that linear models assume: Linearity and additivity: the relationship you’re trying to model is, in fact, linear and with several predictors, they combine additively. Normality: For b estimates to be optimal the residuals should be normally distributed. For p-values and confidence intervals to be accurate, the sampling distribution of bs should be normal. Homoscedasticity: necessary for b estimates to be optimal and significance tests and CIs of the parameters to be accurate.

Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

Embed Size (px)

Citation preview

Page 1: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page1

Linear Models: Looking for Bias The following sections have been adapted from Field (2013) Chapter 8. These sections have been edited downconsiderablyandIsuggest(especiallyifyou’reconfused)thatyoureadthisChapterinitsentirety.Youwillalsoneedtoreadthischaptertohelpyouinterprettheoutput.Ifyou’rehavingproblemsthereisplentyofsupportavailable:youcan(1)emailorseeyourseminartutor(2)postamessageonthecoursebulletinboardor(3)dropintomyofficehour.

More on Bias Outliers Wehaveseenthatoutlierscanbiasamodel:theybiasestimatesoftheregressionparameters.weknowthatanoutlier,byitsnature,isverydifferentfromalloftheotherscores.Therefore,ifweweretoworkoutthedifferencesbetweenthedatavaluesthatwerecollected,andthevaluespredictedbythemodel,wecoulddetectanoutlierbylookingforlargedifferences.Thedifferencesbetweenthevaluesoftheoutcomepredictedbythemodelandthevaluesoftheoutcomeobservedinthesamplearecalledresiduals.Ifamodelisapoorfitofthesampledatathentheresidualswillbelarge.Also,ifanycasesstandoutashavingalargeresidual,thentheycouldbeoutliers.

Thenormalorunstandardizedresidualsdescribedabovearemeasuredinthesameunitsastheoutcomevariableandso are difficult to interpret across differentmodels. All we can do is to look for residuals that stand out as beingparticularly large:wecannotdefineauniversalcut-offpoint forwhatconstitutesa largeresidual.Toovercomethisproblem,weusestandardizedresiduals,whicharetheresidualsconvertedtoz-scores,whichmeanstheyareconvertedintostandarddeviationunits(i.e.,theyaredistributedaroundameanof0withastandarddeviationof1).Byconvertingresidualsintoz-scores(standardizedresiduals)wecancompareresidualsfromdifferentmodelsandusewhatweknowaboutthepropertiesof z-scorestodeviseuniversalguidelinesforwhatconstitutesanacceptable(orunacceptable)value.Forexample,inanormallydistributedsample,95%ofz-scoresshouldliebetween−1.96and+1.96,99%shouldliebetween−2.58and+2.58,and99.9%(i.e.,nearlyallofthem)shouldliebetween−3.29and+3.29.Somegeneralrulesforstandardizedresidualsarederivedfromthesefacts:(1)standardizedresidualswithanabsolutevaluegreaterthan3.29(wecanuse3asanapproximation)arecauseforconcernbecauseinanaveragesampleavaluethishighisunlikelytooccur;(2)ifmorethan1%ofoursamplecaseshavestandardizedresidualswithanabsolutevaluegreaterthan2.58(weusuallyjustsay2.5)thereisevidencethattheleveloferrorwithinourmodelisunacceptable(themodelisafairlypoorfitofthesampledata);and(3)ifmorethan5%ofcaseshavestandardizedresidualswithanabsolutevaluegreaterthan1.96(wecanuse2forconvenience)thenthereisalsoevidencethatthemodelisapoorrepresentationoftheactualdata.

Influential Cases Aswellastestingforoutliersbylookingattheerrorinthemodel,itisalsopossibletolookatwhethercertaincasesexertundue influenceover theparametersof themodel. So, ifwewere todeleteacertaincase,wouldweobtaindifferentregressioncoefficients?Thistypeofanalysiscanhelptodeterminewhethertheregressionmodelisstableacrossthesample,orwhetheritisbiasedbyafewinfluentialcases.Therearenumerouswaystolookforinfluentialcases,alldescribedinscintillatingdetailinField(2013).We’lljustlookat1ofthem,Cook’sdistance,whichquantifiestheeffectofasinglecaseonthemodelasawhole.CookandWeisberg(1982)havesuggestedthatvaluesgreaterthan1maybecauseforconcern.

Generalization Rememberfromyourlectureonbiasthatlinearmodelsassume:

• Linearityandadditivity:therelationshipyou’retryingtomodelis,infact,linearandwithseveralpredictors,theycombineadditively.

• Normality: For b estimates to be optimal the residuals should be normally distributed. For p-values andconfidenceintervalstobeaccurate,thesamplingdistributionofbsshouldbenormal.

• Homoscedasticity:necessaryforbestimatestobeoptimalandsignificancetestsandCIsoftheparameterstobeaccurate.

Page 2: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page2

However,therearesomeotherassumptionsthatareimportantifwewanttogeneralizethemodelwefitbeyondoutsample.Themostimportantis:

• Independenterrors:Foranytwoobservationstheresidualtermsshouldbeuncorrelated(i.e.,independent).This eventuality is sometimes described as a lack of autocorrelation. If we violate the assumption ofindependencethenourconfidenceintervalsandsignificancetestswillbeinvalid.ThisassumptioncanbetestedwiththeDurbin-Watsontest(1951).Theteststatisticcanvarybetween0and4withavalueof2meaningthatthe residuals are uncorrelated. A value greater than 2 indicates a negative correlation between adjacentresiduals,whereasavaluebelow2 indicatesapositive correlation.The sizeof theDurbin-Watsonstatisticdependsuponthenumberofpredictorsinthemodelandthenumberofobservations.Asaveryconservativeruleofthumb,valueslessthan1orgreaterthan3aredefinitelycauseforconcern;however,valuescloserto2maystillbeproblematicdependingonyoursampleandmodel.

Therearesomeotherconsiderationsthatwehavenotyetdiscussed(seeBerry,1993):

• Predictorsareuncorrelatedwith‘externalvariables’:Externalvariablesarevariablesthathaven’tbeenincludedintheregressionmodelwhichinfluencetheoutcomevariable.

• Variabletypes:Allpredictorvariablesmustbequantitativeorcategorical(with2categories),andtheoutcomevariablemustbequantitative,continuousandunbounded.

• Noperfectmulticollinearity:Ifyourmodelhasmorethanonepredictorthenthereshouldbenoperfectlinearrelationshipbetweentwoormoreofthepredictors.So,thepredictorvariablesshouldnotcorrelatetoo.

• Non-zerovariance:Thepredictorsshouldhavesomevariationinvalue(i.e.,theydonothavevariancesof0).Thisisself-evidentreally.

Figure1:Plotsofstandardizedresidualsagainstpredicted(fitted)values

Thefourmostimportantconditionsarelinearityandadditivity,normality,homoscedasticity,andindependenterrors.Thesecanbetestedgraphicallyusingaplotofstandardizedresiduals (zresid)againststandardizedpredictedvalues(zpred).Figure1showsseveralexamplesoftheplotofstandardizedresidualsagainststandardizedpredictedvalues.Thetopleftpanelshowsasituationinwhichtheassumptionsoflinearity,independenterrorsandhomoscedasticityhavebeenmet.Independenterrorsareshownbyarandompatternofdots.Thetoprightpanelshowsasimilarplotfor

Page 3: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page3

adatasetthatviolatestheassumptionofhomoscedasticity.Notethatthepointsformafunnel:theybecomemorespreadoutacrossthegraph.Thisfunnelshapeistypicalofheteroscedasticityandindicatesincreasingvarianceacrosstheresiduals.Thebottomleftpanelshowsaplotofsomedatainwhichthereisanon-linearrelationshipbetweentheoutcomeandthepredictor:thereisaclearcurveintheresiduals.Finally,thebottomrightpanelillustratesdatathatnotonlyhaveanon-linearrelationship,butalsoshowheteroscedasticity.Notefirstthecurvedtrendintheresiduals,andthenalsonotethatatoneendoftheplotthepointsareveryclosetogetherwhereasattheotherendtheyarewidelydispersed.Whentheseassumptionshavebeenviolatedyouwillnotseetheseexactpatterns,buthopefullytheseplotswillhelpyoutounderstandthegeneralanomaliesyoushouldlookoutfor.

Methods of Regression Lastweekwelookedatasituationwhereweforcedpredictorsintothemodel.However,thereareotheroptions.Wecanselectpredictorsinseveralways:

• Inhierarchicalregressionpredictorsareselectedbasedonpastworkandtheresearcherdecidesinwhichordertoenterthepredictorsintothemodel.Asageneralrule,knownpredictors(fromotherresearch)shouldbeenteredintothemodelfirstinorderoftheirimportanceinpredictingtheoutcome.Afterknownpredictorshave been entered, the experimenter can add any newpredictors into themodel.Newpredictors can beenteredeitherallinonego,inastepwisemanner,orhierarchically(suchthatthenewpredictorsuspectedtobethemostimportantisenteredfirst).

• Forcedentry (orEnteras it isknowninSPSS) isamethodinwhichallpredictorsareforcedintothemodelsimultaneously. Like hierarchical, this method relies on good theoretical reasons for including the chosenpredictors,butunlikehierarchicaltheexperimentermakesnodecisionabouttheorderinwhichvariablesareentered.

• Stepwisemethodsaregenerally frowneduponbystatisticians. Instepwiseregressionsdecisionsabouttheorder inwhichpredictorsareentered into themodelarebasedonapurelymathematical criterion. In theforwardmethod,aninitialmodelisdefinedthatcontainsonlytheconstant(b0).Thecomputerthensearchesforthepredictor(outoftheonesavailable)thatbestpredictstheoutcomevariable—itdoesthisbyselectingthepredictorthathasthehighestsimplecorrelationwiththeoutcome.Ifthispredictorsignificantlyimprovestheabilityofthemodeltopredicttheoutcome,thenthispredictorisretainedinthemodelandthecomputersearchesforasecondpredictor.Thecriterionusedforselectingthissecondpredictoristhatitisthevariablethathasthelargestsemi-partialcorrelationwiththeoutcome.InplainEnglish,imaginethatthefirstpredictorcanexplain40%ofthevariationintheoutcomevariable;thenthereisstill60%leftunexplained.Thecomputersearchesforthepredictorthatcanexplainthebiggestpartoftheremaining60%(itisnotinterestedinthe40% that is already explained). As such, this semi-partial correlation gives a measure of how much ‘newvariance’intheoutcomecanbeexplainedbyeachremainingpredictor.Thepredictorthataccountsforthemostnewvarianceisaddedtothemodeland,ifitmakesasignificantcontributiontothepredictivepowerofthemodel,itisretainedandanotherpredictorisconsidered.

Many writers argue that stepwise methods take the important methodological decisions out of the hands of theresearcher.What’smore,themodelsderivedbystepwisemethodsoftentakeadvantageofrandomsamplingvariationandsodecisionsaboutwhichvariablesshouldbeincludedwillbebaseduponslightdifferencesintheirsemi-partialcorrelation.However,theseslightstatisticaldifferencesmaycontrastdramaticallywiththetheoreticalimportanceofapredictortothemodel.Thereisalsothedangerofover-fitting(havingtoomanyvariablesinthemodelthatessentiallymake little contribution topredicting theoutcome) andunder-fitting (leavingout importantpredictors) themodel.However,whenlittletheoryexistsstepwisemethodsmightbetheonlypracticaloption.

The Example We’lllookatdatacollectedfromseveralquestionnairesrelatingtoclinicalpsychology,andwewillusethesemeasuresto predict social anxiety usingmultiple regression.Anxiety disorders takeondifferent shapes and forms, and eachdisorderisbelievedtobedistinctandhaveuniquecauses.Wecansummarisethedisordersandsomepopulartheoriesasfollows:

Page 4: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page4

• SocialAnxiety: Social anxietydisorder is amarkedandpersistent fearof1ormore social orperformancesituationsinwhichthepersonisexposedtounfamiliarpeopleorpossiblescrutinybyothers.Thisanxietyleadstoavoidanceofthesesituations.Peoplewithsocialphobiaarebelievedtofeelelevatedfeelingsofshame.

• ObsessiveCompulsiveDisorder(OCD):OCDischaracterisedbytheeverydayintrusionintoconsciousthinkingof intense, repetitive,personallyabhorrent,absurdandalien thoughts (Obsessions), leading to theendlessrepetition of specific acts or to the rehearsal of bizarre and irrational mental and behavioural rituals(compulsions).

Socialanxietyandobsessivecompulsivedisorderareseenasdistinctdisordershavingdifferentcauses.However,therearesomesimilarities.

• Theybothinvolvesomekindofattentionalbias:attentiontobodilysensationinsocialanxietyandattentiontothingsthatcouldhavenegativeconsequencesinOCD.

• Theybothinvolverepetitivethinkingstyles:socialphobicsruminateaboutsocialencountersaftertheevent(knownaspost-eventprocessing),andpeoplewithOCDhaverecurringintrusivethoughtsandimages.

• Theybothinvolvesafetybehaviours(i.e.tryingtoavoidthethingthatmakesyouanxious).

This might lead us to think that, rather than being different disorders, they are manifestations of the same coreprocesses.Onewaytoresearchthispossibilitywouldbetoseewhethersocialanxietycanbepredictedfrommeasuresofotheranxietydisorders.IfsocialanxietydisorderandOCDaredistinctweshouldexpectthatmeasuresofOCDwillnotpredictsocialanxiety.However,iftherearecoreprocessesunderlyingallanxietydisorders,thenmeasuresofOCDshouldpredictsocialanxiety.

Figure2:Datalayoutformultipleregression

ThedataareinthefileSocialAnxietyRegression.savwhichcanbedownloadedfromStudyDirect.Thisfilecontainsfourvariables:

• TheSocialPhobiaandAnxietyInventory(SPAI),whichmeasureslevelsofsocialanxiety.

• InterpretationofIntrusionsInventory(III),whichmeasuresthedegreetowhichapersonexperiencesintrusivethoughtslikethosefoundinOCD.

• Obsessive BeliefsQuestionnaire (OBQ),whichmeasures the degree towhich people experience obsessivebeliefslikethosefoundinOCD.

• TheTestofSelf-ConsciousAffect(TOSCA),whichmeasuresshame.

Eachof134peoplewasadministeredall fourquestionnaires. You shouldnote thateachquestionnairehas itsowncolumnandeachrowrepresentsadifferentperson(seeFigure2).

Page 5: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page5

What analysis will we do? Weare going todoamultiple regressionanalysis. Specifically,we’re going todoahierarchicalmultiple regressionanalysis.Allthismeansisthatweentervariablesintotheregressionmodelinanorderdeterminedbypastresearchandexpectations.So,foryouranalysis,wewillentervariablesinso-called‘blocks’:

• Block1: the firstblockwill containanypredictors thatweexpect topredict socialanxiety.Thesevariablesshouldbeenteredusingforcedentry.Inthisexamplewehaveonlyonevariablethatweexpect,theoretically,topredictsocialanxietyandthatisshame(measuredbytheTOSCA).

• Block2:thesecondblockwillcontainourexploratorypredictorvariables(theone’swedon’tnecessarilyexpecttopredictsocialanxiety).ThisbockshouldcontainthemeasuresofOCD(OBQandIII)becausethesevariablesshouldn’tpredictsocialanxietyifsocialanxietyisindeeddistinctfromOCD.Thesevariablesshouldbeenteredusingastepwisemethodbecauseweare‘exploringthem’(thinkbacktoyourlecture).

Doing Multiple Regression on SPSS

Specifying the First Block in Hierarchical Regression

Theoryindicatesthatshameisasignificantpredictorofsocialphobia,andsothisvariableshouldbeincludedinthemodel first. The exploratory variables (obq and iii) should, therefore, be entered into themodel after shame. Thismethodiscalledhierarchical(theresearcherdecidesinwhichordertoentervariablesintothemodelbasedonpastresearch).TodoahierarchicalregressioninSPSSweenterthevariablesinblocks(eachblockrepresentingonestepinthehierarchy).Togettothemainregressiondialogboxselect .ThemaindialogboxisshowninFigure3.

Figure3:Maindialogboxforblock1ofthemultipleregressionThemaindialogboxisfairlyself-explanatoryinthatthereisaspacetospecifythedependentvariable(outcome),andaspacetoplaceoneormoreindependentvariables(predictorvariables).Asusual,thevariablesinthedataeditorarelistedontheleft-handsideofthebox.Highlighttheoutcomevariable(SPAIscores)inthislistbyclickingonitandthentransferittotheboxlabelledDependentbyclickingon ordraggingitacross.Wealsoneedtospecifythepredictorvariableforthefirstblock.Wedecidedthatshameshouldbeenteredintothemodelfirst(becausetheoryindicatesthatitisanimportantpredictor),so,highlightthisvariableinthe list and transfer it to the box labelled Independent(s) by clicking on or dragging it across.Underneath the Independent(s) box, there is a drop-down menu for specifying the Method ofregression.Youcanselectadifferentmethodofvariableentryforeachblockbyclickingon ,nexttowhereitsaysMethod.Thedefaultoptionisforcedentry,andthisistheoptionwewant,butifyouwerecarrying

Page 6: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page6

outmoreexploratorywork,youmightdecidetouseoneof thestepwisemethods (forward,backward,stepwiseorremove).

Specifying the Second Block in Hierarchical Regression

Havingspecifiedthefirstblockinthehierarchy,wemoveontotothesecond.Totellthecomputerthatyouwanttospecifyanewblockofpredictorsyoumustclickon .ThisprocessclearstheIndependent(s)boxsothatyoucanenterthenewpredictors(youshouldalsonotethatabovethisboxitnowreadsBlock2of2indicatingthatyouareinthesecondblockofthetwothatyouhavesofarspecified).Wedecidedthatthesecondblockwouldcontainbothofthenewpredictorsandsoyoushouldclickonobqand iii inthevariableslistandtransferthem,onebyone,totheIndependent(s)boxbyclickingon .ThedialogboxshouldnowlooklikeFigure4.Tomovebetweenblocksusethe

and buttons(so,forexample,tomovebacktoblock1,clickon ).

Itispossibletoselectdifferentmethodsofvariableentryfordifferentblocksinahierarchy.So,althoughwespecifiedforcedentryforthefirstblock,wecouldnowspecifyastepwisemethodforthesecond.GiventhatwehavenopreviousresearchregardingtheeffectsofobqandiiionSPAIscores,wemightbejustifiedinrequestingastepwisemethodforthisblock(seeyourlecturenotesandmytextbook).Forthisanalysisselectastepwisemethodforthissecondblock.

Figure4:Maindialogboxforblock2ofthemultipleregression

Statistics

Inthemainregressiondialogboxclickon toopenadialogboxforselectingvariousimportantoptionsrelatingtothemodel(Figure5).Mostoftheseoptionsrelatetotheparametersofthemodel;however,thereareproceduresavailable for checking the assumptionsof nomulticollinearity (Collinearity diagnostics) and independenceof errors(Durbin-Watson).Whenyouhaveselectedthestatisticsyourequire(Irecommendallbutthecovariancematrixasageneralrule)clickon toreturntothemaindialogbox.

® Estimates:Thisoptionisselectedbydefaultbecauseitgivesustheestimatedcoefficientsoftheregressionmodel(i.e.theestimatedb-values).

® Confidence intervals: This optionproduces confidence intervals for eachof theunstandardized regressioncoefficients.

® Modelfit:Thisoptionisvitalandisselectedbydefault.Itprovidesnotonlyastatisticaltestofthemodel’sabilitytopredicttheoutcomevariable(theF-test),butalsothevalueofR(ormultipleR),thecorrespondingR2,andtheadjustedR2.

® Rsquaredchange:ThisoptiondisplaysthechangeinR2resultingfromtheinclusionofanewpredictor(orblockofpredictors).Thismeasureisausefulwaytoassesstheuniquecontributionofnewpredictors(orblocks)toexplainingvarianceintheoutcome.

Page 7: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page7

® Descriptives: If selected, this option displays a table of the mean, standard deviation and number ofobservationsofallofthevariablesincludedintheanalysis.Acorrelationmatrixisalsodisplayedshowingthecorrelationbetweenallof thevariablesand theone-tailedprobability foreachcorrelationcoefficient.Thiscorrelationmatrixcanbeusedtoestablishwhetherthereismulticollinearity.

® Part and partial correlations: This option produces the zero-order correlation (the Pearson correlation)between each predictor and the outcome variable. It also produces the partial correlation between eachpredictorandtheoutcome,controllingforallotherpredictorsinthemodel.

® Collinearitydiagnostics:ThisoptionisforobtainingcollinearitystatisticssuchastheVIF,tolerance,eigenvaluesofthescaled,uncentredcross-productsmatrix,conditionindexesandvarianceproportions(seeField,2013,andyourlecturenotes).

® Durbin-Watson:ThisoptionproducestheDurbin-Watsonteststatistic,whichtestsforcorrelationsbetweenerrors.

® Casewisediagnostics:Thisoptionliststheobservedvalueoftheoutcome,thepredictedvalueoftheoutcome,thedifferencebetweenthesevalues(theresidual)andthisdifferencestandardized.Furthermore,itwill listthesevalueseitherforallcases,orjustforcasesforwhichthestandardizedresidualisgreaterthan3(whenthe±signisignored).Thiscriterionvalueof3canbechanged,andIrecommendchangingitto2forreasonsthatwillbecomeapparent.

Figure5:Statisticsdialogboxforregressionanalysis

Regression Plots

Onceyouarebackinthemaindialogbox,clickon toactivatetheregressionplotsdialogboxshowninFigure6.Thisdialogboxprovidesthemeanstospecifyanumberofgraphs,whichcanhelptoestablishthevalidityofsomeregressionassumptions.Mostoftheseplotsinvolvevariousresidualvalues.Ontheleft-handsideofthedialogboxisalistofseveralvariables:

• DEPENDNT(theoutcomevariable).• *ZPRED(thestandardizedpredictedvaluesofthedependentvariablebasedonthemodel).Thesevaluesare

standardizedformsofthevaluespredictedbythemodel.• *ZRESID (thestandardizedresiduals,orerrors).Thesevaluesarethestandardizeddifferencesbetweenthe

observeddataandthevaluesthatthemodelpredicts).• *DRESID(thedeletedresiduals).• *ADJPRED(theadjustedpredictedvalues).• *SRESID(theStudentizedresidual).• *SDRESID(theStudentizeddeletedresidual).Thisvalueisthedeletedresidualdividedbyitsstandarderror.

Thevariableslistedinthisdialogboxallcomeunderthegeneralheadingofresiduals,andarediscussedindetailinmybook(sorryforalloftheself-referencing,butI’mtryingtocondensea60pagechapterintoamanageablehandout!).Forabasicanalysisitisworthplotting*ZRESID(Y-axis)against*ZPRED(X-axis),becausethisplotisusefultodeterminewhether theassumptionsof randomerrorsandhomoscedasticityhavebeenmet (seeealier).Tocreatetheseplots

Page 8: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page8

selectavariablefromthelist,andtransferittothespacelabelledeitherXorY(whichrefertotheaxes)byclicking .Whenyouhaveselectedtwovariablesforthefirstplot(asisthecaseinFigure6)youcanspecifyanewplotbyclickingon .Thisprocessclearsthespacesinwhichvariablesarespecified.Ifyouclickon andwouldliketoreturntotheplotthatyoulastspecified,thensimplyclickon .

Youcanalsoselectthetick-boxlabelledProduceallpartialplotswhichwillproducescatterplotsoftheresidualsoftheoutcomevariableandeachofthepredictorswhenbothvariablesareregressedseparatelyontheremainingpredictors.Anyobvious outliers on a partial plot represent cases thatmight haveundue influenceon a predictor’s regressioncoefficient.Also,non-linear relationshipsbetweenapredictorandtheoutcomevariablearemuchmoredetectableusing theseplots. Finally, theyare ausefulwayofdetecting collinearity. Thereare several options forplotsof thestandardized residuals. First, you can select a histogram of the standardized residuals (this is extremely useful forcheckingtheassumptionofnormalityoferrors).Second,youcanaskforanormalprobabilityplot,whichalsoprovidesinformationaboutwhethertheresidualsinthemodelarenormallydistributed.Whenyouhaveselectedtheoptionsyourequire,clickon totakeyoubacktothemainregressiondialogbox.

Figure6:Linearregression:plotsdialogbox

Saving Regression Diagnostics

Inthisweek’slecturewemettwotypesofregressiondiagnostics:thosethathelpusassesshowwellourmodelfitsoursampleandthosethathelpusdetectcasesthathavealargeinfluenceonthemodelgenerated.InSPSSwecanchoosetosavethesediagnosticvariablesinthedataeditor(so,SPSSwillcalculatethemandthencreatenewcolumnsinthedataeditorinwhichthevaluesareplaced).

Clickon inthemainregressiondialogboxtoactivatethesavenewvariablesdialogbox(seeFigure7).Oncethisdialogboxisactive,itisasimplemattertoticktheboxesnexttotherequiredstatistics.MostoftheavailableoptionsareexplainedinField(2013)andFigure7shows,whatIconsidertobe,abareminimumsetofdiagnosticstatistics.StandardizedversionsofthesediagnosticsaregenerallyeasiertointerpretandsoIsuggestselectingtheminpreferencetotheunstandardizedversions.Oncetheregressionhasbeenrun,SPSScreatesacolumninyourdataeditorforeachstatistic requestedand ithasastandardsetofvariablenames todescribeeachone (zpr_1: standardizedpredictedvalue;zre_1:standardizedresidual;coo_1:Cook’sdistance).Afterthename,therewillbeanumberthatreferstotheanalysisthathasbeenrun.So,forthefirstregressionrunonadatasetthevariablenameswillbefollowedbya1,ifyoucarryoutasecondregressionitwillcreateanewsetofvariableswithnamesfollowedbya2andsoon.Whenyouhaveselected the diagnostics you require (by clicking in the appropriate boxes), click on to return to the mainregressiondialogbox.

Page 9: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page9

Figure7:Dialogboxforregressiondiagnostics

Bootstrapping

We can get bootstrapped confidence intervals for the regression coefficients by clicking (see last week’shandout).However,thisfunctiondoesn’tworkwhenwehaveusedthe optiontosaveresiduals,sowecan’tuseitnow.However,onceyouhaveruntheanalysisandinspectedtheresidualsandinfluentialcases,youmightwanttore-runtheanalysisselectingthebootstrapoption(andrememberingtodeselectalloftheoptionsforsavingvariables).

A Brief Guide to Interpretation

Model Summary

Themodelsummary(Output1)containstwomodels.Model1referstothefirststageinthehierarchywhenonlyTOSCAisusedasapredictor.Model2referstothefinalmodel(TOSCA,andOBQandIIIiftheyendupbeingincluded).

® In the column labelledR are the valuesof themultiple correlation coefficient between thepredictors and theoutcome.WhenonlyTOSCAisusedasapredictor,thisisthesimplecorrelationbetweenSPAIandTOSCA(0.34).

® The next column gives us a value of R2,which is ameasure of howmuch of the variability in the outcome isaccountedforbythepredictors.Forthefirstmodelitsvalueis0.116,whichmeansthatTOSCAaccountsfor11.6%ofthevariationinsocialanxiety.However,forthefinalmodel(model2),thisvalueincreasesto0.157or15.7%ofthevarianceinSPAI.Therefore,whatevervariablesenterthemodelinblock2accountforanextra(15.7-11.6)4.1%ofthevariance inSPAIscores (this isalsothevalue inthecolumn labelledR-squarechangebutexpressedasapercentage).

® TheadjustedR2givesussomeideaofhowwellourmodelgeneralizesandideallywewouldlikeitsvaluetobethesame,orverycloseto,thevalueofR2.Inthisexamplethedifferenceforthefinalmodelisafairbit(0.157–0.143=0.014or1.4%).Thisshrinkagemeansthatifthemodelwerederivedfromthepopulationratherthanasampleitwouldaccountforapproximately1.4%lessvarianceintheoutcome.

® Finally, ifyourequestedtheDurbin-Watsonstatistic itwillbefoundinthe lastcolumn.Thisstatistic informsusaboutwhethertheassumptionofindependenterrorsistenable.Thecloserto2thatthevalueis,thebetter,andforthesedatathevalueis2.084,whichissocloseto2thattheassumptionhasalmostcertainlybeenmet.

Page 10: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page10

Output1

ANOVA Table

Output2containsananalysisofvariance(ANOVA)thattestswhetherthemodelissignificantlybetteratpredictingtheoutcomethanusingthemeanasa‘bestguess’.Thistableisagainsplitintotwosections:oneforeachmodel.IftheimprovementduetofittingtheregressionmodelismuchgreaterthantheinaccuracywithinthemodelthenthevalueofFwillbegreaterthan1andSPSScalculatestheexactprobabilityofobtainingthevalueofFatleastthisbigiftherewerenoeffect.FortheinitialmodeltheF-ratiois16.52(p<.001),andforthesecondmodelthevalueofFis11.61,whichisalsohighlysignificant(p<.001).Wecaninterprettheseresultsasmeaningthatthefinalmodelsignificantlyimprovesourabilitytopredicttheoutcomevariable.

Output2

Model Parameters

Thenextpartoftheoutput isconcernedwiththeparametersofthemodel.ThefirststepinourhierarchyincludedTOSCAandalthoughtheseparametersareinterestinguptoapoint,we’remoreinterestedinthefinalmodelbecausethis includesallpredictorsthatmakeasignificantcontributiontopredictingsocialanxiety.So,we’ll lookonlyatthelowerhalfofthetable(Model2).

Inmultipleregressionthemodeltakestheformofanequationthatcontainsacoefficient(b)foreachpredictor.Thefirstpartofthetablegivesusestimatesforthesebvaluesandthesevaluesindicatetheindividualcontributionofeachpredictortothemodel.

Thebvaluestellusabouttherelationshipbetweensocialanxietyandeachpredictor.Ifthevalueispositivewecantellthatthereisapositiverelationshipbetweenthepredictorandtheoutcomewhereasanegativecoefficientrepresentsanegativerelationship.Forthesedatabothpredictorshavepositivebvaluesindicatingpositiverelationships.So,asshame(TOSCA)increases,socialanxietyincreasesandasobsessivebeliefsincreasesodoessocialanxiety;Thebvaluesalsotellustowhatdegreeeachpredictoraffectstheoutcomeiftheeffectsofallotherpredictorsareheldconstant.

Eachofthesebetavalueshasanassociatedstandarderrorindicatingtowhatextentthesevalueswouldvaryacrossdifferentsamples,andthesestandarderrorsareusedtodeterminewhetherornotthebvaluedifferssignificantlyfromzero(usingthet-statistic).Therefore, if thet-testassociatedwithabvalue issignificant (if thevalue inthecolumnlabelledSig. is less than0.05) then thatpredictor ismakinga significantcontribution to themodel.For thismodel,

Model Summaryc

.340a .116 .109 28.38137 .116 16.515 1 126 .000

.396b .157 .143 27.82969 .041 6.045 1 125 .015 2.084

Model12

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change StatisticsDurbin-W

atson

Predictors: (Constant), Shame (TOSCA)a.

Predictors: (Constant), Shame (TOSCA), OCD (Obsessive Beliefs Questionnaire)b.

Dependent Variable: Social Anxiety (SPAI)c.

ANOVAc

13302.700 1 13302.700 16.515 .000a

101493.3 126 805.502114796.0 127

17984.538 2 8992.269 11.611 .000b

96811.431 125 774.491114796.0 127

RegressionResidualTotalRegressionResidualTotal

Model1

2

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Shame (TOSCA)a.

Predictors: (Constant), Shame (TOSCA), OCD (Obsessive Beliefs Questionnaire)b.

Dependent Variable: Social Anxiety (SPAI)c.

Page 11: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page11

Shame(TOSCA),t(125)=3.16,p=.002,andobsessivebeliefs,t(125)=2.46,p=.015,aresignificantpredictorsofsocialanxiety. From themagnitude of the t-statisticswe can see that the Shame (TOSCA) had slightlymore impact thanobsessivebeliefs.Thisconclusionisalsoborneoutbythestandardizedbetavalues,whicharemeasuredinstandarddeviation units and so are directly comparable: the standardized beta values for Shame (TOSCA) is 0.273, and forobsessivebeliefsis0.213.Thistellsusthatshamehasslightlymoreimpactinthemodel.

Output3

Excluded Variables

AteachstageofaregressionanalysisSPSSprovidesasummaryofanyvariablesthathavenotyetbeenenteredintothemodel. Inahierarchicalmodel, this summaryhasdetailsof thevariables thathavebeenspecified tobeentered insubsequentsteps,andinstepwiseregressionthistablecontainssummariesofthevariablesthatSPSSisconsideringenteringintothemodel.Thesummarygivesanestimateofeachpredictor’sbvalueifitwasenteredintotheequationatthispointandcalculatesat-testforthisvalue.Inastepwiseregression,SPSSshouldenterthepredictorwiththehighestt-statisticandwillcontinueenteringpredictorsuntiltherearenoneleftwitht-statisticsthathavesignificancevalueslessthan0.05.Therefore,thefinalmodelmightnotincludeallofthevariablesyouaskedSPSStoenter.

Inthiscaseittellsusthatiftheinterpretationofintrusions(III)isenteredintothemodelitwouldnothaveasignificantimpactonthemodel’sabilitytopredictsocialanxiety,t=–0.049,p=.961.Infactthesignificanceofthisvariableisalmost1 indicating itwouldhavevirtuallyno impactwhatsoever(notealsothat itsbetavalue isextremelyclosetozero!).

Output4

Checking for Bias SPSSproducesasummarytableoftheresidualstatisticsandtheseshouldbeexaminedforextremecases.Output5showsanycasesthathaveastandardizedresiduallessthan−2orgreaterthan2(rememberthatwechangedthedefaultcriterionfrom3to2).Inanordinarysamplewewouldexpect95%ofcasestohavestandardizedresidualswithinabout±2.Wehaveasampleof134,therefore it isreasonabletoexpectabout7cases(5%approx..)tohavestandardizedresidualsoutsideoftheselimits.FromOutput5wecanseethatwehave7cases(5%)thatareoutsideofthelimits:therefore,oursampleisbasicallywhatwewouldexpect. Inaddition,99%ofcasesshouldliewithin±2.5andsowewouldexpectonly1%ofcasestolieoutsideoftheselimits.Fromthecaseslistedhere,itisclearthattwocases(1.5%)

Coefficientsa

-54.368 28.618 -1.900 .060 -111.002 2.26727.448 6.754 .340 4.064 .000 14.081 40.814 .340 .340 .340 1.000 1.000

-51.493 28.086 -1.833 .069 -107.079 4.09422.047 6.978 .273 3.160 .002 8.237 35.856 .340 .272 .260 .901 1.110

6.920 2.815 .213 2.459 .015 1.350 12.491 .299 .215 .202 .901 1.110

(Constant)Shame (TOSCA)(Constant)Shame (TOSCA)OCD (ObsessiveBeliefs Questionnaire)

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound95% Confidence Interval for B

Zero-order Partial PartCorrelations

Tolerance VIFCollinearity Statistics

Dependent Variable: Social Anxiety (SPAI)a.

Excluded Variablesc

.132a

1.515 .132 .134 .917 1.091 .917

.213a

2.459 .015 .215 .901 1.110 .901

-.005b

-.049 .961 -.004 .541 1.849 .531

OCD (Interpretation ofIntrusions Inventory)OCD (ObsessiveBeliefs Questionnaire)OCD (Interpretation ofIntrusions Inventory)

Model1

2

Beta In t Sig.Partial

Correlation Tolerance VIFMinimumTolerance

Collinearity Statistics

Predictors in the Model: (Constant), Shame (TOSCA)a.

Predictors in the Model: (Constant), Shame (TOSCA), OCD (Obsessive Beliefs Questionnaire)b.

Dependent Variable: Social Anxiety (SPAI)c.

Page 12: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page12

lieoutsideofthelimits(cases8and45).Therefore,oursampleappearstoconformroughlytowhatwewouldexpectforafairlyaccuratemodel.Therearealsonostandardizedresidualsgreaterthan3,whichisgoodnews.

WeshouldalsoscanthedataeditortoseeifanycaseshaveCook’sdistance(COO_1)greaterthan1.[YoucouldalsouseSPSStofindthemaximumvalueofCook’sdistancebyusingthedescriptivestatisticscommand].YoushouldfindthatallofCook’sdistancesarebelow1,whichmeansthatnocasesarehavinganundueinfluence.

Output5

Figure8:P-Pplot(topleft),aplotofstandardizedresidualsvs.standardizedpredictedvalues(topright),andpartialplotsofsocialanxietyagainstshame(bottomleft)andOBQ(bottomright)

WecanusehistogramsandP-Pplotstolookfornormalityoftheresiduals.Figure8(topleft)showstheP-Pplotforourmodel. The dots hover fairly close to the diagonal line indicating normality in the residuals. We can look forheteroscedasticity and non-linearity using a plot of standardized residuals against standardized predicted values. IfeverythingisOKthenthisgraphshouldlooklikearandomarrayofdots,ifthegraphfunnelsoutthenthatisasignof

P-P PLot ZResid vs. ZPred

Partial Plot: Shame Partial Plot: OBQ

Page 13: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page13

heteroscedasticityandanycurvesuggestnonlinearity(seeearlier).Figure8(topright)showstheplotforourmodel.Notehowthepointsarerandomlyandevenlydispersedthroughouttheplot.Thispatternisindicativeofasituationinwhichtheassumptionsoflinearityandhomoscedasticityhavebeenmet.ComparethiswiththeexamplesinFigure1.

Figure8alsoshowsthepartialplots,whicharescatterplotsoftheresidualsoftheoutcomevariableandeachofthepredictorswhenbothvariablesareregressedseparatelyontheremainingpredictors.Obviousoutliersonapartialplotrepresentcasesthatmighthaveundueinfluenceonapredictor’sregressioncoefficientandthatnon-linearrelationshipsandheteroscedasticitycanbedetectedusingtheseplotsaswell.Forshame(Figure8bottomleft)thepartialplotshowsthepositiverelationshiptosocialanxiety.Therearenoobviousoutliersonthisplot,butthecloudofdotsisabitfunnel-shaped,possiblyindicatingsomeheteroscedasticity.ForOBQ(Figure8,bottomright)theplotagainshowsapositiverelationshiptosocialanxiety.Therearenoobviousoutliersonthisplot.

Finally,theVIFvaluesarewellbelow10whichreassuresusthatmulticollinearityisnotaproblem.

Writing Up Multiple Regression Analysis Ifyourmodelhasseveralpredictorsthanyoucan’treallybeatasummarytableasaconcisewaytoreportyourmodel.Asabareminimumreportthebetas,theirconfidenceinterval,significancevalueandsomegeneralstatisticsaboutthemodel(suchastheR2).Thestandardizedbetavaluesandthestandarderrorsarealsoveryuseful.So,basically,youwanttoreproducethetablelabelledCoefficientsfromtheSPSSoutputandomitsomeofthenon-essentialinformation.Fortheexampleinthischapterwemightproduceatablelikethatin.

SeeifyoucanlookbackthroughtheSPSSoutputinthischapterandworkoutfromwherethevaluescame.Thingstonoteare:(1)I’veroundedoffto2decimalplacesthroughoutbecausethisisareasonablelevelofprecisiongiventhevariablesmeasured; (2) for thestandardizedbetas there isnozerobefore thedecimalpoint (because thesevaluesshouldn’texceed1)butforallothervalueslessthan1thezeroispresent;(3)oftenyou’llseethesignificanceofthevariableisdenotedbyanasteriskwithafootnotetoindicatethesignificancelevelbeingusedbutit’sbetterpracticetoreportexactp-values;(4)theR2fortheinitialmodelandthechangeinR2(denotedas∆R2)foreachsubsequentstepofthemodelarereportedbelowthetable;and(5)inthetitleIhavementionedthatconfidenceintervalsandstandarderrorsinthetablearebasedonbootstrapping:thisinformationisimportantforreaderstoknow

Table1:Linearmodelofpredictorsofsocialanxiety(SPAI).95%confidenceintervalsreportedinparentheses.

b SEB b pStep1 Constant -54.37

(-111.00,2.27)28.62 p=.06

Shame(TOSCA) 27.45(14.08,40.81)

6.75 .34 p<.001

Step2 Constant −51.49

(-107.08,4.09)28.09 p=.069

Shame(TOSCA) 22.05(8.24,35.86)

6.98 .27 p=.002

OCD(OBQ) 6.92(1.35,12.49)

2.82 .21 p=.015

Note.R2=.12forStep1:∆R2=.04forStep2(ps<.05).

Tasks Task 1 Afashionstudentwasinterestedinfactorsthatpredictedthesalariesofcatwalkmodels.Shecollecteddatafrom231models.Foreachmodelsheaskedthemtheirsalaryperdayondayswhentheywereworking(salary),theirage(age),howmanyyearstheyhadworkedasamodel(years),andthengotapanelofexpertsfrommodellingagenciestoratetheattractivenessofeachmodelasapercentagewith100%beingperfectlyattractive(beauty).ThedataareinthefileSupermodel.savonthecoursewebsite.Conductamultipleregressiontoseewhichfactorspredictamodel’ssalary?(Answerstothistaskcanbefoundatwww.uk.sagepub.com/field4e/study/smartalex/chapter8.pdf).

Page 14: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page14

Howmuchvariancedoesthefinalmodelexplain?

YourAnswers:

Whichvariablessignificantlypredictsalary?

YourAnswers:

FillinthevaluesforthefollowingAPAformattableoftheresults:

b SEb b p

Constant

Age

YearsasaModel

Attractiveness

Note.R2=

Writeouttheregressionequationforthefinalmodel.

YourAnswers:

Aretheresidualsasyouwouldexpectforagoodmodel?

YourAnswers:

Isthereevidenceofnormalityoferrors,homoscedasticityandnomulticollinearity?

Page 15: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page15

YourAnswers:

Task 2 Coldwell,PikeandDunn(2006)investigatedwhetherhouseholdchaospredictedchildren’sproblembehaviouroverandaboveparenting.Theycollecteddatafrom118two-parentfamilies.Foreachfamilytheyrecordedtheageandgenderofboththeolderandyoungersibling;age_child1,gender_child1,age_child1andgender_child2respectively.Theytheninterviewed each child about their relationship with their parent’s using the Berkeley Puppet Interview (BPI). Theinterviewmeasuredeachchild’srelationshipwitheachparentalongtwodimensions:(1)warmth/enjoyment,and(2)anger/hostility.Higherscoresindicatemoreanger/hostilityandwarmth/enjoymentrespectively.EachparentwastheninterviewedabouttheirrelationshipwitheachoftheirchildrenusingTheParent-childRelationshipScale.Thisresultedinscoresforparent-childrelationshippositivityandparent-childrelationshipnegativity.Overall,thesemeasuresresultinalotofvariables:

Mum Dad

Measures Child1 Child2 Child1 Child2

Warmth/Enjoyment mum_warmth_child1 mum_warmth_child2 dad_warmth_child1 dad_warmth_child2

Anger/Hostility mum_anger_child1 mum_anger_child2 dad_anger_child1 dad_anger_child2

PositiveRelationship

mum_pos_child1 mum_pos_child2 dad_pos_child1 dad_pos_child2

NegativeRelationship

mum_neg_child1 mum_neg_child2 dad_neg_child1 dad_neg_child2

Household chaos (chaos) was assessed using the Confusion, Hubbub, And Order Scale (CHAOS). There were twooutcomevariables (one for each child) thatmeasured children’s adjustment (sdq_child1 and sdq_child2) using theStrengthsandDifficultiesQuestionnaire:thehigherthescore,themoreproblembehaviourthechildisreportedtobedisplaying.

ThedataareinthefileCHAOS.savonthecoursewebsite.Totestwhetherhouseholdchaoswaspredictiveofchildren’sproblembehaviouroverandaboveparenting,conductfourhierarchicalregressions:

(1) Maternalrelationshipwithchild1(2) Maternalrelationshipwithchild2(3) Paternalrelationshipwithchild1(4) Paternalrelationshipwithchild2

Eachhierarchicalregressionconsistsofthreesteps.First,enterchildageandchildgenderascontrolvariables.Inthesecondstepaddthevariablesmeasuringparent-childpositivity,parent-childnegativity,parent-childwarmth,parent-childanger.Finally,inthethirdstep,chaosshouldbeadded.Thecrucialtestofthehypothesisliesinthefinalstep.Toconfirmthathouseholdchaosispredictiveofchildren’sproblembehaviouroverandaboveparenting,thisthirdstepmustresultinasignificantR2change.

Page 16: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page16

Whatconclusionscanyoudrawfromtheseanalyses?

YourAnswers:

Look at Coldwell, J., Pike, A.&Dunn, J. (2006).Household chaos - linkswith parenting and childbehaviour.JournalofChildPsychologyandPsychiatry,47,1116-1122.(Onthecoursewebsite).Howdo your results and interpretation compare to those reported?Reflect uponhowyouhaveusedregressionasatooltoansweranimportantpsychologicalquestion.

YourAnswers:

FillinthevaluesforthefollowingAPAformattableoftheresults:

Mother-childrelationship Father-childrelationship

Oldersibling

SDQ

Youngersibling

SDQ

Oldersibling

SDQ

Youngersibling

SDQ

TotalR2= TotalR2= TotalR2= TotalR2=

bDR2 bDR2 bDR2 bDR2

Step1

Childage

Childgender

Page 17: Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page17

Step2

Childage

Childgender

Childrptparent-childpositivity

Childrptparent-childnegativity

Parentrptparent-childpositivity

Parentrptparent-childnegativity

Step3

Childage

Childgender

Childrptparent-childpositivity

Childrptparent-childnegativity

Parentrptparent-childpositivity

Parentrptparent-childnegativity

CHAOS

*p<.05,**p<.01,***p<.001

Task 3 Complete the multiple choice questions for Chapter 8 on the companion website to Field (2013):https://studysites.uk.sagepub.com/field4e/study/mcqs.htm.Ifyougetanywrong,re-readthishandout(orField,2013,Chapter8)anddothemagainuntilyougetthemallcorrect.

Task 4 Gobacktotheoutputforlastweek’stask(doeslisteningtoheavymetalpredictsuiciderisk).Isthemodelvalid(i.e.arealloftheassumptionsmet?)?

References Berry,W.D.(1993).Understandingregressionassumptions.Sageuniversitypaperseriesonquantitativeapplicationsin

thesocialsciences,07–092.NewburyPark,CA:Sage.

Cook,R.D.,&Weisberg,S.(1982).Residualsandinfluenceinregression.NewYork:Chapman&Hall.

Durbin,J.,&Watson,G.S.(1951).Testingforserialcorrelationinleastsquaresregression,II.Biometrika,30,159-178.

Field,A.P.(2013).DiscoveringstatisticsusingIBMSPSSStatistics:Andsexanddrugsandrock'n'roll(4thed.).London:Sage.

Terms of Use Thishandoutcontainsmaterialfrom:

Field,A.P.(2013).DiscoveringstatisticsusingSPSS:andsexanddrugsandrock‘n’roll(4thEdition).London:Sage.

ThismaterialiscopyrightAndyField(2000-2016).

This document is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 InternationalLicense (https://creativecommons.org/licenses/by-nc-nd/4.0/), basically you can use it for teaching and non-profitactivitiesbutnotmeddlewithitwithoutpermissionfromtheauthor.