Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson

©Prof.AndyField,2016 www.discoveringstatistics.com Page1

Linear Models: Looking for Bias The following sections have been adapted from Field (2013) Chapter 8. These sections have been edited downconsiderablyandIsuggest(especiallyifyou’reconfused)thatyoureadthisChapterinitsentirety.Youwillalsoneedtoreadthischaptertohelpyouinterprettheoutput.Ifyou’rehavingproblemsthereisplentyofsupportavailable:youcan(1)emailorseeyourseminartutor(2)postamessageonthecoursebulletinboardor(3)dropintomyofficehour.

More on Bias Outliers Wehaveseenthatoutlierscanbiasamodel:theybiasestimatesoftheregressionparameters.weknowthatanoutlier,byitsnature,isverydifferentfromalloftheotherscores.Therefore,ifweweretoworkoutthedifferencesbetweenthedatavaluesthatwerecollected,andthevaluespredictedbythemodel,wecoulddetectanoutlierbylookingforlargedifferences.Thedifferencesbetweenthevaluesoftheoutcomepredictedbythemodelandthevaluesoftheoutcomeobservedinthesamplearecalledresiduals.Ifamodelisapoorfitofthesampledatathentheresidualswillbelarge.Also,ifanycasesstandoutashavingalargeresidual,thentheycouldbeoutliers.

Thenormalorunstandardizedresidualsdescribedabovearemeasuredinthesameunitsastheoutcomevariableandso are difficult to interpret across differentmodels. All we can do is to look for residuals that stand out as beingparticularly large:wecannotdefineauniversalcut-offpoint forwhatconstitutesa largeresidual.Toovercomethisproblem,weusestandardizedresiduals,whicharetheresidualsconvertedtoz-scores,whichmeanstheyareconvertedintostandarddeviationunits(i.e.,theyaredistributedaroundameanof0withastandarddeviationof1).Byconvertingresidualsintoz-scores(standardizedresiduals)wecancompareresidualsfromdifferentmodelsandusewhatweknowaboutthepropertiesof z-scorestodeviseuniversalguidelinesforwhatconstitutesanacceptable(orunacceptable)value.Forexample,inanormallydistributedsample,95%ofz-scoresshouldliebetween−1.96and+1.96,99%shouldliebetween−2.58and+2.58,and99.9%(i.e.,nearlyallofthem)shouldliebetween−3.29and+3.29.Somegeneralrulesforstandardizedresidualsarederivedfromthesefacts:(1)standardizedresidualswithanabsolutevaluegreaterthan3.29(wecanuse3asanapproximation)arecauseforconcernbecauseinanaveragesampleavaluethishighisunlikelytooccur;(2)ifmorethan1%ofoursamplecaseshavestandardizedresidualswithanabsolutevaluegreaterthan2.58(weusuallyjustsay2.5)thereisevidencethattheleveloferrorwithinourmodelisunacceptable(themodelisafairlypoorfitofthesampledata);and(3)ifmorethan5%ofcaseshavestandardizedresidualswithanabsolutevaluegreaterthan1.96(wecanuse2forconvenience)thenthereisalsoevidencethatthemodelisapoorrepresentationoftheactualdata.

Influential Cases Aswellastestingforoutliersbylookingattheerrorinthemodel,itisalsopossibletolookatwhethercertaincasesexertundue influenceover theparametersof themodel. So, ifwewere todeleteacertaincase,wouldweobtaindifferentregressioncoefficients?Thistypeofanalysiscanhelptodeterminewhethertheregressionmodelisstableacrossthesample,orwhetheritisbiasedbyafewinfluentialcases.Therearenumerouswaystolookforinfluentialcases,alldescribedinscintillatingdetailinField(2013).We’lljustlookat1ofthem,Cook’sdistance,whichquantifiestheeffectofasinglecaseonthemodelasawhole.CookandWeisberg(1982)havesuggestedthatvaluesgreaterthan1maybecauseforconcern.

Generalization Rememberfromyourlectureonbiasthatlinearmodelsassume:

• Linearityandadditivity:therelationshipyou’retryingtomodelis,infact,linearandwithseveralpredictors,theycombineadditively.

• Normality: For b estimates to be optimal the residuals should be normally distributed. For p-values andconfidenceintervalstobeaccurate,thesamplingdistributionofbsshouldbenormal.

• Homoscedasticity:necessaryforbestimatestobeoptimalandsignificancetestsandCIsoftheparameterstobeaccurate.


However,therearesomeotherassumptionsthatareimportantifwewanttogeneralizethemodelwefitbeyondoutsample.Themostimportantis:

• Independenterrors:Foranytwoobservationstheresidualtermsshouldbeuncorrelated(i.e.,independent).This eventuality is sometimes described as a lack of autocorrelation. If we violate the assumption ofindependencethenourconfidenceintervalsandsignificancetestswillbeinvalid.ThisassumptioncanbetestedwiththeDurbin-Watsontest(1951).Theteststatisticcanvarybetween0and4withavalueof2meaningthatthe residuals are uncorrelated. A value greater than 2 indicates a negative correlation between adjacentresiduals,whereasavaluebelow2 indicatesapositive correlation.The sizeof theDurbin-Watsonstatisticdependsuponthenumberofpredictorsinthemodelandthenumberofobservations.Asaveryconservativeruleofthumb,valueslessthan1orgreaterthan3aredefinitelycauseforconcern;however,valuescloserto2maystillbeproblematicdependingonyoursampleandmodel.

Therearesomeotherconsiderationsthatwehavenotyetdiscussed(seeBerry,1993):

• Predictorsareuncorrelatedwith‘externalvariables’:Externalvariablesarevariablesthathaven’tbeenincludedintheregressionmodelwhichinfluencetheoutcomevariable.

• Variabletypes:Allpredictorvariablesmustbequantitativeorcategorical(with2categories),andtheoutcomevariablemustbequantitative,continuousandunbounded.

• Noperfectmulticollinearity:Ifyourmodelhasmorethanonepredictorthenthereshouldbenoperfectlinearrelationshipbetweentwoormoreofthepredictors.So,thepredictorvariablesshouldnotcorrelatetoo.

• Non-zerovariance:Thepredictorsshouldhavesomevariationinvalue(i.e.,theydonothavevariancesof0).Thisisself-evidentreally.

Figure1:Plotsofstandardizedresidualsagainstpredicted(fitted)values

Thefourmostimportantconditionsarelinearityandadditivity,normality,homoscedasticity,andindependenterrors.Thesecanbetestedgraphicallyusingaplotofstandardizedresiduals (zresid)againststandardizedpredictedvalues(zpred).Figure1showsseveralexamplesoftheplotofstandardizedresidualsagainststandardizedpredictedvalues.Thetopleftpanelshowsasituationinwhichtheassumptionsoflinearity,independenterrorsandhomoscedasticityhavebeenmet.Independenterrorsareshownbyarandompatternofdots.Thetoprightpanelshowsasimilarplotfor


adatasetthatviolatestheassumptionofhomoscedasticity.Notethatthepointsformafunnel:theybecomemorespreadoutacrossthegraph.Thisfunnelshapeistypicalofheteroscedasticityandindicatesincreasingvarianceacrosstheresiduals.Thebottomleftpanelshowsaplotofsomedatainwhichthereisanon-linearrelationshipbetweentheoutcomeandthepredictor:thereisaclearcurveintheresiduals.Finally,thebottomrightpanelillustratesdatathatnotonlyhaveanon-linearrelationship,butalsoshowheteroscedasticity.Notefirstthecurvedtrendintheresiduals,andthenalsonotethatatoneendoftheplotthepointsareveryclosetogetherwhereasattheotherendtheyarewidelydispersed.Whentheseassumptionshavebeenviolatedyouwillnotseetheseexactpatterns,buthopefullytheseplotswillhelpyoutounderstandthegeneralanomaliesyoushouldlookoutfor.

Methods of Regression Lastweekwelookedatasituationwhereweforcedpredictorsintothemodel.However,thereareotheroptions.Wecanselectpredictorsinseveralways:

• Inhierarchicalregressionpredictorsareselectedbasedonpastworkandtheresearcherdecidesinwhichordertoenterthepredictorsintothemodel.Asageneralrule,knownpredictors(fromotherresearch)shouldbeenteredintothemodelfirstinorderoftheirimportanceinpredictingtheoutcome.Afterknownpredictorshave been entered, the experimenter can add any newpredictors into themodel.Newpredictors can beenteredeitherallinonego,inastepwisemanner,orhierarchically(suchthatthenewpredictorsuspectedtobethemostimportantisenteredfirst).

• Forcedentry (orEnteras it isknowninSPSS) isamethodinwhichallpredictorsareforcedintothemodelsimultaneously. Like hierarchical, this method relies on good theoretical reasons for including the chosenpredictors,butunlikehierarchicaltheexperimentermakesnodecisionabouttheorderinwhichvariablesareentered.

• Stepwisemethodsaregenerally frowneduponbystatisticians. Instepwiseregressionsdecisionsabouttheorder inwhichpredictorsareentered into themodelarebasedonapurelymathematical criterion. In theforwardmethod,aninitialmodelisdefinedthatcontainsonlytheconstant(b0).Thecomputerthensearchesforthepredictor(outoftheonesavailable)thatbestpredictstheoutcomevariable—itdoesthisbyselectingthepredictorthathasthehighestsimplecorrelationwiththeoutcome.Ifthispredictorsignificantlyimprovestheabilityofthemodeltopredicttheoutcome,thenthispredictorisretainedinthemodelandthecomputersearchesforasecondpredictor.Thecriterionusedforselectingthissecondpredictoristhatitisthevariablethathasthelargestsemi-partialcorrelationwiththeoutcome.InplainEnglish,imaginethatthefirstpredictorcanexplain40%ofthevariationintheoutcomevariable;thenthereisstill60%leftunexplained.Thecomputersearchesforthepredictorthatcanexplainthebiggestpartoftheremaining60%(itisnotinterestedinthe40% that is already explained). As such, this semi-partial correlation gives a measure of how much ‘newvariance’intheoutcomecanbeexplainedbyeachremainingpredictor.Thepredictorthataccountsforthemostnewvarianceisaddedtothemodeland,ifitmakesasignificantcontributiontothepredictivepowerofthemodel,itisretainedandanotherpredictorisconsidered.

Many writers argue that stepwise methods take the important methodological decisions out of the hands of theresearcher.What’smore,themodelsderivedbystepwisemethodsoftentakeadvantageofrandomsamplingvariationandsodecisionsaboutwhichvariablesshouldbeincludedwillbebaseduponslightdifferencesintheirsemi-partialcorrelation.However,theseslightstatisticaldifferencesmaycontrastdramaticallywiththetheoreticalimportanceofapredictortothemodel.Thereisalsothedangerofover-fitting(havingtoomanyvariablesinthemodelthatessentiallymake little contribution topredicting theoutcome) andunder-fitting (leavingout importantpredictors) themodel.However,whenlittletheoryexistsstepwisemethodsmightbetheonlypracticaloption.

The Example We’lllookatdatacollectedfromseveralquestionnairesrelatingtoclinicalpsychology,andwewillusethesemeasuresto predict social anxiety usingmultiple regression.Anxiety disorders takeondifferent shapes and forms, and eachdisorderisbelievedtobedistinctandhaveuniquecauses.Wecansummarisethedisordersandsomepopulartheoriesasfollows:


• SocialAnxiety: Social anxietydisorder is amarkedandpersistent fearof1ormore social orperformancesituationsinwhichthepersonisexposedtounfamiliarpeopleorpossiblescrutinybyothers.Thisanxietyleadstoavoidanceofthesesituations.Peoplewithsocialphobiaarebelievedtofeelelevatedfeelingsofshame.

• ObsessiveCompulsiveDisorder(OCD):OCDischaracterisedbytheeverydayintrusionintoconsciousthinkingof intense, repetitive,personallyabhorrent,absurdandalien thoughts (Obsessions), leading to theendlessrepetition of specific acts or to the rehearsal of bizarre and irrational mental and behavioural rituals(compulsions).

Socialanxietyandobsessivecompulsivedisorderareseenasdistinctdisordershavingdifferentcauses.However,therearesomesimilarities.

• Theybothinvolvesomekindofattentionalbias:attentiontobodilysensationinsocialanxietyandattentiontothingsthatcouldhavenegativeconsequencesinOCD.

• Theybothinvolverepetitivethinkingstyles:socialphobicsruminateaboutsocialencountersaftertheevent(knownaspost-eventprocessing),andpeoplewithOCDhaverecurringintrusivethoughtsandimages.

• Theybothinvolvesafetybehaviours(i.e.tryingtoavoidthethingthatmakesyouanxious).

This might lead us to think that, rather than being different disorders, they are manifestations of the same coreprocesses.Onewaytoresearchthispossibilitywouldbetoseewhethersocialanxietycanbepredictedfrommeasuresofotheranxietydisorders.IfsocialanxietydisorderandOCDaredistinctweshouldexpectthatmeasuresofOCDwillnotpredictsocialanxiety.However,iftherearecoreprocessesunderlyingallanxietydisorders,thenmeasuresofOCDshouldpredictsocialanxiety.

Figure2:Datalayoutformultipleregression

ThedataareinthefileSocialAnxietyRegression.savwhichcanbedownloadedfromStudyDirect.Thisfilecontainsfourvariables:

• TheSocialPhobiaandAnxietyInventory(SPAI),whichmeasureslevelsofsocialanxiety.

• InterpretationofIntrusionsInventory(III),whichmeasuresthedegreetowhichapersonexperiencesintrusivethoughtslikethosefoundinOCD.

• Obsessive BeliefsQuestionnaire (OBQ),whichmeasures the degree towhich people experience obsessivebeliefslikethosefoundinOCD.

• TheTestofSelf-ConsciousAffect(TOSCA),whichmeasuresshame.

Eachof134peoplewasadministeredall fourquestionnaires. You shouldnote thateachquestionnairehas itsowncolumnandeachrowrepresentsadifferentperson(seeFigure2).


What analysis will we do? Weare going todoamultiple regressionanalysis. Specifically,we’re going todoahierarchicalmultiple regressionanalysis.Allthismeansisthatweentervariablesintotheregressionmodelinanorderdeterminedbypastresearchandexpectations.So,foryouranalysis,wewillentervariablesinso-called‘blocks’:

• Block1: the firstblockwill containanypredictors thatweexpect topredict socialanxiety.Thesevariablesshouldbeenteredusingforcedentry.Inthisexamplewehaveonlyonevariablethatweexpect,theoretically,topredictsocialanxietyandthatisshame(measuredbytheTOSCA).

• Block2:thesecondblockwillcontainourexploratorypredictorvariables(theone’swedon’tnecessarilyexpecttopredictsocialanxiety).ThisbockshouldcontainthemeasuresofOCD(OBQandIII)becausethesevariablesshouldn’tpredictsocialanxietyifsocialanxietyisindeeddistinctfromOCD.Thesevariablesshouldbeenteredusingastepwisemethodbecauseweare‘exploringthem’(thinkbacktoyourlecture).

Doing Multiple Regression on SPSS

Specifying the First Block in Hierarchical Regression

Theoryindicatesthatshameisasignificantpredictorofsocialphobia,andsothisvariableshouldbeincludedinthemodel first. The exploratory variables (obq and iii) should, therefore, be entered into themodel after shame. Thismethodiscalledhierarchical(theresearcherdecidesinwhichordertoentervariablesintothemodelbasedonpastresearch).TodoahierarchicalregressioninSPSSweenterthevariablesinblocks(eachblockrepresentingonestepinthehierarchy).Togettothemainregressiondialogboxselect .ThemaindialogboxisshowninFigure3.

Figure3:Maindialogboxforblock1ofthemultipleregressionThemaindialogboxisfairlyself-explanatoryinthatthereisaspacetospecifythedependentvariable(outcome),andaspacetoplaceoneormoreindependentvariables(predictorvariables).Asusual,thevariablesinthedataeditorarelistedontheleft-handsideofthebox.Highlighttheoutcomevariable(SPAIscores)inthislistbyclickingonitandthentransferittotheboxlabelledDependentbyclickingon ordraggingitacross.Wealsoneedtospecifythepredictorvariableforthefirstblock.Wedecidedthatshameshouldbeenteredintothemodelfirst(becausetheoryindicatesthatitisanimportantpredictor),so,highlightthisvariableinthe list and transfer it to the box labelled Independent(s) by clicking on or dragging it across.Underneath the Independent(s) box, there is a drop-down menu for specifying the Method ofregression.Youcanselectadifferentmethodofvariableentryforeachblockbyclickingon ,nexttowhereitsaysMethod.Thedefaultoptionisforcedentry,andthisistheoptionwewant,butifyouwerecarrying


outmoreexploratorywork,youmightdecidetouseoneof thestepwisemethods (forward,backward,stepwiseorremove).

Specifying the Second Block in Hierarchical Regression

Havingspecifiedthefirstblockinthehierarchy,wemoveontotothesecond.Totellthecomputerthatyouwanttospecifyanewblockofpredictorsyoumustclickon .ThisprocessclearstheIndependent(s)boxsothatyoucanenterthenewpredictors(youshouldalsonotethatabovethisboxitnowreadsBlock2of2indicatingthatyouareinthesecondblockofthetwothatyouhavesofarspecified).Wedecidedthatthesecondblockwouldcontainbothofthenewpredictorsandsoyoushouldclickonobqand iii inthevariableslistandtransferthem,onebyone,totheIndependent(s)boxbyclickingon .ThedialogboxshouldnowlooklikeFigure4.Tomovebetweenblocksusethe

and buttons(so,forexample,tomovebacktoblock1,clickon ).

Itispossibletoselectdifferentmethodsofvariableentryfordifferentblocksinahierarchy.So,althoughwespecifiedforcedentryforthefirstblock,wecouldnowspecifyastepwisemethodforthesecond.GiventhatwehavenopreviousresearchregardingtheeffectsofobqandiiionSPAIscores,wemightbejustifiedinrequestingastepwisemethodforthisblock(seeyourlecturenotesandmytextbook).Forthisanalysisselectastepwisemethodforthissecondblock.

Figure4:Maindialogboxforblock2ofthemultipleregression

Statistics

Inthemainregressiondialogboxclickon toopenadialogboxforselectingvariousimportantoptionsrelatingtothemodel(Figure5).Mostoftheseoptionsrelatetotheparametersofthemodel;however,thereareproceduresavailable for checking the assumptionsof nomulticollinearity (Collinearity diagnostics) and independenceof errors(Durbin-Watson).Whenyouhaveselectedthestatisticsyourequire(Irecommendallbutthecovariancematrixasageneralrule)clickon toreturntothemaindialogbox.

® Estimates:Thisoptionisselectedbydefaultbecauseitgivesustheestimatedcoefficientsoftheregressionmodel(i.e.theestimatedb-values).

® Confidence intervals: This optionproduces confidence intervals for eachof theunstandardized regressioncoefficients.

® Modelfit:Thisoptionisvitalandisselectedbydefault.Itprovidesnotonlyastatisticaltestofthemodel’sabilitytopredicttheoutcomevariable(theF-test),butalsothevalueofR(ormultipleR),thecorrespondingR2,andtheadjustedR2.

® Rsquaredchange:ThisoptiondisplaysthechangeinR2resultingfromtheinclusionofanewpredictor(orblockofpredictors).Thismeasureisausefulwaytoassesstheuniquecontributionofnewpredictors(orblocks)toexplainingvarianceintheoutcome.


® Descriptives: If selected, this option displays a table of the mean, standard deviation and number ofobservationsofallofthevariablesincludedintheanalysis.Acorrelationmatrixisalsodisplayedshowingthecorrelationbetweenallof thevariablesand theone-tailedprobability foreachcorrelationcoefficient.Thiscorrelationmatrixcanbeusedtoestablishwhetherthereismulticollinearity.

® Part and partial correlations: This option produces the zero-order correlation (the Pearson correlation)between each predictor and the outcome variable. It also produces the partial correlation between eachpredictorandtheoutcome,controllingforallotherpredictorsinthemodel.

® Collinearitydiagnostics:ThisoptionisforobtainingcollinearitystatisticssuchastheVIF,tolerance,eigenvaluesofthescaled,uncentredcross-productsmatrix,conditionindexesandvarianceproportions(seeField,2013,andyourlecturenotes).

® Durbin-Watson:ThisoptionproducestheDurbin-Watsonteststatistic,whichtestsforcorrelationsbetweenerrors.

® Casewisediagnostics:Thisoptionliststheobservedvalueoftheoutcome,thepredictedvalueoftheoutcome,thedifferencebetweenthesevalues(theresidual)andthisdifferencestandardized.Furthermore,itwill listthesevalueseitherforallcases,orjustforcasesforwhichthestandardizedresidualisgreaterthan3(whenthe±signisignored).Thiscriterionvalueof3canbechanged,andIrecommendchangingitto2forreasonsthatwillbecomeapparent.

Figure5:Statisticsdialogboxforregressionanalysis

Regression Plots

Onceyouarebackinthemaindialogbox,clickon toactivatetheregressionplotsdialogboxshowninFigure6.Thisdialogboxprovidesthemeanstospecifyanumberofgraphs,whichcanhelptoestablishthevalidityofsomeregressionassumptions.Mostoftheseplotsinvolvevariousresidualvalues.Ontheleft-handsideofthedialogboxisalistofseveralvariables:

• DEPENDNT(theoutcomevariable).• *ZPRED(thestandardizedpredictedvaluesofthedependentvariablebasedonthemodel).Thesevaluesare

standardizedformsofthevaluespredictedbythemodel.• *ZRESID (thestandardizedresiduals,orerrors).Thesevaluesarethestandardizeddifferencesbetweenthe

observeddataandthevaluesthatthemodelpredicts).• *DRESID(thedeletedresiduals).• *ADJPRED(theadjustedpredictedvalues).• *SRESID(theStudentizedresidual).• *SDRESID(theStudentizeddeletedresidual).Thisvalueisthedeletedresidualdividedbyitsstandarderror.

Thevariableslistedinthisdialogboxallcomeunderthegeneralheadingofresiduals,andarediscussedindetailinmybook(sorryforalloftheself-referencing,butI’mtryingtocondensea60pagechapterintoamanageablehandout!).Forabasicanalysisitisworthplotting*ZRESID(Y-axis)against*ZPRED(X-axis),becausethisplotisusefultodeterminewhether theassumptionsof randomerrorsandhomoscedasticityhavebeenmet (seeealier).Tocreatetheseplots


selectavariablefromthelist,andtransferittothespacelabelledeitherXorY(whichrefertotheaxes)byclicking .Whenyouhaveselectedtwovariablesforthefirstplot(asisthecaseinFigure6)youcanspecifyanewplotbyclickingon .Thisprocessclearsthespacesinwhichvariablesarespecified.Ifyouclickon andwouldliketoreturntotheplotthatyoulastspecified,thensimplyclickon .

Youcanalsoselectthetick-boxlabelledProduceallpartialplotswhichwillproducescatterplotsoftheresidualsoftheoutcomevariableandeachofthepredictorswhenbothvariablesareregressedseparatelyontheremainingpredictors.Anyobvious outliers on a partial plot represent cases thatmight haveundue influenceon a predictor’s regressioncoefficient.Also,non-linear relationshipsbetweenapredictorandtheoutcomevariablearemuchmoredetectableusing theseplots. Finally, theyare ausefulwayofdetecting collinearity. Thereare several options forplotsof thestandardized residuals. First, you can select a histogram of the standardized residuals (this is extremely useful forcheckingtheassumptionofnormalityoferrors).Second,youcanaskforanormalprobabilityplot,whichalsoprovidesinformationaboutwhethertheresidualsinthemodelarenormallydistributed.Whenyouhaveselectedtheoptionsyourequire,clickon totakeyoubacktothemainregressiondialogbox.

Figure6:Linearregression:plotsdialogbox

Saving Regression Diagnostics

Inthisweek’slecturewemettwotypesofregressiondiagnostics:thosethathelpusassesshowwellourmodelfitsoursampleandthosethathelpusdetectcasesthathavealargeinfluenceonthemodelgenerated.InSPSSwecanchoosetosavethesediagnosticvariablesinthedataeditor(so,SPSSwillcalculatethemandthencreatenewcolumnsinthedataeditorinwhichthevaluesareplaced).

Clickon inthemainregressiondialogboxtoactivatethesavenewvariablesdialogbox(seeFigure7).Oncethisdialogboxisactive,itisasimplemattertoticktheboxesnexttotherequiredstatistics.MostoftheavailableoptionsareexplainedinField(2013)andFigure7shows,whatIconsidertobe,abareminimumsetofdiagnosticstatistics.StandardizedversionsofthesediagnosticsaregenerallyeasiertointerpretandsoIsuggestselectingtheminpreferencetotheunstandardizedversions.Oncetheregressionhasbeenrun,SPSScreatesacolumninyourdataeditorforeachstatistic requestedand ithasastandardsetofvariablenames todescribeeachone (zpr_1: standardizedpredictedvalue;zre_1:standardizedresidual;coo_1:Cook’sdistance).Afterthename,therewillbeanumberthatreferstotheanalysisthathasbeenrun.So,forthefirstregressionrunonadatasetthevariablenameswillbefollowedbya1,ifyoucarryoutasecondregressionitwillcreateanewsetofvariableswithnamesfollowedbya2andsoon.Whenyouhaveselected the diagnostics you require (by clicking in the appropriate boxes), click on to return to the mainregressiondialogbox.


Figure7:Dialogboxforregressiondiagnostics

Bootstrapping

We can get bootstrapped confidence intervals for the regression coefficients by clicking (see last week’shandout).However,thisfunctiondoesn’tworkwhenwehaveusedthe optiontosaveresiduals,sowecan’tuseitnow.However,onceyouhaveruntheanalysisandinspectedtheresidualsandinfluentialcases,youmightwanttore-runtheanalysisselectingthebootstrapoption(andrememberingtodeselectalloftheoptionsforsavingvariables).

A Brief Guide to Interpretation

Model Summary

Themodelsummary(Output1)containstwomodels.Model1referstothefirststageinthehierarchywhenonlyTOSCAisusedasapredictor.Model2referstothefinalmodel(TOSCA,andOBQandIIIiftheyendupbeingincluded).

® In the column labelledR are the valuesof themultiple correlation coefficient between thepredictors and theoutcome.WhenonlyTOSCAisusedasapredictor,thisisthesimplecorrelationbetweenSPAIandTOSCA(0.34).

® The next column gives us a value of R2,which is ameasure of howmuch of the variability in the outcome isaccountedforbythepredictors.Forthefirstmodelitsvalueis0.116,whichmeansthatTOSCAaccountsfor11.6%ofthevariationinsocialanxiety.However,forthefinalmodel(model2),thisvalueincreasesto0.157or15.7%ofthevarianceinSPAI.Therefore,whatevervariablesenterthemodelinblock2accountforanextra(15.7-11.6)4.1%ofthevariance inSPAIscores (this isalsothevalue inthecolumn labelledR-squarechangebutexpressedasapercentage).

® TheadjustedR2givesussomeideaofhowwellourmodelgeneralizesandideallywewouldlikeitsvaluetobethesame,orverycloseto,thevalueofR2.Inthisexamplethedifferenceforthefinalmodelisafairbit(0.157–0.143=0.014or1.4%).Thisshrinkagemeansthatifthemodelwerederivedfromthepopulationratherthanasampleitwouldaccountforapproximately1.4%lessvarianceintheoutcome.

® Finally, ifyourequestedtheDurbin-Watsonstatistic itwillbefoundinthe lastcolumn.Thisstatistic informsusaboutwhethertheassumptionofindependenterrorsistenable.Thecloserto2thatthevalueis,thebetter,andforthesedatathevalueis2.084,whichissocloseto2thattheassumptionhasalmostcertainlybeenmet.


Output1

ANOVA Table

Output2containsananalysisofvariance(ANOVA)thattestswhetherthemodelissignificantlybetteratpredictingtheoutcomethanusingthemeanasa‘bestguess’.Thistableisagainsplitintotwosections:oneforeachmodel.IftheimprovementduetofittingtheregressionmodelismuchgreaterthantheinaccuracywithinthemodelthenthevalueofFwillbegreaterthan1andSPSScalculatestheexactprobabilityofobtainingthevalueofFatleastthisbigiftherewerenoeffect.FortheinitialmodeltheF-ratiois16.52(p<.001),andforthesecondmodelthevalueofFis11.61,whichisalsohighlysignificant(p<.001).Wecaninterprettheseresultsasmeaningthatthefinalmodelsignificantlyimprovesourabilitytopredicttheoutcomevariable.

Output2

Model Parameters

Thenextpartoftheoutput isconcernedwiththeparametersofthemodel.ThefirststepinourhierarchyincludedTOSCAandalthoughtheseparametersareinterestinguptoapoint,we’remoreinterestedinthefinalmodelbecausethis includesallpredictorsthatmakeasignificantcontributiontopredictingsocialanxiety.So,we’ll lookonlyatthelowerhalfofthetable(Model2).

Inmultipleregressionthemodeltakestheformofanequationthatcontainsacoefficient(b)foreachpredictor.Thefirstpartofthetablegivesusestimatesforthesebvaluesandthesevaluesindicatetheindividualcontributionofeachpredictortothemodel.

Thebvaluestellusabouttherelationshipbetweensocialanxietyandeachpredictor.Ifthevalueispositivewecantellthatthereisapositiverelationshipbetweenthepredictorandtheoutcomewhereasanegativecoefficientrepresentsanegativerelationship.Forthesedatabothpredictorshavepositivebvaluesindicatingpositiverelationships.So,asshame(TOSCA)increases,socialanxietyincreasesandasobsessivebeliefsincreasesodoessocialanxiety;Thebvaluesalsotellustowhatdegreeeachpredictoraffectstheoutcomeiftheeffectsofallotherpredictorsareheldconstant.

Eachofthesebetavalueshasanassociatedstandarderrorindicatingtowhatextentthesevalueswouldvaryacrossdifferentsamples,andthesestandarderrorsareusedtodeterminewhetherornotthebvaluedifferssignificantlyfromzero(usingthet-statistic).Therefore, if thet-testassociatedwithabvalue issignificant (if thevalue inthecolumnlabelledSig. is less than0.05) then thatpredictor ismakinga significantcontribution to themodel.For thismodel,

Model Summaryc

.340a .116 .109 28.38137 .116 16.515 1 126 .000

.396b .157 .143 27.82969 .041 6.045 1 125 .015 2.084

Model12

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change StatisticsDurbin-W

atson

Predictors: (Constant), Shame (TOSCA)a.

Predictors: (Constant), Shame (TOSCA), OCD (Obsessive Beliefs Questionnaire)b.

Dependent Variable: Social Anxiety (SPAI)c.

ANOVAc

13302.700 1 13302.700 16.515 .000a

101493.3 126 805.502114796.0 127

17984.538 2 8992.269 11.611 .000b

96811.431 125 774.491114796.0 127

RegressionResidualTotalRegressionResidualTotal

Model1

2

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Shame (TOSCA)a.

Predictors: (Constant), Shame (TOSCA), OCD (Obsessive Beliefs Questionnaire)b.



Shame(TOSCA),t(125)=3.16,p=.002,andobsessivebeliefs,t(125)=2.46,p=.015,aresignificantpredictorsofsocialanxiety. From themagnitude of the t-statisticswe can see that the Shame (TOSCA) had slightlymore impact thanobsessivebeliefs.Thisconclusionisalsoborneoutbythestandardizedbetavalues,whicharemeasuredinstandarddeviation units and so are directly comparable: the standardized beta values for Shame (TOSCA) is 0.273, and forobsessivebeliefsis0.213.Thistellsusthatshamehasslightlymoreimpactinthemodel.

Output3

Excluded Variables

AteachstageofaregressionanalysisSPSSprovidesasummaryofanyvariablesthathavenotyetbeenenteredintothemodel. Inahierarchicalmodel, this summaryhasdetailsof thevariables thathavebeenspecified tobeentered insubsequentsteps,andinstepwiseregressionthistablecontainssummariesofthevariablesthatSPSSisconsideringenteringintothemodel.Thesummarygivesanestimateofeachpredictor’sbvalueifitwasenteredintotheequationatthispointandcalculatesat-testforthisvalue.Inastepwiseregression,SPSSshouldenterthepredictorwiththehighestt-statisticandwillcontinueenteringpredictorsuntiltherearenoneleftwitht-statisticsthathavesignificancevalueslessthan0.05.Therefore,thefinalmodelmightnotincludeallofthevariablesyouaskedSPSStoenter.

Inthiscaseittellsusthatiftheinterpretationofintrusions(III)isenteredintothemodelitwouldnothaveasignificantimpactonthemodel’sabilitytopredictsocialanxiety,t=–0.049,p=.961.Infactthesignificanceofthisvariableisalmost1 indicating itwouldhavevirtuallyno impactwhatsoever(notealsothat itsbetavalue isextremelyclosetozero!).

Output4

Checking for Bias SPSSproducesasummarytableoftheresidualstatisticsandtheseshouldbeexaminedforextremecases.Output5showsanycasesthathaveastandardizedresiduallessthan−2orgreaterthan2(rememberthatwechangedthedefaultcriterionfrom3to2).Inanordinarysamplewewouldexpect95%ofcasestohavestandardizedresidualswithinabout±2.Wehaveasampleof134,therefore it isreasonabletoexpectabout7cases(5%approx..)tohavestandardizedresidualsoutsideoftheselimits.FromOutput5wecanseethatwehave7cases(5%)thatareoutsideofthelimits:therefore,oursampleisbasicallywhatwewouldexpect. Inaddition,99%ofcasesshouldliewithin±2.5andsowewouldexpectonly1%ofcasestolieoutsideoftheselimits.Fromthecaseslistedhere,itisclearthattwocases(1.5%)

Coefficientsa

-54.368 28.618 -1.900 .060 -111.002 2.26727.448 6.754 .340 4.064 .000 14.081 40.814 .340 .340 .340 1.000 1.000

-51.493 28.086 -1.833 .069 -107.079 4.09422.047 6.978 .273 3.160 .002 8.237 35.856 .340 .272 .260 .901 1.110

6.920 2.815 .213 2.459 .015 1.350 12.491 .299 .215 .202 .901 1.110

(Constant)Shame (TOSCA)(Constant)Shame (TOSCA)OCD (ObsessiveBeliefs Questionnaire)

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound95% Confidence Interval for B

Zero-order Partial PartCorrelations

Tolerance VIFCollinearity Statistics

Dependent Variable: Social Anxiety (SPAI)a.

Excluded Variablesc

.132a

1.515 .132 .134 .917 1.091 .917

.213a

2.459 .015 .215 .901 1.110 .901

-.005b

-.049 .961 -.004 .541 1.849 .531

OCD (Interpretation ofIntrusions Inventory)OCD (ObsessiveBeliefs Questionnaire)OCD (Interpretation ofIntrusions Inventory)

Model1

2

Beta In t Sig.Partial

Correlation Tolerance VIFMinimumTolerance

Collinearity Statistics

Predictors in the Model: (Constant), Shame (TOSCA)a.

Predictors in the Model: (Constant), Shame (TOSCA), OCD (Obsessive Beliefs Questionnaire)b.



lieoutsideofthelimits(cases8and45).Therefore,oursampleappearstoconformroughlytowhatwewouldexpectforafairlyaccuratemodel.Therearealsonostandardizedresidualsgreaterthan3,whichisgoodnews.

WeshouldalsoscanthedataeditortoseeifanycaseshaveCook’sdistance(COO_1)greaterthan1.[YoucouldalsouseSPSStofindthemaximumvalueofCook’sdistancebyusingthedescriptivestatisticscommand].YoushouldfindthatallofCook’sdistancesarebelow1,whichmeansthatnocasesarehavinganundueinfluence.

Output5

Figure8:P-Pplot(topleft),aplotofstandardizedresidualsvs.standardizedpredictedvalues(topright),andpartialplotsofsocialanxietyagainstshame(bottomleft)andOBQ(bottomright)

WecanusehistogramsandP-Pplotstolookfornormalityoftheresiduals.Figure8(topleft)showstheP-Pplotforourmodel. The dots hover fairly close to the diagonal line indicating normality in the residuals. We can look forheteroscedasticity and non-linearity using a plot of standardized residuals against standardized predicted values. IfeverythingisOKthenthisgraphshouldlooklikearandomarrayofdots,ifthegraphfunnelsoutthenthatisasignof

P-P PLot ZResid vs. ZPred

Partial Plot: Shame Partial Plot: OBQ


heteroscedasticityandanycurvesuggestnonlinearity(seeearlier).Figure8(topright)showstheplotforourmodel.Notehowthepointsarerandomlyandevenlydispersedthroughouttheplot.Thispatternisindicativeofasituationinwhichtheassumptionsoflinearityandhomoscedasticityhavebeenmet.ComparethiswiththeexamplesinFigure1.

Figure8alsoshowsthepartialplots,whicharescatterplotsoftheresidualsoftheoutcomevariableandeachofthepredictorswhenbothvariablesareregressedseparatelyontheremainingpredictors.Obviousoutliersonapartialplotrepresentcasesthatmighthaveundueinfluenceonapredictor’sregressioncoefficientandthatnon-linearrelationshipsandheteroscedasticitycanbedetectedusingtheseplotsaswell.Forshame(Figure8bottomleft)thepartialplotshowsthepositiverelationshiptosocialanxiety.Therearenoobviousoutliersonthisplot,butthecloudofdotsisabitfunnel-shaped,possiblyindicatingsomeheteroscedasticity.ForOBQ(Figure8,bottomright)theplotagainshowsapositiverelationshiptosocialanxiety.Therearenoobviousoutliersonthisplot.

Finally,theVIFvaluesarewellbelow10whichreassuresusthatmulticollinearityisnotaproblem.

Writing Up Multiple Regression Analysis Ifyourmodelhasseveralpredictorsthanyoucan’treallybeatasummarytableasaconcisewaytoreportyourmodel.Asabareminimumreportthebetas,theirconfidenceinterval,significancevalueandsomegeneralstatisticsaboutthemodel(suchastheR2).Thestandardizedbetavaluesandthestandarderrorsarealsoveryuseful.So,basically,youwanttoreproducethetablelabelledCoefficientsfromtheSPSSoutputandomitsomeofthenon-essentialinformation.Fortheexampleinthischapterwemightproduceatablelikethatin.

SeeifyoucanlookbackthroughtheSPSSoutputinthischapterandworkoutfromwherethevaluescame.Thingstonoteare:(1)I’veroundedoffto2decimalplacesthroughoutbecausethisisareasonablelevelofprecisiongiventhevariablesmeasured; (2) for thestandardizedbetas there isnozerobefore thedecimalpoint (because thesevaluesshouldn’texceed1)butforallothervalueslessthan1thezeroispresent;(3)oftenyou’llseethesignificanceofthevariableisdenotedbyanasteriskwithafootnotetoindicatethesignificancelevelbeingusedbutit’sbetterpracticetoreportexactp-values;(4)theR2fortheinitialmodelandthechangeinR2(denotedas∆R2)foreachsubsequentstepofthemodelarereportedbelowthetable;and(5)inthetitleIhavementionedthatconfidenceintervalsandstandarderrorsinthetablearebasedonbootstrapping:thisinformationisimportantforreaderstoknow

Table1:Linearmodelofpredictorsofsocialanxiety(SPAI).95%confidenceintervalsreportedinparentheses.

b SEB b pStep1 Constant -54.37

(-111.00,2.27)28.62 p=.06

Shame(TOSCA) 27.45(14.08,40.81)

6.75 .34 p<.001

Step2 Constant −51.49

(-107.08,4.09)28.09 p=.069

Shame(TOSCA) 22.05(8.24,35.86)

6.98 .27 p=.002

OCD(OBQ) 6.92(1.35,12.49)

2.82 .21 p=.015

Note.R2=.12forStep1:∆R2=.04forStep2(ps<.05).

Tasks Task 1 Afashionstudentwasinterestedinfactorsthatpredictedthesalariesofcatwalkmodels.Shecollecteddatafrom231models.Foreachmodelsheaskedthemtheirsalaryperdayondayswhentheywereworking(salary),theirage(age),howmanyyearstheyhadworkedasamodel(years),andthengotapanelofexpertsfrommodellingagenciestoratetheattractivenessofeachmodelasapercentagewith100%beingperfectlyattractive(beauty).ThedataareinthefileSupermodel.savonthecoursewebsite.Conductamultipleregressiontoseewhichfactorspredictamodel’ssalary?(Answerstothistaskcanbefoundatwww.uk.sagepub.com/field4e/study/smartalex/chapter8.pdf).


Howmuchvariancedoesthefinalmodelexplain?

YourAnswers:

Whichvariablessignificantlypredictsalary?

YourAnswers:

FillinthevaluesforthefollowingAPAformattableoftheresults:

b SEb b p

Constant

Age

YearsasaModel

Attractiveness

Note.R2=

Writeouttheregressionequationforthefinalmodel.

YourAnswers:

Aretheresidualsasyouwouldexpectforagoodmodel?

YourAnswers:

Isthereevidenceofnormalityoferrors,homoscedasticityandnomulticollinearity?


YourAnswers:

Task 2 Coldwell,PikeandDunn(2006)investigatedwhetherhouseholdchaospredictedchildren’sproblembehaviouroverandaboveparenting.Theycollecteddatafrom118two-parentfamilies.Foreachfamilytheyrecordedtheageandgenderofboththeolderandyoungersibling;age_child1,gender_child1,age_child1andgender_child2respectively.Theytheninterviewed each child about their relationship with their parent’s using the Berkeley Puppet Interview (BPI). Theinterviewmeasuredeachchild’srelationshipwitheachparentalongtwodimensions:(1)warmth/enjoyment,and(2)anger/hostility.Higherscoresindicatemoreanger/hostilityandwarmth/enjoymentrespectively.EachparentwastheninterviewedabouttheirrelationshipwitheachoftheirchildrenusingTheParent-childRelationshipScale.Thisresultedinscoresforparent-childrelationshippositivityandparent-childrelationshipnegativity.Overall,thesemeasuresresultinalotofvariables:

Mum Dad

Measures Child1 Child2 Child1 Child2

Warmth/Enjoyment mum_warmth_child1 mum_warmth_child2 dad_warmth_child1 dad_warmth_child2

Anger/Hostility mum_anger_child1 mum_anger_child2 dad_anger_child1 dad_anger_child2

PositiveRelationship

mum_pos_child1 mum_pos_child2 dad_pos_child1 dad_pos_child2

NegativeRelationship

mum_neg_child1 mum_neg_child2 dad_neg_child1 dad_neg_child2

Household chaos (chaos) was assessed using the Confusion, Hubbub, And Order Scale (CHAOS). There were twooutcomevariables (one for each child) thatmeasured children’s adjustment (sdq_child1 and sdq_child2) using theStrengthsandDifficultiesQuestionnaire:thehigherthescore,themoreproblembehaviourthechildisreportedtobedisplaying.

ThedataareinthefileCHAOS.savonthecoursewebsite.Totestwhetherhouseholdchaoswaspredictiveofchildren’sproblembehaviouroverandaboveparenting,conductfourhierarchicalregressions:

(1) Maternalrelationshipwithchild1(2) Maternalrelationshipwithchild2(3) Paternalrelationshipwithchild1(4) Paternalrelationshipwithchild2

Eachhierarchicalregressionconsistsofthreesteps.First,enterchildageandchildgenderascontrolvariables.Inthesecondstepaddthevariablesmeasuringparent-childpositivity,parent-childnegativity,parent-childwarmth,parent-childanger.Finally,inthethirdstep,chaosshouldbeadded.Thecrucialtestofthehypothesisliesinthefinalstep.Toconfirmthathouseholdchaosispredictiveofchildren’sproblembehaviouroverandaboveparenting,thisthirdstepmustresultinasignificantR2change.


Whatconclusionscanyoudrawfromtheseanalyses?

YourAnswers:

Look at Coldwell, J., Pike, A.&Dunn, J. (2006).Household chaos - linkswith parenting and childbehaviour.JournalofChildPsychologyandPsychiatry,47,1116-1122.(Onthecoursewebsite).Howdo your results and interpretation compare to those reported?Reflect uponhowyouhaveusedregressionasatooltoansweranimportantpsychologicalquestion.

YourAnswers:

FillinthevaluesforthefollowingAPAformattableoftheresults:

Mother-childrelationship Father-childrelationship

Oldersibling

SDQ

Youngersibling

SDQ

Oldersibling

SDQ

Youngersibling

SDQ

TotalR2= TotalR2= TotalR2= TotalR2=

bDR2 bDR2 bDR2 bDR2

Step1

Childage

Childgender


Step2

Childage

Childgender

Childrptparent-childpositivity

Childrptparent-childnegativity

Parentrptparent-childpositivity

Parentrptparent-childnegativity

Step3

Childage

Childgender

Childrptparent-childpositivity

Childrptparent-childnegativity

Parentrptparent-childpositivity

Parentrptparent-childnegativity

CHAOS

*p<.05,**p<.01,***p<.001

Task 3 Complete the multiple choice questions for Chapter 8 on the companion website to Field (2013):https://studysites.uk.sagepub.com/field4e/study/mcqs.htm.Ifyougetanywrong,re-readthishandout(orField,2013,Chapter8)anddothemagainuntilyougetthemallcorrect.

Task 4 Gobacktotheoutputforlastweek’stask(doeslisteningtoheavymetalpredictsuiciderisk).Isthemodelvalid(i.e.arealloftheassumptionsmet?)?

References Berry,W.D.(1993).Understandingregressionassumptions.Sageuniversitypaperseriesonquantitativeapplicationsin

thesocialsciences,07–092.NewburyPark,CA:Sage.

Cook,R.D.,&Weisberg,S.(1982).Residualsandinfluenceinregression.NewYork:Chapman&Hall.

Durbin,J.,&Watson,G.S.(1951).Testingforserialcorrelationinleastsquaresregression,II.Biometrika,30,159-178.

Field,A.P.(2013).DiscoveringstatisticsusingIBMSPSSStatistics:Andsexanddrugsandrock'n'roll(4thed.).London:Sage.

Terms of Use Thishandoutcontainsmaterialfrom:

Field,A.P.(2013).DiscoveringstatisticsusingSPSS:andsexanddrugsandrock‘n’roll(4thEdition).London:Sage.

ThismaterialiscopyrightAndyField(2000-2016).

This document is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 InternationalLicense (https://creativecommons.org/licenses/by-nc-nd/4.0/), basically you can use it for teaching and non-profitactivitiesbutnotmeddlewithitwithoutpermissionfromtheauthor.

Documents

Linear Models: Looking for Bias - · PDF fileLinear Models: Looking for Bias ... the effect of a single case on the model as a whole. Cook and Weisberg (1982) ... The size of the Durbin-Watson