4
© Prof. Andy Field, 2012 www.discoveringstatistics.com Page 1 DSUR Errata Wilcox Functions Wilcox’s website address has changed since the book was published. Latest versions of the functions for robust analysis by Wilcox are available by executing: source("http://dornsife.usc.edu/assets/sites/239/docs/Rallfun-v26.txt") Code changes/Package Updates. Chapter 4 (ggplot2): After the book was published Hadley Wickham updated ggplot2, and some of the syntax changed considerably (see http://docs.ggplot2.org/current/). Please let me know of anything that doesn’t work, but here are a few problems that I know about already. Line graphs not working This is a bug introduced in ggplot2 0.9.3 There will likely be a fix soon (a version 0.9.3.1). In the meantime, a temporary fix can be found by executing (I didn’t write this fix and it could create other problems) 1 . See https://github.com/hadley/ggplot2/issues/732 install.packages("devtools") library(devtools) source_gist("https://gist.github.com/4578531") Page 155 (R´s Souls´ Tip 4.3): scale_fill_manual ("Gender", c("Female" = "Blue", "Male"="Green")) should be scale_fill_manual ("Gender", values= c("Female" = "Blue", "Male"="Green")). [Thanks Steffen Wild]. The opts() function is depreciated and has been replaced by the theme() function. This has implications for anything in the chapter that uses opts(). There is a very good transition guide to help you transfer from opts() to theme() here. Needless to say I will have to update the chapter/code at some point. If you correct any code then please email it to me if you feel so inclinedJ To get rid of the legend use theme(legend.position = "none") instead of opts(). P. 156: the factor() function has changed, so you’ll get an error using: hiccups$Intervention_Factor<-factor(hiccups$Intervention, levels = hiccups$Intervention) Instead, you need to execute this (to order the levels as they are in the book rather than alphabetic): hiccups$Intervention_Factor<-factor(hiccups$Intervention, levels(hiccups$Intervention)[c(1, 4, 2, 3)]) P. 199 (R’s Souls’ Tip 5.4): the final command: dlf$meanHygiene<-ifelse(dlf$daysMissing < 2, NA, rowMeans(cbind(dlf$day1, dlf$day2, dlf$day3), na.rm = TRUE)) should be (note the position of NA – it has moved to the end of the command): dlf$meanHygiene<-ifelse(dlf$daysMissing < 2, rowMeans(cbind(dlf$day1, dlf$day2, dlf$day3), na.rm = TRUE), NA) 1 I have used this patch on three different machines (Macs) and had no issues at all. However, Isaac van Patten emailed to say that the patch had messed up his system. He said he: “ … had to delete R 2.15.2 altogether and reinstall it. What will work is to remove ggplot2 0.9.3 from the library and then go to the archives and load ggplot2 0.9.1 from the source code … using the older version it draws the graphs as needed.” Like I said, it’s not my patch, so use it at your own risk. It works fine for me, but it can cause problems. See

DSUR Errata - Discovering StatisticsDSUR Errata Wilcox Functions ... This contradicts Gramming Sam's tips on page 491 where in the third point it is written "you need to ... Thanks

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DSUR Errata - Discovering StatisticsDSUR Errata Wilcox Functions ... This contradicts Gramming Sam's tips on page 491 where in the third point it is written "you need to ... Thanks

©Prof.AndyField,2012 www.discoveringstatistics.com Page1

DSUR Errata Wilcox Functions Wilcox’s website address has changed since the book was published. Latest versions of the functions for robustanalysisbyWilcoxareavailablebyexecuting:

source("http://dornsife.usc.edu/assets/sites/239/docs/Rallfun-v26.txt")

Code changes/Package Updates. • Chapter 4 (ggplot2): After the bookwas publishedHadleyWickhamupdated ggplot2, and some of the syntax

changedconsiderably(seehttp://docs.ggplot2.org/current/).Pleaseletmeknowofanythingthatdoesn’twork,buthereareafewproblemsthatIknowaboutalready.

• LinegraphsnotworkingThisisabugintroducedinggplot20.9.3Therewilllikelybeafixsoon(aversion0.9.3.1).In themeantime, a temporary fix can be found by executing (I didn’t write this fix and it could create otherproblems)1.Seehttps://github.com/hadley/ggplot2/issues/732

install.packages("devtools")

library(devtools)

source_gist("https://gist.github.com/4578531")

• Page 155 (R´s Souls´ Tip 4.3): scale_fill_manual ("Gender", c("Female" = "Blue", "Male"="Green")) should bescale_fill_manual("Gender",values=c("Female"="Blue","Male"="Green")).[ThanksSteffenWild].

• The opts() function is depreciated and has been replaced by the theme() function. This has implications foranythinginthechapterthatusesopts().Thereisaverygoodtransitionguidetohelpyoutransferfromopts()totheme()here.NeedlesstosayIwillhavetoupdatethechapter/codeatsomepoint.IfyoucorrectanycodethenpleaseemailittomeifyoufeelsoinclinedJTogetridofthelegendusetheme(legend.position="none")insteadofopts().

• P.156:thefactor()functionhaschanged,soyou’llgetanerrorusing:

hiccups$Intervention_Factor<-factor(hiccups$Intervention, levels = hiccups$Intervention)

Instead,youneedtoexecutethis(toorderthelevelsastheyareinthebookratherthanalphabetic):

hiccups$Intervention_Factor<-factor(hiccups$Intervention, levels(hiccups$Intervention)[c(1, 4, 2, 3)])

• P.199(R’sSouls’Tip5.4):thefinalcommand:

dlf$meanHygiene<-ifelse(dlf$daysMissing < 2, NA, rowMeans(cbind(dlf$day1, dlf$day2, dlf$day3), na.rm = TRUE))

shouldbe(notethepositionofNA–ithasmovedtotheendofthecommand):

dlf$meanHygiene<-ifelse(dlf$daysMissing < 2, rowMeans(cbind(dlf$day1, dlf$day2, dlf$day3), na.rm = TRUE), NA)

1 I have used this patch on three different machines (Macs) and had no issues at all. However, Isaac van Pattenemailedtosaythatthepatchhadmesseduphissystem.Hesaidhe:

“…hadtodeleteR2.15.2altogetherandreinstallit.Whatwillworkistoremoveggplot20.9.3fromthelibraryandthengotothearchivesandloadggplot20.9.1fromthesourcecode…usingtheolderversionitdrawsthegraphsasneeded.”

LikeIsaid,it’snotmypatch,souseitatyourownrisk.Itworksfineforme,butitcancauseproblems.See

Page 2: DSUR Errata - Discovering StatisticsDSUR Errata Wilcox Functions ... This contradicts Gramming Sam's tips on page 491 where in the third point it is written "you need to ... Thanks

©Prof.AndyField,2012 www.discoveringstatistics.com Page2

• P.216(thecor()function):IfyouthrowtheexamDatadataframeintocor()you’llgetanerrorsayingthatxmustbe numeric. The problem is the gender variable, which is a non-numeric factor (Male and Female). The wayaroundthis,istoeitherselectonlythefirst4variablesofthedataframe(thenumericvariables):

cor(examData[,1:4])

YoucouldalsoconvertGendertoa0,1dummycodedvariable,thenruncor()onthewholedataframe(inwhichcasecorrelationsinvolvinggenderwillbethepointbiserialcorrelations).Inthecodebelowas.numeric()convertstheGendervariabletonumbers,butRwilluse1and2bydefault,sothetheminus1changesthesevaluesto0and1asperdummycoding:

> examData$Gender<-as.numeric(examData$Gender)-1 > cor(examData)

• P.226(Section6.5.7):bootTau<-function(liarData,i)cor(liarData…won’trunwithoutaspacebeforecor,thereisaspaceinthebookbutbecauseofthetypesettingthatisn’tnecessarilyclear.It’ssafertobracketthefunction{},soyoucouldwritethisfunctionas(ThanksJanDittrich):

bootTau<- function(liarData,i){cor(liarData … etc. )}

• P.235(section6.6.2):arequireddependencyfortheggmpackageisnolongersupportedbyCRAN–thegraphpackage isno longeravailable.It isbeingmaintainedatBioconductor.orgbutrequires individualdownloadandinstallation.ItalsorequiressomeotherdependenciesfromBiocoductor,BiocGenerics&RBGL,tobedownloadedandinstalledinyourlibraryfolder.Todothisexecute:

source("http://bioconductor.org/biocLite.R")

biocLite(c("BiocGenerics", "RBGL"))

install.packages("ggm")

library(ggm)

Oncethiswasdone, thematerial inSection6.6.2willwork.Without ityoucannot load theggmpackage.Oneotherthingthatisnotobviousisadataframemustjustbethevariablesincludedinthepartialcorrelationforthevar() argument (e.g. – it’ll choke if you forget to stripout the subjectnumbers!). [Thanks, IsaacT.VanPatten,RadfordUniversityandJeffP.]

• P.299:bootReg<-function(formula,data,indices)o IndicesshouldbeItomatchthedata[i,]twolinesbelow.Thecodesampleiscorrect,justthebookthat’s

wrong.• P.895:Growthmodels.IfyouusethefileHoneymoonPeriodRestructured.saveverythingwillbefine.However,

ifyouusetheHoneymoonPeriod.datfileandrestructurethedatainR(usingmelt())thenyouwillgetanerrormessageresultingromthefactthatthevariableTimeistreatedasafactorratherthananumericvariable.INmycodesamplethedataarepreparedasfollows:

satisfactionData = read.delim("Honeymoon Period.dat", header = TRUE)

restructuredData<-melt(satisfactionData, id = c("Person", "Gender"), measured = c("Satisfaction_Base", "Satisfaction_6_Months", "Satisfaction_12_Months", "Satisfaction_18_Months"))

names(restructuredData)<-c("Person", "Gender", "Time", "Life_Satisfaction")

restructuredData$Time<-as.numeric(restructuredData$Time)-1

However,inthebook,Idon’ttalkaboutthisindetail(becauseofspace)andIreallyshouldhaveflaggedtheneedforthe final line because it converts time into a numeric variable. In fact, I also subtract 1 from the numeric valuesbecausetheas.numeric()functionwillconverttheTimefactorintovaluesof1,2,3,4andIwantthemtobe0,1,2,3(becausethebaselinevalueoftimeisameaningfulzeropoint).

Typos

Page 3: DSUR Errata - Discovering StatisticsDSUR Errata Wilcox Functions ... This contradicts Gramming Sam's tips on page 491 where in the third point it is written "you need to ... Thanks

©Prof.AndyField,2012 www.discoveringstatistics.com Page3

• Page14line11.'j14the'>>>'the'• -page58,subsectionStatisticalpower: "...as longasweknowthreeof theseproperties" - shouldn't thismean

"...twooftheseproperties.."?• Page194:dltshouldreaddlf(thanksBastianWimmer):

• Page212(thirdvariableproblem):ReferencetoJaneSuperbrainBox1.1shouldbe1.4.• -page218:withinthetwolastcor.test()functionsthereisabrackettoomuchafter"less"• -page291:Theparentheseswithin the formulacalculating theaverage leverage iswrong, it shouldbe (k+1)/n

ratherthank+1/n.• Page299,line3and5fromthebottom.advert>>>advertstime.>>>time).• Page224,line3and6fromthetop,miss-typing.liarData=>>>liarData<-• Page329:VariablenameCuredshouldsayIntervention.• Page379,line12fromthetop.statistics).------>statistics.• Page382,line4fromparagraph2.-40and47------>40and47• Page388:Equation’sequalsignisomitted.• page415,line6:Thereshouldnotbeadoubledot• page428,heading:"hoc"shouldbealsoinitalicletters• page455,below:calculates.esdoesnotexist,thenameiscompute.es;-)• page472,"...robustversionofANOVA,..."shouldberather"...robustversionofANCOVA,..."• page474,R'sSouls'Tip:"...totheeffectsintheroverallANOVA..."shouldbealso"...ANCOVA..."• page475,JaneSuperbrain:Firstsentence:AsfarasIseetypeIVsumsofsquareshasnotbeenintroduced• page476, lastparagraphof JaneSuperbrain:"...mainchoice inANOVAdesigns isbetweenType IIandType III

sums..." This contradictsGrammingSam's tipsonpage491where in the thirdpoint it iswritten "youneed todecidewhethertouseTypeIorTypeIIIsumsofsquares"

• page482,R-code:shouldbe"plot(viagraModel)"insteadof"plots(viagraModel)"• page488,lastparagraph:Whatisthesmall"x"?Shouldthisbethecapital"X"appearinginthesentencebefore?• page493,fourthR-code"mes(5.988117,...)".Youhavetakenthewrongvalueshere,thesearenotthemeanand

theadjustedmeanbutthevaluesofthe95%confidenceintervalshowninoutput11.4• page 537, last R-code: This works (at least atmy PC) only in case we have additionally specified "est=mom".

Otherwise,onlyNA'sareshown.• page538,Output12.8:therighthandsideoftheoutputiscompletelymissing• page543:"calculate.es"shouldratherbe"compute.es"• page556,Figure13.2:onthesecondlevel,SS_BrespectivelySS_Wisonetimeabove,onetimebelow• page 562 "General procedure...", point 4: "Depending on what you find in the previous step.." it is ton the

previousstepbutthestepbeforethat• page566,R-code(lastline):Shouldthisbeinblue,orisitratherapartofoutput13.1?• page579:Iamnotsureifthe"hatPsi"symbolhasbeenintroducedyet• page595,firstR-code:Thisshouldbenamed"drinkModel",the"baseline"hasbeenalreadydefinedbefore• page661,output15.2:Ithinkweneedthepackage"car"toperformtheLevene'sTest.Thispackagehasnotbeen

mentionedatthebeginofthechapter

Page 4: DSUR Errata - Discovering StatisticsDSUR Errata Wilcox Functions ... This contradicts Gramming Sam's tips on page 491 where in the third point it is written "you need to ... Thanks

©Prof.AndyField,2012 www.discoveringstatistics.com Page4

• page678,secondlastparagraph:"Output15.8showsthattheKruskal-Wallistest..."shouldberather"...Shapiro-Wilktest..."

• page689, last paragraph: "Friedman'sANOVA is significant..." shouldbe replacedby "The Shapiro-Wilk Test issignificant..."

• page727,Figure16.6:Inthegraphonthelefthandsidebelow:Shouldbetheoutlier(26)inblue?• page768,fourthlastline:Thenameofthefileis"raq.dat"insteadof"RAQ.dat"• page 778, "R's Souls' Tip": You should change the names "pc2" to "pc1", since these models are the same

comparedtothepc1modelsaboveonthispage.pc2incontrastisdefinedobpage781asthererunofpc1usingonlyrelevantfactors.

• page783,R-codeafterlastsentenceonthispage:Shouldbeinbluecolorandseparatedfromtheoutput.• page788,inthemiddleofthepage:R-command"pc2"shouldbechangedto"pc3"• -page818,assumption2forthechi-squaretest:Therearerulesregardingfrequencies>5or<5inthetwofirst

sentence,thisexcludesthecase=5.Sowhathappensifallfrequenciesequalto5?• P.818,line3frombottom,beginswith"catData".Subsequently,whenIrefertothisdata-frame(e.g.,onp.821,

line7frombottom),Icallit"catsData".Imeanttocallit"catsData"throughout.[RonaldWyllys]• page839,R-codes:Hereyousuddenlyuse"="insteadof"<-"todefineobjects.Acommentwouldbeniceif"="

worksalwaysanalogouslyto"<-"• page845,Figure18.5:Thetitlewithinthefigureiswrong,itshouldbe"Cats:Expectedvalues"

Thankstoeveryonespottingmistakes,[email protected]