47
Categorical Independent Variables And multiple comparisons

Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

CategoricalIndependentVariables

Andmultiplecomparisons

Page 2: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

One‐wayAnalysisofvariance

•  CategoricalIV•  QuantitativeDV•  pcategories(groups)•  H0:Allpopulationmeansequal

•  Normalconditionaldistributions•  Equalvariances

Page 3: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Analysismeanstosplitup

•  WithnoIV,bestpredictoristheoverallmean

•  VariationtobeexplainedisSSTO,sumofsquareddifferencesfromtheoverallmean

•  WithanIV,bestpredictoristhegroupmean•  VariationstillunexplainedisSSW,sumofsquareddifferencesfromthegroupmeans

Page 4: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

SSTO=SSB+SSW

Page 5: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical
Page 6: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

ANOVASummaryTable

Sum ofSource DF Squares Mean Square F Value Pr > F

Model p! 1 SSB MSB = SSB/(p! 1) MSB/MSW p-value

Error n! p SSW MSW = SSW/(n! p)

Corrected Total n! 1 SSTO

H0 : µ1 = . . . = µp

Page 7: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

R2istheproportionofvariationexplainedbytheindependentvariable

Page 8: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Contrasts

c = a1µ1 + a2µ2 + · · · + apµp

!c = a1Y 1 + a2Y 2 + · · · + apY p

Page 9: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

OverallF‐testisatestofp‐1contrasts

c = a1µ1 + a2µ2 + · · · + apµp

Page 10: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

MultipleComparisons

•  Mosthypothesistestsaredesignedtobecarriedoutinisolation

•  Butifyoudoalotoftestsandallthenullhypothesesaretrue,thechanceofrejectingatleastoneofthemcanbealotmorethanα.ThisisinflationoftheTypeIerrorrate.

•  Multiplecomparisons(sometimescalledfollow‐uptests,posthoctests,probing)offerasolution.

Page 11: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Multiplecomparisons

•  ProtectafamilyoftestsagainstTypeIerroratsomejointsignificancelevelα

•  Ifallthenullhypothesesaretrue,theprobabilityofrejectingatleastoneisnomorethanα

Page 12: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Multiplecomparisonsofcontrastsinaone‐waydesign:Assumeallmeansare

equalinthepopulation

•  Bonferroni•  Tukey•  Scheffé

Page 13: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Bonferroni

•  BasedonBonferroni’sinequality

•  Appliestoanycollectionofktests•  Assumeallknullhypothesesaretrue•  EventAjisthatnullhypothesisjisrejected.•  Dothetestsasusual•  RejecteachH0ifp<0.05/k•  Or,adjustthep‐values.Multiplythembyk,andrejectifpk<0.05

Pr!!k

j=1Aj

""

k#

j=1

Pr{Aj}

Page 14: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Bonferroni

•  Advantage:Flexibility•  Advantage:Easytodo

•  Disadvantage:Mustknowwhatallthetestsarebeforeseeingthedata

•  Disadvantage:Alittleconservative;thetruejointsignificancelevelislessthanα.

Page 15: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Tukey(HSD)

•  Basedonthedistributionofthelargestmeanminusthesmallest.

•  Appliesonlytopairwisecomparisonsofmeans

•  Ifsamplesizesareequal,it’smostpowerful,period

•  Ifsamplesizesarenotequal,it’sabitconservative

Page 16: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Scheffé

•  Findtheusualcriticalvaluefortheinitialtest.Multiplybyp‐1.ThisistheScheffécriticalvalue.

•  Familyincludesallcontrasts:Infinitelymany!•  Youdon’tneedtospecifytheminadvance•  Basedontheunion‐intersectionprinciple–moredetailslater,afterF‐tests.

Page 17: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Scheffé

•  Follow‐uptestscannotbesignificantiftheinitialoveralltestisnot.NotquitetrueofBonferroniandTukey.

•  Iftheinitialtest(ofp‐1contrasts)issignificant,thereisasinglecontrastthatissignificant(notnecessarilyapairwisecomparison)

•  Adjustedp‐valueisthetailareabeyondFtimes(p‐1)

Page 18: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Whichmethodshouldyouuse?

•  Ifthesamplesizesarenearlyequalandyouareonlyinterestedinpairwisecomparisons,useTukeybecauseit'smostpowerful

•  Ifthesamplesizesarenotclosetoequalandyouareonlyinterestedinpairwisecomparisons,thereis(amazingly)noharminapplyingallthreemethodsandpickingtheonethatgivesyouthegreatestnumberofsignificantresults.(It’sokaybecausethischoicecouldbedeterminedinadvancebasedonnumberoftreatments,αandthesamplesizes.)

Page 19: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

•  Ifyouareinterestedincontraststhatgobeyondpairwisecomparisonsandyoucanspecifyallofthembeforeseeingthedata,BonferroniisalmostalwaysmorepowerfulthanScheffé.(Tukeyisout.)

•  Ifyouwantlotsofspecialcontrastsbutyoudon'tknowexactlywhattheyallare,Schefféistheonlyhonestwaytogo,unlessyouhaveaseparatereplicationdataset.

Page 20: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

DummyVariables

•  X=1meansDrug,X=0meansPlacebo

•  Populationmeanis

•  Forpatientsgettingthedrug,meanresponseis

•  Forpatientsgettingtheplacebo,meanresponseis

Yi = !0 + !1xi,1 + "i

Page 21: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

RegressiontestofH0:β1=0

•  Sameasanindependentt‐test•  SameasaonewayANOVAwith2categories

•  Samet,sameF,samep‐value.

Page 22: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

DrugA,DrugB,Placebo•  x1=1ifDrugA,Zerootherwise•  x2=1ifDrugB,Zerootherwise• 

Regressioncoefficientsarecontrastswiththecategorythathasnoindicator‐Thereferencecategory.

H0 : µ1 = µ2 = µ3 ! !1 = !2 = 0

Page 23: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Indicatordummyvariablecodingwithintercept

•  Needp‐1indicatorstorepresentacategoricalIVwithpcategories

•  Ifyouusepdummyvariables,trouble•  Regressioncoefficientsarecontrastswiththecategorythathasnoindicator

•  Callthisthereferencecategory

Page 24: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Nowaddaquantitativevariable(covariate)

•  x1=Age•  x2=1ifDrugA,Zerootherwise•  x3=1ifDrugB,Zerootherwise• 

Page 25: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Whatdoyoureport?•  x1=Age•  x2=1ifDrugA,Zerootherwise•  x3=1ifDrugB,Zerootherwise• 

Page 26: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Setallcovariatestotheirsamplemeanvalues

•  AndcomputeY‐hatforeachgroup•  Callitan“adjusted”mean,orsomethinglike“averageuniversityGPAadjustedforHighSchoolGPA.”

•  SAScallsitaleastsquaresmean(lsmeans)

Page 27: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Test whether the average response to Drug A and Drug B is different from response to the placebo, controlling for age. What is the null hypothesis?

Page 28: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Show your work

We want to avoid this kind of thing. It can get complicated.

Page 29: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

A common error

•  Categorical IV with p categories •  p dummy variables (rather than p-1) •  And an intercept

•  There are p population means represented by p+1 regression coefficients - not unique

Page 30: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

But suppose you leave off the intercept

•  Now there are p regression coefficients and p population means

•  The correspondence is unique, and the model can be handy -- less algebra

•  Called cell means coding

Page 31: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Cell means coding: p indicators and no intercept

Page 32: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Add a covariate: x4

Page 33: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Effect coding •  p-1 dummy variables for p categories •  Include an intercept •  Last category gets -1 instead of zero •  What do the regression coefficients

mean?

Page 34: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Meaning of the regression coefficients

Page 35: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

With effect coding •  Intercept is the Grand Mean •  Regression coefficients are deviations of

group means from the grand mean •  Equal population means is equivalent to zero

coefficients for all the dummy variables •  Last category is not a reference category

Page 36: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Sometimes speak of the “main effect” of a categorical variable

•  More than one categorical IV (factor) •  Marginal means are average group mean,

averaging across the other factors •  This is loose speech: There are actually p

main effects for a variable, not one •  Blends the “effect” of an experimental

variable with the technical statistical meaning of effect.

•  It’s harmless

Page 37: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Add a covariate: Age = x1

Regression coefficients are deviations from the average conditional population mean (conditional on x1).

So if the regression coefficients for all the dummy variables equal zero, the categorical IV is unrelated to the DV, controlling for the covariates.

Page 38: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

We will see later that effect coding is very useful when there is more than one categorical independent variable and we are interested in interactions --- ways in which the relationship of an independent variable with the dependent variable depends on the value of another independent variable.

Page 39: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

What dummy variable coding scheme should you use?

•  Whichever is most convenient •  They are all equivalent, if done correctly •  Same test statistics, same conclusions

Page 40: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Interactions

•  Interactionbetweenindependentvariablesmeans“Itdepends.”

•  RelationshipbetweenoneIVandtheDVdependsonthevalueofanotherIV.

•  Canhave– Quantitativebyquantitative– Quantitativebycategorical– Categoricalbycategorical

Page 41: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

QuantitativebyQuantitative

Y = !0 + !1x1 + !2x2 + !3x1x2 + "

E(Y |x) = !0 + !1x1 + !2x2 + !3x1x2

Forfixedx2

E(Y |x) = (!0 + !2x2) + (!1 + !3x2)x1

Bothslopeandinterceptdependonvalueofx2

Andforfixedx1,slopeandinterceptrelatingx2toE(Y)dependonthevalueofx1

Page 42: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

QuantitativebyCategorical

•  Interactionmeansslopesarenotparallel•  Formaproductofquantitativevariablebyeachdummyvariableforthecategoricalvariable

•  Forexample,threetreatmentsandonecovariate:x1isthecovariateandx2,x3aredummyvariables

Y = !0 + !1x1 + !2x2 + !3x3

+!4x1x2 + !5x1x3 + "

Page 43: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Generalprinciple

•  InteractionbetweenAandBmeans– RelationshipofAtoYdependsonvalueofB– RelationshipofBtoYdependsonvalueofA

•  Thetwostatementsareformallyequivalent

Page 44: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

E(Y |x) = !0 + !1x1 + !2x2 + !3x3 + !4x1x2 + !5x1x3

Group x2 x3 E(Y |x)1 1 0 (!0 + !2) + (!1 + !4)x1

2 0 1 (!0 + !3) + (!1 + !5)x1

3 0 0 !0 + !1 x1

Page 45: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

Group x2 x3 E(Y |x)1 1 0 (!0 + !2) + (!1 + !4)x1

2 0 1 (!0 + !3) + (!1 + !5)x1

3 0 0 !0 + !1 x1

Whatnullhypothesiswouldyoutestfor

•  Parallelslopes•  Compareslopesforgrouponevsthree

•  Compareslopesforgrouponevstwo

•  Equalregressions•  Interactionbetweengroupandx1

Page 46: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

WhattodoifH0:β4=β5=0isrejected

•  HowdoyoutestGroup“controlling”forx1?•  Agoodchoiceistosetx1toitssamplemean,andcomparetreatmentsatthatpoint.

•  Howaboutsettingx1tosamplemeanofthegroup(3differentvalues)?

•  WithrandomassignmenttoGroup,allthreemeansjustestimateE(X1),andthemeanofallthex1valuesisabetterestimate.

Page 47: Categorical Independent Variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · Indicator dummy variable coding with intercept • Need p‐1 indicators to represent a categorical

CategoricalbyCategorical

•  Soon•  Butfirst,anexample