Rerandomization,to,Improve, CovariateBalancein, Randomized...

Preview:

Citation preview

Rerandomization  to  Improve  Covariate  Balance  in  

Randomized  Experiments  Kari  Lock  

Harvard  Statistics  Advisor:  Don  Rubin  

4/28/11  

•   Randomized  experiments  are  the  “gold  standard”  for  estimating  causal  effects  

•   WHY?        

1.  They  yield  unbiased  estimates    2.  They  eliminate  confounding  factors  

The  Gold  Standard  

R  R   R   R  

R   R   R   R  

R   R   R   R  

R   R   R   R  

R   R   R   R  

R   R   R   R  

R   R   R   R  

R   R   R   R  

R   R   R  

R   R   R   R   R  

R   R   R   R   R  

R   R   R   R   R  

R   R   R  

R   R   R   R   R  

R   R   R   R   R  

R   R   R   R   R  

 Randomize  

R  R   R   R   R   R   R   R  

12  Females,  8  Males   8  Females,  12  Males  5  Females,  15  Males   15  Females,  5  Males  

•   Suppose  you  get  a  “bad”  randomization  •   What  would  you  do???  •   Can  you  rerandomize???    When?    How?    

Covariate  Balance  -­‐  Gender  

Rubin:  What  if,  in  a  randomized  experiment,  the  chosen  randomized  allocation  exhibited  substantial  imbalance  on  a  prognostically  important  baseline  covariate?  

Cochran:  Why  didn't  you  block  on  that  variable?  

Rubin:  Well,  there  were  many  baseline  covariates,  and  the  correct  blocking  wasn't  obvious;  and  I  was  lazy  at  that  time.  Cochran:  This  is  a  question  that  I  once  asked  Fisher,  and  his  reply  was  unequivocal:  

Fisher  (recreated  via  Cochran):  Of  course,  if  the  experiment  had  not  been  started,  I  would  rerandomize.  

The  Origin  of  the  Idea  

•   The  more  covariates,  the  more  likely  at  least  one  covariate  will  be  imbalanced  across  treatment  groups  

•   With  just  10  independent  covariates,  the  probability  of  a  signiWicant  difference  (at  level  α  =  .05)  for  at  least  one  covariate  is  1  –  (1-­‐.05)10  =  40%!  

•   Covariate  imbalance  is  not  limited  to  rare  “unlucky”  randomizations    

Covariate  Imbalance  

•   We  all  know  that  randomized  experiments  are  the  “gold  standard”  for  estimating  causal  effects  

•   WHY?        

1.  They  yield  unbiased  estimates    2.  They  eliminate  confounding  factors  

…  on  average!        For  any  particular  experiment,  covariate  imbalance  is  possible  (and  likely!),  and  conditional  bias  exists  

The  Gold  Standard  

Randomize  subjects  to  treatment  and  control  

Collect  covariate  data  Specify  criteria  determining  when  a  randomization  is  unacceptable;  based  

on  covariate  balance  

(Re)randomize  subjects  to  treatment  and  control  

Check  covariate  balance  

1)  

2)  

Conduct  experiment  

unacceptable   acceptable  

Analyze  results    (with  a  randomization  test)  

3)  

4)  

RERANDOMIZATION  

Theorem:  If  the  treatment  groups  are  exchangeable  for  each  randomization  and  for  the  rerandomization  criteria,  then  rerandomization  yields  an  unbiased  estimate  of  the  average  treatment  effect.    •   For  exchangeability:  •   Equal  sized  treatment  groups  •   Rerandomization  criteria  that  is  objective  and  does  not  favor  a  speciWic  group    

Unbiased  

Unbiased  

Options  for  unequal  sized  treatment  groups:  

•   Rerandomize  into  multiple  equally  sized  groups,  then  combine  groups  after  the  rerandomization  •   Discard  the  extra  units  to  form  equal  sized  groups  •   Rerandomize  to  lower  MSE,  with  perhaps  slight  bias  •   Do  not  rerandomize  

•   Randomization  Test:  •   Simulate  randomizations  to  see  what  the  statistic  would  look  like  just  by  random  chance,  if  the  null  hypothesis  were  true    

•   Rerandomization  Test:  •   A  randomization  test,  but  for  each  simulated  randomization,  follow  the  same  rerandomization  criteria  used  in  the  experiment  

•   As  long  as  the  simulated  randomizations  are  done  using  the  same  randomization  scheme  used  in  the  experiment,  this  will  give  accurate  p-­‐values  

 

Rerandomization  Test  

•   t-­‐test:  •   Too  conservative  •   SigniWicant  results  can  be  trusted    

•   Regression:  •   Regression  including  the  covariates  that  were  balanced  on  using  rerandomization  more  accurately  estimates  the  true  precision  of  the  estimated  treatment  effect  •   Assumptions  are  less  dangerous  after  rerandomization  because  groups  should  be  well  balanced  

   

Alternatives  for  Analysis  

We  use  Mahalanobis  Distance,  M,  to  represent  multivariate  distance  between  group  means:    

2Under adequate sample sizes and pure randomization: ~: Number of covariates to be balanced

kMk

χ

Choose  a  and  rerandomize  when  M  >  a  

( ) ( ) ( )1

'cov T CT C T CM−

− −= −X X X X X X

Criteria  for  Acceptable  Balance  

( ) ( ) ( )11

1 1 'covT C T CT Cn n

−−⎛ ⎞

= + − −⎜ ⎟⎝ ⎠

X X X X X

Ma

MMa

Ma

Distribution  of  M  

RERANDOMIZE  

Acceptable  Randomizations  

pa  =  Probability  of  accepting  a  randomization  

•   Choosing  the  acceptance  probability  is  a  tradeoff  between  a  desire  for  better  balance  and  computational  time  

•   The  number  of  randomizations  needed  to  get  one  successful  one  is  Geometric(pa),  so  the  expected  number  needed  is  1/pa    

•   Computational  time  must  be  considered  in  advance,  since  many  simulated  acceptable  randomizations  are  needed  to  conduct  the  randomization  test  

Choosing  pa  

•   The  obvious  choice  may  be  to  set  limits  on  the  acceptable  balance  for  each  covariate  individually  •   This  destroys  the  joint  distribution  of  the  covariates    

-3 -2 -1 0 1 2 3

-3-2

-10

12

3 D2

D1

Another  Measure  of  Balance  

•   Since  M  follows  a  known  distribution,  easy  to  specify  the  proportion  of  accepted  randomizations  •   M  is  afWinely  invariant  (unaffected  by  afWine  transformations  of  the  covariates)  •   Correlations  between  covariates  are  maintained  •   The  balance  improvement  for  each  covariate  is  the  same  (and  known)…  •   …  and  is  the  same  for  any  linear  combination  of  the  covariates  

Rerandomization  Based  on  M  

Theorem:  If  nT  =  nC,  the  covariate  means  are  normally  distributed,  and  rerandomization  occurs  when  M  >  a,  then     ( )|T CE M a− ≤ =X X 0

( ) ( )cov cov| .T C T CaM a v− ≤ = −X X X X

( )( )

22

2

1,2 2 2

,2 2

ka

k

k aP a

vk ak P a

γ χ

χγ

+

⎛ ⎞+⎜ ⎟ ≤⎝ ⎠≡ × =⎛ ⎞ ≤⎜ ⎟⎝ ⎠

Covariates  After  Rerandomization  

1

0where is the incomplete gamma function: ( , ) .

c b yb c y e dyγ γ − −≡ ∫

Percent  Reduction  in  Variance  

( )( )

, ,

, ,

rerandomizativar

var

onj T j C

j T j C

X X

X X

( ) ( )( )

, , , ,

, ,

rerandomizatiovar var n100

varj T j C j T j C

j T j C

X X X X

X X

⎛ ⎞−⎜ ⎟⎜⎝

− ⎟⎠

− ∣

•   For  each  covariate  xj,  the  ratio  of  variances  is:  

and  the  percent  reduction  in  variance  is  

av=

100(1 )av= −

Shadish  Data  

n  =  445  undergraduate  students  

Randomized  Experiment  235  

Observational  Study  210  

Vocab  Training  116  

Math  Training  119  

Vocab  Training  131  

Math  Training  79  

randomization  

randomization   students  choose  

Shadish,  M.  R.,  Clark,  M.  H.,  Steiner,  P.  M.  (2008).    Can  nonrandomized  experiments  yield  accurate  answers?    A  randomized  experiment  comparing  random  and  nonrandom  assignments.    JASA.    103(484):  1334-­‐1344.  

Shadish  Data  

k    =  10  covariates  

Standardized Difference in Covariate Means-3 -2 -1 0 1 2 3

maleage

collgpaaactcomp

preflitlikelit

likemathnumbmath

mathprevocabpre

Shadish  Data  

To  generate  1000  rerandomizations:    

 pa  =  0.1:  11.6  seconds    pa  =  0.01:  1.8  minutes    pa  =  0.001:  18.3  minutes    pa  =  0.0001:    3.4  hours  

 

 

Shadish  Data  

To  generate  1000  rerandomizations:    

 pa  =  0.1:  11.6  seconds    pa  =  0.01:  1.8  minutes    pa  =  0.001:  18.3  minutes    pa  =  0.0001:    3.4  hours  

 

 

Shadish  Data  

( )210 0.001 1.48P a aχ ≤ ⇒ ==

Rerandomize  when  M  >  1.48  

Ratio of Variances:1.48

1.4

101, 1,2 22 2 2 2 0.12

1010, ,2 2

82 2

a

k a

k akv

γ γ

γ γ

⎛ ⎞ ⎛ ⎞+ +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠× = × =⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

=

k    =  10  covariates  pa  =  0.001      

     

Percent  Reduction  in  Variance:  100(1  –  va)  =  88%  

Shadish  Data  Vocab Pre-Test

XT-XC

-2 -1 0 1 2

Pure RandomizationRerandomization

Shadish  Data  

Standardized Difference in Covariate Means-4 -2 0 2 4

act+10gpa

male

age

gpa

act

preflit

likelit

likemath

numbmath

mathpre

vocabpre

Actual RandomizationPure RandomizationRerandomization PRIV

88%

88%

88%

88%

88%

88%

88%

88%

88%

88%

88%

Theorem:  If  nT  =  nC  ,  the  covariate  and  outcome  means  are  normally  distributed,  the  treatment  effect  is  additive,  and  rerandomization  occurs  when  M  >  a,  then        

( )|T CE Y aY M τ− ≤ =and  the  percent  reduction  in  variance  for  the  estimated  treatment  effect  is  

( ) 200 1 ,1 av R−where  R2  is  the  squared  canonical  correlation.  

Estimated  Treatment  Effect  After  Rerandomization  

va  

Outcome  Variance  Reduction  

Shadish  Data  

n  =  445  undergraduate  students  

Randomized  Experiment  235  

Observational  Study  210  

Vocab  Training  116  

Math  Training  119  

Vocab  Training  131  

Math  Training  79  

randomization  

randomization   students  choose  

Outcomes:    score  on  a  vocabulary  test,  score  on  a  mathematics  test    

Shadish  Data  

10, .001 Covariate Percent Reduction in Variance = 88%

ak p=⇒

=

Vocabulary:    R2  =  0.11    ⇒  Percent  Reduction  in  Variance  =  88(0.11)  =  9.68%      

Equivalent  to  increasing  the  sample  size  by    

1/(1-­‐0.28)  =  1.39  

2Percent Reduction in Varia ˆnce 88 for is T CY RYτ = −

Mathematics:    R2  =  0.32    ⇒  Percent  Reduction  in  Variance  =  88(0.32)  =  28.16%      

Shadish  Data  

Vocabulary

Estimated Treatment Effect-3 -2 -1 0 1 2 3

Mathematics

Estimated Treatment Effect-2 -1 0 1 2

•   Since  M  follows  a  known  distribution,  easy  to  specify  the  proportion  of  accepted  randomizations  •   M  is  afWinely  invariant  (unaffected  by  afWine  transformations  of  the  covariates)  •   Correlations  between  covariates  are  maintained  •   The  balance  improvement  for  each  covariate  is  the  same…  •   …  and  is  the  same  for  any  linear  combination  of  the  covariates  

Rerandomization  Based  on  M  

Is  that  good???  

•   Since  M  follows  a  known  distribution,  easy  to  specify  the  proportion  of  accepted  randomizations  •   M  is  afWinely  invariant  (unaffected  by  afWine  transformations  of  the  covariates)  •   Correlations  between  covariates  are  maintained  •   The  balance  improvement  for  each  covariate  is  the  same…  •   …  and  is  the  same  for  any  linear  combination  of  the  covariates  

•   In  practice,  covariates  are  often  of  varying  importance    •   We  can  place  covariates  into  tiers,  with  the  top  tier  including  the  most  important  covariates,  the  second  tier  less  important,  etc.  

•   Criteria  for  acceptable  balance  can  be  less  stringent  for  each  successive  tier    

     

Tiers  of  Covariates  

1)   Check  balance  for  covariates  in  tier  1    ⇒  calculate  M1  using  the  covariates  in  tier  1,    

rerandomize  if  M1  >  a1    2)  For  each  successive  tier,  regress  each  covariate  

on  all  covariates  in  previous  tiers,  and                check  balance  for  the  residuals  of  that  tier              ⇒  calculate  Mt  using  the  residuals  of  tier  t,  

rerandomize  if  Mt  >  at      at  denotes  the  acceptance  threshold  for  tier  t  

Tiers  of  Covariates  

Theorem:  If  the  nT  =  nC  and  the  covariate  means  are  normally  distributed,  and  rerandomization  occurs  if  any  Mt  >  at,  then  the  ratio  of  variances  for  covariate  xj  is          

     

1 if in tier 1,av

1 2

2 21 1(1 ) if in tier 2,a av R Rv+ −

Tiers  of  Covariates  

1

12 2 2 2

1 1 12

( ) (1 ), if in tier 2,i t

t

a a i i a ti

v R v R R v R t−

− −=

+ − + − >∑where  Rt2  denotes  the  squared  canonical  correlation  between  xj  and  all  covariates  in  tiers  1  through  t.  

Empirical Percent Reduction in Variance

0 20 40 60 80 100

actcomp*collgpaalikelit*collgpaalikelit*actcomp

likemath*collgpaalikemath*actcomp

likemath*likelitnumbmath*collgpaanumbmath*actcomp

numbmath*likelitnumbmath*likemath

mathpre*collgpaamathpre*actcomp

mathpre*likelitmathpre*likemath

mathpre*numbmathvocabpre*collgpaavocabpre*actcomp

vocabpre*likelitvocabpre*likemath

vocabpre*numbmathvocabpre*mathpre

mathpre2vocabpre2

TIER 4:aframcauc

marriedmaleage

daddegrmomdegr

parentsIncomepintellpemot

pconscpagreepextra

beckmars

majormicredit

hsgpaarpreflit

TIER 3:collgpaaactcomp

likelitlikemath

numbmathTIER 2:

mathprevocabpre

TIER 1: 94.8278.2440.7523.24

CovariatesResiduals

Standardized Difference in Covariate Means

-4 -2 0 2 4

actcomp*collgpaalikelit*collgpaalikelit*actcomp

likemath*collgpaalikemath*actcomp

likemath*likelitnumbmath*collgpaanumbmath*actcomp

numbmath*likelitnumbmath*likemath

mathpre*collgpaamathpre*actcomp

mathpre*likelitmathpre*likemath

mathpre*numbmathvocabpre*collgpaavocabpre*actcomp

vocabpre*likelitvocabpre*likemath

vocabpre*numbmathvocabpre*mathpre

mathpre2vocabpre2

TIER 4:aframcauc

marriedmaleage

daddegrmomdegr

parentsIncomepintellpemot

pconscpagreepextra

beckmars

majormicredit

hsgpaarpreflit

TIER 3:collgpaaactcomp

likelitlikemath

numbmathTIER 2:

mathprevocabpre

TIER 1:PRIV95%94%

79%80%79%85%79%67%52%44%45%46%38%47%43%43%41%53%31%41%44%43%42%39%51%55%

94%89%92%79%78%84%90%85%81%82%84%91%88%74%76%79%78%72%78%77%80%78%81%

Empirical  vs  Theoretical  Values  

0 20 40 60 80 100

020

4060

8010

0

Empirical PRIV

Theo

retic

al P

RIV

Tier 1Tier 2Tier 3Tier 4

•     Rerandomization  is  easily  extended  to  multiple  treatment  groups  

•   To  measure  balance,  one  of  the  MANOVA  test  statistics  (such  as  Wilks’  Λ)  can  be  used,  measuring  the  ratio  of  within  group  variability  to  between  group  variability  for  multiple  covariates  

•   This  is  equivalent  to  rerandomizing  based  on  Mahalanobis  distance  if  used  on  only  two  groups    

Multiple  Treatment  Groups  

•     This  is  not  meant  to  replace  blocking,  and  blocking  and  rerandomization  can  be  used  together  

•   Blocking  is  great  for  balancing  a  few  very  important  covariates,  but  is  not  easily  used  for  many  covariates  

•     “Block  what  you  can,  rerandomize  what  you  can’t”    

 

Blocking  

•   The  theoretical  results  given  depend  on  M  ~  χk2      

•   If  the  covariate  means  are  not  normally  distributed,  that  is  Wine!    The  theoretical  results  given  are  used  nowhere  in  analysis.    Rerandomization  does  NOT  depend  on  asymptotics.  

•   The  threshold  a  corresponding  to  pa,  and  the  percent  reduction  in  variance  for  the  covariates  can  be  estimated  via  simulation  

Small  Samples  or  Non-­‐Normality  

•   Rerandomization  improves  covariate  balance  between  the  treatment  groups,  giving  the  researcher  more  faith  that  an  observed  effect  is  really  due  to  the  treatment  

•   If  the  covariates  are  correlated  with  the  outcome,  rerandomization  also  increases  precision  in  estimating  the  treatment  effect,  giving  the  researcher  more  power  to  detect  a  signiWicant  result  

Conclusion  

Thank  you!!!  

Recommended