146
Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington, Seattle Chris Adolph (UW) Tufte without tears 1 / 106

Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Tufte Without Tears

Flexible tools for visual explorationand presentation of statistical models

Christopher Adolph

Department of Political Science

and

Center for Statistics and the Social Sciences

University of Washington, Seattle

Chris Adolph (UW) Tufte without tears 1 / 106

Page 2: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Edward Tufte & Scientific Visuals

In several beautifully illustrated books Edward Tufte gives the following advice:

1 Assume the reader’s interest and intelligence

2 Maximize information, minimize ink & space: the data-ink ratio

3 Show the underlying data and facilitate comparisons

4 Use small multiples: repetitions of a basic design

Tufte’s recommended medical chart illustrates many of these ideas:

Chris Adolph (UW) Tufte without tears 2 / 106

Page 3: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Edward Tufte & Scientific Visuals

In several beautifully illustrated books Edward Tufte gives the following advice:

1 Assume the reader’s interest and intelligence

2 Maximize information, minimize ink & space: the data-ink ratio

3 Show the underlying data and facilitate comparisons

4 Use small multiples: repetitions of a basic design

Tufte’s recommended medical chart illustrates many of these ideas:

Chris Adolph (UW) Tufte without tears 2 / 106

Page 4: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Edward Tufte & Scientific Visuals

In several beautifully illustrated books Edward Tufte gives the following advice:

1 Assume the reader’s interest and intelligence

2 Maximize information, minimize ink & space: the data-ink ratio

3 Show the underlying data and facilitate comparisons

4 Use small multiples: repetitions of a basic design

Tufte’s recommended medical chart illustrates many of these ideas:

Chris Adolph (UW) Tufte without tears 2 / 106

Page 5: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Edward Tufte & Scientific Visuals

In several beautifully illustrated books Edward Tufte gives the following advice:

1 Assume the reader’s interest and intelligence

2 Maximize information, minimize ink & space: the data-ink ratio

3 Show the underlying data and facilitate comparisons

4 Use small multiples: repetitions of a basic design

Tufte’s recommended medical chart illustrates many of these ideas:

Chris Adolph (UW) Tufte without tears 2 / 106

Page 6: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

This chart is annotated for pedagogical purposes

Lots of information; little distracting scaffolding

A model that can be repeated once learned. . .Chris Adolph (UW) Tufte without tears 3 / 106

Page 7: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

A complete layout using small multiples to convey lots of info

Elegant, information-rich. . . and hard to make

Chris Adolph (UW) Tufte without tears 4 / 106

Page 8: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

What most discussions of statistical graphics leave out

Tufte’s books have had a huge impact on information visualization

However, they have two important limits:

Modeling Most examples are either exploratory or very simple models;

Social scientists want cutting edge applications

Tools Need to translate aesthetic guidelines into software

Social scientists are unlikely to do this on their own—and shouldn’t have to!

Chris Adolph (UW) Tufte without tears 5 / 106

Page 9: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Key problem Ready-to-use techniques to visually present model results:

for many variables

for many robustness checks

showing uncertainty

without accidental extrapolation

for an audience without deep statistical knowledge

Not covered here The theory behind effective visual display of data

Visual displays for data (not model) exploration

For these and other topics, and a reading list, see my course atfaculty.washington.edu/cadolph/vis

Lots of examples. . . Too many if we need to discuss methods in detail

Chris Adolph (UW) Tufte without tears 6 / 106

Page 10: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Key problem Ready-to-use techniques to visually present model results:

for many variables

for many robustness checks

showing uncertainty

without accidental extrapolation

for an audience without deep statistical knowledge

Not covered here The theory behind effective visual display of data

Visual displays for data (not model) exploration

For these and other topics, and a reading list, see my course atfaculty.washington.edu/cadolph/vis

Lots of examples. . . Too many if we need to discuss methods in detail

Chris Adolph (UW) Tufte without tears 6 / 106

Page 11: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Key problem Ready-to-use techniques to visually present model results:

for many variables

for many robustness checks

showing uncertainty

without accidental extrapolation

for an audience without deep statistical knowledge

Not covered here The theory behind effective visual display of data

Visual displays for data (not model) exploration

For these and other topics, and a reading list, see my course atfaculty.washington.edu/cadolph/vis

Lots of examples. . . Too many if we need to discuss methods in detail

Chris Adolph (UW) Tufte without tears 6 / 106

Page 12: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Key problem Ready-to-use techniques to visually present model results:

for many variables

for many robustness checks

showing uncertainty

without accidental extrapolation

for an audience without deep statistical knowledge

Not covered here The theory behind effective visual display of data

Visual displays for data (not model) exploration

For these and other topics, and a reading list, see my course atfaculty.washington.edu/cadolph/vis

Lots of examples. . . Too many if we need to discuss methods in detail

Chris Adolph (UW) Tufte without tears 6 / 106

Page 13: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Key problem Ready-to-use techniques to visually present model results:

for many variables

for many robustness checks

showing uncertainty

without accidental extrapolation

for an audience without deep statistical knowledge

Not covered here The theory behind effective visual display of data

Visual displays for data (not model) exploration

For these and other topics, and a reading list, see my course atfaculty.washington.edu/cadolph/vis

Lots of examples. . . Too many if we need to discuss methods in detail

Chris Adolph (UW) Tufte without tears 6 / 106

Page 14: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Who votes in American elections?Source: King, Tomz, and WittenbergMethod: Logistic regression

What do alligators eat?Source: AgrestiMethod: Multinomial logit

How do Chinese leaders gain power?Source: Shih, Adolph, and LiuMethod: Bayesian model of partially observed ranks

When do governments choose liberal or conservative central bankers?Source: AdolphMethod: Zero-inflated compositional data model

What explains the tier of European governments controlling health policies?Source: Adolph, Greer, and FonsecaMethod: Multilevel multinomial logit

Chris Adolph (UW) Tufte without tears 7 / 106

Page 15: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Presenting Estimated Models in Social Science

Most empirical work in social science is regression model-driven,with a focus on conditional expectation

Our regression models are

full of covariates

often non-linear

usually involve interactions and transformations

If there is anything we need to visualize well, it is our models

Yet we often just print off tables of parameter estimates

Limits readers’ and analysts’ understanding of the results

Chris Adolph (UW) Tufte without tears 8 / 106

Page 16: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Coefficients are not enough

Some limits of typical presentations of statistical results:

Everything written in terms of arcane intermediate quantites(for most people, this includes logit coefficients)

Little effort to transform results to the scale of the quantities of interest→ really want the conditional expectation, E(y |x)

Little effort to make informative statements about estimation uncertainty→ really want to know how uncertain is E(y |x)

Little visualization at all, or graphs with low data-ink ratios

Chris Adolph (UW) Tufte without tears 9 / 106

Page 17: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Voting Example (Logit Model)

We will explore a simple dataset using a simple model of voting

People either vote (Votei = 1), or they don’t (Votei = 0)

Many factors could influence turn-out; we focus on age and education

Data from National Election Survey in 2000. “Did you vote in 2000 election?”

vote00 age hsdeg coldeg[1,] 1 49 1 0[2,] 0 35 1 0[3,] 1 57 1 0[4,] 1 63 1 0[5,] 1 40 1 0[6,] 1 77 0 0[7,] 0 43 1 0[8,] 1 47 1 1[9,] 1 26 1 1[10,] 1 48 1 0...

Chris Adolph (UW) Tufte without tears 10 / 106

Page 18: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Logit of Decision to Vote, 2000 Presidential NES

est. s.e. p-value

Age 0.074 0.017 0.000Age2 −0.0004 0.0002 0.009High School Grad 1.168 0.178 0.000College Grad 1.085 0.131 0.000Constant −3.05 0.418 0.000

Age enters as a quadratic to allow the probability of voting to first rise andeventually fall over the life course

Results look sensible, but what do they mean?

Which has the bigger effect, age or education?

What is the probability a specific person will vote?

Chris Adolph (UW) Tufte without tears 11 / 106

Page 19: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

An alternative to printing eye-glazing tables

1 Run your model as normal. Treat the output as an intermediate step.

2 Translate your model results back into the scale of the response variable

I Modeling war? Show the change in probability of war associated with X

I Modeling counts of crimes committed? Show how those counts vary with X

I Unemployment rate time series? Show how a change in X shifts theunemployment rate over the following t years

3 Calculate or simulate the uncertainty in these final quantities of interest

4 Present visually as many scenarios calculated from the model as needed

Chris Adolph (UW) Tufte without tears 12 / 106

Page 20: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

An alternative to printing eye-glazing tables

1 Run your model as normal. Treat the output as an intermediate step.

2 Translate your model results back into the scale of the response variable

I Modeling war? Show the change in probability of war associated with X

I Modeling counts of crimes committed? Show how those counts vary with X

I Unemployment rate time series? Show how a change in X shifts theunemployment rate over the following t years

3 Calculate or simulate the uncertainty in these final quantities of interest

4 Present visually as many scenarios calculated from the model as needed

Chris Adolph (UW) Tufte without tears 12 / 106

Page 21: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

An alternative to printing eye-glazing tables

1 Run your model as normal. Treat the output as an intermediate step.

2 Translate your model results back into the scale of the response variable

I Modeling war? Show the change in probability of war associated with X

I Modeling counts of crimes committed? Show how those counts vary with X

I Unemployment rate time series? Show how a change in X shifts theunemployment rate over the following t years

3 Calculate or simulate the uncertainty in these final quantities of interest

4 Present visually as many scenarios calculated from the model as needed

Chris Adolph (UW) Tufte without tears 12 / 106

Page 22: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

An alternative to printing eye-glazing tables

1 Run your model as normal. Treat the output as an intermediate step.

2 Translate your model results back into the scale of the response variable

I Modeling war? Show the change in probability of war associated with X

I Modeling counts of crimes committed? Show how those counts vary with X

I Unemployment rate time series? Show how a change in X shifts theunemployment rate over the following t years

3 Calculate or simulate the uncertainty in these final quantities of interest

4 Present visually as many scenarios calculated from the model as needed

Chris Adolph (UW) Tufte without tears 12 / 106

Page 23: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

A bit more formally. . .

We want to know the behavior of E(y |x) as we vary x .

In non-linear models with multiple regressors, this gets tricky.

The effect of x1 depends on all the other x ’s and β̂’s

Generally, we will need to make a set of “counterfactual” assumptions:x1 = a, x2 = b, x3 = c, . . .

Choose a,b, c, . . . to match a particular counterfactual case of interest or

Hold all but one of the x ’s at their mean values (or other referencebaseline), then systematically vary the remaining x .

The same trick works if we are after differences in y related to changes in x ,such as E(yscen1 − yscen2|xscen1, xscen2)

Chris Adolph (UW) Tufte without tears 13 / 106

Page 24: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Calculating quantities of interest

Our goal to obtain “quantities of interest,” like

Expected Values: E(Y |Xc)

Differences: E(Y |Xc2)− E(Y |Xc1)

Risk Ratios: E(Y |Xc2)/E(Y |Xc1)

or any other function of the above

for some counterfactual Xc ’s.

For our Voting example, that’s easy—just plug Xc into

E(Y |Xc) =1

1 + exp(−Xcβ)

Chris Adolph (UW) Tufte without tears 14 / 106

Page 25: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Getting confidence intervals is harder, but there are several options:

For maximum likelihood models,simulate the response conditional on the regressors

See King, Tomz, and Wittenberg, 2000, American Journal of PoliticalScience, and the Zelig package for R or Clarify for Stata.

These simulations can easily be summarized as CIs: sort them and takepercentiles

For Bayesian models, usual model output is a set of posterior draws

See Andrew Gelman and Jennifer Hill, 2006, Data Analysis UsingHierarchical/ Multilevel Models, Cambridge UP.

Once we have the quantities of interest and confidence intervals, we’re readyto make some graphs. . . but how?

Chris Adolph (UW) Tufte without tears 15 / 106

Page 26: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Here is the graph that King, Tomz, and Wittenberg created for this model

How would we make this?

Chris Adolph (UW) Tufte without tears 16 / 106

Page 27: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

We could use the default graphics in Zelig or Clarify (limiting, not as niceas the above)

Or we could do it by hand (hard)

Chris Adolph (UW) Tufte without tears 17 / 106

Page 28: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Wanted: an easy-to-use R package that

1 takes as input the output of estimated statistical models

2 makes a variety of plots for model interpretation

3 plots “triples” (lower, estimate, upper) from estimated models well

4 lays out these plot in a tiled arrangement (small multiples)

5 takes care of axes, titles, and other fussy details

With considerable work, one could

coerce R’s basic graphics to do this badly

or get lattice to do this fairly well for a specific case

But an easy-to-use, general solution is lacking

Chris Adolph (UW) Tufte without tears 18 / 106

Page 29: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Wanted: an easy-to-use R package that

1 takes as input the output of estimated statistical models

2 makes a variety of plots for model interpretation

3 plots “triples” (lower, estimate, upper) from estimated models well

4 lays out these plot in a tiled arrangement (small multiples)

5 takes care of axes, titles, and other fussy details

With considerable work, one could

coerce R’s basic graphics to do this badly

or get lattice to do this fairly well for a specific case

But an easy-to-use, general solution is lacking

Chris Adolph (UW) Tufte without tears 18 / 106

Page 30: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

The tile package

My answer is the tile package, written using R’s grid graphics

Some basic tile graphic types:

scatter Scatterplots with fits, CIs, and extrapolation checkinglineplot Line plots with fits, CIs, and extrapolation checkingropeladder Dot plots with CIs and extrapolation checking

Each can take as input draws from the posterior of a regression model

A call to a tile function makes a multiplot layout:

ideal for small multiples of model parameters

Chris Adolph (UW) Tufte without tears 19 / 106

Page 31: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

1 5 10 20 350

0.2

0.4

0.6

0.8

1

Row Title 1

Column Title 1Plot Title 1

Under Title 1

X Axis 1

Y A

xis

11 5 10 20 35

0

0.2

0.4

0.6

0.8

1

Column Title 2Plot Title 2

Under Title 2

Top Axis 2

Right A

xis 2

1 5 10 20 350

0.2

0.4

0.6

0.8

1

Column Title 3Plot Title 3

Under Title 3

X Axis 3

1 5 10 20 35

0

0.2

0.4

0.6

0.8

1

Row Title 2

Plot Title 4

Under Title 4

Top Axis 4

Right A

xis 4

1 5 10 20 350

0.2

0.4

0.6

0.8

1

Plot Title 5

Under Title 5

X Axis 5

1 5 10 20 35

0

0.2

0.4

0.6

0.8

1

Plot Title 6

Under Title 6

Top Axis 6

Right A

xis 6

Chris Adolph (UW) Tufte without tears 20 / 106

Page 32: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Plot simulations of QoI

Generally, we want to plot triples: lower, estimate, upperWe could do this for specific discrete scenarios, e.g.

Pr(Voting) given five distinct sets of x ’s

Recommended plot: Dotplot with confidence interval lines

Or for a continuous stream of scenarios, e.g.,

Hold all but Age constant, then calculate Pr(Voting) at every level of Age

Recommended plot: Lineplot with shaded confidence intervals

Chris Adolph (UW) Tufte without tears 21 / 106

Page 33: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Plot simulations of QoI

Generally, we want to plot triples: lower, estimate, upperWe could do this for specific discrete scenarios, e.g.

Pr(Voting) given five distinct sets of x ’s

Recommended plot: Dotplot with confidence interval lines

Or for a continuous stream of scenarios, e.g.,

Hold all but Age constant, then calculate Pr(Voting) at every level of Age

Recommended plot: Lineplot with shaded confidence intervals

Chris Adolph (UW) Tufte without tears 21 / 106

Page 34: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

This example is obviously superior to the table of logit coefficients

But is there anything wrong or missing here?

Chris Adolph (UW) Tufte without tears 22 / 106

Page 35: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

20 30 40 50 60 70 80 900

0.2

0.4

0.6

0.8

1

Age of Respondent

Pro

babi

lity

of V

otin

g

Less than HS

High School

College

Logit estimates:95% confidence

interval is shaded

18 year old college grads?! And what about high school dropouts?

Chris Adolph (UW) Tufte without tears 23 / 106

Page 36: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

20 30 40 50 60 70 80 900

0.2

0.4

0.6

0.8

1

Age of Respondent

Pro

babi

lity

of V

otin

g

Less than HS

High School

College

Logit estimates:95% confidence

interval is shaded

tile helps us systematize plotting model results,and helps avoid unwanted extrapolation by limiting results to the convex hull

Chris Adolph (UW) Tufte without tears 24 / 106

Page 37: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Three steps to make tile plots

1 Create data traces. Each trace contains the data and graphicalparameters needed to plot a single set of graphical elements to one ormore plots.

I Could be a set of points, or text labels, or lines, or a polygon

I Could be a set of points and symbols, colors, labels, fit line, CIs, and/orextrapolation limits

I Could be the data for a dotchart, with labels for each line

I Could be the marginal data for a rug

I All annotation must happen in this step

I Basic traces: linesTile(), pointsile(), polygonTile(),polylinesTile(), and textTile()

I Complex traces: lineplot(), scatter(), ropeladder(), andrugTile()

Chris Adolph (UW) Tufte without tears 25 / 106

Page 38: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Three steps to make tile plots

1 Create data traces. Each trace contains the data and graphicalparameters needed to plot a single set of graphical elements to one ormore plots.

I Could be a set of points, or text labels, or lines, or a polygon

I Could be a set of points and symbols, colors, labels, fit line, CIs, and/orextrapolation limits

I Could be the data for a dotchart, with labels for each line

I Could be the marginal data for a rug

I All annotation must happen in this step

I Basic traces: linesTile(), pointsile(), polygonTile(),polylinesTile(), and textTile()

I Complex traces: lineplot(), scatter(), ropeladder(), andrugTile()

Chris Adolph (UW) Tufte without tears 25 / 106

Page 39: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Three steps to make tile plots

1 Create data traces. Each trace contains the data and graphicalparameters needed to plot a single set of graphical elements to one ormore plots.

I Could be a set of points, or text labels, or lines, or a polygon

I Could be a set of points and symbols, colors, labels, fit line, CIs, and/orextrapolation limits

I Could be the data for a dotchart, with labels for each line

I Could be the marginal data for a rug

I All annotation must happen in this step

I Basic traces: linesTile(), pointsile(), polygonTile(),polylinesTile(), and textTile()

I Complex traces: lineplot(), scatter(), ropeladder(), andrugTile()

Chris Adolph (UW) Tufte without tears 25 / 106

Page 40: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Three steps to make tile plots

1 Create data traces. Each trace contains the data and graphicalparameters needed to plot a single set of graphical elements to one ormore plots.

I Could be a set of points, or text labels, or lines, or a polygon

I Could be a set of points and symbols, colors, labels, fit line, CIs, and/orextrapolation limits

I Could be the data for a dotchart, with labels for each line

I Could be the marginal data for a rug

I All annotation must happen in this step

I Basic traces: linesTile(), pointsile(), polygonTile(),polylinesTile(), and textTile()

I Complex traces: lineplot(), scatter(), ropeladder(), andrugTile()

Chris Adolph (UW) Tufte without tears 25 / 106

Page 41: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Three steps to make tile plots

1 Create data traces. Each trace contains the data and graphicalparameters needed to plot a single set of graphical elements to one ormore plots.

I Could be a set of points, or text labels, or lines, or a polygon

I Could be a set of points and symbols, colors, labels, fit line, CIs, and/orextrapolation limits

I Could be the data for a dotchart, with labels for each line

I Could be the marginal data for a rug

I All annotation must happen in this step

I Basic traces: linesTile(), pointsile(), polygonTile(),polylinesTile(), and textTile()

I Complex traces: lineplot(), scatter(), ropeladder(), andrugTile()

Chris Adolph (UW) Tufte without tears 25 / 106

Page 42: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Three steps to make tile plots

1 Create data traces. Each trace contains the data and graphicalparameters needed to plot a single set of graphical elements to one ormore plots.

I Could be a set of points, or text labels, or lines, or a polygon

I Could be a set of points and symbols, colors, labels, fit line, CIs, and/orextrapolation limits

I Could be the data for a dotchart, with labels for each line

I Could be the marginal data for a rug

I All annotation must happen in this step

I Basic traces: linesTile(), pointsile(), polygonTile(),polylinesTile(), and textTile()

I Complex traces: lineplot(), scatter(), ropeladder(), andrugTile()

Chris Adolph (UW) Tufte without tears 25 / 106

Page 43: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Trace functions in tile

Primitive trace functions:linesTile Plot a set of connected line segmentspointsTile Plot a set of pointspolygonTile Plot a shaded regionpolylinesTile Plot a set of unconnected line segmentstextTile Plot text labels

Complex traces for model or data exploration:

lineplot Plot lines with confidence intervals, extrapolation warningsropeladder Plot dotplots with confidence intervals, extrapolation warnings,

and shaded rangesrugTile Plot marginal data rugs to axes of plotsscatter Plot scatterplots with text and symbol markers,

fit lines, and confidence intervals

Chris Adolph (UW) Tufte without tears 26 / 106

Page 44: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Three steps to make tile plots

1 Create data traces. Each trace contains the data and graphicalparameters needed to plot a single set of graphical elements to one ormore plots.

2 Plot the data traces. Using the tile() function, simultaneously plot alltraces to all plots.

I This is the step where the scaffolding gets made: axes and titles

I Set up the rows and columns of plots

I Titles of plots, axes, rows of plots, columns of plots, etc.

I Set up axis limits, ticks, tick labels, logging of axes

Chris Adolph (UW) Tufte without tears 27 / 106

Page 45: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Three steps to make tile plots

1 Create data traces. Each trace contains the data and graphicalparameters needed to plot a single set of graphical elements to one ormore plots.

2 Plot the data traces. Using the tile() function, simultaneously plot alltraces to all plots.

I This is the step where the scaffolding gets made: axes and titles

I Set up the rows and columns of plots

I Titles of plots, axes, rows of plots, columns of plots, etc.

I Set up axis limits, ticks, tick labels, logging of axes

Chris Adolph (UW) Tufte without tears 27 / 106

Page 46: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Three steps to make tile plots

1 Create data traces. Each trace contains the data and graphicalparameters needed to plot a single set of graphical elements to one ormore plots.

2 Plot the data traces. Using the tile() function, simultaneously plot alltraces to all plots.

3 Examine output and revise. Look at the graph made in step 2, andtweak the input parameters for steps 1 and 2 to make a better graph.

Chris Adolph (UW) Tufte without tears 28 / 106

Page 47: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

R Syntax for Lineplot of Voting Logit

# Using simcf and tile to explore an estimated logistic regression# Voting example using 2000 NES data after King, Tomz, and Wittenberg# Chris Adolph

# Load librarieslibrary(RColorBrewer)library(MASS)library(simcf) # Download simcf and tile packageslibrary(tile) # from faculty.washington.edu/cadolph

# Load data (available from faculty.washington.edu/cadolph)file <- "nes00.csv"data <- read.csv(file, header=TRUE)

Chris Adolph (UW) Tufte without tears 29 / 106

Page 48: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

R Syntax for Lineplot of Voting Logit

# Set up model formula and model specific data framemodel <- vote00 ~ age + I(age^2) + hsdeg + coldegmdata <- extractdata(model, data, na.rm=TRUE)

# Run logit & extract resultslogit.result <- glm(model, family=binomial, data=mdata)pe <- logit.result$coefficients # point estimatesvc <- vcov(logit.result) # var-cov matrix

# Simulate parameter distributionssims <- 10000simbetas <- mvrnorm(sims, pe, vc)

Chris Adolph (UW) Tufte without tears 30 / 106

Page 49: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

R Syntax for Lineplot of Voting Logit

# Set up counterfactuals: all ages, each of three educationsxhyp <- seq(18,97,1)nscen <- length(xhyp)nohsScen <- hsScen <- collScen <- cfMake(model, mdata, nscen)for (i in 1:nscen) {

# No High school scenarios (loop over each age)nohsScen <- cfChange(nohsScen, "age", x = xhyp[i], scen = i)nohsScen <- cfChange(nohsScen, "hsdeg", x = 0, scen = i)nohsScen <- cfChange(nohsScen, "coldeg", x = 0, scen = i)

# HS grad scenarios (loop over each age)hsScen <- cfChange(hsScen, "age", x = xhyp[i], scen = i)hsScen <- cfChange(hsScen, "hsdeg", x = 1, scen = i)hsScen <- cfChange(hsScen, "coldeg", x = 0, scen = i)

# College grad scenarios (loop over each age)collScen <- cfChange(collScen, "age", x = xhyp[i], scen = i)collScen <- cfChange(collScen, "hsdeg", x = 1, scen = i)collScen <- cfChange(collScen, "coldeg", x = 1, scen = i)

}

Chris Adolph (UW) Tufte without tears 31 / 106

Page 50: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

R Syntax for Lineplot of Voting Logit

# Simulate expected probabilities for all scenariosnohsSims <- logitsimev(nohsScen, simbetas, ci=0.95)hsSims <- logitsimev(hsScen, simbetas, ci=0.95)collSims <- logitsimev(collScen, simbetas, ci=0.95)

# Get 3 nice colors for tracescol <- brewer.pal(3,"Dark2")

# Set up lineplot traces of expected probabilitiesnohsTrace <- lineplot(x=xhyp,

y=nohsSims$pe,lower=nohsSims$lower,upper=nohsSims$upper,col=col[1],extrapolate=list(data=mdata[,2:ncol(mdata)],

cfact=nohsScen$x[,2:ncol(hsScen$x)],omit.extrapolated=TRUE),

plot=1)

Chris Adolph (UW) Tufte without tears 32 / 106

Page 51: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

R Syntax for Lineplot of Voting Logit

hsTrace <- lineplot(x=xhyp,y=hsSims$pe,lower=hsSims$lower,upper=hsSims$upper,col=col[2],extrapolate=list(data=mdata[,2:ncol(mdata)],

cfact=hsScen$x[,2:ncol(hsScen$x)],omit.extrapolated=TRUE),

plot=1)

collTrace <- lineplot(x=xhyp,y=collSims$pe,lower=collSims$lower,upper=collSims$upper,col=col[3],extrapolate=list(data=mdata[,2:ncol(mdata)],

cfact=collScen$x[,2:ncol(hsScen$x)],omit.extrapolated=TRUE),

plot=1)

Chris Adolph (UW) Tufte without tears 33 / 106

Page 52: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

R Syntax for Lineplot of Voting Logit

# Set up traces with labels and legendlabelTrace <- textTile(labels=c("Less than HS", "High School",

"College"),x=c( 55, 49, 30),y=c( 0.26, 0.56, 0.87),col=col,plot=1)

legendTrace <- textTile(labels=c("Logit estimates:", "95% confidence","interval is shaded"),

x=c(82, 82, 82),y=c(0.2, 0.16, 0.12),plot=1)

Chris Adolph (UW) Tufte without tears 34 / 106

Page 53: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

R Syntax for Lineplot of Voting Logit

# Plot traces using tiletile(nohsTrace,

hsTrace,collTrace,labelTrace,legendTrace,width=list(null=5),limits=c(18,94,0,1),xaxis=list(at=c(20,30,40,50,60,70,80,90)),xaxistitle=list(labels="Age of Respondent"),yaxistitle=list(labels="Probability of Voting"),frame=TRUE)

Chris Adolph (UW) Tufte without tears 35 / 106

Page 54: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Chomp! Coefficient tables can hide the punchline

Agresti offers the following example of a multinominal data analysis

Alligators in a certain Florida lake were studied, and the following datacollected:

Principal Food 1 = Invertebrates,2 = Fish,3= “Other” (!!! Floridians?)

Size of alligator in metersSex of alligator male or female

The question is how alligator size influences food choice.

We fit the model in R using multinomial logit and get . . .

Chris Adolph (UW) Tufte without tears 36 / 106

Page 55: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Chomp! Coefficient tables can hide the punchline

Multinomial Logit of Alligator’s Primary Food Source

ln(πInvertebrates ln(πOther/πFish) /πFish)

Intercept 4.90 −1.95(1.71) (1.53)

Size −2.53 0.13(0.85) (0.52)

Female −0.79 0.38(0.71) (0.91)

Direct interpretation?

We could do it, using odds ratios and a calculator.

Invertebrates vs Fish: A 1 meter increase in length makes the odds that analligator will eat invertebrates rather than fish exp(−2.53− 0) = 0.08 timessmaller.

Chris Adolph (UW) Tufte without tears 37 / 106

Page 56: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Chomp! Coefficient tables can hide the punchline

Multinomial Logit of Alligator’s Primary Food Source

ln(πInvertebrates ln(πOther/πFish) /πFish)

Intercept 4.90 −1.95(1.71) (1.53)

Size −2.53 0.13(0.85) (0.52)

Female −0.79 0.38(0.71) (0.91)

Direct interpretation?

We could do it, using odds ratios and a calculator.

Invertebrates vs Fish: A 1 meter increase in length makes the odds that analligator will eat invertebrates rather than fish exp(−2.53− 0) = 0.08 timessmaller.

Chris Adolph (UW) Tufte without tears 37 / 106

Page 57: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Chomp! Coefficient tables can hide the punchline

Multinomial Logit of Alligator’s Primary Food Source

ln(πInvertebrates ln(πOther/πFish) /πFish)

Intercept 4.90 −1.95(1.71) (1.53)

Size −2.53 0.13(0.85) (0.52)

Female −0.79 0.38(0.71) (0.91)

Invertebrates vs Other: A 1 meter increase in length makes the odds that analligator will eat invertebrates rather than “other” foodexp(−2.53− 0.13) = 0.07 times smaller.

No reader is going to do this

Chris Adolph (UW) Tufte without tears 38 / 106

Page 58: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Chomp! Coefficient tables can hide the punchline

Multinomial Logit of Alligator’s Primary Food Source

ln(πInvertebrates ln(πOther/πFish) /πFish)

Intercept 4.90 −1.95(1.71) (1.53)

Size −2.53 0.13(0.85) (0.52)

Female −0.79 0.38(0.71) (0.91)

Invertebrates vs Other: A 1 meter increase in length makes the odds that analligator will eat invertebrates rather than “other” foodexp(−2.53− 0.13) = 0.07 times smaller.

No reader is going to do this

Chris Adolph (UW) Tufte without tears 38 / 106

Page 59: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Chomp! Coefficient tables can hide the punchline

Multinomial Logit of Alligator’s Primary Food Source

ln(πInvertebrates ln(πOther/πFish) /πFish)

Intercept 4.90 −1.95(1.71) (1.53)

Size −2.53 0.13(0.85) (0.52)

Female −0.79 0.38(0.71) (0.91)

Fish vs Other: A 1 meter increase in length makes the odds that an alligatorwill eat fish rather than “other” food exp(0− 0.13) = 0.87 times smaller.

There has to be a better way

Chris Adolph (UW) Tufte without tears 39 / 106

Page 60: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

1 2 3 40

0.2

0.4

0.6

0.8

1Food choice by alligator size and sex

MaleSize of alligator

Pr(F

ood

pref

eren

ce)

1 2 3 40

0.2

0.4

0.6

0.8

1

FemaleSize of alligator

InvertebratesFish

Other

Invertebrates Fish

Other

A simple lineplot made with tile shows the whole covariate space

Much more dramatic than the table

Chris Adolph (UW) Tufte without tears 40 / 106

Page 61: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

1 2 3 4

0

0.2

0.4

0.6

0.8

1Food choice by alligator size and sex

MaleSize of alligator

Pr(

Foo

d pr

efer

ence

)

1 2 3 4

0

0.2

0.4

0.6

0.8

1

FemaleSize of alligator

Invertebrates Fish

Other

Invertebrates Fish

Other

We can also add easy to interpret measures of uncertainty

Above are standard error regions

Chris Adolph (UW) Tufte without tears 41 / 106

Page 62: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

1 2 3 4

0

0.2

0.4

0.6

0.8

1Food choice by alligator size and sex

MaleSize of alligator

Pr(

Foo

d pr

efer

ence

)

1 2 3 4

0

0.2

0.4

0.6

0.8

1

FemaleSize of alligator

Invertebrates Fish

Other

Invertebrates Fish

Other

Easy to highlight where the model is interpolating into the observed data

. . . and where it’s extrapolating into Hollywood territory

Chris Adolph (UW) Tufte without tears 42 / 106

Page 63: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

When simulation is the only option: Chinese leadership

Shih, Adolph, and Liu investigate the advancement of elite Chinese leaders inthe Reform Period (1982–2002)

Explain (partially observed) ranks of the top 300 to 500 Chinese CommunistParty leaders as a function of:

Demographics age, sex, ethnicityEducation level of degreePerformance provincial growth, revenueFaction birth, school, career, and family ties to top leaders

Bayesian model of partially observed ranks of CCP officials

Model parameters difficult to interpret: on a latent scaleand individual effects are conditioned on all other ranked members

Only solution:Simulate ranks of hypothetical officials as if placed in the observed hierarchy

Chris Adolph (UW) Tufte without tears 43 / 106

Page 64: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

356 285 214 142 71 0

0 20 40 60 80 100

Expected rank

Expected percentile

< High SchoolFemale

High SchoolHan

Non−PrincelingAverage MemberGraduate School

CollegeJiang Zemin Faction

MaleRelative GDP Growth +1 sd

PrincelingParty Exp +1 sd

Relative Fiscal Growth +1 sdMinority

Hu Jintao FactionDeng Faction

Age +1 sd

Black circlesshowexpectedranks forotherwiseaverageChineseofficials withthecharacteristiclisted at left

Chris Adolph (UW) Tufte without tears 44 / 106

Page 65: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

356 285 214 142 71 0

0 20 40 60 80 100

Expected rank

Expected percentile

< High SchoolFemale

High SchoolHan

Non−PrincelingAverage MemberGraduate School

CollegeJiang Zemin Faction

MaleRelative GDP Growth +1 sd

PrincelingParty Exp +1 sd

Relative Fiscal Growth +1 sdMinority

Hu Jintao FactionDeng Faction

Age +1 sd

Thick blackhorizontallines are 1 stderror bars,and thin linesare 95% CIs

Chris Adolph (UW) Tufte without tears 45 / 106

Page 66: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

356 285 214 142 71 0

0 20 40 60 80 100

Expected rank

Expected percentile

< High SchoolFemale

High SchoolHan

Non−PrincelingAverage MemberGraduate School

CollegeJiang Zemin Faction

MaleRelative GDP Growth +1 sd

PrincelingParty Exp +1 sd

Relative Fiscal Growth +1 sdMinority

Hu Jintao FactionDeng Faction

Age +1 sd

Blue trianglesare officialswith randomeffects at ±1sd; how muchunmeasuredfactors matter

Chris Adolph (UW) Tufte without tears 46 / 106

Page 67: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

356 285 214 142 71 0

0 20 40 60 80 100

Expected rank

Expected percentile

< High SchoolFemale

High SchoolHan

Non−PrincelingAverage MemberGraduate School

CollegeJiang Zemin Faction

MaleRelative GDP Growth +1 sd

PrincelingParty Exp +1 sd

Relative Fiscal Growth +1 sdMinority

Hu Jintao FactionDeng Faction

Age +1 sd

Helps to sortrows of theplot fromsmallest tolargest effect

Chris Adolph (UW) Tufte without tears 47 / 106

Page 68: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40C

hang

e in

R

ank

Perc

entil

e

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

GDP Growth +1 sd Fiscal Revenue +1 sd

College Degree

Ethnic Minority

Female

Graduate Degree

Shih, Adolph, and Liu re-estimate the model for each year, leading to a largenumber of results

A complex lineplot helps organize them and facilitate comparisons

Chris Adolph (UW) Tufte without tears 48 / 106

Page 69: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40C

hang

e in

R

ank

Perc

entil

e

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

GDP Growth +1 sd Fiscal Revenue +1 sd

College Degree

Ethnic Minority

Female

Graduate Degree

Note that these results are now first differences:

the expected percentile change in rank for an otherwise average official whogains the characteristic noted

Chris Adolph (UW) Tufte without tears 49 / 106

Page 70: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

'82 '87 '82 '87 '82 '87 '82 '87 '92 '97 '02 '82 '87

−20

0

20

40C

hang

e in

R

ank

Perc

entil

e

'87 '92 '92 '97 '02 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

Mao Faction

Long Marcher 2nd Field Army Vet Deng Faction Hu Yaobang Fact'n

Zhao Ziyang Faction

Jiang Zemin Faction

Hu Jintao Faction Princeling

Over time, officials’ economic performance never matters, but factions oftendo

Runs counter to the conventional wisdom that meritocratic selection ofofficials lies behind Chinese economic success

Chris Adolph (UW) Tufte without tears 50 / 106

Page 71: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks

So far, we’ve presenting conditional expectations & differences fromregressions

But are we confident that these were the “right” estimates?

The language of inference usually assumes we

correctly specified our model

correctly measured our variables

chose the right probability model

don’t have influential outliers, etc.

We’re never completely sure these assumptions hold.

Most people present one model, and argue it was the best choice

Sometimes, a few alternatives are displayed

Chris Adolph (UW) Tufte without tears 51 / 106

Page 72: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks

So far, we’ve presenting conditional expectations & differences fromregressions

But are we confident that these were the “right” estimates?

The language of inference usually assumes we

correctly specified our model

correctly measured our variables

chose the right probability model

don’t have influential outliers, etc.

We’re never completely sure these assumptions hold.

Most people present one model, and argue it was the best choice

Sometimes, a few alternatives are displayed

Chris Adolph (UW) Tufte without tears 51 / 106

Page 73: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

The race of the variables

Model 1 Model 2 Model 3 Model 4 Model 5

My variable X.XX X.XX X.XX X.XXof interest, X1 (X.XX) (X.XX) (X.XX) (X.XX)

A control X.XX X.XX X.XX X.XX X.XXI "need" (X.XX) (X.XX) (X.XX) (X.XX) (X.XX)

A control X.XX X.XX X.XX X.XX X.XXI "need" (X.XX) (X.XX) (X.XX) (X.XX) (X.XX)

A candidate X.XX X.XXcontrol (X.XX) (X.XX)

A candidate X.XX X.XXcontrol (X.XX) (X.XX)

Alternate X.XXmeasure of X1 (X.XX)

Chris Adolph (UW) Tufte without tears 52 / 106

Page 74: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks

Problems with the approach above?1 Lots of space to show a few permutations of the model

Most space wasted or devoted to ancillary info

2 What if we’re really interested in E(Y |X ), not β̂?

E.g., because of nonlinearities, interactions, scale differences, etc.

3 The selection of permutations is ad hoc.

We’ll try to fix 1 & 2.

Objection 3 is harder, but worth thinking about.

Chris Adolph (UW) Tufte without tears 53 / 106

Page 75: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks

Problems with the approach above?1 Lots of space to show a few permutations of the model

Most space wasted or devoted to ancillary info

2 What if we’re really interested in E(Y |X ), not β̂?

E.g., because of nonlinearities, interactions, scale differences, etc.

3 The selection of permutations is ad hoc.

We’ll try to fix 1 & 2.

Objection 3 is harder, but worth thinking about.

Chris Adolph (UW) Tufte without tears 53 / 106

Page 76: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks

Problems with the approach above?1 Lots of space to show a few permutations of the model

Most space wasted or devoted to ancillary info

2 What if we’re really interested in E(Y |X ), not β̂?

E.g., because of nonlinearities, interactions, scale differences, etc.

3 The selection of permutations is ad hoc.

We’ll try to fix 1 & 2.

Objection 3 is harder, but worth thinking about.

Chris Adolph (UW) Tufte without tears 53 / 106

Page 77: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks

Problems with the approach above?1 Lots of space to show a few permutations of the model

Most space wasted or devoted to ancillary info

2 What if we’re really interested in E(Y |X ), not β̂?

E.g., because of nonlinearities, interactions, scale differences, etc.

3 The selection of permutations is ad hoc.

We’ll try to fix 1 & 2.

Objection 3 is harder, but worth thinking about.

Chris Adolph (UW) Tufte without tears 53 / 106

Page 78: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: An algorithm1 Identify a relation of interest between a concept X and a concept Y

2 Choose:

I a measure of X , denoted X ,

I a measure of Y, denoted Y ,

I a set of confounders, Z ,

I a functional form, g(·)

I a probability model of Y , f (·)

3 Estimate the probability model Y ∼ f (µ, α), µ = g(vec(X ,Z ), β).

4 Simulate the quantity of interest, e.g., E(Y |X ) or E(Y2 − Y1|X2,X1), toobtain a point estimate and confidence interval.

5 Repeat 2–4, changing at each iteration one of the choices in step 2.

6 Compile the results in a variant of the dot plot called a ropeladder.

Chris Adolph (UW) Tufte without tears 54 / 106

Page 79: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: An algorithm1 Identify a relation of interest between a concept X and a concept Y

2 Choose:

I a measure of X , denoted X ,

I a measure of Y, denoted Y ,

I a set of confounders, Z ,

I a functional form, g(·)

I a probability model of Y , f (·)

3 Estimate the probability model Y ∼ f (µ, α), µ = g(vec(X ,Z ), β).

4 Simulate the quantity of interest, e.g., E(Y |X ) or E(Y2 − Y1|X2,X1), toobtain a point estimate and confidence interval.

5 Repeat 2–4, changing at each iteration one of the choices in step 2.

6 Compile the results in a variant of the dot plot called a ropeladder.

Chris Adolph (UW) Tufte without tears 54 / 106

Page 80: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: An algorithm1 Identify a relation of interest between a concept X and a concept Y

2 Choose:

I a measure of X , denoted X ,

I a measure of Y, denoted Y ,

I a set of confounders, Z ,

I a functional form, g(·)

I a probability model of Y , f (·)

3 Estimate the probability model Y ∼ f (µ, α), µ = g(vec(X ,Z ), β).

4 Simulate the quantity of interest, e.g., E(Y |X ) or E(Y2 − Y1|X2,X1), toobtain a point estimate and confidence interval.

5 Repeat 2–4, changing at each iteration one of the choices in step 2.

6 Compile the results in a variant of the dot plot called a ropeladder.

Chris Adolph (UW) Tufte without tears 54 / 106

Page 81: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: An algorithm1 Identify a relation of interest between a concept X and a concept Y

2 Choose:

I a measure of X , denoted X ,

I a measure of Y, denoted Y ,

I a set of confounders, Z ,

I a functional form, g(·)

I a probability model of Y , f (·)

3 Estimate the probability model Y ∼ f (µ, α), µ = g(vec(X ,Z ), β).

4 Simulate the quantity of interest, e.g., E(Y |X ) or E(Y2 − Y1|X2,X1), toobtain a point estimate and confidence interval.

5 Repeat 2–4, changing at each iteration one of the choices in step 2.

6 Compile the results in a variant of the dot plot called a ropeladder.

Chris Adolph (UW) Tufte without tears 54 / 106

Page 82: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: An algorithm1 Identify a relation of interest between a concept X and a concept Y

2 Choose:

I a measure of X , denoted X ,

I a measure of Y, denoted Y ,

I a set of confounders, Z ,

I a functional form, g(·)

I a probability model of Y , f (·)

3 Estimate the probability model Y ∼ f (µ, α), µ = g(vec(X ,Z ), β).

4 Simulate the quantity of interest, e.g., E(Y |X ) or E(Y2 − Y1|X2,X1), toobtain a point estimate and confidence interval.

5 Repeat 2–4, changing at each iteration one of the choices in step 2.

6 Compile the results in a variant of the dot plot called a ropeladder.

Chris Adolph (UW) Tufte without tears 54 / 106

Page 83: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: An algorithm1 Identify a relation of interest between a concept X and a concept Y

2 Choose:

I a measure of X , denoted X ,

I a measure of Y, denoted Y ,

I a set of confounders, Z ,

I a functional form, g(·)

I a probability model of Y , f (·)

3 Estimate the probability model Y ∼ f (µ, α), µ = g(vec(X ,Z ), β).

4 Simulate the quantity of interest, e.g., E(Y |X ) or E(Y2 − Y1|X2,X1), toobtain a point estimate and confidence interval.

5 Repeat 2–4, changing at each iteration one of the choices in step 2.

6 Compile the results in a variant of the dot plot called a ropeladder.

Chris Adolph (UW) Tufte without tears 54 / 106

Page 84: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: An algorithm1 Identify a relation of interest between a concept X and a concept Y

2 Choose:

I a measure of X , denoted X ,

I a measure of Y, denoted Y ,

I a set of confounders, Z ,

I a functional form, g(·)

I a probability model of Y , f (·)

3 Estimate the probability model Y ∼ f (µ, α), µ = g(vec(X ,Z ), β).

4 Simulate the quantity of interest, e.g., E(Y |X ) or E(Y2 − Y1|X2,X1), toobtain a point estimate and confidence interval.

5 Repeat 2–4, changing at each iteration one of the choices in step 2.

6 Compile the results in a variant of the dot plot called a ropeladder.

Chris Adolph (UW) Tufte without tears 54 / 106

Page 85: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: An algorithm1 Identify a relation of interest between a concept X and a concept Y

2 Choose:

I a measure of X , denoted X ,

I a measure of Y, denoted Y ,

I a set of confounders, Z ,

I a functional form, g(·)

I a probability model of Y , f (·)

3 Estimate the probability model Y ∼ f (µ, α), µ = g(vec(X ,Z ), β).

4 Simulate the quantity of interest, e.g., E(Y |X ) or E(Y2 − Y1|X2,X1), toobtain a point estimate and confidence interval.

5 Repeat 2–4, changing at each iteration one of the choices in step 2.

6 Compile the results in a variant of the dot plot called a ropeladder.

Chris Adolph (UW) Tufte without tears 54 / 106

Page 86: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: An algorithm1 Identify a relation of interest between a concept X and a concept Y

2 Choose:

I a measure of X , denoted X ,

I a measure of Y, denoted Y ,

I a set of confounders, Z ,

I a functional form, g(·)

I a probability model of Y , f (·)

3 Estimate the probability model Y ∼ f (µ, α), µ = g(vec(X ,Z ), β).

4 Simulate the quantity of interest, e.g., E(Y |X ) or E(Y2 − Y1|X2,X1), toobtain a point estimate and confidence interval.

5 Repeat 2–4, changing at each iteration one of the choices in step 2.

6 Compile the results in a variant of the dot plot called a ropeladder.Chris Adolph (UW) Tufte without tears 54 / 106

Page 87: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: A central banks example

In The Dilemma of Discretion, I argue central bankers’ career backgroundsexplain their monetary policy choices

Central bankers with financial sector backgrounds choose more conservativepolicies, leading to lower inflation but potentially higher unemployment

I argue more conservative governments should prefer to appoint moreconservative central bankers, e.g., those with financial sector backgrounds

Chris Adolph (UW) Tufte without tears 55 / 106

Page 88: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: A central banks example

I test this claim using data on central bankers setting monetary policy over 20countries and 30 years

Conservatism of central bankers is measured by percentage share in“conservative” careers, and summarized by an index CBCC

Partisanship of government is measured by PCoG:higher values = more conservative Partisan“Center of Gravity”

I estimate a zero-inflated compositional regression on career shares

and find the expected relationship:conservative goverments pick conservative central bankers

Chris Adolph (UW) Tufte without tears 56 / 106

Page 89: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Checks: A central banks example

Conservative goverments pick conservative central bankers

Is this finding robust to specification assumptions?

In my case, the statistical model is so demanding it’s hard to include manyregressors at once.

So try one at a time, and show a long “ropeladder” plot. . .

Chris Adolph (UW) Tufte without tears 57 / 106

Page 90: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness Ropeladder: Partisan central banker appointment

237

Control added

[None]

Office appointed to

CBI (3 index avg.)

CBI (Cukierman)

Lagged inflation

Lagged unemployment

Trade Openness

Endebtedness

Financial Sector Employment

Financial Sector Score

Time Trend

Central Bank staff size

-0.4 -0.2 0.0 0.2 0.4 0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

Estimated increase in Central Bank Conservatism (CBCC) resulting from . . .

Shifting control from

low (µ-1.5 sd)

to high (µ+1.5)

Shifting PCoG from

From left gov (µ-1.5 sd)

to right gov (µ+1.5 sd)

Figure 8.4: Partisanship of central banker appointment: Robustness, part 2. Each row presents adifferent specification, adding to the baseline model the variable listed at the left. To save space, neitherthe estimated parameters nor the simplex plots of expected values are shown; instead, I plot the firstdifference in career conservatism (CBCC) for a three standard deviation increase in the control variable(at the left) or partisan center of gravity (at the right). Horizontal bars show 95 percent confidenceintervals; the gray box shows the range of point estimates for the partisan effect across all robustnesschecks. In all cases, partisan effects on appointments are substantively the same as in Figure 8.2.

Chris Adolph (UW) Tufte without tears 58 / 106

Page 91: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Anatomy of a ropeladder plot

I call this a ropeladder plot.

The column of dots shows the relationship between Y and a specific X underdifferent model assumptions

Each entry corresponds to a different assumption about the specification, orthe measures, or the estimation method, etc.

If all the dots line up, with narrow, similar CIs, we say the finding is robust, andreflects the data under a range of reasonable assumptions

If the ropeladder is “blowing in the wind”, we may be skeptical of the finding.It depends on model assumptions that may be controversial

The shaded gray box shows the full range of the point estimates for the QoI.

Narrow is better.

Chris Adolph (UW) Tufte without tears 59 / 106

Page 92: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Anatomy of a ropeladder plot

I call this a ropeladder plot.

The column of dots shows the relationship between Y and a specific X underdifferent model assumptions

Each entry corresponds to a different assumption about the specification, orthe measures, or the estimation method, etc.

If all the dots line up, with narrow, similar CIs, we say the finding is robust, andreflects the data under a range of reasonable assumptions

If the ropeladder is “blowing in the wind”, we may be skeptical of the finding.It depends on model assumptions that may be controversial

The shaded gray box shows the full range of the point estimates for the QoI.

Narrow is better.

Chris Adolph (UW) Tufte without tears 59 / 106

Page 93: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Anatomy of a ropeladder plot

I call this a ropeladder plot.

The column of dots shows the relationship between Y and a specific X underdifferent model assumptions

Each entry corresponds to a different assumption about the specification, orthe measures, or the estimation method, etc.

If all the dots line up, with narrow, similar CIs, we say the finding is robust, andreflects the data under a range of reasonable assumptions

If the ropeladder is “blowing in the wind”, we may be skeptical of the finding.It depends on model assumptions that may be controversial

The shaded gray box shows the full range of the point estimates for the QoI.

Narrow is better.

Chris Adolph (UW) Tufte without tears 59 / 106

Page 94: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Why ropeladders?1 Anticipate objections on model assumptions, and have concrete answers.

Avoid: “I ran it that other way, and it came out the ‘same’.”

Instead: “I ran it that other way, and look —it made no substantive orstatistical difference worth speaking of.”

Or: “. . . it makes this much difference.”

2 Investigate robustness more thoroughly.

Traditional tabular presentation would have run to 7 pages, makingcomparison hard and discouraging a thorough search

3 Find patterns of model sensitivity.

Two seemingly unrelated changes in specification had the same effect.(Unemployment and Financial Sector Size)

Turned out to be a missing third covariate. (Time trend)

Chris Adolph (UW) Tufte without tears 60 / 106

Page 95: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Why ropeladders?1 Anticipate objections on model assumptions, and have concrete answers.

Avoid: “I ran it that other way, and it came out the ‘same’.”

Instead: “I ran it that other way, and look —it made no substantive orstatistical difference worth speaking of.”

Or: “. . . it makes this much difference.”

2 Investigate robustness more thoroughly.

Traditional tabular presentation would have run to 7 pages, makingcomparison hard and discouraging a thorough search

3 Find patterns of model sensitivity.

Two seemingly unrelated changes in specification had the same effect.(Unemployment and Financial Sector Size)

Turned out to be a missing third covariate. (Time trend)

Chris Adolph (UW) Tufte without tears 60 / 106

Page 96: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Why ropeladders?1 Anticipate objections on model assumptions, and have concrete answers.

Avoid: “I ran it that other way, and it came out the ‘same’.”

Instead: “I ran it that other way, and look —it made no substantive orstatistical difference worth speaking of.”

Or: “. . . it makes this much difference.”

2 Investigate robustness more thoroughly.

Traditional tabular presentation would have run to 7 pages, makingcomparison hard and discouraging a thorough search

3 Find patterns of model sensitivity.

Two seemingly unrelated changes in specification had the same effect.(Unemployment and Financial Sector Size)

Turned out to be a missing third covariate. (Time trend)

Chris Adolph (UW) Tufte without tears 60 / 106

Page 97: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Robustness for several QoIs at once 81

−3 −1 1 −3 −1 1 −1 1 3 −1 1 3 −3 −1 1

FinExp FMExp CBExp GovExp CBCC

Baseline

Robust Estimation

Add πworld

Use Cukierman CBI

Omit Imports/GDP

Add Exchange Regime

Add % Left-Appointees

Add Partisan CoG of Aptees

Add % with Econ PhDs

Change in inflation, five years after +1 s.d. in . . .

Specification

Figure 3.7: The career-inflation link under alternative specifications. Each plot shows the five-yearfirst difference in inflation resulting from a one standard deviation increase in a career variable (FinExp,GovExp, etc.), given the specification noted at left. See text for a description of the baseline model andalternatives. Circles and squares indicate point estimates of the first difference, and horizontal lines 90percent confidence intervals. The results marked with a square are the same as those used in Figure 3.6.The shaded areas highlight the range of point estimates across all alternatives. Effects of controls notshown (of those considered, only πworld had a significant effect on inflation).

A similar pattern of robustness emerges if we keep the baseline model in place andinstead vary the construction of the key explanatory variables. Most central banks havemultiple, legally established monetary policy makers who must make collective decisions.Modeling the aggregating of policy makers’ preferences is a key challenge. Taking the(tenure-weighted) means of all de jure policy makers seems like a good first approximation,though the median member is a better bet when a single dimension (like CBCC) is estab-lished. But as Figure 3.8 shows, it makes little difference if we take means or medians of thecareer components or CBCC. In constructing the CBCC index we have still more options.Rather than weighting FinExp, GovExp, FMExp, and CBExp equally, as in the main text,we could give only half-weight to the more ambiguous categories (FMExp and CBExp), orwe could use the point estimates of the coefficients on each category as weights. The middlerows of Figure 3.8 show that these alternative make little difference.

A final approach follows the common practice of ignoring all central bankers but thegovernor of the bank. The procedure eliminates the problem of aggregation by pretendingit does not exist, and is a mere half-step towards recognizing the importance of central bankofficials in setting monetary policy. (In particular, the governors-only focus is incompatible

Each ropeladder, or column,shows the effect of a different variable on the response

That is, reading across shows the results from a single model

Reading down shows the results for a single question across different models

Chris Adolph (UW) Tufte without tears 61 / 106

Page 98: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Chinese Officials Redux

Earlier, I showed the relationship between many different covariates and theexpected political rank of Chinese officials

I also showed how these relationships changed over time, making for acomplex plot

But our findings were controversial: countered the widely accepted belief thatChinese officials are rewarded for economic performance

Critics asked for lots of alternative specifications to probe our results

I can use tile to show how exactly what difference these robustness checksmade using overlapping lineplots

Chris Adolph (UW) Tufte without tears 62 / 106

Page 99: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

GDP Growth +1 sd Fiscal Revenue +1 sd

College Degree

Ethnic Minority

Female

Graduate Degree

Original Model Factional Tie Requires Job Overlap

Some critics worried that our measures of faction were too sensitive,so we considered a more specific alternative

This didn’t salvage the conventional wisdom on growth. . .

Chris Adolph (UW) Tufte without tears 63 / 106

Page 100: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

'82 '87 '82 '87 '82 '87 '82 '87 '92 '97 '02 '82 '87

−20

0

20

40C

hang

e in

R

ank

Perc

entil

e

'87 '92 '92 '97 '02 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

Mao Faction

Long Marcher 2nd Field Army Vet Deng Faction Hu Yaobang Fact'n

Zhao Ziyang Faction

Jiang Zemin Faction

Hu Jintao Faction Princeling

Original Model Factional Tie Requires Job Overlap

But did (unsuprisingly) strengthen our factional results

(Specific measures pick up the strongest ties)

Chris Adolph (UW) Tufte without tears 64 / 106

Page 101: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

'82 '87 '82 '87 '82 '87 '82 '87 '92 '97 '02 '82 '87

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

'87 '92 '92 '97 '02 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

Mao Faction

Long Marcher 2nd Field Army Vet Deng Faction Hu Yaobang Fact'n

Zhao Ziyang Faction

Jiang Zemin Faction

Hu Jintao Faction Princeling

Original Model Surprise Performance, ARMA(p,q)

Other critics worried about endogeneity or selection effects flowing frompolitical power to economic performance

We used measures of unexpected growth to zero in on an official’s ownperformance in office—which still nets zero political benefit

Chris Adolph (UW) Tufte without tears 65 / 106

Page 102: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40C

hang

e in

R

ank

Perc

entil

e

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

GDP Growth +1 sd Fiscal Revenue +1 sd

College Degree

Ethnic Minority

Female

Graduate Degree

Original Model Surprise Performance, ARMA(p,q)

But is there a more efficient way to show that our results stay essentially thesame?

Chris Adolph (UW) Tufte without tears 66 / 106

Page 103: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

'82 '87 '92 '97 '02 '82 '87 '92 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

GDP Growth +1 sd Fiscal Revenue +1 sd

College Degree

Ethnic Minority

Female

Graduate Degree

Original Model Ignore exact ACC ranks Job Overlap Only7 models controlling Surprise Performance, Five Factor Perform. & Schools

In our printed article, we show only this plot, which overlaps the full array ofrobustness checks

Chris Adolph (UW) Tufte without tears 67 / 106

Page 104: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

'82 '87 '82 '87 '82 '87 '82 '87 '92 '97 '02 '82 '87

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

'87 '92 '92 '97 '02 '97 '02 '82 '87 '92 '97 '02

−20

0

20

40

Cha

nge

in

Ran

k Pe

rcen

tile

Mao Faction

Long Marcher 2nd Field Army Vet Deng Faction Hu Yaobang Fact'n

Zhao Ziyang Faction

Jiang Zemin Faction

Hu Jintao Faction Princeling

Original Model Ignore exact ACC ranks Job Overlap Only7 models controlling Surprise Performance, Five Factor Perform. & Schools

Conveys hundreds of separate findings in a compact, readable form

No knowledge of Bayesian methods or partial rank coefficients required!

Chris Adolph (UW) Tufte without tears 68 / 106

Page 105: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Bonus Example: Allocation of Authority for Health Policy

Adolph, Greer, and Fonseca consider the problem of explaining whether local,regional, or national European government have power over specific healthpolicy areas and instruments

Areas: Pharamceuticals, Secondary/Tertiary, Primary Care, Public Health

Instruments: Frameworks, Finance, Implementation, Provision

Each combination for each country is a case

Fiscal federalism suggests lower levels for information-intensive policiesand higher levels for policies with spillovers or public goods

Also control for country characteristics and country random effects

With 3 nominal outcomes for each case, need a multilevel multinomial logit

Chris Adolph (UW) Tufte without tears 69 / 106

Page 106: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Bonus Example: Allocation of Authority for Health Policy

Adolph, Greer, and Fonseca consider the problem of explaining whether local,regional, or national European government have power over specific healthpolicy areas and instruments

Areas: Pharamceuticals, Secondary/Tertiary, Primary Care, Public Health

Instruments: Frameworks, Finance, Implementation, Provision

Each combination for each country is a case

Fiscal federalism suggests lower levels for information-intensive policiesand higher levels for policies with spillovers or public goods

Also control for country characteristics and country random effects

With 3 nominal outcomes for each case, need a multilevel multinomial logit

Chris Adolph (UW) Tufte without tears 69 / 106

Page 107: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Bonus Example: Allocation of Authority for Health Policy

Covariates:

Policy area NominalPolicy instrument NominalRegions old or new BinaryCountry size ContinuousNumber of regions ContinuousMountains CountinuousEthnic heterogeneity Continuous

Tricky part to the model:some cases have structural zeros for regions (when they don’t exist!)

Chris Adolph (UW) Tufte without tears 70 / 106

Page 108: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

How to set up counterfactuals?

We could set all but one covariate to the mean, then predict the probability ofeach level of authority given varied levels of the remaining covariate

We should do this separately for countries with and without regionalgovernments

Let’s fix everthing but policy instrument to the mean values, then simulate theprobability of authority at each level for each instrument

We show the results using a “nested” dot plot, made using ropeladder() inthe tile package

Chris Adolph (UW) Tufte without tears 71 / 106

Page 109: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

Two−tier countries

Probability of allocation

Probability of allocation

Provision

Implementation

Finance

Framework

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

Three−tier countries

Probability of allocation

Probability of allocation

state

state

state

state

region

region

region

region

local

local

local

local

state

state

state

state

local

local

local

local

Chris Adolph (UW) Tufte without tears 72 / 106

Page 110: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Special plots for compositional data

Probabilities have a special property: they sum to one

Variables that sum to a constraint are compositional

We can plot a two-part composition on a line,

and a three-part composition on a triangular plot

This makes it easier to show more complex counterfactuals, such as everycombination of policy area and instrument

But we also need to work harder to explain these plots

Chris Adolph (UW) Tufte without tears 73 / 106

Page 111: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Probability of allocation of authority by policy type

State Region

Local

0.8

2.08.

0

0.6

4.0

6.0

0.4

6.0

4.0

0.2

8.0

2.0

Public HealthPrimary CareSecondary/TertiaryPharmacueticals

•(Pha

rma-

ceut

ical

s)•

Finance

Framework

Implementation

Provision

Stat

e

Loca

l

●●

Prov

ision

Fina

nce

Impl

emen

tatio

n

Fram

ewor

k

Above holds country characteristics at their means

Chris Adolph (UW) Tufte without tears 74 / 106

Page 112: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Probability of allocation of authority by country type

State Region

Local

0.8

2 . 0 8 .

0

0.6

4 . 0

6 .

0

0.4

6 . 0

4 .

0

0.2

8 . 0

2 .

0

• Ethnically Homogeneous

Ethnically DiversePre-1973

Regions

SmallCountry

Post-1973 Regions

Typical Three Tier

Few Large Regions

Many SmallRegions

LargeCountry

Ethnically Homogeneous

Ethnically Diverse

LargeCountry

SmallCountry

Typical Two-TieredMountainous Country

Flat Country

Mountainous

Flat Country

Stat

e

Loca

l

Above holds policy area and instrument “at their means”

Chris Adolph (UW) Tufte without tears 75 / 106

Page 113: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Residual country effects

StateStat

e

Region

Local

Loca

l

0.8

2.08.

0

0.6

4.0

6.0

0.4

6.0

4.0

0.2

8.0

2.0

••

•••

Belgium

Hungary

Czech Republic

United Kingdom

Austria

Italy

Spain

Denmark

Germany

Sweden

Switzerland●

●●

Slovakia

Bulgaria

Latvia

Poland

Estonia

Lithuania

Netherlands

France

Finland

CyprusSlovenia

Ireland

Luxembourg

Romania

Malta

Portugal

Greece

Looking at the country random effects might suggest omitted variables

Chris Adolph (UW) Tufte without tears 76 / 106

Page 114: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Residual country effects

StateStat

e

Region

Local

Loca

l

0.8

2.08.

0

0.6

4.0

6.0

0.4

6.0

4.0

0.2

8.0

2.0

••

•••

Belgium

Hungary

Czech Republic

United Kingdom

Austria

Italy

Spain

Denmark

Germany

Sweden

Switzerland●

●●

Slovakia

Bulgaria

Latvia

Poland

Estonia

Lithuania

Netherlands

France

Finland

CyprusSlovenia

Ireland

Luxembourg

Romania

Malta

Portugal

Greece

On a previous iteration, mountainous countries clustered as high Pr(Regions)

Chris Adolph (UW) Tufte without tears 77 / 106

Page 115: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Lessons for practice of data analysis

Simulation + Graphics can summarize complex models for a broad audience

You might even find something you missed as an analyst

And even for fancy or complex models, we can and should show uncertainty

Payoff to programming: this is hard the first few times, but gets easier

Code is re-usable, and encourages more ambitious modeling

Chris Adolph (UW) Tufte without tears 78 / 106

Page 116: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Teaching with tile

tile helps clarify data and models in research

Also helps in teaching statistical models

I incorporate this software throughout our graduate statistics sequence

Greatly aids intuitive understanding of models

Find out much more, and download the software, from:

faculty.washington.edu/cadolph

Chris Adolph (UW) Tufte without tears 79 / 106

Page 117: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Building a scatterplot: Supplementary exploratory example

In my graphics class, I have students build a scatterplot “from scratch”

This helps us see the many choices to make, and implications for:

1 perception of the data

2 exploration of relationships

3 assessment of fit

A good warm up for tile before the main event (application to models)

See how tile helps follow Tufte recommendations

Chris Adolph (UW) Tufte without tears 80 / 106

Page 118: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Building a scatterplot: Redistribution example

Data on

political party systems

and

redistributive effort from various industrial countries

Source of data & basic plot:

Torben Iversen & David Soskice, 2002,“Why do some democracies redistribute more than others?” HarvardUniversity.

Chris Adolph (UW) Tufte without tears 81 / 106

Page 119: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Building a scatterplot: Redistribution example

Concepts for this example (electoral systems and the welfare state):

Effective number of parties:

# of parties varies across countries

electoral rules determine #

I Winner take all (US)→∼ 2 parties.

I Proportional representation→ more parties

To see this, need to discount trivial parties.

Poverty reduction:

Percent lifted out of poverty by taxes and transfers.

Poverty = an income below 50% of mean income.

Chris Adolph (UW) Tufte without tears 82 / 106

Page 120: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Building a scatterplot: Redistribution example

Concepts for this example (electoral systems and the welfare state):

Effective number of parties:

# of parties varies across countries

electoral rules determine #

I Winner take all (US)→∼ 2 parties.

I Proportional representation→ more parties

To see this, need to discount trivial parties.

Poverty reduction:

Percent lifted out of poverty by taxes and transfers.

Poverty = an income below 50% of mean income.

Chris Adolph (UW) Tufte without tears 82 / 106

Page 121: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Building a scatterplot: Redistribution example

Concepts for this example (electoral systems and the welfare state):

Effective number of parties:

# of parties varies across countries

electoral rules determine #

I Winner take all (US)→∼ 2 parties.

I Proportional representation→ more parties

To see this, need to discount trivial parties.

Poverty reduction:

Percent lifted out of poverty by taxes and transfers.

Poverty = an income below 50% of mean income.

Chris Adolph (UW) Tufte without tears 82 / 106

Page 122: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Building a scatterplot: Redistribution example

Concepts for this example (electoral systems and the welfare state):

Effective number of parties:

# of parties varies across countries

electoral rules determine #

I Winner take all (US)→∼ 2 parties.

I Proportional representation→ more parties

To see this, need to discount trivial parties.

Poverty reduction:

Percent lifted out of poverty by taxes and transfers.

Poverty = an income below 50% of mean income.

Chris Adolph (UW) Tufte without tears 82 / 106

Page 123: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

100

90

80

70

60

50

40

30

20

10

0

po

vred

ct

14121086420enpart

Initial plotting area is often oddly shaped (I’ve exaggerated)

Plotting area hiding relationship here. Sometimes can even exclude data!

Filled circles: okay for a little data; open is better when data overlap

Chris Adolph (UW) Tufte without tears 83 / 106

Page 124: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

70

60

50

40

30

20

po

vred

765432enpart

Sensible, data based plot limits

Appears to be a curvilinear relationship. Can bring that out with. . .

Chris Adolph (UW) Tufte without tears 84 / 106

Page 125: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

70

60

50

40

30

20

po

vred

ct

0.80.70.60.50.40.3logpart

Log scaling.

But what logarthmic base? And why print the exponents?

Chris Adolph (UW) Tufte without tears 85 / 106

Page 126: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

70

60

50

40

30

20

po

vred

ct

2 3 4 5 6 7 8 910

enpart

Combine a log scale with linear labels. Now everyone can read. . .

Next: Axis labels people can understand

Chris Adolph (UW) Tufte without tears 86 / 106

Page 127: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

70

60

50

40

30

20

Per

cen

t lif

ted

ou

t o

f po

vert

y b

y ta

xes

and

tra

nsf

ers

2 3 4 5 6 7 8 910

Effective number of political parties (log10 scale)

So what’s going on the data? What are those outliers?

Chris Adolph (UW) Tufte without tears 87 / 106

Page 128: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

70

60

50

40

30

20

Per

cen

t lif

ted

ou

t o

f po

vert

y b

y ta

xes

and

tra

nsf

ers

2 3 4 5 6 7 8 910

Effective number of political parties (log10 scale)

Australia

Belgium

Canada

DenmarkFinland

France

Germany

Italy

NetherlandsNorwaySweden

Switzerland

United Kingdom

United States

With little data & big outliers, show the name of each case

Now we can try to figure out what makes the US and Switzerland so different

Chris Adolph (UW) Tufte without tears 88 / 106

Page 129: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

80

70

60

50

40

30

20

10

srefsnart dn a sex at yb y tre vop fo t uo detf il tnecreP

2 3 4 5 6 7 8 910

Effective number of political parties (log10 scale)

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian Proportional Unanimity

Electoral system

Chris Adolph (UW) Tufte without tears 89 / 106

Page 130: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 70

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

Plot redone using scatter (tile package in R)

Chris Adolph (UW) Tufte without tears 90 / 106

Page 131: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 70

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

Scatterplots relate two distributions.

Why not make those marginal distributions explicit?

Chris Adolph (UW) Tufte without tears 91 / 106

Page 132: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 7

0

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

Rugs accomplish this by replacing the axis lines with the plots

We could choose any plotting style: from the histogram-like dots. . .

Chris Adolph (UW) Tufte without tears 92 / 106

Page 133: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 7

0

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

. . . to a strip of jittered data. . .

Chris Adolph (UW) Tufte without tears 93 / 106

Page 134: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 7

0

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

. . . to a set of very thin lines marking each observation

Because we have so few cases, thin lines work best for this example

Chris Adolph (UW) Tufte without tears 94 / 106

Page 135: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 7

0

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

Let’s add a parametric model of the data: a least squares fit line

tile can do this for us

Chris Adolph (UW) Tufte without tears 95 / 106

Page 136: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 7

0

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

But we don’t have to be parametric

A local smoother, like loess, often helps show non-linear relationships

Chris Adolph (UW) Tufte without tears 96 / 106

Page 137: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 7

0

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

M-estimators weight observations by an influence function to minimize theinfluence of outliers

Chris Adolph (UW) Tufte without tears 97 / 106

Page 138: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 7

0

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

Even with an M-estimator, every outlier has some influence

Thus any one distant outlier can bias the result

Chris Adolph (UW) Tufte without tears 98 / 106

Page 139: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 7

0

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

Majoritarian

Proportional

Unanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

A robust and resistant MM-estimator, shown above, largely avoids thisproblem

Only a (non-outlying) fraction of the data influence this fit.rlm(method="MM")

Chris Adolph (UW) Tufte without tears 99 / 106

Page 140: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

2 3 4 5 6 7

0

20

40

60

80

Australia

Belgium

Canada

Denmark

Finland

France

Germany

Italy

NetherlandsNorway

Sweden

Switzerland

United Kingdom

United States

MajoritarianProportionalUnanimity

Effective number of parties

% li

fted

fro

m p

ove

rty

by

taxe

s &

tra

nsf

ers

In our final plot, we add 95 percent confidence intervals for the MM-estimator

A measure of uncertainty is essential to reader confidence in the result

Chris Adolph (UW) Tufte without tears 100 / 106

Page 141: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Syntax for redistribution scatter

library(tile)

# Load datadata <- read.csv("iver.csv", header=TRUE)attach(data)

Chris Adolph (UW) Tufte without tears 101 / 106

Page 142: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Syntax for redistribution scatter

# First, collect all the data inputs into a series of "traces"

# The actual scattered pointstrace1 <- scatter(x = enp, # X coordinate of the data

y = povred, # Y coordinate of the datalabels = cty, # Labels for each point

# Plot symbol for each pointpch = recode(system,"1=17;2=15;3=16"),

# Color for each pointcol = recode(system,"1=’blue’;2=’darkgreen’;3=’red’"),

# Offset text labelslabelyoffset = -0.03, # on npc scale

# Fontsizefontsize = 9,

Chris Adolph (UW) Tufte without tears 102 / 106

Page 143: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Syntax for redistribution scatter

# Marker sizesize = 1, # could be vector for bubble plot

# Add a robust fit line and CIfit = list(method = "mmest", ci = 0.95),

# Which plot(s) to plot toplot = 1)

# The rugs with marginal distributionsrugX1 <- rugTile(x=enp, type="lines", plot = 1)

rugY1 <- rugTile(y=povred, type="lines", plot = 1)

Chris Adolph (UW) Tufte without tears 103 / 106

Page 144: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Syntax for redistribution scatter

# A legendlegendSymbols1 <- pointsTile(x= c(1.8, 1.8, 1.8),

y= c(78, 74, 70),pch=c(17, 15, 16),col=c("blue", "darkgreen", "red"),fontsize = 9,size=1,plot=1)

legendLabels1 <- textTile(labels=c("Majoritarian","Proportional","Unanimity"),

x= c(2.05, 2.05, 2.05),y= c(78, 74, 70),pch=c(17, 15, 16),col=c("blue", "darkgreen", "red"),fontsize = 9,plot=1)

Chris Adolph (UW) Tufte without tears 104 / 106

Page 145: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Syntax for redistribution scatter

# Now, send that trace to be plotted with tiletile(trace1, # Could list as many

rugX1, # traces here as we wantrugY1, # in any orderlegendSymbols1,legendLabels1,

# Some generic options for tileRxC = c(1,1),#frame = TRUE,#output = list(file="iverson1", width=5, type="pdf"),height = list(plot="golden"),

# Limits of plotting regionlimits=c(1.6, 7.5, 0, 82),

Chris Adolph (UW) Tufte without tears 105 / 106

Page 146: Tufte Without Tears eserved@d = *@let@token @let@token ...Tufte Without Tears Flexible tools for visual exploration and presentation of statistical models Christopher Adolph Department

Syntax for redistribution scatter

# x-axis controlsxaxis=list(log = TRUE,at = c(2,3,4,5,6,7)),

# x-axis title controlsxaxistitle=list(labels=c("Effective number of parties")),

# y-axis title controlsyaxistitle=list(labels=c("% lifted from poverty by taxes & transfers")),

# Plot titlesplottitle=list(labels=("Party Systems and Redistribution")))

Chris Adolph (UW) Tufte without tears 106 / 106