51
L U U R * Michael Becher Daniel Stegmueller is version: January 2019 A Recent research has documented that lawmakers are more responsive to the views of the auent than to the less well-o. is raises the important question of whether there are institutions that can limit unequal representation. We argue that labor unions play this role and we provide evidence from the contemporary U.S. House of Representatives. Our novel dataset combines income-specic estimates of constituency preferences based on 223,000 survey respondents matched to 27 roll-call votes with measure of district-level union strength, drawn from 350,000 administrative records. Exploiting within-district variation in preference polarization, within-state variation in union strength and rich data on confounds, our analysis rules out a host of alternative explanations. In contrast to the view that unions have become too weak or fragmented to maer, they signicantly dampen unequal responsiveness: a standard deviation increase in union membership increases legislative responsiveness towards the poor by about 6 to 8 percentage points. * We thank John Ahlquist, Lucio Baccaro, Alexander Hertel-Fernandez, Patricia A. Kirkland, Clara Park, Jonas Pontusson and Elizabeth Rigby for very helpful comments on earlier dras of the paper. We are also grateful for feedback from conference/seminar participants at the annual meetings of APSA (2017) and MPSA (2018), IAST/TSE, and the Geneva Workshop on Unions and the Politics of Inequality. We thank Konstantin K¨ appner for contributing to an earlier version and Spencer Dorsey and Marco Morucci for excellent research assistance. Becher gratefully acknowledges nancial support from the Agence Nationale de la Recherche (ANR)-Labex IAST. Stegmueller’s research was supported by the National Research Foundation of Korea (NRF-2017S1A3A2066657). Institute for Advanced Study in Toulouse, University of Toulouse 1 Capitole, [email protected] Duke University, [email protected]

Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Labor Unions and Uneqal Representationlowast

Michael Becherdagger

Daniel StegmuellerDagger

is version January 2019

Abstract

Recent research has documented that lawmakers are more responsive to the views of theauent than to the less well-o is raises the important question of whether there areinstitutions that can limit unequal representation We argue that labor unions play thisrole and we provide evidence from the contemporary US House of Representatives Ournovel dataset combines income-specic estimates of constituency preferences based on223000 survey respondents matched to 27 roll-call votes with measure of district-level unionstrength drawn from 350000 administrative records Exploiting within-district variation inpreference polarization within-state variation in union strength and rich data on confoundsour analysis rules out a host of alternative explanations In contrast to the view thatunions have become too weak or fragmented to maer they signicantly dampen unequalresponsiveness a standard deviation increase in union membership increases legislativeresponsiveness towards the poor by about 6 to 8 percentage points

lowastWe thank John Ahlquist Lucio Baccaro Alexander Hertel-Fernandez Patricia A Kirkland Clara ParkJonas Pontusson and Elizabeth Rigby for very helpful comments on earlier dras of the paper We arealso grateful for feedback from conferenceseminar participants at the annual meetings of APSA (2017)and MPSA (2018) IASTTSE and the Geneva Workshop on Unions and the Politics of Inequality Wethank Konstantin Kappner for contributing to an earlier version and Spencer Dorsey and Marco Moruccifor excellent research assistance Becher gratefully acknowledges nancial support from the AgenceNationale de la Recherche (ANR)-Labex IAST Stegmuellerrsquos research was supported by the NationalResearch Foundation of Korea (NRF-2017S1A3A2066657)

daggerInstitute for Advanced Study in Toulouse University of Toulouse 1 Capitole michaelbecheriastfrDaggerDuke University danielstegmuellerdukeedu

I Introduction

Over the last 15 years or so political scientists have paid increasing aention to the linkbetween economic inequality and political representation In contrast to the principle ofpolitical equality that is central to the ideal of democratic governance this vibrant strand ofresearch has repeatedly found disparities in political representation by income Specicallyelected ocials and policy outcomes are more responsive to the views of auent citizensthan to middle-income and low-income citizens and sometimes they are not responsiveto low-income citizens at all1 As summarized by Bartels (2016 235) evidence of unequalrepresentation has been found for legislators party platforms national policy and state policyWhile scholarship has initially focused mostly on the United States recent comparative workhas revealed similar paerns of unequal representation across a larger range of politicalsystems (Bartels 2017 Elsasser et al 2017 Lupu and Warner 2017) including democracieswith proportional electoral systems and multi-party governments that had been previouslyassociated with kinder and gentler (and presumably more equal) representation (Lijphart1999) Given these results it is germane to ask whether there are institutions or organizationsthat dampen unequal responsiveness in the democratic process

In this paper we argue that stronger labor unions systematically decrease the extent ofunequal representation by elected representatives in the US ey do so even in a contextof high income inequality expensive electoral campaigns and comparatively low unionmembership Existing social science research has documented that union membership is as-sociated with lower income dierentials in political participation (Leighley and Nagler 2007Rosenfeld 2014) Moreover unions tend to take positions favored by less auent citizens(Gilens 2012) and they are one of the few organizations in national politics that advocateon the behalf of non-managerial workers spending a substantial amount of resources in theprocess (Schlozman et al 2012) Some also take costly strike action in the interest of others(Ahlquist and Levy 2013)2

However the literature provides lile evidence on whether unions actually cause ameaningful reduction in the pro-auent bias of national politicians Several scholars ofrepresentation suggest that unions have become too weak too narrow or too fragmentedto have a signicant egalitarian political impact in national policymaking (Gilens 2012175 Hacker and Pierson 2010 143) Moreover a key issue is that the relationship betweenunion strength and more equal responsiveness by politicians may be spuriously drivenby the same underlying determinants For example due to dierences in social capital(Putnam 1993 2000) workers in some electoral districts may be beer at solving collectiveaction problems than others As a result they would be more likely to unionize their

1For instance see Bartels (2008 ch 9) Bartels (2016 ch 8) Bhai and Erikson (2011) Flavin (2012) Ellis(2013) Gilens (2012) Gilens and Page (2014) Rhodes and Schaner (2017) Rigby and Wright (2013) Forexamples of dierent ndings or interpretations see Brunner et al (2013) Enns (2015) Erikson (2015)

2See Ahlquist (2017) for a review of the large interdisciplinary literature on union eects

1

workplace in the rst place and independently politicians would be more responsive tothem Another concern is that the activity of unions may inuence ldquoparties and policy butpolicy and institutions also aect unionization ratesrdquo (Ahlquist 2017 427) While collectiveaction problems dilute incentives of politicians to make politics using policies in somecircumstances they are overcome (Anzia and Moe 2016 Hacker and Pierson 2010) usunequal representation may produce policies that make it more dicult to organize unionsin the rst place In particular lsquoright-to-workrsquo and collective bargaining laws hamperunionization eorts and recent research demonstrates that these laws can have profoundpolitical eects (Feigenbaum et al 2018 Flavin and Hartney 2015)

Our empirical strategy addresses these problems based on a combination of ne-graineddata a within-district research design and robust inferential models We assess our argumentusing the contemporary Congress where unequal responsiveness by elected representativesand their policy choices has been well documented (Bartels 2008 2016 Ellis 2013 Gilens2012 Rhodes and Schaner 2017) and the playing eld for organized interest is skewedagainst the less auent (Schlozman et al 2012) We focus on members of the House ofRepresentatives during the 109ndash112th Congress (2005-2012) since this seing enables usto capture within-state variation in union strength as well as within-district variation inpreference polarization by income across a large number of policy issues Our designprovides leverage to rule out alternative explanations using state and district xed eectsand allows us to measure theoretically important confounders not accounted for in previouswork

At its core our dataset combines estimated income-specic measures of constituencypreferences based on 223000 survey respondents matched to 27 roll-call votes with informa-tion on local unions extracted from more than 350000 administrative records To measuredistrict-level policy preferences we use multiple waves of the Cooperative CongressionalElection Study (CCES) and calculate preferences on 27 concrete policy issues for each incomegroup in each congressional district We employ small area estimation as the CCES is notdesigned to be representative at the district level (we also show that our ndings are robustto using alternative approaches such as multilevel regression and poststratication [MRP])To measure the district-level strength of unions we use mandatory reports led by localunions to the Department of Labor Following recent work by Becher et al (2018) this largelyneglected administrative data source is used to construct measures of union membershipat the district level is measurement strategy overcomes major limitations of standardsurvey data used to measure union strength3

Our empirical analysis traces the legislative responsiveness of House members to thepreferences of dierent income groups in their constituency conditional on district-level

3Prior research is almost exclusively based on survey data that are not suited for a district-level analysis dueto missing identiers or sampling design In contrast to surveys moreover ling LM forms is mandatoryfor most unions non-submission and incorrect submissions are penalized reports are audited and containprecise geographic information

2

union strength We nd that district-level union membership dampens unequal responsive-ness by national legislators In line with previous research on average House members aresignicantly less responsive to the policy preferences of low-income constituents Howeverthis gap in responsiveness is smaller where unions are stronger and it decreases signicantlywhere union members are numerous is moderating eect of unions is not an artifactof existing state-level union policies or largely time-invariant state-level or district-levelunobservables (such as institutions history or culture) Extended specications allow otherdistrict-level characteristics to also moderate legislative responsiveness to dierent incomegroups ey demonstrate that the union eect is not driven by district-specic levels ofsocio-economic factors such as education race gender median household income urbaniza-tion or a districtrsquos employment structure We also rule out the possibility that our ndingsimply represents the general capacity of workers or people to organize (or be organized) byaccounting for explicit measures of district-level organization capacity based on new dataon unionization aempts from the National Labor Relations Board the predominance ofreligious organizations and behavioral measures of social capital We also go beyond stan-dard regression models and employ estimates from a Double Selection Estimator (Belloniet al 2014) and Kernel Regularized Least Squares (Hainmueller and Hazle 2014)to showthat the moderating eect of unions is robust to relaxing potentially important modelingassumptions

An exploration of possible mechanisms points to campaign contributions and partisanselection as two relevant channels through which local unions enhance the representationof the less well-o Relatedly the equality-enhancing eect of unions is stronger for bills onwhich the largest union confederation AFL-CIO has staked out a clear position

We are aware of only two previous investigations of the eect of organized labor onunequal responsiveness and they dier considerably in their approach from ours Focusingon a recent cross-section of 47 US states Flavin (2018) shows that states with stronger unionsexhibit less unequal representation as estimated from regressions of income-weighted voterpreferences on state-level policy liberalism Studying the 110th House of RepresentativesEllis (2013) nds mixed results District-level unionization is related to a smaller rich-poorgap for key legislative votes but there is no such eect for overall ideological representa-tion Our analysis addresses the problem that survey samples are not representative forcongressional districts by design which can lead to biased estimates of income biases inrepresentation It conrms the nding that unions are linked to more equal representationcovering four Congresses and three times as many roll call votes as studied by Ellis (2013)However our main empirical contribution is that we can go much further in ruling out al-ternative explanations Our research design leverages within-district variation in preferencepolarization within-state variation in union strength as well as extensive district-level dataon alternative moderating factors that may be bundled with union strength In contrastto the pure cross-sectional designs of these two previous studies our analysis can thusaccount for state and district xed eects that capture important sources of unobserved

3

heterogeneity and it directly measures many important confounders As a result we canstate with more condence that the impact of unions is not spurious

Against the backdrop of current scientic and public debates about labor unions andpolitical representation these ndings may come as somewhat of a surprise While somestrands of research and political discourse portray unions as an egalitarian force in politicsothers see them fatally weakened eects as much as causes of unequal representation orsimply as just another organized group ghting for special interests (that do not generallyoverlap with those of lower income individuals) e laer view is held by a large strand ofscholarship in economics (cf Freeman and Medo 1984) and by researchers studying therole of teachersrsquo unions in political science (Anzia 2011 Moe 2011) Most extant research onCongress simply does not have the required data to directly assess the eect of unions onrepresentational equality Numerous studies of union strength and congressional roll-callvoting do not measure voter preferences which makes it dicult to interpret who is beingrepresented (Becher et al 2018 Box-Steensmeier et al 1997 Freeman and Medo 1984)

Altogether our results suggest that unequal responsiveness is not an unavoidablefeature of democratic capitalism e results are especially striking given that recent cross-national studies have found consistent paerns of unequal representation across dierentpolitical institutions (Bartels 2017 Lupu and Warner 2017) In contrast we nd considerableheterogeneity in dierential responsiveness across districts aected by local labor unionsmdasha fundamental economic institution e moderating eect of unions uncovered in ouranalysis is large enough to swing key votes in Congress at said our results support theview that political eorts to (further) weaken unions as evidenced in recent reforms in stateslike Michigan and Wisconsin are if anything likely to exacerbate unequal responsivenessin representation ey may also explain why unions are (still) under aack

II Moderating biased responsiveness in Congress

While few studies have directly assessed the impact of labor unions on unequal respon-siveness in Congress or elsewhere various strands of scholarship in political science andrelated elds suggest that labor unions are one of the few mass-membership organizationthat provide collective voice to lower income individuals in the political arena with poten-tially important consequences for political representation (Ahlquist and Levy 2013 Bartels2016 Freeman and Medo 1984 Schlozman et al 2012) Consistent with a central premiseof the collective voice perspective unions tend to take positions favored by less auentcitizens Gilens (2012 154-161) compares public positions of national unions with masspolicy preferences across several hundred policy issues and nds that unionsrsquo positionsare most strongly correlated with the preferences of the less well-o (see also Hacker and

4

Pierson 2010 Schlozman 2015)4 Similarly Schlozman et al (2012 87) conclude that unionsare one of the few organizations in national politics ldquothat advocate on behalf of the economicinterest of workers who are not professionals or managersrdquo

However shared preferences between the less well-o and organized labor are by nomeans sucient to alter inequalities in political representation in national politics isrequires an eective political transmission mechanism To guide the empirical analysis wesketch key elements of a framework of union organization and political responsiveness

Labor unions are organizations formed to bargain collectively on behalf of their mem-bers with employers over wages and conditions Unions are thus created at the local (ieestablishment) level (Freeman and Medo 1984) Once formed unions may (and oen do)enter the political arena e ability of unions to increase the rate of political participationmdashincluding voting contacting ocials aending rallies or making donationsmdashof low- andmiddle-income citizens is oen considered to be their key channel of political inuenceImportantly unions may also increase participation among non-members with similar policypreferences through get-out-the-vote campaigns and social networks (Leighley and Nagler2007 Rosenfeld 2014 Schlozman et al 2012) Making contributions to favored candidates andcampaigns complements the ability of unions to communicate with and mobilize membersor to provide campaign volunteers Indeed unions are among the leading contributors topolitical action commiees (PAC) accounting for a quarter of total PAC spending in 2009(Schlozman et al 2012 ch 14) In contrast to corporations and business organizations unioncontributions ldquorepresent the aggregation of a large number of small individual donationsrdquo(Schlozman et al 2012 428)5

e credible threat of political mobilization can aect policy decisions by representativesin two general ways First it may shape who is elected in a given electoral district Ifpoliticians are not exchangeable (because they dier in their preferences and beliefs) politicalselection is important In an age of elite polarization (McCarty et al 2006) the partisanidentity of a representative is oen crucial for determining legislative voting (Bartels 2016Lee et al 2004) Since the New Deal era unions and union members have largely allied withthe Democratic Party given its stronger support for many of their broader policy demands(Lichtenstein 2013 Schlozman 2015) Political selection might also shape other politicalcharacteristics of representatives such as their class background or race (Butler 2014 Carnes2013)

Second unionsrsquo mobilization potential shapes the incentives of elected representativesbeyond their partisan aliation and personal traits Policymakersrsquo rational anticipation of

4is is consistent with the argument that organized labor fosters norms of solidarity and support for theless well-o through leadership (Ahlquist and Levy 2013 Kim and Margalit 2017) or social interactions(Berelson et al 1954)

5While evidence on the direct eect of contributions on legislative behavior is mixed recent eld-experimentalresults indicate that contributions help to provide access (Kalla and Broockman 2016) or sway congressionalstaers (Hertel-Fernandez et al 2018)

5

public reactions plays a central role in theories of accountability and dynamic responsiveness(Arnold 1990 Stimson et al 1995) While many individual legislative votes do not aect thereelection prospects of representatives on potentially salient votes they can face hard choicesbetween party ideology and competing constituency preferences On international tradeagreements for instance Democratic representatives have faced cross-pressures betweena more skeptical stance taken by unions and low-income constituents versus that of theirown party (Box-Steensmeier et al 1997) On the other side of the aisle in the wake of thenancial crisis Republican legislators found themselves torn between their own partisanviews on stimulus spending and the pressure from less well-o constituents (Mian et al2010)

Politiciansrsquo incentives are also linked to information eories of representation empha-size that members of Congress and especially the House face numerous voting decisions ineach term and it would be unrealistic to assume that they have access to reliable unbiasedpolling data on constituency preferences on all the issues they face (Arnold 1990 Miller andStokes 1963) Instead representativesmdashwith the help of their staersmdashrely on alternativemethods to assess public opinion including constituent correspondence town halls contactswith community leaders or local interest groups (Miler 2007) In this limited informationcontext the strength of local unions may enhance the visibility and perception of constituentpreferences (Hertel-Fernandez et al 2018)6

Following seminal theories of congressional action (Arnold 1990 Miller and Stokes1963) our argument emphasizes that the strength of local unions underpins a crediblemobilization threat that impacts the action of candidates and legislators Anticipatingmobilizing eorts by unions a potential candidate may not even enter into the race anelected career-oriented politician might be pressured to alter his or her vote even withouta full mobilization eort as long as unionsrsquo mobilization capacity is visible us bothcampaign contributions and candidate selection should maer as a channel linking localunion strength and representation since they are linked to credible threats of mobilization

Our argument implies that the district-level strength of labor unions increases theresponsiveness by members of Congress to the less auent While we know from previouswork that politicians are considerably more responsive to the preferences of the auentthan those of the less well-o this bias should be reduced in districts with relatively higherunion membership Substantively it is crucial to assess how far the presence of unions canmove responsiveness toward the ideal of political equality7

6Butler and Nickerson (2011) nd that politicians respond when provided with more accurate opinion dataHowever behavioral biases may lead politicians to discount constituent preferences they disagree with(Butler and Dynes 2016)

7In line with a large literature we focus on union membership as a key component of union strength In astudy of the eect of unions on legislative ideology rather than income-biased responsiveness Becher et al(2018) argue that structure of local unions (ie the concentration of unions in a given locality) maers aswell However they also show empirically that union density and concentration are separable dimensions

6

III Data and Empirical Strategy

Any eort to test the relevance of unions for unequal representation confronts majorchallenges of measurement and causal interpretation e dataset we have compiled allowsus to address these issues to an extent previously impossible We have created a panel oflegislatorsrsquo roll call votes matched to income-specic policy preferences at the district leveland district-level measures of union membership Our main empirical strategy to examinethe inuence of unions on unequal representation is built on two basic pillars district xedeects and interactive controls e fact that we observe several roll calls within a givencongressional district allows us to specify a model with district xed eects which captureunobservable characteristics of districts (and states) that are constant over roll-calls such ashistorical legacies or the strength of partisan organization To provide for a stricter test ofthe moderating eect of unions we also allow a rich set of other district characteristics tomoderate the link between income groups and legislatorsrsquo voting behavior is amountsto estimating models including interactions between observed district characteristics andgroup preferences In our most exible specication we allow these to be non-linear (wedescribe our models in more detail below)

e data required to implement these models were constructed in three steps Firstwe match information on roll call items for 223000 CCES respondents to actual roll callvotes cast in the House of Representatives in the 109th to the 112th Congress8 Secondwe estimate policy preferences for low and high income constituents in each district for27 roll calls To deal with the fact that the CCES is not a representative sample of districtpopulations we use a small area estimation strategy combining the CCES sample with unitrecord Census data matching the full distribution of age education gender race and incomeusing a chained Random Forests algorithm (more below and in Appendix B) ird wemeasure district-level union membership based on digitized administrative records from theDepartment of Labor

IIIA CCES data and Congressional roll calls

e CCES is an ideal starting point for our analysis since it is a nationally representativestudy includes a considerable number of roll call questions and provides us with a largeenough sample size to decompose income-group preferences by district It addresses severaldata concerns that plagued initial research on unequal responsiveness in Congress (Bhaiand Erikson 2011) e roll calls included in the CCES concern key votes as identiedby Congressional arterly and the Washington Post and cover a broad range of issues

In this paper we focus on union membership but show in a robustness test that our results still obtainwhen accounting for union concentration (see Table E1)

8Our analysis focuses on one apportionment period which generally holds district boundaries constant (weshow that the results are robust to cases of mid-period redistricting)

7

(Ansolabehere and Jones 2010) Respondents are presented with the key wording of the bill(as used on the oor and in media reports) and are then asked to cast their own vote ldquoWhatabout you If you were faced with this decision would you vote for against or not surerdquoContrary to widely usual agreendashdisagree survey measures of issue preferences matched rollcall votes provide us with unequivocal evidence of policy congruence between respondentand legislator (Jessee 2009 Ansolabehere and Jones 2010 585) We match 27 roll call itemsin the CCES to roll call votes cast in the House of the 109th to 112th Congress ese coverimportant legislative decisions such as Dodd-Frank the Aordable Care Act (and aemptsto repeal it) the minimum wage increase the ratication of the Central America Free TradeAgreement or the Lilly Ledbeer Fair Pay Act Table A1 in the Appendix lists all matchedCCES items and House bills included in our estimation sample

IIIB Measuring constituency preferences by income group

e CCES provides us with a comparatively large sample size per district However animportant potential issue is that it is not designed to be representative for congressionaldistrict populations us individuals with certain characteristics such as particular com-binations of income race and education may be underrepresented in the CCES samplefor a given district If this is the case unadjusted policy preferences from the CCES willnot reect the target population and using them can lead to biased estimates of unequalrepresentation in Congress as politicians are held to the wrong benchmark e solution tothis issue is to employ some form of small area estimation to rebalance the survey sample torepresent the district population e machine-learning solution we propose is relativelynew to the representation literature in political science but it has some aractive featuresthat merit its application to this topic It does not require distributional and functionalform assumptions it allows for arbitrary higher-order interactions of covariates and it canfully leverage ne-grained census data to construct representative samples of congressionaldistricts However we stress that our ndings do not depend on this particular approach Asshown in Online Appendix B our approach leads to somewhat more conservative estimatesof the impact of unions on the representation of dierent income groups compared to theMRP approach widely used by political scientists (Lax and Phillips 2009) alitatively bothapproaches yield the same conclusions

Our approach small area estimation using chained random forests matches CCESsurvey respondents to corresponding cases from unit record Census data e design of theCensus ensures an accurate representation of the distribution of population characteristicsin a given district (Torrieri et al 2014 Ch4) Matching these two data sources is essentiallya prediction problem which we address using a exible non-parametric machine learningapproach based on random forests (Stekhoven and Buhlmann 2011)9 Put simply the idea is

9Honaker and Plutzer (2016) use a similar approach (but relying on multivariate normal imputations) andfurther discuss its empirical performance in estimating small area aitudes and preferences

8

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 2: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

I Introduction

Over the last 15 years or so political scientists have paid increasing aention to the linkbetween economic inequality and political representation In contrast to the principle ofpolitical equality that is central to the ideal of democratic governance this vibrant strand ofresearch has repeatedly found disparities in political representation by income Specicallyelected ocials and policy outcomes are more responsive to the views of auent citizensthan to middle-income and low-income citizens and sometimes they are not responsiveto low-income citizens at all1 As summarized by Bartels (2016 235) evidence of unequalrepresentation has been found for legislators party platforms national policy and state policyWhile scholarship has initially focused mostly on the United States recent comparative workhas revealed similar paerns of unequal representation across a larger range of politicalsystems (Bartels 2017 Elsasser et al 2017 Lupu and Warner 2017) including democracieswith proportional electoral systems and multi-party governments that had been previouslyassociated with kinder and gentler (and presumably more equal) representation (Lijphart1999) Given these results it is germane to ask whether there are institutions or organizationsthat dampen unequal responsiveness in the democratic process

In this paper we argue that stronger labor unions systematically decrease the extent ofunequal representation by elected representatives in the US ey do so even in a contextof high income inequality expensive electoral campaigns and comparatively low unionmembership Existing social science research has documented that union membership is as-sociated with lower income dierentials in political participation (Leighley and Nagler 2007Rosenfeld 2014) Moreover unions tend to take positions favored by less auent citizens(Gilens 2012) and they are one of the few organizations in national politics that advocateon the behalf of non-managerial workers spending a substantial amount of resources in theprocess (Schlozman et al 2012) Some also take costly strike action in the interest of others(Ahlquist and Levy 2013)2

However the literature provides lile evidence on whether unions actually cause ameaningful reduction in the pro-auent bias of national politicians Several scholars ofrepresentation suggest that unions have become too weak too narrow or too fragmentedto have a signicant egalitarian political impact in national policymaking (Gilens 2012175 Hacker and Pierson 2010 143) Moreover a key issue is that the relationship betweenunion strength and more equal responsiveness by politicians may be spuriously drivenby the same underlying determinants For example due to dierences in social capital(Putnam 1993 2000) workers in some electoral districts may be beer at solving collectiveaction problems than others As a result they would be more likely to unionize their

1For instance see Bartels (2008 ch 9) Bartels (2016 ch 8) Bhai and Erikson (2011) Flavin (2012) Ellis(2013) Gilens (2012) Gilens and Page (2014) Rhodes and Schaner (2017) Rigby and Wright (2013) Forexamples of dierent ndings or interpretations see Brunner et al (2013) Enns (2015) Erikson (2015)

2See Ahlquist (2017) for a review of the large interdisciplinary literature on union eects

1

workplace in the rst place and independently politicians would be more responsive tothem Another concern is that the activity of unions may inuence ldquoparties and policy butpolicy and institutions also aect unionization ratesrdquo (Ahlquist 2017 427) While collectiveaction problems dilute incentives of politicians to make politics using policies in somecircumstances they are overcome (Anzia and Moe 2016 Hacker and Pierson 2010) usunequal representation may produce policies that make it more dicult to organize unionsin the rst place In particular lsquoright-to-workrsquo and collective bargaining laws hamperunionization eorts and recent research demonstrates that these laws can have profoundpolitical eects (Feigenbaum et al 2018 Flavin and Hartney 2015)

Our empirical strategy addresses these problems based on a combination of ne-graineddata a within-district research design and robust inferential models We assess our argumentusing the contemporary Congress where unequal responsiveness by elected representativesand their policy choices has been well documented (Bartels 2008 2016 Ellis 2013 Gilens2012 Rhodes and Schaner 2017) and the playing eld for organized interest is skewedagainst the less auent (Schlozman et al 2012) We focus on members of the House ofRepresentatives during the 109ndash112th Congress (2005-2012) since this seing enables usto capture within-state variation in union strength as well as within-district variation inpreference polarization by income across a large number of policy issues Our designprovides leverage to rule out alternative explanations using state and district xed eectsand allows us to measure theoretically important confounders not accounted for in previouswork

At its core our dataset combines estimated income-specic measures of constituencypreferences based on 223000 survey respondents matched to 27 roll-call votes with informa-tion on local unions extracted from more than 350000 administrative records To measuredistrict-level policy preferences we use multiple waves of the Cooperative CongressionalElection Study (CCES) and calculate preferences on 27 concrete policy issues for each incomegroup in each congressional district We employ small area estimation as the CCES is notdesigned to be representative at the district level (we also show that our ndings are robustto using alternative approaches such as multilevel regression and poststratication [MRP])To measure the district-level strength of unions we use mandatory reports led by localunions to the Department of Labor Following recent work by Becher et al (2018) this largelyneglected administrative data source is used to construct measures of union membershipat the district level is measurement strategy overcomes major limitations of standardsurvey data used to measure union strength3

Our empirical analysis traces the legislative responsiveness of House members to thepreferences of dierent income groups in their constituency conditional on district-level

3Prior research is almost exclusively based on survey data that are not suited for a district-level analysis dueto missing identiers or sampling design In contrast to surveys moreover ling LM forms is mandatoryfor most unions non-submission and incorrect submissions are penalized reports are audited and containprecise geographic information

2

union strength We nd that district-level union membership dampens unequal responsive-ness by national legislators In line with previous research on average House members aresignicantly less responsive to the policy preferences of low-income constituents Howeverthis gap in responsiveness is smaller where unions are stronger and it decreases signicantlywhere union members are numerous is moderating eect of unions is not an artifactof existing state-level union policies or largely time-invariant state-level or district-levelunobservables (such as institutions history or culture) Extended specications allow otherdistrict-level characteristics to also moderate legislative responsiveness to dierent incomegroups ey demonstrate that the union eect is not driven by district-specic levels ofsocio-economic factors such as education race gender median household income urbaniza-tion or a districtrsquos employment structure We also rule out the possibility that our ndingsimply represents the general capacity of workers or people to organize (or be organized) byaccounting for explicit measures of district-level organization capacity based on new dataon unionization aempts from the National Labor Relations Board the predominance ofreligious organizations and behavioral measures of social capital We also go beyond stan-dard regression models and employ estimates from a Double Selection Estimator (Belloniet al 2014) and Kernel Regularized Least Squares (Hainmueller and Hazle 2014)to showthat the moderating eect of unions is robust to relaxing potentially important modelingassumptions

An exploration of possible mechanisms points to campaign contributions and partisanselection as two relevant channels through which local unions enhance the representationof the less well-o Relatedly the equality-enhancing eect of unions is stronger for bills onwhich the largest union confederation AFL-CIO has staked out a clear position

We are aware of only two previous investigations of the eect of organized labor onunequal responsiveness and they dier considerably in their approach from ours Focusingon a recent cross-section of 47 US states Flavin (2018) shows that states with stronger unionsexhibit less unequal representation as estimated from regressions of income-weighted voterpreferences on state-level policy liberalism Studying the 110th House of RepresentativesEllis (2013) nds mixed results District-level unionization is related to a smaller rich-poorgap for key legislative votes but there is no such eect for overall ideological representa-tion Our analysis addresses the problem that survey samples are not representative forcongressional districts by design which can lead to biased estimates of income biases inrepresentation It conrms the nding that unions are linked to more equal representationcovering four Congresses and three times as many roll call votes as studied by Ellis (2013)However our main empirical contribution is that we can go much further in ruling out al-ternative explanations Our research design leverages within-district variation in preferencepolarization within-state variation in union strength as well as extensive district-level dataon alternative moderating factors that may be bundled with union strength In contrastto the pure cross-sectional designs of these two previous studies our analysis can thusaccount for state and district xed eects that capture important sources of unobserved

3

heterogeneity and it directly measures many important confounders As a result we canstate with more condence that the impact of unions is not spurious

Against the backdrop of current scientic and public debates about labor unions andpolitical representation these ndings may come as somewhat of a surprise While somestrands of research and political discourse portray unions as an egalitarian force in politicsothers see them fatally weakened eects as much as causes of unequal representation orsimply as just another organized group ghting for special interests (that do not generallyoverlap with those of lower income individuals) e laer view is held by a large strand ofscholarship in economics (cf Freeman and Medo 1984) and by researchers studying therole of teachersrsquo unions in political science (Anzia 2011 Moe 2011) Most extant research onCongress simply does not have the required data to directly assess the eect of unions onrepresentational equality Numerous studies of union strength and congressional roll-callvoting do not measure voter preferences which makes it dicult to interpret who is beingrepresented (Becher et al 2018 Box-Steensmeier et al 1997 Freeman and Medo 1984)

Altogether our results suggest that unequal responsiveness is not an unavoidablefeature of democratic capitalism e results are especially striking given that recent cross-national studies have found consistent paerns of unequal representation across dierentpolitical institutions (Bartels 2017 Lupu and Warner 2017) In contrast we nd considerableheterogeneity in dierential responsiveness across districts aected by local labor unionsmdasha fundamental economic institution e moderating eect of unions uncovered in ouranalysis is large enough to swing key votes in Congress at said our results support theview that political eorts to (further) weaken unions as evidenced in recent reforms in stateslike Michigan and Wisconsin are if anything likely to exacerbate unequal responsivenessin representation ey may also explain why unions are (still) under aack

II Moderating biased responsiveness in Congress

While few studies have directly assessed the impact of labor unions on unequal respon-siveness in Congress or elsewhere various strands of scholarship in political science andrelated elds suggest that labor unions are one of the few mass-membership organizationthat provide collective voice to lower income individuals in the political arena with poten-tially important consequences for political representation (Ahlquist and Levy 2013 Bartels2016 Freeman and Medo 1984 Schlozman et al 2012) Consistent with a central premiseof the collective voice perspective unions tend to take positions favored by less auentcitizens Gilens (2012 154-161) compares public positions of national unions with masspolicy preferences across several hundred policy issues and nds that unionsrsquo positionsare most strongly correlated with the preferences of the less well-o (see also Hacker and

4

Pierson 2010 Schlozman 2015)4 Similarly Schlozman et al (2012 87) conclude that unionsare one of the few organizations in national politics ldquothat advocate on behalf of the economicinterest of workers who are not professionals or managersrdquo

However shared preferences between the less well-o and organized labor are by nomeans sucient to alter inequalities in political representation in national politics isrequires an eective political transmission mechanism To guide the empirical analysis wesketch key elements of a framework of union organization and political responsiveness

Labor unions are organizations formed to bargain collectively on behalf of their mem-bers with employers over wages and conditions Unions are thus created at the local (ieestablishment) level (Freeman and Medo 1984) Once formed unions may (and oen do)enter the political arena e ability of unions to increase the rate of political participationmdashincluding voting contacting ocials aending rallies or making donationsmdashof low- andmiddle-income citizens is oen considered to be their key channel of political inuenceImportantly unions may also increase participation among non-members with similar policypreferences through get-out-the-vote campaigns and social networks (Leighley and Nagler2007 Rosenfeld 2014 Schlozman et al 2012) Making contributions to favored candidates andcampaigns complements the ability of unions to communicate with and mobilize membersor to provide campaign volunteers Indeed unions are among the leading contributors topolitical action commiees (PAC) accounting for a quarter of total PAC spending in 2009(Schlozman et al 2012 ch 14) In contrast to corporations and business organizations unioncontributions ldquorepresent the aggregation of a large number of small individual donationsrdquo(Schlozman et al 2012 428)5

e credible threat of political mobilization can aect policy decisions by representativesin two general ways First it may shape who is elected in a given electoral district Ifpoliticians are not exchangeable (because they dier in their preferences and beliefs) politicalselection is important In an age of elite polarization (McCarty et al 2006) the partisanidentity of a representative is oen crucial for determining legislative voting (Bartels 2016Lee et al 2004) Since the New Deal era unions and union members have largely allied withthe Democratic Party given its stronger support for many of their broader policy demands(Lichtenstein 2013 Schlozman 2015) Political selection might also shape other politicalcharacteristics of representatives such as their class background or race (Butler 2014 Carnes2013)

Second unionsrsquo mobilization potential shapes the incentives of elected representativesbeyond their partisan aliation and personal traits Policymakersrsquo rational anticipation of

4is is consistent with the argument that organized labor fosters norms of solidarity and support for theless well-o through leadership (Ahlquist and Levy 2013 Kim and Margalit 2017) or social interactions(Berelson et al 1954)

5While evidence on the direct eect of contributions on legislative behavior is mixed recent eld-experimentalresults indicate that contributions help to provide access (Kalla and Broockman 2016) or sway congressionalstaers (Hertel-Fernandez et al 2018)

5

public reactions plays a central role in theories of accountability and dynamic responsiveness(Arnold 1990 Stimson et al 1995) While many individual legislative votes do not aect thereelection prospects of representatives on potentially salient votes they can face hard choicesbetween party ideology and competing constituency preferences On international tradeagreements for instance Democratic representatives have faced cross-pressures betweena more skeptical stance taken by unions and low-income constituents versus that of theirown party (Box-Steensmeier et al 1997) On the other side of the aisle in the wake of thenancial crisis Republican legislators found themselves torn between their own partisanviews on stimulus spending and the pressure from less well-o constituents (Mian et al2010)

Politiciansrsquo incentives are also linked to information eories of representation empha-size that members of Congress and especially the House face numerous voting decisions ineach term and it would be unrealistic to assume that they have access to reliable unbiasedpolling data on constituency preferences on all the issues they face (Arnold 1990 Miller andStokes 1963) Instead representativesmdashwith the help of their staersmdashrely on alternativemethods to assess public opinion including constituent correspondence town halls contactswith community leaders or local interest groups (Miler 2007) In this limited informationcontext the strength of local unions may enhance the visibility and perception of constituentpreferences (Hertel-Fernandez et al 2018)6

Following seminal theories of congressional action (Arnold 1990 Miller and Stokes1963) our argument emphasizes that the strength of local unions underpins a crediblemobilization threat that impacts the action of candidates and legislators Anticipatingmobilizing eorts by unions a potential candidate may not even enter into the race anelected career-oriented politician might be pressured to alter his or her vote even withouta full mobilization eort as long as unionsrsquo mobilization capacity is visible us bothcampaign contributions and candidate selection should maer as a channel linking localunion strength and representation since they are linked to credible threats of mobilization

Our argument implies that the district-level strength of labor unions increases theresponsiveness by members of Congress to the less auent While we know from previouswork that politicians are considerably more responsive to the preferences of the auentthan those of the less well-o this bias should be reduced in districts with relatively higherunion membership Substantively it is crucial to assess how far the presence of unions canmove responsiveness toward the ideal of political equality7

6Butler and Nickerson (2011) nd that politicians respond when provided with more accurate opinion dataHowever behavioral biases may lead politicians to discount constituent preferences they disagree with(Butler and Dynes 2016)

7In line with a large literature we focus on union membership as a key component of union strength In astudy of the eect of unions on legislative ideology rather than income-biased responsiveness Becher et al(2018) argue that structure of local unions (ie the concentration of unions in a given locality) maers aswell However they also show empirically that union density and concentration are separable dimensions

6

III Data and Empirical Strategy

Any eort to test the relevance of unions for unequal representation confronts majorchallenges of measurement and causal interpretation e dataset we have compiled allowsus to address these issues to an extent previously impossible We have created a panel oflegislatorsrsquo roll call votes matched to income-specic policy preferences at the district leveland district-level measures of union membership Our main empirical strategy to examinethe inuence of unions on unequal representation is built on two basic pillars district xedeects and interactive controls e fact that we observe several roll calls within a givencongressional district allows us to specify a model with district xed eects which captureunobservable characteristics of districts (and states) that are constant over roll-calls such ashistorical legacies or the strength of partisan organization To provide for a stricter test ofthe moderating eect of unions we also allow a rich set of other district characteristics tomoderate the link between income groups and legislatorsrsquo voting behavior is amountsto estimating models including interactions between observed district characteristics andgroup preferences In our most exible specication we allow these to be non-linear (wedescribe our models in more detail below)

e data required to implement these models were constructed in three steps Firstwe match information on roll call items for 223000 CCES respondents to actual roll callvotes cast in the House of Representatives in the 109th to the 112th Congress8 Secondwe estimate policy preferences for low and high income constituents in each district for27 roll calls To deal with the fact that the CCES is not a representative sample of districtpopulations we use a small area estimation strategy combining the CCES sample with unitrecord Census data matching the full distribution of age education gender race and incomeusing a chained Random Forests algorithm (more below and in Appendix B) ird wemeasure district-level union membership based on digitized administrative records from theDepartment of Labor

IIIA CCES data and Congressional roll calls

e CCES is an ideal starting point for our analysis since it is a nationally representativestudy includes a considerable number of roll call questions and provides us with a largeenough sample size to decompose income-group preferences by district It addresses severaldata concerns that plagued initial research on unequal responsiveness in Congress (Bhaiand Erikson 2011) e roll calls included in the CCES concern key votes as identiedby Congressional arterly and the Washington Post and cover a broad range of issues

In this paper we focus on union membership but show in a robustness test that our results still obtainwhen accounting for union concentration (see Table E1)

8Our analysis focuses on one apportionment period which generally holds district boundaries constant (weshow that the results are robust to cases of mid-period redistricting)

7

(Ansolabehere and Jones 2010) Respondents are presented with the key wording of the bill(as used on the oor and in media reports) and are then asked to cast their own vote ldquoWhatabout you If you were faced with this decision would you vote for against or not surerdquoContrary to widely usual agreendashdisagree survey measures of issue preferences matched rollcall votes provide us with unequivocal evidence of policy congruence between respondentand legislator (Jessee 2009 Ansolabehere and Jones 2010 585) We match 27 roll call itemsin the CCES to roll call votes cast in the House of the 109th to 112th Congress ese coverimportant legislative decisions such as Dodd-Frank the Aordable Care Act (and aemptsto repeal it) the minimum wage increase the ratication of the Central America Free TradeAgreement or the Lilly Ledbeer Fair Pay Act Table A1 in the Appendix lists all matchedCCES items and House bills included in our estimation sample

IIIB Measuring constituency preferences by income group

e CCES provides us with a comparatively large sample size per district However animportant potential issue is that it is not designed to be representative for congressionaldistrict populations us individuals with certain characteristics such as particular com-binations of income race and education may be underrepresented in the CCES samplefor a given district If this is the case unadjusted policy preferences from the CCES willnot reect the target population and using them can lead to biased estimates of unequalrepresentation in Congress as politicians are held to the wrong benchmark e solution tothis issue is to employ some form of small area estimation to rebalance the survey sample torepresent the district population e machine-learning solution we propose is relativelynew to the representation literature in political science but it has some aractive featuresthat merit its application to this topic It does not require distributional and functionalform assumptions it allows for arbitrary higher-order interactions of covariates and it canfully leverage ne-grained census data to construct representative samples of congressionaldistricts However we stress that our ndings do not depend on this particular approach Asshown in Online Appendix B our approach leads to somewhat more conservative estimatesof the impact of unions on the representation of dierent income groups compared to theMRP approach widely used by political scientists (Lax and Phillips 2009) alitatively bothapproaches yield the same conclusions

Our approach small area estimation using chained random forests matches CCESsurvey respondents to corresponding cases from unit record Census data e design of theCensus ensures an accurate representation of the distribution of population characteristicsin a given district (Torrieri et al 2014 Ch4) Matching these two data sources is essentiallya prediction problem which we address using a exible non-parametric machine learningapproach based on random forests (Stekhoven and Buhlmann 2011)9 Put simply the idea is

9Honaker and Plutzer (2016) use a similar approach (but relying on multivariate normal imputations) andfurther discuss its empirical performance in estimating small area aitudes and preferences

8

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 3: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

workplace in the rst place and independently politicians would be more responsive tothem Another concern is that the activity of unions may inuence ldquoparties and policy butpolicy and institutions also aect unionization ratesrdquo (Ahlquist 2017 427) While collectiveaction problems dilute incentives of politicians to make politics using policies in somecircumstances they are overcome (Anzia and Moe 2016 Hacker and Pierson 2010) usunequal representation may produce policies that make it more dicult to organize unionsin the rst place In particular lsquoright-to-workrsquo and collective bargaining laws hamperunionization eorts and recent research demonstrates that these laws can have profoundpolitical eects (Feigenbaum et al 2018 Flavin and Hartney 2015)

Our empirical strategy addresses these problems based on a combination of ne-graineddata a within-district research design and robust inferential models We assess our argumentusing the contemporary Congress where unequal responsiveness by elected representativesand their policy choices has been well documented (Bartels 2008 2016 Ellis 2013 Gilens2012 Rhodes and Schaner 2017) and the playing eld for organized interest is skewedagainst the less auent (Schlozman et al 2012) We focus on members of the House ofRepresentatives during the 109ndash112th Congress (2005-2012) since this seing enables usto capture within-state variation in union strength as well as within-district variation inpreference polarization by income across a large number of policy issues Our designprovides leverage to rule out alternative explanations using state and district xed eectsand allows us to measure theoretically important confounders not accounted for in previouswork

At its core our dataset combines estimated income-specic measures of constituencypreferences based on 223000 survey respondents matched to 27 roll-call votes with informa-tion on local unions extracted from more than 350000 administrative records To measuredistrict-level policy preferences we use multiple waves of the Cooperative CongressionalElection Study (CCES) and calculate preferences on 27 concrete policy issues for each incomegroup in each congressional district We employ small area estimation as the CCES is notdesigned to be representative at the district level (we also show that our ndings are robustto using alternative approaches such as multilevel regression and poststratication [MRP])To measure the district-level strength of unions we use mandatory reports led by localunions to the Department of Labor Following recent work by Becher et al (2018) this largelyneglected administrative data source is used to construct measures of union membershipat the district level is measurement strategy overcomes major limitations of standardsurvey data used to measure union strength3

Our empirical analysis traces the legislative responsiveness of House members to thepreferences of dierent income groups in their constituency conditional on district-level

3Prior research is almost exclusively based on survey data that are not suited for a district-level analysis dueto missing identiers or sampling design In contrast to surveys moreover ling LM forms is mandatoryfor most unions non-submission and incorrect submissions are penalized reports are audited and containprecise geographic information

2

union strength We nd that district-level union membership dampens unequal responsive-ness by national legislators In line with previous research on average House members aresignicantly less responsive to the policy preferences of low-income constituents Howeverthis gap in responsiveness is smaller where unions are stronger and it decreases signicantlywhere union members are numerous is moderating eect of unions is not an artifactof existing state-level union policies or largely time-invariant state-level or district-levelunobservables (such as institutions history or culture) Extended specications allow otherdistrict-level characteristics to also moderate legislative responsiveness to dierent incomegroups ey demonstrate that the union eect is not driven by district-specic levels ofsocio-economic factors such as education race gender median household income urbaniza-tion or a districtrsquos employment structure We also rule out the possibility that our ndingsimply represents the general capacity of workers or people to organize (or be organized) byaccounting for explicit measures of district-level organization capacity based on new dataon unionization aempts from the National Labor Relations Board the predominance ofreligious organizations and behavioral measures of social capital We also go beyond stan-dard regression models and employ estimates from a Double Selection Estimator (Belloniet al 2014) and Kernel Regularized Least Squares (Hainmueller and Hazle 2014)to showthat the moderating eect of unions is robust to relaxing potentially important modelingassumptions

An exploration of possible mechanisms points to campaign contributions and partisanselection as two relevant channels through which local unions enhance the representationof the less well-o Relatedly the equality-enhancing eect of unions is stronger for bills onwhich the largest union confederation AFL-CIO has staked out a clear position

We are aware of only two previous investigations of the eect of organized labor onunequal responsiveness and they dier considerably in their approach from ours Focusingon a recent cross-section of 47 US states Flavin (2018) shows that states with stronger unionsexhibit less unequal representation as estimated from regressions of income-weighted voterpreferences on state-level policy liberalism Studying the 110th House of RepresentativesEllis (2013) nds mixed results District-level unionization is related to a smaller rich-poorgap for key legislative votes but there is no such eect for overall ideological representa-tion Our analysis addresses the problem that survey samples are not representative forcongressional districts by design which can lead to biased estimates of income biases inrepresentation It conrms the nding that unions are linked to more equal representationcovering four Congresses and three times as many roll call votes as studied by Ellis (2013)However our main empirical contribution is that we can go much further in ruling out al-ternative explanations Our research design leverages within-district variation in preferencepolarization within-state variation in union strength as well as extensive district-level dataon alternative moderating factors that may be bundled with union strength In contrastto the pure cross-sectional designs of these two previous studies our analysis can thusaccount for state and district xed eects that capture important sources of unobserved

3

heterogeneity and it directly measures many important confounders As a result we canstate with more condence that the impact of unions is not spurious

Against the backdrop of current scientic and public debates about labor unions andpolitical representation these ndings may come as somewhat of a surprise While somestrands of research and political discourse portray unions as an egalitarian force in politicsothers see them fatally weakened eects as much as causes of unequal representation orsimply as just another organized group ghting for special interests (that do not generallyoverlap with those of lower income individuals) e laer view is held by a large strand ofscholarship in economics (cf Freeman and Medo 1984) and by researchers studying therole of teachersrsquo unions in political science (Anzia 2011 Moe 2011) Most extant research onCongress simply does not have the required data to directly assess the eect of unions onrepresentational equality Numerous studies of union strength and congressional roll-callvoting do not measure voter preferences which makes it dicult to interpret who is beingrepresented (Becher et al 2018 Box-Steensmeier et al 1997 Freeman and Medo 1984)

Altogether our results suggest that unequal responsiveness is not an unavoidablefeature of democratic capitalism e results are especially striking given that recent cross-national studies have found consistent paerns of unequal representation across dierentpolitical institutions (Bartels 2017 Lupu and Warner 2017) In contrast we nd considerableheterogeneity in dierential responsiveness across districts aected by local labor unionsmdasha fundamental economic institution e moderating eect of unions uncovered in ouranalysis is large enough to swing key votes in Congress at said our results support theview that political eorts to (further) weaken unions as evidenced in recent reforms in stateslike Michigan and Wisconsin are if anything likely to exacerbate unequal responsivenessin representation ey may also explain why unions are (still) under aack

II Moderating biased responsiveness in Congress

While few studies have directly assessed the impact of labor unions on unequal respon-siveness in Congress or elsewhere various strands of scholarship in political science andrelated elds suggest that labor unions are one of the few mass-membership organizationthat provide collective voice to lower income individuals in the political arena with poten-tially important consequences for political representation (Ahlquist and Levy 2013 Bartels2016 Freeman and Medo 1984 Schlozman et al 2012) Consistent with a central premiseof the collective voice perspective unions tend to take positions favored by less auentcitizens Gilens (2012 154-161) compares public positions of national unions with masspolicy preferences across several hundred policy issues and nds that unionsrsquo positionsare most strongly correlated with the preferences of the less well-o (see also Hacker and

4

Pierson 2010 Schlozman 2015)4 Similarly Schlozman et al (2012 87) conclude that unionsare one of the few organizations in national politics ldquothat advocate on behalf of the economicinterest of workers who are not professionals or managersrdquo

However shared preferences between the less well-o and organized labor are by nomeans sucient to alter inequalities in political representation in national politics isrequires an eective political transmission mechanism To guide the empirical analysis wesketch key elements of a framework of union organization and political responsiveness

Labor unions are organizations formed to bargain collectively on behalf of their mem-bers with employers over wages and conditions Unions are thus created at the local (ieestablishment) level (Freeman and Medo 1984) Once formed unions may (and oen do)enter the political arena e ability of unions to increase the rate of political participationmdashincluding voting contacting ocials aending rallies or making donationsmdashof low- andmiddle-income citizens is oen considered to be their key channel of political inuenceImportantly unions may also increase participation among non-members with similar policypreferences through get-out-the-vote campaigns and social networks (Leighley and Nagler2007 Rosenfeld 2014 Schlozman et al 2012) Making contributions to favored candidates andcampaigns complements the ability of unions to communicate with and mobilize membersor to provide campaign volunteers Indeed unions are among the leading contributors topolitical action commiees (PAC) accounting for a quarter of total PAC spending in 2009(Schlozman et al 2012 ch 14) In contrast to corporations and business organizations unioncontributions ldquorepresent the aggregation of a large number of small individual donationsrdquo(Schlozman et al 2012 428)5

e credible threat of political mobilization can aect policy decisions by representativesin two general ways First it may shape who is elected in a given electoral district Ifpoliticians are not exchangeable (because they dier in their preferences and beliefs) politicalselection is important In an age of elite polarization (McCarty et al 2006) the partisanidentity of a representative is oen crucial for determining legislative voting (Bartels 2016Lee et al 2004) Since the New Deal era unions and union members have largely allied withthe Democratic Party given its stronger support for many of their broader policy demands(Lichtenstein 2013 Schlozman 2015) Political selection might also shape other politicalcharacteristics of representatives such as their class background or race (Butler 2014 Carnes2013)

Second unionsrsquo mobilization potential shapes the incentives of elected representativesbeyond their partisan aliation and personal traits Policymakersrsquo rational anticipation of

4is is consistent with the argument that organized labor fosters norms of solidarity and support for theless well-o through leadership (Ahlquist and Levy 2013 Kim and Margalit 2017) or social interactions(Berelson et al 1954)

5While evidence on the direct eect of contributions on legislative behavior is mixed recent eld-experimentalresults indicate that contributions help to provide access (Kalla and Broockman 2016) or sway congressionalstaers (Hertel-Fernandez et al 2018)

5

public reactions plays a central role in theories of accountability and dynamic responsiveness(Arnold 1990 Stimson et al 1995) While many individual legislative votes do not aect thereelection prospects of representatives on potentially salient votes they can face hard choicesbetween party ideology and competing constituency preferences On international tradeagreements for instance Democratic representatives have faced cross-pressures betweena more skeptical stance taken by unions and low-income constituents versus that of theirown party (Box-Steensmeier et al 1997) On the other side of the aisle in the wake of thenancial crisis Republican legislators found themselves torn between their own partisanviews on stimulus spending and the pressure from less well-o constituents (Mian et al2010)

Politiciansrsquo incentives are also linked to information eories of representation empha-size that members of Congress and especially the House face numerous voting decisions ineach term and it would be unrealistic to assume that they have access to reliable unbiasedpolling data on constituency preferences on all the issues they face (Arnold 1990 Miller andStokes 1963) Instead representativesmdashwith the help of their staersmdashrely on alternativemethods to assess public opinion including constituent correspondence town halls contactswith community leaders or local interest groups (Miler 2007) In this limited informationcontext the strength of local unions may enhance the visibility and perception of constituentpreferences (Hertel-Fernandez et al 2018)6

Following seminal theories of congressional action (Arnold 1990 Miller and Stokes1963) our argument emphasizes that the strength of local unions underpins a crediblemobilization threat that impacts the action of candidates and legislators Anticipatingmobilizing eorts by unions a potential candidate may not even enter into the race anelected career-oriented politician might be pressured to alter his or her vote even withouta full mobilization eort as long as unionsrsquo mobilization capacity is visible us bothcampaign contributions and candidate selection should maer as a channel linking localunion strength and representation since they are linked to credible threats of mobilization

Our argument implies that the district-level strength of labor unions increases theresponsiveness by members of Congress to the less auent While we know from previouswork that politicians are considerably more responsive to the preferences of the auentthan those of the less well-o this bias should be reduced in districts with relatively higherunion membership Substantively it is crucial to assess how far the presence of unions canmove responsiveness toward the ideal of political equality7

6Butler and Nickerson (2011) nd that politicians respond when provided with more accurate opinion dataHowever behavioral biases may lead politicians to discount constituent preferences they disagree with(Butler and Dynes 2016)

7In line with a large literature we focus on union membership as a key component of union strength In astudy of the eect of unions on legislative ideology rather than income-biased responsiveness Becher et al(2018) argue that structure of local unions (ie the concentration of unions in a given locality) maers aswell However they also show empirically that union density and concentration are separable dimensions

6

III Data and Empirical Strategy

Any eort to test the relevance of unions for unequal representation confronts majorchallenges of measurement and causal interpretation e dataset we have compiled allowsus to address these issues to an extent previously impossible We have created a panel oflegislatorsrsquo roll call votes matched to income-specic policy preferences at the district leveland district-level measures of union membership Our main empirical strategy to examinethe inuence of unions on unequal representation is built on two basic pillars district xedeects and interactive controls e fact that we observe several roll calls within a givencongressional district allows us to specify a model with district xed eects which captureunobservable characteristics of districts (and states) that are constant over roll-calls such ashistorical legacies or the strength of partisan organization To provide for a stricter test ofthe moderating eect of unions we also allow a rich set of other district characteristics tomoderate the link between income groups and legislatorsrsquo voting behavior is amountsto estimating models including interactions between observed district characteristics andgroup preferences In our most exible specication we allow these to be non-linear (wedescribe our models in more detail below)

e data required to implement these models were constructed in three steps Firstwe match information on roll call items for 223000 CCES respondents to actual roll callvotes cast in the House of Representatives in the 109th to the 112th Congress8 Secondwe estimate policy preferences for low and high income constituents in each district for27 roll calls To deal with the fact that the CCES is not a representative sample of districtpopulations we use a small area estimation strategy combining the CCES sample with unitrecord Census data matching the full distribution of age education gender race and incomeusing a chained Random Forests algorithm (more below and in Appendix B) ird wemeasure district-level union membership based on digitized administrative records from theDepartment of Labor

IIIA CCES data and Congressional roll calls

e CCES is an ideal starting point for our analysis since it is a nationally representativestudy includes a considerable number of roll call questions and provides us with a largeenough sample size to decompose income-group preferences by district It addresses severaldata concerns that plagued initial research on unequal responsiveness in Congress (Bhaiand Erikson 2011) e roll calls included in the CCES concern key votes as identiedby Congressional arterly and the Washington Post and cover a broad range of issues

In this paper we focus on union membership but show in a robustness test that our results still obtainwhen accounting for union concentration (see Table E1)

8Our analysis focuses on one apportionment period which generally holds district boundaries constant (weshow that the results are robust to cases of mid-period redistricting)

7

(Ansolabehere and Jones 2010) Respondents are presented with the key wording of the bill(as used on the oor and in media reports) and are then asked to cast their own vote ldquoWhatabout you If you were faced with this decision would you vote for against or not surerdquoContrary to widely usual agreendashdisagree survey measures of issue preferences matched rollcall votes provide us with unequivocal evidence of policy congruence between respondentand legislator (Jessee 2009 Ansolabehere and Jones 2010 585) We match 27 roll call itemsin the CCES to roll call votes cast in the House of the 109th to 112th Congress ese coverimportant legislative decisions such as Dodd-Frank the Aordable Care Act (and aemptsto repeal it) the minimum wage increase the ratication of the Central America Free TradeAgreement or the Lilly Ledbeer Fair Pay Act Table A1 in the Appendix lists all matchedCCES items and House bills included in our estimation sample

IIIB Measuring constituency preferences by income group

e CCES provides us with a comparatively large sample size per district However animportant potential issue is that it is not designed to be representative for congressionaldistrict populations us individuals with certain characteristics such as particular com-binations of income race and education may be underrepresented in the CCES samplefor a given district If this is the case unadjusted policy preferences from the CCES willnot reect the target population and using them can lead to biased estimates of unequalrepresentation in Congress as politicians are held to the wrong benchmark e solution tothis issue is to employ some form of small area estimation to rebalance the survey sample torepresent the district population e machine-learning solution we propose is relativelynew to the representation literature in political science but it has some aractive featuresthat merit its application to this topic It does not require distributional and functionalform assumptions it allows for arbitrary higher-order interactions of covariates and it canfully leverage ne-grained census data to construct representative samples of congressionaldistricts However we stress that our ndings do not depend on this particular approach Asshown in Online Appendix B our approach leads to somewhat more conservative estimatesof the impact of unions on the representation of dierent income groups compared to theMRP approach widely used by political scientists (Lax and Phillips 2009) alitatively bothapproaches yield the same conclusions

Our approach small area estimation using chained random forests matches CCESsurvey respondents to corresponding cases from unit record Census data e design of theCensus ensures an accurate representation of the distribution of population characteristicsin a given district (Torrieri et al 2014 Ch4) Matching these two data sources is essentiallya prediction problem which we address using a exible non-parametric machine learningapproach based on random forests (Stekhoven and Buhlmann 2011)9 Put simply the idea is

9Honaker and Plutzer (2016) use a similar approach (but relying on multivariate normal imputations) andfurther discuss its empirical performance in estimating small area aitudes and preferences

8

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 4: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

union strength We nd that district-level union membership dampens unequal responsive-ness by national legislators In line with previous research on average House members aresignicantly less responsive to the policy preferences of low-income constituents Howeverthis gap in responsiveness is smaller where unions are stronger and it decreases signicantlywhere union members are numerous is moderating eect of unions is not an artifactof existing state-level union policies or largely time-invariant state-level or district-levelunobservables (such as institutions history or culture) Extended specications allow otherdistrict-level characteristics to also moderate legislative responsiveness to dierent incomegroups ey demonstrate that the union eect is not driven by district-specic levels ofsocio-economic factors such as education race gender median household income urbaniza-tion or a districtrsquos employment structure We also rule out the possibility that our ndingsimply represents the general capacity of workers or people to organize (or be organized) byaccounting for explicit measures of district-level organization capacity based on new dataon unionization aempts from the National Labor Relations Board the predominance ofreligious organizations and behavioral measures of social capital We also go beyond stan-dard regression models and employ estimates from a Double Selection Estimator (Belloniet al 2014) and Kernel Regularized Least Squares (Hainmueller and Hazle 2014)to showthat the moderating eect of unions is robust to relaxing potentially important modelingassumptions

An exploration of possible mechanisms points to campaign contributions and partisanselection as two relevant channels through which local unions enhance the representationof the less well-o Relatedly the equality-enhancing eect of unions is stronger for bills onwhich the largest union confederation AFL-CIO has staked out a clear position

We are aware of only two previous investigations of the eect of organized labor onunequal responsiveness and they dier considerably in their approach from ours Focusingon a recent cross-section of 47 US states Flavin (2018) shows that states with stronger unionsexhibit less unequal representation as estimated from regressions of income-weighted voterpreferences on state-level policy liberalism Studying the 110th House of RepresentativesEllis (2013) nds mixed results District-level unionization is related to a smaller rich-poorgap for key legislative votes but there is no such eect for overall ideological representa-tion Our analysis addresses the problem that survey samples are not representative forcongressional districts by design which can lead to biased estimates of income biases inrepresentation It conrms the nding that unions are linked to more equal representationcovering four Congresses and three times as many roll call votes as studied by Ellis (2013)However our main empirical contribution is that we can go much further in ruling out al-ternative explanations Our research design leverages within-district variation in preferencepolarization within-state variation in union strength as well as extensive district-level dataon alternative moderating factors that may be bundled with union strength In contrastto the pure cross-sectional designs of these two previous studies our analysis can thusaccount for state and district xed eects that capture important sources of unobserved

3

heterogeneity and it directly measures many important confounders As a result we canstate with more condence that the impact of unions is not spurious

Against the backdrop of current scientic and public debates about labor unions andpolitical representation these ndings may come as somewhat of a surprise While somestrands of research and political discourse portray unions as an egalitarian force in politicsothers see them fatally weakened eects as much as causes of unequal representation orsimply as just another organized group ghting for special interests (that do not generallyoverlap with those of lower income individuals) e laer view is held by a large strand ofscholarship in economics (cf Freeman and Medo 1984) and by researchers studying therole of teachersrsquo unions in political science (Anzia 2011 Moe 2011) Most extant research onCongress simply does not have the required data to directly assess the eect of unions onrepresentational equality Numerous studies of union strength and congressional roll-callvoting do not measure voter preferences which makes it dicult to interpret who is beingrepresented (Becher et al 2018 Box-Steensmeier et al 1997 Freeman and Medo 1984)

Altogether our results suggest that unequal responsiveness is not an unavoidablefeature of democratic capitalism e results are especially striking given that recent cross-national studies have found consistent paerns of unequal representation across dierentpolitical institutions (Bartels 2017 Lupu and Warner 2017) In contrast we nd considerableheterogeneity in dierential responsiveness across districts aected by local labor unionsmdasha fundamental economic institution e moderating eect of unions uncovered in ouranalysis is large enough to swing key votes in Congress at said our results support theview that political eorts to (further) weaken unions as evidenced in recent reforms in stateslike Michigan and Wisconsin are if anything likely to exacerbate unequal responsivenessin representation ey may also explain why unions are (still) under aack

II Moderating biased responsiveness in Congress

While few studies have directly assessed the impact of labor unions on unequal respon-siveness in Congress or elsewhere various strands of scholarship in political science andrelated elds suggest that labor unions are one of the few mass-membership organizationthat provide collective voice to lower income individuals in the political arena with poten-tially important consequences for political representation (Ahlquist and Levy 2013 Bartels2016 Freeman and Medo 1984 Schlozman et al 2012) Consistent with a central premiseof the collective voice perspective unions tend to take positions favored by less auentcitizens Gilens (2012 154-161) compares public positions of national unions with masspolicy preferences across several hundred policy issues and nds that unionsrsquo positionsare most strongly correlated with the preferences of the less well-o (see also Hacker and

4

Pierson 2010 Schlozman 2015)4 Similarly Schlozman et al (2012 87) conclude that unionsare one of the few organizations in national politics ldquothat advocate on behalf of the economicinterest of workers who are not professionals or managersrdquo

However shared preferences between the less well-o and organized labor are by nomeans sucient to alter inequalities in political representation in national politics isrequires an eective political transmission mechanism To guide the empirical analysis wesketch key elements of a framework of union organization and political responsiveness

Labor unions are organizations formed to bargain collectively on behalf of their mem-bers with employers over wages and conditions Unions are thus created at the local (ieestablishment) level (Freeman and Medo 1984) Once formed unions may (and oen do)enter the political arena e ability of unions to increase the rate of political participationmdashincluding voting contacting ocials aending rallies or making donationsmdashof low- andmiddle-income citizens is oen considered to be their key channel of political inuenceImportantly unions may also increase participation among non-members with similar policypreferences through get-out-the-vote campaigns and social networks (Leighley and Nagler2007 Rosenfeld 2014 Schlozman et al 2012) Making contributions to favored candidates andcampaigns complements the ability of unions to communicate with and mobilize membersor to provide campaign volunteers Indeed unions are among the leading contributors topolitical action commiees (PAC) accounting for a quarter of total PAC spending in 2009(Schlozman et al 2012 ch 14) In contrast to corporations and business organizations unioncontributions ldquorepresent the aggregation of a large number of small individual donationsrdquo(Schlozman et al 2012 428)5

e credible threat of political mobilization can aect policy decisions by representativesin two general ways First it may shape who is elected in a given electoral district Ifpoliticians are not exchangeable (because they dier in their preferences and beliefs) politicalselection is important In an age of elite polarization (McCarty et al 2006) the partisanidentity of a representative is oen crucial for determining legislative voting (Bartels 2016Lee et al 2004) Since the New Deal era unions and union members have largely allied withthe Democratic Party given its stronger support for many of their broader policy demands(Lichtenstein 2013 Schlozman 2015) Political selection might also shape other politicalcharacteristics of representatives such as their class background or race (Butler 2014 Carnes2013)

Second unionsrsquo mobilization potential shapes the incentives of elected representativesbeyond their partisan aliation and personal traits Policymakersrsquo rational anticipation of

4is is consistent with the argument that organized labor fosters norms of solidarity and support for theless well-o through leadership (Ahlquist and Levy 2013 Kim and Margalit 2017) or social interactions(Berelson et al 1954)

5While evidence on the direct eect of contributions on legislative behavior is mixed recent eld-experimentalresults indicate that contributions help to provide access (Kalla and Broockman 2016) or sway congressionalstaers (Hertel-Fernandez et al 2018)

5

public reactions plays a central role in theories of accountability and dynamic responsiveness(Arnold 1990 Stimson et al 1995) While many individual legislative votes do not aect thereelection prospects of representatives on potentially salient votes they can face hard choicesbetween party ideology and competing constituency preferences On international tradeagreements for instance Democratic representatives have faced cross-pressures betweena more skeptical stance taken by unions and low-income constituents versus that of theirown party (Box-Steensmeier et al 1997) On the other side of the aisle in the wake of thenancial crisis Republican legislators found themselves torn between their own partisanviews on stimulus spending and the pressure from less well-o constituents (Mian et al2010)

Politiciansrsquo incentives are also linked to information eories of representation empha-size that members of Congress and especially the House face numerous voting decisions ineach term and it would be unrealistic to assume that they have access to reliable unbiasedpolling data on constituency preferences on all the issues they face (Arnold 1990 Miller andStokes 1963) Instead representativesmdashwith the help of their staersmdashrely on alternativemethods to assess public opinion including constituent correspondence town halls contactswith community leaders or local interest groups (Miler 2007) In this limited informationcontext the strength of local unions may enhance the visibility and perception of constituentpreferences (Hertel-Fernandez et al 2018)6

Following seminal theories of congressional action (Arnold 1990 Miller and Stokes1963) our argument emphasizes that the strength of local unions underpins a crediblemobilization threat that impacts the action of candidates and legislators Anticipatingmobilizing eorts by unions a potential candidate may not even enter into the race anelected career-oriented politician might be pressured to alter his or her vote even withouta full mobilization eort as long as unionsrsquo mobilization capacity is visible us bothcampaign contributions and candidate selection should maer as a channel linking localunion strength and representation since they are linked to credible threats of mobilization

Our argument implies that the district-level strength of labor unions increases theresponsiveness by members of Congress to the less auent While we know from previouswork that politicians are considerably more responsive to the preferences of the auentthan those of the less well-o this bias should be reduced in districts with relatively higherunion membership Substantively it is crucial to assess how far the presence of unions canmove responsiveness toward the ideal of political equality7

6Butler and Nickerson (2011) nd that politicians respond when provided with more accurate opinion dataHowever behavioral biases may lead politicians to discount constituent preferences they disagree with(Butler and Dynes 2016)

7In line with a large literature we focus on union membership as a key component of union strength In astudy of the eect of unions on legislative ideology rather than income-biased responsiveness Becher et al(2018) argue that structure of local unions (ie the concentration of unions in a given locality) maers aswell However they also show empirically that union density and concentration are separable dimensions

6

III Data and Empirical Strategy

Any eort to test the relevance of unions for unequal representation confronts majorchallenges of measurement and causal interpretation e dataset we have compiled allowsus to address these issues to an extent previously impossible We have created a panel oflegislatorsrsquo roll call votes matched to income-specic policy preferences at the district leveland district-level measures of union membership Our main empirical strategy to examinethe inuence of unions on unequal representation is built on two basic pillars district xedeects and interactive controls e fact that we observe several roll calls within a givencongressional district allows us to specify a model with district xed eects which captureunobservable characteristics of districts (and states) that are constant over roll-calls such ashistorical legacies or the strength of partisan organization To provide for a stricter test ofthe moderating eect of unions we also allow a rich set of other district characteristics tomoderate the link between income groups and legislatorsrsquo voting behavior is amountsto estimating models including interactions between observed district characteristics andgroup preferences In our most exible specication we allow these to be non-linear (wedescribe our models in more detail below)

e data required to implement these models were constructed in three steps Firstwe match information on roll call items for 223000 CCES respondents to actual roll callvotes cast in the House of Representatives in the 109th to the 112th Congress8 Secondwe estimate policy preferences for low and high income constituents in each district for27 roll calls To deal with the fact that the CCES is not a representative sample of districtpopulations we use a small area estimation strategy combining the CCES sample with unitrecord Census data matching the full distribution of age education gender race and incomeusing a chained Random Forests algorithm (more below and in Appendix B) ird wemeasure district-level union membership based on digitized administrative records from theDepartment of Labor

IIIA CCES data and Congressional roll calls

e CCES is an ideal starting point for our analysis since it is a nationally representativestudy includes a considerable number of roll call questions and provides us with a largeenough sample size to decompose income-group preferences by district It addresses severaldata concerns that plagued initial research on unequal responsiveness in Congress (Bhaiand Erikson 2011) e roll calls included in the CCES concern key votes as identiedby Congressional arterly and the Washington Post and cover a broad range of issues

In this paper we focus on union membership but show in a robustness test that our results still obtainwhen accounting for union concentration (see Table E1)

8Our analysis focuses on one apportionment period which generally holds district boundaries constant (weshow that the results are robust to cases of mid-period redistricting)

7

(Ansolabehere and Jones 2010) Respondents are presented with the key wording of the bill(as used on the oor and in media reports) and are then asked to cast their own vote ldquoWhatabout you If you were faced with this decision would you vote for against or not surerdquoContrary to widely usual agreendashdisagree survey measures of issue preferences matched rollcall votes provide us with unequivocal evidence of policy congruence between respondentand legislator (Jessee 2009 Ansolabehere and Jones 2010 585) We match 27 roll call itemsin the CCES to roll call votes cast in the House of the 109th to 112th Congress ese coverimportant legislative decisions such as Dodd-Frank the Aordable Care Act (and aemptsto repeal it) the minimum wage increase the ratication of the Central America Free TradeAgreement or the Lilly Ledbeer Fair Pay Act Table A1 in the Appendix lists all matchedCCES items and House bills included in our estimation sample

IIIB Measuring constituency preferences by income group

e CCES provides us with a comparatively large sample size per district However animportant potential issue is that it is not designed to be representative for congressionaldistrict populations us individuals with certain characteristics such as particular com-binations of income race and education may be underrepresented in the CCES samplefor a given district If this is the case unadjusted policy preferences from the CCES willnot reect the target population and using them can lead to biased estimates of unequalrepresentation in Congress as politicians are held to the wrong benchmark e solution tothis issue is to employ some form of small area estimation to rebalance the survey sample torepresent the district population e machine-learning solution we propose is relativelynew to the representation literature in political science but it has some aractive featuresthat merit its application to this topic It does not require distributional and functionalform assumptions it allows for arbitrary higher-order interactions of covariates and it canfully leverage ne-grained census data to construct representative samples of congressionaldistricts However we stress that our ndings do not depend on this particular approach Asshown in Online Appendix B our approach leads to somewhat more conservative estimatesof the impact of unions on the representation of dierent income groups compared to theMRP approach widely used by political scientists (Lax and Phillips 2009) alitatively bothapproaches yield the same conclusions

Our approach small area estimation using chained random forests matches CCESsurvey respondents to corresponding cases from unit record Census data e design of theCensus ensures an accurate representation of the distribution of population characteristicsin a given district (Torrieri et al 2014 Ch4) Matching these two data sources is essentiallya prediction problem which we address using a exible non-parametric machine learningapproach based on random forests (Stekhoven and Buhlmann 2011)9 Put simply the idea is

9Honaker and Plutzer (2016) use a similar approach (but relying on multivariate normal imputations) andfurther discuss its empirical performance in estimating small area aitudes and preferences

8

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 5: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

heterogeneity and it directly measures many important confounders As a result we canstate with more condence that the impact of unions is not spurious

Against the backdrop of current scientic and public debates about labor unions andpolitical representation these ndings may come as somewhat of a surprise While somestrands of research and political discourse portray unions as an egalitarian force in politicsothers see them fatally weakened eects as much as causes of unequal representation orsimply as just another organized group ghting for special interests (that do not generallyoverlap with those of lower income individuals) e laer view is held by a large strand ofscholarship in economics (cf Freeman and Medo 1984) and by researchers studying therole of teachersrsquo unions in political science (Anzia 2011 Moe 2011) Most extant research onCongress simply does not have the required data to directly assess the eect of unions onrepresentational equality Numerous studies of union strength and congressional roll-callvoting do not measure voter preferences which makes it dicult to interpret who is beingrepresented (Becher et al 2018 Box-Steensmeier et al 1997 Freeman and Medo 1984)

Altogether our results suggest that unequal responsiveness is not an unavoidablefeature of democratic capitalism e results are especially striking given that recent cross-national studies have found consistent paerns of unequal representation across dierentpolitical institutions (Bartels 2017 Lupu and Warner 2017) In contrast we nd considerableheterogeneity in dierential responsiveness across districts aected by local labor unionsmdasha fundamental economic institution e moderating eect of unions uncovered in ouranalysis is large enough to swing key votes in Congress at said our results support theview that political eorts to (further) weaken unions as evidenced in recent reforms in stateslike Michigan and Wisconsin are if anything likely to exacerbate unequal responsivenessin representation ey may also explain why unions are (still) under aack

II Moderating biased responsiveness in Congress

While few studies have directly assessed the impact of labor unions on unequal respon-siveness in Congress or elsewhere various strands of scholarship in political science andrelated elds suggest that labor unions are one of the few mass-membership organizationthat provide collective voice to lower income individuals in the political arena with poten-tially important consequences for political representation (Ahlquist and Levy 2013 Bartels2016 Freeman and Medo 1984 Schlozman et al 2012) Consistent with a central premiseof the collective voice perspective unions tend to take positions favored by less auentcitizens Gilens (2012 154-161) compares public positions of national unions with masspolicy preferences across several hundred policy issues and nds that unionsrsquo positionsare most strongly correlated with the preferences of the less well-o (see also Hacker and

4

Pierson 2010 Schlozman 2015)4 Similarly Schlozman et al (2012 87) conclude that unionsare one of the few organizations in national politics ldquothat advocate on behalf of the economicinterest of workers who are not professionals or managersrdquo

However shared preferences between the less well-o and organized labor are by nomeans sucient to alter inequalities in political representation in national politics isrequires an eective political transmission mechanism To guide the empirical analysis wesketch key elements of a framework of union organization and political responsiveness

Labor unions are organizations formed to bargain collectively on behalf of their mem-bers with employers over wages and conditions Unions are thus created at the local (ieestablishment) level (Freeman and Medo 1984) Once formed unions may (and oen do)enter the political arena e ability of unions to increase the rate of political participationmdashincluding voting contacting ocials aending rallies or making donationsmdashof low- andmiddle-income citizens is oen considered to be their key channel of political inuenceImportantly unions may also increase participation among non-members with similar policypreferences through get-out-the-vote campaigns and social networks (Leighley and Nagler2007 Rosenfeld 2014 Schlozman et al 2012) Making contributions to favored candidates andcampaigns complements the ability of unions to communicate with and mobilize membersor to provide campaign volunteers Indeed unions are among the leading contributors topolitical action commiees (PAC) accounting for a quarter of total PAC spending in 2009(Schlozman et al 2012 ch 14) In contrast to corporations and business organizations unioncontributions ldquorepresent the aggregation of a large number of small individual donationsrdquo(Schlozman et al 2012 428)5

e credible threat of political mobilization can aect policy decisions by representativesin two general ways First it may shape who is elected in a given electoral district Ifpoliticians are not exchangeable (because they dier in their preferences and beliefs) politicalselection is important In an age of elite polarization (McCarty et al 2006) the partisanidentity of a representative is oen crucial for determining legislative voting (Bartels 2016Lee et al 2004) Since the New Deal era unions and union members have largely allied withthe Democratic Party given its stronger support for many of their broader policy demands(Lichtenstein 2013 Schlozman 2015) Political selection might also shape other politicalcharacteristics of representatives such as their class background or race (Butler 2014 Carnes2013)

Second unionsrsquo mobilization potential shapes the incentives of elected representativesbeyond their partisan aliation and personal traits Policymakersrsquo rational anticipation of

4is is consistent with the argument that organized labor fosters norms of solidarity and support for theless well-o through leadership (Ahlquist and Levy 2013 Kim and Margalit 2017) or social interactions(Berelson et al 1954)

5While evidence on the direct eect of contributions on legislative behavior is mixed recent eld-experimentalresults indicate that contributions help to provide access (Kalla and Broockman 2016) or sway congressionalstaers (Hertel-Fernandez et al 2018)

5

public reactions plays a central role in theories of accountability and dynamic responsiveness(Arnold 1990 Stimson et al 1995) While many individual legislative votes do not aect thereelection prospects of representatives on potentially salient votes they can face hard choicesbetween party ideology and competing constituency preferences On international tradeagreements for instance Democratic representatives have faced cross-pressures betweena more skeptical stance taken by unions and low-income constituents versus that of theirown party (Box-Steensmeier et al 1997) On the other side of the aisle in the wake of thenancial crisis Republican legislators found themselves torn between their own partisanviews on stimulus spending and the pressure from less well-o constituents (Mian et al2010)

Politiciansrsquo incentives are also linked to information eories of representation empha-size that members of Congress and especially the House face numerous voting decisions ineach term and it would be unrealistic to assume that they have access to reliable unbiasedpolling data on constituency preferences on all the issues they face (Arnold 1990 Miller andStokes 1963) Instead representativesmdashwith the help of their staersmdashrely on alternativemethods to assess public opinion including constituent correspondence town halls contactswith community leaders or local interest groups (Miler 2007) In this limited informationcontext the strength of local unions may enhance the visibility and perception of constituentpreferences (Hertel-Fernandez et al 2018)6

Following seminal theories of congressional action (Arnold 1990 Miller and Stokes1963) our argument emphasizes that the strength of local unions underpins a crediblemobilization threat that impacts the action of candidates and legislators Anticipatingmobilizing eorts by unions a potential candidate may not even enter into the race anelected career-oriented politician might be pressured to alter his or her vote even withouta full mobilization eort as long as unionsrsquo mobilization capacity is visible us bothcampaign contributions and candidate selection should maer as a channel linking localunion strength and representation since they are linked to credible threats of mobilization

Our argument implies that the district-level strength of labor unions increases theresponsiveness by members of Congress to the less auent While we know from previouswork that politicians are considerably more responsive to the preferences of the auentthan those of the less well-o this bias should be reduced in districts with relatively higherunion membership Substantively it is crucial to assess how far the presence of unions canmove responsiveness toward the ideal of political equality7

6Butler and Nickerson (2011) nd that politicians respond when provided with more accurate opinion dataHowever behavioral biases may lead politicians to discount constituent preferences they disagree with(Butler and Dynes 2016)

7In line with a large literature we focus on union membership as a key component of union strength In astudy of the eect of unions on legislative ideology rather than income-biased responsiveness Becher et al(2018) argue that structure of local unions (ie the concentration of unions in a given locality) maers aswell However they also show empirically that union density and concentration are separable dimensions

6

III Data and Empirical Strategy

Any eort to test the relevance of unions for unequal representation confronts majorchallenges of measurement and causal interpretation e dataset we have compiled allowsus to address these issues to an extent previously impossible We have created a panel oflegislatorsrsquo roll call votes matched to income-specic policy preferences at the district leveland district-level measures of union membership Our main empirical strategy to examinethe inuence of unions on unequal representation is built on two basic pillars district xedeects and interactive controls e fact that we observe several roll calls within a givencongressional district allows us to specify a model with district xed eects which captureunobservable characteristics of districts (and states) that are constant over roll-calls such ashistorical legacies or the strength of partisan organization To provide for a stricter test ofthe moderating eect of unions we also allow a rich set of other district characteristics tomoderate the link between income groups and legislatorsrsquo voting behavior is amountsto estimating models including interactions between observed district characteristics andgroup preferences In our most exible specication we allow these to be non-linear (wedescribe our models in more detail below)

e data required to implement these models were constructed in three steps Firstwe match information on roll call items for 223000 CCES respondents to actual roll callvotes cast in the House of Representatives in the 109th to the 112th Congress8 Secondwe estimate policy preferences for low and high income constituents in each district for27 roll calls To deal with the fact that the CCES is not a representative sample of districtpopulations we use a small area estimation strategy combining the CCES sample with unitrecord Census data matching the full distribution of age education gender race and incomeusing a chained Random Forests algorithm (more below and in Appendix B) ird wemeasure district-level union membership based on digitized administrative records from theDepartment of Labor

IIIA CCES data and Congressional roll calls

e CCES is an ideal starting point for our analysis since it is a nationally representativestudy includes a considerable number of roll call questions and provides us with a largeenough sample size to decompose income-group preferences by district It addresses severaldata concerns that plagued initial research on unequal responsiveness in Congress (Bhaiand Erikson 2011) e roll calls included in the CCES concern key votes as identiedby Congressional arterly and the Washington Post and cover a broad range of issues

In this paper we focus on union membership but show in a robustness test that our results still obtainwhen accounting for union concentration (see Table E1)

8Our analysis focuses on one apportionment period which generally holds district boundaries constant (weshow that the results are robust to cases of mid-period redistricting)

7

(Ansolabehere and Jones 2010) Respondents are presented with the key wording of the bill(as used on the oor and in media reports) and are then asked to cast their own vote ldquoWhatabout you If you were faced with this decision would you vote for against or not surerdquoContrary to widely usual agreendashdisagree survey measures of issue preferences matched rollcall votes provide us with unequivocal evidence of policy congruence between respondentand legislator (Jessee 2009 Ansolabehere and Jones 2010 585) We match 27 roll call itemsin the CCES to roll call votes cast in the House of the 109th to 112th Congress ese coverimportant legislative decisions such as Dodd-Frank the Aordable Care Act (and aemptsto repeal it) the minimum wage increase the ratication of the Central America Free TradeAgreement or the Lilly Ledbeer Fair Pay Act Table A1 in the Appendix lists all matchedCCES items and House bills included in our estimation sample

IIIB Measuring constituency preferences by income group

e CCES provides us with a comparatively large sample size per district However animportant potential issue is that it is not designed to be representative for congressionaldistrict populations us individuals with certain characteristics such as particular com-binations of income race and education may be underrepresented in the CCES samplefor a given district If this is the case unadjusted policy preferences from the CCES willnot reect the target population and using them can lead to biased estimates of unequalrepresentation in Congress as politicians are held to the wrong benchmark e solution tothis issue is to employ some form of small area estimation to rebalance the survey sample torepresent the district population e machine-learning solution we propose is relativelynew to the representation literature in political science but it has some aractive featuresthat merit its application to this topic It does not require distributional and functionalform assumptions it allows for arbitrary higher-order interactions of covariates and it canfully leverage ne-grained census data to construct representative samples of congressionaldistricts However we stress that our ndings do not depend on this particular approach Asshown in Online Appendix B our approach leads to somewhat more conservative estimatesof the impact of unions on the representation of dierent income groups compared to theMRP approach widely used by political scientists (Lax and Phillips 2009) alitatively bothapproaches yield the same conclusions

Our approach small area estimation using chained random forests matches CCESsurvey respondents to corresponding cases from unit record Census data e design of theCensus ensures an accurate representation of the distribution of population characteristicsin a given district (Torrieri et al 2014 Ch4) Matching these two data sources is essentiallya prediction problem which we address using a exible non-parametric machine learningapproach based on random forests (Stekhoven and Buhlmann 2011)9 Put simply the idea is

9Honaker and Plutzer (2016) use a similar approach (but relying on multivariate normal imputations) andfurther discuss its empirical performance in estimating small area aitudes and preferences

8

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 6: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Pierson 2010 Schlozman 2015)4 Similarly Schlozman et al (2012 87) conclude that unionsare one of the few organizations in national politics ldquothat advocate on behalf of the economicinterest of workers who are not professionals or managersrdquo

However shared preferences between the less well-o and organized labor are by nomeans sucient to alter inequalities in political representation in national politics isrequires an eective political transmission mechanism To guide the empirical analysis wesketch key elements of a framework of union organization and political responsiveness

Labor unions are organizations formed to bargain collectively on behalf of their mem-bers with employers over wages and conditions Unions are thus created at the local (ieestablishment) level (Freeman and Medo 1984) Once formed unions may (and oen do)enter the political arena e ability of unions to increase the rate of political participationmdashincluding voting contacting ocials aending rallies or making donationsmdashof low- andmiddle-income citizens is oen considered to be their key channel of political inuenceImportantly unions may also increase participation among non-members with similar policypreferences through get-out-the-vote campaigns and social networks (Leighley and Nagler2007 Rosenfeld 2014 Schlozman et al 2012) Making contributions to favored candidates andcampaigns complements the ability of unions to communicate with and mobilize membersor to provide campaign volunteers Indeed unions are among the leading contributors topolitical action commiees (PAC) accounting for a quarter of total PAC spending in 2009(Schlozman et al 2012 ch 14) In contrast to corporations and business organizations unioncontributions ldquorepresent the aggregation of a large number of small individual donationsrdquo(Schlozman et al 2012 428)5

e credible threat of political mobilization can aect policy decisions by representativesin two general ways First it may shape who is elected in a given electoral district Ifpoliticians are not exchangeable (because they dier in their preferences and beliefs) politicalselection is important In an age of elite polarization (McCarty et al 2006) the partisanidentity of a representative is oen crucial for determining legislative voting (Bartels 2016Lee et al 2004) Since the New Deal era unions and union members have largely allied withthe Democratic Party given its stronger support for many of their broader policy demands(Lichtenstein 2013 Schlozman 2015) Political selection might also shape other politicalcharacteristics of representatives such as their class background or race (Butler 2014 Carnes2013)

Second unionsrsquo mobilization potential shapes the incentives of elected representativesbeyond their partisan aliation and personal traits Policymakersrsquo rational anticipation of

4is is consistent with the argument that organized labor fosters norms of solidarity and support for theless well-o through leadership (Ahlquist and Levy 2013 Kim and Margalit 2017) or social interactions(Berelson et al 1954)

5While evidence on the direct eect of contributions on legislative behavior is mixed recent eld-experimentalresults indicate that contributions help to provide access (Kalla and Broockman 2016) or sway congressionalstaers (Hertel-Fernandez et al 2018)

5

public reactions plays a central role in theories of accountability and dynamic responsiveness(Arnold 1990 Stimson et al 1995) While many individual legislative votes do not aect thereelection prospects of representatives on potentially salient votes they can face hard choicesbetween party ideology and competing constituency preferences On international tradeagreements for instance Democratic representatives have faced cross-pressures betweena more skeptical stance taken by unions and low-income constituents versus that of theirown party (Box-Steensmeier et al 1997) On the other side of the aisle in the wake of thenancial crisis Republican legislators found themselves torn between their own partisanviews on stimulus spending and the pressure from less well-o constituents (Mian et al2010)

Politiciansrsquo incentives are also linked to information eories of representation empha-size that members of Congress and especially the House face numerous voting decisions ineach term and it would be unrealistic to assume that they have access to reliable unbiasedpolling data on constituency preferences on all the issues they face (Arnold 1990 Miller andStokes 1963) Instead representativesmdashwith the help of their staersmdashrely on alternativemethods to assess public opinion including constituent correspondence town halls contactswith community leaders or local interest groups (Miler 2007) In this limited informationcontext the strength of local unions may enhance the visibility and perception of constituentpreferences (Hertel-Fernandez et al 2018)6

Following seminal theories of congressional action (Arnold 1990 Miller and Stokes1963) our argument emphasizes that the strength of local unions underpins a crediblemobilization threat that impacts the action of candidates and legislators Anticipatingmobilizing eorts by unions a potential candidate may not even enter into the race anelected career-oriented politician might be pressured to alter his or her vote even withouta full mobilization eort as long as unionsrsquo mobilization capacity is visible us bothcampaign contributions and candidate selection should maer as a channel linking localunion strength and representation since they are linked to credible threats of mobilization

Our argument implies that the district-level strength of labor unions increases theresponsiveness by members of Congress to the less auent While we know from previouswork that politicians are considerably more responsive to the preferences of the auentthan those of the less well-o this bias should be reduced in districts with relatively higherunion membership Substantively it is crucial to assess how far the presence of unions canmove responsiveness toward the ideal of political equality7

6Butler and Nickerson (2011) nd that politicians respond when provided with more accurate opinion dataHowever behavioral biases may lead politicians to discount constituent preferences they disagree with(Butler and Dynes 2016)

7In line with a large literature we focus on union membership as a key component of union strength In astudy of the eect of unions on legislative ideology rather than income-biased responsiveness Becher et al(2018) argue that structure of local unions (ie the concentration of unions in a given locality) maers aswell However they also show empirically that union density and concentration are separable dimensions

6

III Data and Empirical Strategy

Any eort to test the relevance of unions for unequal representation confronts majorchallenges of measurement and causal interpretation e dataset we have compiled allowsus to address these issues to an extent previously impossible We have created a panel oflegislatorsrsquo roll call votes matched to income-specic policy preferences at the district leveland district-level measures of union membership Our main empirical strategy to examinethe inuence of unions on unequal representation is built on two basic pillars district xedeects and interactive controls e fact that we observe several roll calls within a givencongressional district allows us to specify a model with district xed eects which captureunobservable characteristics of districts (and states) that are constant over roll-calls such ashistorical legacies or the strength of partisan organization To provide for a stricter test ofthe moderating eect of unions we also allow a rich set of other district characteristics tomoderate the link between income groups and legislatorsrsquo voting behavior is amountsto estimating models including interactions between observed district characteristics andgroup preferences In our most exible specication we allow these to be non-linear (wedescribe our models in more detail below)

e data required to implement these models were constructed in three steps Firstwe match information on roll call items for 223000 CCES respondents to actual roll callvotes cast in the House of Representatives in the 109th to the 112th Congress8 Secondwe estimate policy preferences for low and high income constituents in each district for27 roll calls To deal with the fact that the CCES is not a representative sample of districtpopulations we use a small area estimation strategy combining the CCES sample with unitrecord Census data matching the full distribution of age education gender race and incomeusing a chained Random Forests algorithm (more below and in Appendix B) ird wemeasure district-level union membership based on digitized administrative records from theDepartment of Labor

IIIA CCES data and Congressional roll calls

e CCES is an ideal starting point for our analysis since it is a nationally representativestudy includes a considerable number of roll call questions and provides us with a largeenough sample size to decompose income-group preferences by district It addresses severaldata concerns that plagued initial research on unequal responsiveness in Congress (Bhaiand Erikson 2011) e roll calls included in the CCES concern key votes as identiedby Congressional arterly and the Washington Post and cover a broad range of issues

In this paper we focus on union membership but show in a robustness test that our results still obtainwhen accounting for union concentration (see Table E1)

8Our analysis focuses on one apportionment period which generally holds district boundaries constant (weshow that the results are robust to cases of mid-period redistricting)

7

(Ansolabehere and Jones 2010) Respondents are presented with the key wording of the bill(as used on the oor and in media reports) and are then asked to cast their own vote ldquoWhatabout you If you were faced with this decision would you vote for against or not surerdquoContrary to widely usual agreendashdisagree survey measures of issue preferences matched rollcall votes provide us with unequivocal evidence of policy congruence between respondentand legislator (Jessee 2009 Ansolabehere and Jones 2010 585) We match 27 roll call itemsin the CCES to roll call votes cast in the House of the 109th to 112th Congress ese coverimportant legislative decisions such as Dodd-Frank the Aordable Care Act (and aemptsto repeal it) the minimum wage increase the ratication of the Central America Free TradeAgreement or the Lilly Ledbeer Fair Pay Act Table A1 in the Appendix lists all matchedCCES items and House bills included in our estimation sample

IIIB Measuring constituency preferences by income group

e CCES provides us with a comparatively large sample size per district However animportant potential issue is that it is not designed to be representative for congressionaldistrict populations us individuals with certain characteristics such as particular com-binations of income race and education may be underrepresented in the CCES samplefor a given district If this is the case unadjusted policy preferences from the CCES willnot reect the target population and using them can lead to biased estimates of unequalrepresentation in Congress as politicians are held to the wrong benchmark e solution tothis issue is to employ some form of small area estimation to rebalance the survey sample torepresent the district population e machine-learning solution we propose is relativelynew to the representation literature in political science but it has some aractive featuresthat merit its application to this topic It does not require distributional and functionalform assumptions it allows for arbitrary higher-order interactions of covariates and it canfully leverage ne-grained census data to construct representative samples of congressionaldistricts However we stress that our ndings do not depend on this particular approach Asshown in Online Appendix B our approach leads to somewhat more conservative estimatesof the impact of unions on the representation of dierent income groups compared to theMRP approach widely used by political scientists (Lax and Phillips 2009) alitatively bothapproaches yield the same conclusions

Our approach small area estimation using chained random forests matches CCESsurvey respondents to corresponding cases from unit record Census data e design of theCensus ensures an accurate representation of the distribution of population characteristicsin a given district (Torrieri et al 2014 Ch4) Matching these two data sources is essentiallya prediction problem which we address using a exible non-parametric machine learningapproach based on random forests (Stekhoven and Buhlmann 2011)9 Put simply the idea is

9Honaker and Plutzer (2016) use a similar approach (but relying on multivariate normal imputations) andfurther discuss its empirical performance in estimating small area aitudes and preferences

8

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 7: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

public reactions plays a central role in theories of accountability and dynamic responsiveness(Arnold 1990 Stimson et al 1995) While many individual legislative votes do not aect thereelection prospects of representatives on potentially salient votes they can face hard choicesbetween party ideology and competing constituency preferences On international tradeagreements for instance Democratic representatives have faced cross-pressures betweena more skeptical stance taken by unions and low-income constituents versus that of theirown party (Box-Steensmeier et al 1997) On the other side of the aisle in the wake of thenancial crisis Republican legislators found themselves torn between their own partisanviews on stimulus spending and the pressure from less well-o constituents (Mian et al2010)

Politiciansrsquo incentives are also linked to information eories of representation empha-size that members of Congress and especially the House face numerous voting decisions ineach term and it would be unrealistic to assume that they have access to reliable unbiasedpolling data on constituency preferences on all the issues they face (Arnold 1990 Miller andStokes 1963) Instead representativesmdashwith the help of their staersmdashrely on alternativemethods to assess public opinion including constituent correspondence town halls contactswith community leaders or local interest groups (Miler 2007) In this limited informationcontext the strength of local unions may enhance the visibility and perception of constituentpreferences (Hertel-Fernandez et al 2018)6

Following seminal theories of congressional action (Arnold 1990 Miller and Stokes1963) our argument emphasizes that the strength of local unions underpins a crediblemobilization threat that impacts the action of candidates and legislators Anticipatingmobilizing eorts by unions a potential candidate may not even enter into the race anelected career-oriented politician might be pressured to alter his or her vote even withouta full mobilization eort as long as unionsrsquo mobilization capacity is visible us bothcampaign contributions and candidate selection should maer as a channel linking localunion strength and representation since they are linked to credible threats of mobilization

Our argument implies that the district-level strength of labor unions increases theresponsiveness by members of Congress to the less auent While we know from previouswork that politicians are considerably more responsive to the preferences of the auentthan those of the less well-o this bias should be reduced in districts with relatively higherunion membership Substantively it is crucial to assess how far the presence of unions canmove responsiveness toward the ideal of political equality7

6Butler and Nickerson (2011) nd that politicians respond when provided with more accurate opinion dataHowever behavioral biases may lead politicians to discount constituent preferences they disagree with(Butler and Dynes 2016)

7In line with a large literature we focus on union membership as a key component of union strength In astudy of the eect of unions on legislative ideology rather than income-biased responsiveness Becher et al(2018) argue that structure of local unions (ie the concentration of unions in a given locality) maers aswell However they also show empirically that union density and concentration are separable dimensions

6

III Data and Empirical Strategy

Any eort to test the relevance of unions for unequal representation confronts majorchallenges of measurement and causal interpretation e dataset we have compiled allowsus to address these issues to an extent previously impossible We have created a panel oflegislatorsrsquo roll call votes matched to income-specic policy preferences at the district leveland district-level measures of union membership Our main empirical strategy to examinethe inuence of unions on unequal representation is built on two basic pillars district xedeects and interactive controls e fact that we observe several roll calls within a givencongressional district allows us to specify a model with district xed eects which captureunobservable characteristics of districts (and states) that are constant over roll-calls such ashistorical legacies or the strength of partisan organization To provide for a stricter test ofthe moderating eect of unions we also allow a rich set of other district characteristics tomoderate the link between income groups and legislatorsrsquo voting behavior is amountsto estimating models including interactions between observed district characteristics andgroup preferences In our most exible specication we allow these to be non-linear (wedescribe our models in more detail below)

e data required to implement these models were constructed in three steps Firstwe match information on roll call items for 223000 CCES respondents to actual roll callvotes cast in the House of Representatives in the 109th to the 112th Congress8 Secondwe estimate policy preferences for low and high income constituents in each district for27 roll calls To deal with the fact that the CCES is not a representative sample of districtpopulations we use a small area estimation strategy combining the CCES sample with unitrecord Census data matching the full distribution of age education gender race and incomeusing a chained Random Forests algorithm (more below and in Appendix B) ird wemeasure district-level union membership based on digitized administrative records from theDepartment of Labor

IIIA CCES data and Congressional roll calls

e CCES is an ideal starting point for our analysis since it is a nationally representativestudy includes a considerable number of roll call questions and provides us with a largeenough sample size to decompose income-group preferences by district It addresses severaldata concerns that plagued initial research on unequal responsiveness in Congress (Bhaiand Erikson 2011) e roll calls included in the CCES concern key votes as identiedby Congressional arterly and the Washington Post and cover a broad range of issues

In this paper we focus on union membership but show in a robustness test that our results still obtainwhen accounting for union concentration (see Table E1)

8Our analysis focuses on one apportionment period which generally holds district boundaries constant (weshow that the results are robust to cases of mid-period redistricting)

7

(Ansolabehere and Jones 2010) Respondents are presented with the key wording of the bill(as used on the oor and in media reports) and are then asked to cast their own vote ldquoWhatabout you If you were faced with this decision would you vote for against or not surerdquoContrary to widely usual agreendashdisagree survey measures of issue preferences matched rollcall votes provide us with unequivocal evidence of policy congruence between respondentand legislator (Jessee 2009 Ansolabehere and Jones 2010 585) We match 27 roll call itemsin the CCES to roll call votes cast in the House of the 109th to 112th Congress ese coverimportant legislative decisions such as Dodd-Frank the Aordable Care Act (and aemptsto repeal it) the minimum wage increase the ratication of the Central America Free TradeAgreement or the Lilly Ledbeer Fair Pay Act Table A1 in the Appendix lists all matchedCCES items and House bills included in our estimation sample

IIIB Measuring constituency preferences by income group

e CCES provides us with a comparatively large sample size per district However animportant potential issue is that it is not designed to be representative for congressionaldistrict populations us individuals with certain characteristics such as particular com-binations of income race and education may be underrepresented in the CCES samplefor a given district If this is the case unadjusted policy preferences from the CCES willnot reect the target population and using them can lead to biased estimates of unequalrepresentation in Congress as politicians are held to the wrong benchmark e solution tothis issue is to employ some form of small area estimation to rebalance the survey sample torepresent the district population e machine-learning solution we propose is relativelynew to the representation literature in political science but it has some aractive featuresthat merit its application to this topic It does not require distributional and functionalform assumptions it allows for arbitrary higher-order interactions of covariates and it canfully leverage ne-grained census data to construct representative samples of congressionaldistricts However we stress that our ndings do not depend on this particular approach Asshown in Online Appendix B our approach leads to somewhat more conservative estimatesof the impact of unions on the representation of dierent income groups compared to theMRP approach widely used by political scientists (Lax and Phillips 2009) alitatively bothapproaches yield the same conclusions

Our approach small area estimation using chained random forests matches CCESsurvey respondents to corresponding cases from unit record Census data e design of theCensus ensures an accurate representation of the distribution of population characteristicsin a given district (Torrieri et al 2014 Ch4) Matching these two data sources is essentiallya prediction problem which we address using a exible non-parametric machine learningapproach based on random forests (Stekhoven and Buhlmann 2011)9 Put simply the idea is

9Honaker and Plutzer (2016) use a similar approach (but relying on multivariate normal imputations) andfurther discuss its empirical performance in estimating small area aitudes and preferences

8

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 8: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

III Data and Empirical Strategy

Any eort to test the relevance of unions for unequal representation confronts majorchallenges of measurement and causal interpretation e dataset we have compiled allowsus to address these issues to an extent previously impossible We have created a panel oflegislatorsrsquo roll call votes matched to income-specic policy preferences at the district leveland district-level measures of union membership Our main empirical strategy to examinethe inuence of unions on unequal representation is built on two basic pillars district xedeects and interactive controls e fact that we observe several roll calls within a givencongressional district allows us to specify a model with district xed eects which captureunobservable characteristics of districts (and states) that are constant over roll-calls such ashistorical legacies or the strength of partisan organization To provide for a stricter test ofthe moderating eect of unions we also allow a rich set of other district characteristics tomoderate the link between income groups and legislatorsrsquo voting behavior is amountsto estimating models including interactions between observed district characteristics andgroup preferences In our most exible specication we allow these to be non-linear (wedescribe our models in more detail below)

e data required to implement these models were constructed in three steps Firstwe match information on roll call items for 223000 CCES respondents to actual roll callvotes cast in the House of Representatives in the 109th to the 112th Congress8 Secondwe estimate policy preferences for low and high income constituents in each district for27 roll calls To deal with the fact that the CCES is not a representative sample of districtpopulations we use a small area estimation strategy combining the CCES sample with unitrecord Census data matching the full distribution of age education gender race and incomeusing a chained Random Forests algorithm (more below and in Appendix B) ird wemeasure district-level union membership based on digitized administrative records from theDepartment of Labor

IIIA CCES data and Congressional roll calls

e CCES is an ideal starting point for our analysis since it is a nationally representativestudy includes a considerable number of roll call questions and provides us with a largeenough sample size to decompose income-group preferences by district It addresses severaldata concerns that plagued initial research on unequal responsiveness in Congress (Bhaiand Erikson 2011) e roll calls included in the CCES concern key votes as identiedby Congressional arterly and the Washington Post and cover a broad range of issues

In this paper we focus on union membership but show in a robustness test that our results still obtainwhen accounting for union concentration (see Table E1)

8Our analysis focuses on one apportionment period which generally holds district boundaries constant (weshow that the results are robust to cases of mid-period redistricting)

7

(Ansolabehere and Jones 2010) Respondents are presented with the key wording of the bill(as used on the oor and in media reports) and are then asked to cast their own vote ldquoWhatabout you If you were faced with this decision would you vote for against or not surerdquoContrary to widely usual agreendashdisagree survey measures of issue preferences matched rollcall votes provide us with unequivocal evidence of policy congruence between respondentand legislator (Jessee 2009 Ansolabehere and Jones 2010 585) We match 27 roll call itemsin the CCES to roll call votes cast in the House of the 109th to 112th Congress ese coverimportant legislative decisions such as Dodd-Frank the Aordable Care Act (and aemptsto repeal it) the minimum wage increase the ratication of the Central America Free TradeAgreement or the Lilly Ledbeer Fair Pay Act Table A1 in the Appendix lists all matchedCCES items and House bills included in our estimation sample

IIIB Measuring constituency preferences by income group

e CCES provides us with a comparatively large sample size per district However animportant potential issue is that it is not designed to be representative for congressionaldistrict populations us individuals with certain characteristics such as particular com-binations of income race and education may be underrepresented in the CCES samplefor a given district If this is the case unadjusted policy preferences from the CCES willnot reect the target population and using them can lead to biased estimates of unequalrepresentation in Congress as politicians are held to the wrong benchmark e solution tothis issue is to employ some form of small area estimation to rebalance the survey sample torepresent the district population e machine-learning solution we propose is relativelynew to the representation literature in political science but it has some aractive featuresthat merit its application to this topic It does not require distributional and functionalform assumptions it allows for arbitrary higher-order interactions of covariates and it canfully leverage ne-grained census data to construct representative samples of congressionaldistricts However we stress that our ndings do not depend on this particular approach Asshown in Online Appendix B our approach leads to somewhat more conservative estimatesof the impact of unions on the representation of dierent income groups compared to theMRP approach widely used by political scientists (Lax and Phillips 2009) alitatively bothapproaches yield the same conclusions

Our approach small area estimation using chained random forests matches CCESsurvey respondents to corresponding cases from unit record Census data e design of theCensus ensures an accurate representation of the distribution of population characteristicsin a given district (Torrieri et al 2014 Ch4) Matching these two data sources is essentiallya prediction problem which we address using a exible non-parametric machine learningapproach based on random forests (Stekhoven and Buhlmann 2011)9 Put simply the idea is

9Honaker and Plutzer (2016) use a similar approach (but relying on multivariate normal imputations) andfurther discuss its empirical performance in estimating small area aitudes and preferences

8

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 9: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

(Ansolabehere and Jones 2010) Respondents are presented with the key wording of the bill(as used on the oor and in media reports) and are then asked to cast their own vote ldquoWhatabout you If you were faced with this decision would you vote for against or not surerdquoContrary to widely usual agreendashdisagree survey measures of issue preferences matched rollcall votes provide us with unequivocal evidence of policy congruence between respondentand legislator (Jessee 2009 Ansolabehere and Jones 2010 585) We match 27 roll call itemsin the CCES to roll call votes cast in the House of the 109th to 112th Congress ese coverimportant legislative decisions such as Dodd-Frank the Aordable Care Act (and aemptsto repeal it) the minimum wage increase the ratication of the Central America Free TradeAgreement or the Lilly Ledbeer Fair Pay Act Table A1 in the Appendix lists all matchedCCES items and House bills included in our estimation sample

IIIB Measuring constituency preferences by income group

e CCES provides us with a comparatively large sample size per district However animportant potential issue is that it is not designed to be representative for congressionaldistrict populations us individuals with certain characteristics such as particular com-binations of income race and education may be underrepresented in the CCES samplefor a given district If this is the case unadjusted policy preferences from the CCES willnot reect the target population and using them can lead to biased estimates of unequalrepresentation in Congress as politicians are held to the wrong benchmark e solution tothis issue is to employ some form of small area estimation to rebalance the survey sample torepresent the district population e machine-learning solution we propose is relativelynew to the representation literature in political science but it has some aractive featuresthat merit its application to this topic It does not require distributional and functionalform assumptions it allows for arbitrary higher-order interactions of covariates and it canfully leverage ne-grained census data to construct representative samples of congressionaldistricts However we stress that our ndings do not depend on this particular approach Asshown in Online Appendix B our approach leads to somewhat more conservative estimatesof the impact of unions on the representation of dierent income groups compared to theMRP approach widely used by political scientists (Lax and Phillips 2009) alitatively bothapproaches yield the same conclusions

Our approach small area estimation using chained random forests matches CCESsurvey respondents to corresponding cases from unit record Census data e design of theCensus ensures an accurate representation of the distribution of population characteristicsin a given district (Torrieri et al 2014 Ch4) Matching these two data sources is essentiallya prediction problem which we address using a exible non-parametric machine learningapproach based on random forests (Stekhoven and Buhlmann 2011)9 Put simply the idea is

9Honaker and Plutzer (2016) use a similar approach (but relying on multivariate normal imputations) andfurther discuss its empirical performance in estimating small area aitudes and preferences

8

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 10: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

that rich census data exist for every district wheres survey data on preference are scarce insome districts and may not be fully representative Using general machine learning tools wecan aach preferences to the Census by matching it to CCES respondents based on commondemographic characteristics e resulting data set of public preferences is representative ofcongressional districts

Concretely we use about 3 million individual-level records from a synthetic sample ofthe Census Bureaursquos American Community Survey from 2006 to 2011 We stack both datasetscreating a structure where we have common district identiers and individual covariateswhile responses to policy preference questions are missing in the Census portion of thedata As common covariates bridging CCES and Census we use the following demographiccharacteristics gender race (3 categories) education (5 categories) age (continuous) andfamily income (continuous)10 e laer is of particular relevance as we are interested inproducing districtndashincome group specic preferences

In the next step we ll missing roll call preferences in the Census with matchingdata from CCES respondents Since this is essentially a prediction problem we can usepowerful tools developed in the machine learning literature to achieve this task We usean algorithm proposed by Stekhoven and Buhlmann (2011) which uses chained randomforests (Breiman 2001) to impute missing cells Compared to commonly used multivariatenormal or regression imputation techniques this strategy has the advantage that it is fullynonparametric allowing for complex interactions between covariates and deals with bothcontinuous and categorical data (Tang and Ishwaran 2017) Our completed data-set nowcontains preferences for 27 roll call items of synthetic lsquoCensus individualsrsquo which are arepresentative sample of each House district

With these data in hand we assign individuals to income groups and calculate group-specic preferences for each roll call in each district Following previous work in therepresentation literature (Bartels 2008 2016) we delineate low- and high-income respondentsusing the 33th and 67th percentile of the distribution of family incomes Note that in linewith theories of constituency representation in Congress we specify these income thresholdsseparately by congressional district is accounts for the substantial dierences in bothaverage income and income inequality between US districts It also ensures that withineach district income groups are of comparable size Online Appendix Table A2 shows thedistribution of income-group cutos On average our chosen cutos are close to thoseused in the established literature e mean of our district-specic low-income cutos isaround $39000 while Bartels uses $40000 (Bartels 2016 240) our mean high-income cutois around $81000 where Bartels employs a threshold of $80000 However beyond theseaverages lies considerable variation In some districts the 33rd percentile cuto is as low as$16500 while the 67th percentile reaches almost $160000 in others11

10See Appendix B for more details on the construction of our Census sample and our matchingimputationprocedure

11Results are relatively invariant to using alternative income thresholds (see Table C1)

9

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 11: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

minus01 00 01 02 03 04 05 06

Increase Minimum Wage

minus01 00 01 02 03 04 05 06

Housing Crisis Assistance

minus02 00 01 02 03 04 05minus01

Fair Pay Act

minus01 00 01 02 03 04 05

Affordable Care Act

minus05 minus04 minus03 minus02 minus01 00 01

CAFTA Ratification

minus01 00 01 02 03 04 05 06

Recovery and Reinvestment

Figure IDistrict-level income gap in public support for 6 selected policies

Note Each histogram plots the dierence in support for a matched roll-call vote question between people inlower third and people in upper third of their districtrsquos income distribution for all House districts

For each roll call we then estimate district-level preferences of low- and high-incomeconstituents which we denote by (θ l θh) as the proportion of individuals voting lsquoyearsquo Sincepreference estimates are in [0 1] they can be directly related to legislatorsrsquo probability ofvoting lsquoyearsquo on a given roll call Our data shows considerable variation in the distance ofthe policy preferences of those at the top and those at the boom as illustrated in Figure I Itplots histograms of the dierence between low-income and high-income preferences (θhminusθ l )in congressional districts for six selected roll calls For salient bills such as increasing theminimum wage (the Fair Minimum Wage Act) housing crisis assistance (the Housing andEconomic Recovery Act) or Aordable Care Act the vast majority of low-income con-stituents are more supportive than their high-income counterparts in each and every districtOn other issues such as the ratication of the Central America Free Trade Agreement highincome constituents are clearly in favor In all examples we nd considerable across-districtvariation in the preference gap between low- and high-income constituents12 We willemploy this variation over both roll calls and districts to estimate legislatorsrsquo dierential

12Averaged over all districts and roll calls there is a statistically signicant gap between the preferences ofthe boom third and the top e mean of the (absolute) preference dierence is 17 percentage points the10th percentile is 3 points while the 90th percentile is 32 percentage points

10

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 12: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

responsiveness to changes in policy preferences of dierent income groups and how itmight be moderated by union strength

IIIC District-level union membership

To measure district-level union membership we draw on ne-grained administrative dataBased on the Labor-Management Reporting and Disclosure Act (LMRDA) of 1959 unionshave to le mandatory yearly reports (called LM forms) with Oce of Labor-ManagementStandards (OLMS) e Civil Service Reform Act of 1978 introduced a similarly compre-hensive system of reporting for federal employees (see Budd 2018) A mandatory part ofeach report is the number of members a union has Failure to report or reporting falsiedinformation is made a criminal oense under the LMRDA and reports led by unions areaudited by the OLMS is makes LM forms a reliable source of information on unions andtheir members

Using LM forms provides important advantages over using measures derived fromsurveys First mandatory administrative lings are likely more reliable than populationsurveys which oen suer from over-reporting and unit-nonresponse (Southworth andStepan-Norris 2009 311 Card 1996)13 Second they allow us to estimate union membershipnumbers for smaller geographical units which are usually unavailable in population surveys(to protect respondentsrsquo condentiality) or only covered with insucient sample sizes14

Another advantage for the study of politics is that the presence of union locales is observableto politicians on the ground even in the absence of survey data

e resulting database contains almost 30000 local union It is based on 358051 digitizedindividual reports that were cleaned validated geocoded and matched to congressionaldistricts e number of union members in each congressional district can then be readilyobtained as the sum of all reported union members Figure II shows the distribution of unionmembership in House districts averaged for the 109th to 112th Congress It demonstratesthat there is substantial variation in unionization between electoral districts even withinstates which would be ignored by a state-level analysis

A potential drawback of using LM forms is that some unions are exempt from lingrequirements Each and every private sector union is required to submit a report but undersome specic conditions public sector unions are exempt us while unions representingpostal or federal employees are covered unions that exclusively represent state countyor municipal government employees are exempt However even these have to le if atleast one of their members is a private sector employee In practice this leads to almost

13Even the primary source for union data the Current Population Survey (CPS) suers from these issuespartly as a result of its rather broad question wording

14e most prominent data set on union membership compiled by Hirsch et al (2001) provides CPS-basedestimates for states and metropolitan statistical areas district identiers are not available

11

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 13: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

4th quartile3rd quartile2nd quartile1st quartile

Figure IIUnion membership in House districts 109th-112th Congress

complete coverage as during the laer part of the twentieth century unions are increasinglyorganizing workers across dierent sectors and occupations (Lichtenstein 2013 249)15

IIID Statistical specications

For each roll call vote j (j = 1 J ) we have measured preferences of low and highincome citizens in a given congressional district d (d = 1 D) denoted by (θ l

jd θh

jd) For

each district the level of (logged) union membership is denoted byUd Given that populationsize is approximately identical in districts within states we sometimes simply refer to thisas union density We specify relevant confounders in Xd Depending on the particularspecication (discussed in the next section) these will include (i) socio-economic districtcharacteristics (ii) measures of historical state union policies and state xed eects (iii)measures for the capability of districtsrsquo workers to organize collective action (iv) as well asnon-linear transformations of these For ease of interpretation we have scaled all inputs tohave mean zero and unit standard deviation Our model for the voting behavior of House

15While there is no ldquogold standardrdquo of accurate union membership numbers we can compare aggregatemembership based on our LM form data with widely used survey-based measure from the CPS (Hirschet al 2001) is conrms that LM forms provide a rather comprehensive accounting of unions At thenational level the average number of union members in our dataset is 1321 million (excluding WashingtonDC which is not represented in Congress) e CPS gure for the same period is 1522 million ismodest dierence is consistent with some degree of over-reporting in the CPS given its broad questionwording (Southworth and Stepan-Norris 2009 311) It can also be interpreted as an upper bound for thenon-coverage of some public sector unions in our data A more detailed analysis by Becher et al (2018)shows that state-level aggregates from LM forms and the CPS are strongly correlated (r = 086)

12

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 14: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

members is the following linear probability specication

yijd =microlθ ljd + micro

hθhjd + ηl (Ud times θ

ljd) + η

h(Ud times θhjd)+

βl (Xd times θljd) + β

h(Xd times θhjd) + αd + ϵijd

e key terms here are the interactions between union membership and the respectivepreferences of the auent and the poor Udθ

hjd

and Udθljd

us when ηl and ηh are zero thegroup-specic preference coecients microl and microh indicate the change in the probability oflegislators casting a supportive vote induced by a standard deviation change in the respectivepreferences of the poor and the auent e coecient ηl indicates the marginal eect of astandard deviation change in logged union membership on the responsiveness of legislatorsrsquovotes to the preferences of the poor e corresponding marginal eect for the auent isgiven by ηh Our theoretical expectation is that ηl gt 0 and ηh le 0

In order to mitigate the inuence of unobserved confounders aecting legislatorsrsquo votingbehavior we account for time-constant unobservables on the district-level by includingdistrict xed eects αd 16 Despite this one may be worried that changes in responsivenessaributed to unions are spurious To provide a stricter test of the moderating eect ofunions we include the interactions between controls (both on the district- and state-level)and group preferences Xdθ

ljd

and Xdθhjd

ey use within-district variation over roll-calls andpreferences to estimate the conditional marginal eect of group preferences making it lesslikely that our estimated eect of union membership is simply due to omied confoundersIn more sophisticated analyses detailed below we allow these confounds to be stronglynon-linear as well Finally ϵijd are white-noise errors assumed independent of covariatesWe account for heteroscedasticity and arbitrary within-district correlations when calculatingstandard errors (Abadie et al 2017 Cameron and Miller 2015 324)

IV Results

Before presenting evidence on the moderating eect of unions we want to give a senseof the overall picture of legislatorsrsquo responsiveness emerging from our data Estimating amodel as described above with district xed eects but without accounting for local unionorganization (seing βl βh and ηl ηh to zero) or any other moderators we nd a clear gap inthe responsiveness of legislators to the preferences of low- versus high-income individualsA standard deviation increase in the preferences of the auent is linked to an increase inthe probability of legislators to cast a corresponding vote of 136 (plusmn12) percentage pointsIn contrast a standard deviation increase in the preferences of the less well-o inducesa much smaller change in legislatorsrsquo behavior of 16 (plusmn14) percentage points With a

16Note that non-interacted eects of district-level union membership and covariates (which vary betweendistricts but are constant over roll calls) are absorbed in αd

13

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 15: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

condence interval ranging from minus11 to 44 points we cannot reject the null hypothesisthat legislators do not respond to the preferences of low-income constituents in the averageelectoral district e responsiveness gap between the two groups is sizable (at 119 (plusmn25)percentage points) and signicantly dierent from zero We show below that the extent oflegislatorsrsquo non-responsiveness depends crucially on the strength of local unions

IVA Unions and unequal legislative responsiveness

We start by summarize our key nding graphically and then discuss more extensivemodel specications Figure III plots marginal eects of low- and high-income constituencypreferences on representativesrsquo roll-call votes at varying levels of union membership with95 condence intervals17 It shows that legislatorsrsquo responsiveness to the policy preferencesand low-income and high-income constituents depends on district-level union membershipas unionization increases legislatorsrsquo responsiveness to low-income constituents increaseswhile their responsiveness to high-income constituents declines by a similar amount Forexample moving from a district with median levels of union density to one at the 75th per-centile increases the responsiveness of legislators to low-income preferences by 8 percentagepoints while it decreases responsiveness to high-income preferences by about 5 pointsGiven the initial responsiveness gap this change is substantial enough to substantially levelthe playing eld between auent and poor

Are these ndings robust to confounding factors Table I presents parameter estimatesfrom a number of increasingly rich specications designed to capture potential confoundsIn specication (1) we begin with a baseline model (also ploed in Figure III) that includesdistrict xed eects but no further preferences-confounder interactions (seing βl and βh tozero) We nd that a standard deviation increase in district union membership increaseslegislatorsrsquo responsiveness to the poor by about 11 (plusmn1) percentage points while at the sametime decreasing the advantage in responsiveness enjoyed by the auent by about 6 (plusmn1)points

Even aer accounting for district xed eects however our results are still vulnerable toomied variables that interact with group preferences Following accounts of winner-take-all politics (Hacker and Pierson 2010) one alternative interpretation is that the moderatingeect we have ascribed to unions mostly reects the fact that state governments have chosenpolicies that strengthen or weaken the ability of unions to organize (also see Ahlquist 2017Anzia and Moe 2016) If the likelihood of adapting pro- or anti union policies is correlatedwith biased representation our estimated eect of unions might be spurious In line withthis concern recent studies have demonstrated that right-to-work and collective bargaininglaws regulating the formation and management of unions in the private or public sectorhave clear political eects on turnout and partisan vote shares (Feigenbaum et al 2018

17Calculated from a LPM of vote choice on preferences and union membership It includes district xed eectsand clusters standard errors on the district level See also specication (1) in Table I below

14

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 16: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ctLow income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Mar

gina

l effe

ct

High income constituents

p10 p25 p50 p75 p90

Figure IIIDistrict-level union membership as moderator of unequal representation

Note is gure plots changes in marginal eects of low- and high-income constituency preferences onrepresentativesrsquo roll-call votes conditional on district-level union membership Shaded areas are 95 condenceintervals based on district-clustered standard errors e sample distribution of (z-standardized) unionmembership is indicated above the x-axis

Flavin and Hartney 2015) In specication (2) we therefore add two measures of historicalstate union policy the share of years with right-to-work legislation and the share of yearswith mandatory collective bargaining laws for teachers since 1955 taken from Flavin andHartney (2015) ese enter Xd and are interacted with income group preferences θ l andθh In specication (3) we go one step further and allow for any state-level characteristic(such as institutions or historically-rooted popular anti-union sentiments) to moderatethe marginal eect of income group preferences on legislators vote choice by includingstate-specic constants in Xd which are interacted with group preferences e results fromboth extended specications show that accounting for state-level policies and institutions aspotential moderators does not change our core picture of the role of local union organizationwhere local unions are stronger the responsiveness gap between the auent and the poor isreduced

A more subtle problem concerns a form of simultaneity bias at the district level eremay be district-level factors shaping both the propensity to be a union member and to bepolitically active If less auent individuals with a higher capacity to organize and solvecollective action problems cluster in specic districts our estimates of the marginal impactof district union membership on responsiveness will be overly optimistic Such a propensitymay reect critical historical junctures in labor organizations (Ahlquist and Levy 2013) or

15

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 17: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Table IUnion density and representation Marginal eect of standard deviation increase in union

membership on marginal eect of income group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note N=15780 Nd = 534 27 roll call votes 109th to 112th Congress Linear probability models with standard errors robust toarbitrary within-district correlation and heteroscedasticity All models include district xed eects Entries are marginal eects ofunion membership ηl and ηh Specications (2) to (5) include coecients for interaction (β l βh ) of income group preferenceswith state- or district-level confounders Specication (2) includes two measures of historical state union policymaking the shareof years with right-to-work legislation and collective bargaining agreements (3) interacts preferences with state xed eects (4)includes a measure of district-level capacity to organize collective action captured by the number of churches per inhabitant andthe number of NLRB union certication elections (5) includes a large set of district-level characteristics (population size degreeof urbanization shares of female Black Hispanic BA degrees employed in manufacturing as well as median household income)Specication (6) includes all of the previously described measured variables

social capital (Putnam 1993 2000) Consistent with the laer for instance Nannicini et al(2013) nd that that political accountability in Italy is higher in districts with higher socialcapital

To tackle this problem we gathered additional data capturing the organizational capacityof a district (i) the capability of workers to organize collective action measured via theaverage number of union certication elections in a district (ii) the stock of social capitalcaptured by the number of congregations per 1000 inhabitants (as well as two alternativemeasures of social capital a behavioral index and the number of bowling alleys used inrobustness tests)

Union certication elections conducted by the National Labor Relations Board (NLRB)are a useful proxy since holding such an election requires overcoming a costly organizationalhurdle at least 30 percent of employees have to sign authorization cards stating that theywant to be represented by a union Union organizers also face a non-trivial probability ofbeing (illegally) red by her employer (Budd 2018 ch 6)18 We use the NLRBrsquos database to

18Certication elections are not a foregone conclusion during the 112th Congress unions won 59

16

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 18: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

extract all aempts to certify (or de-certify) a local union19 We geocode each individualcase report and locate it in a district We then use the (logged) average number of cases in adistrict over the last seven years to proxy organizational potential To count the number ofcongregations in a district we use county-level data from the 2000 Religious Congregationsand Membership Study and spatially interpolate it to districts Appendix D provides moredetails Both measures (interacted with group preferences) proxy a districtrsquos organizationalcapacity in specication (4)

Perhaps surprisingly we nd that accounting for organizational capacity only dampensthe union eect by a modest amount e estimated impact of unions on responsiveness isreduced by about 1 percentage point Note that this may also reect the fact that existingunion strength shapes aempts to organize new rms or establishments However spec-ication (4) in Table I makes clear that even aer accounting for organizational capacitywe nd that local union membership shapes responsiveness a standard deviation increasein union membership still increases legislatorsrsquo responsiveness to the preferences of thepoor by 9 (plusmn1) percentage points and lowers their responsiveness to the preferences of theauent is rules out the interpretation that the moderating eect of unions is merely anartifact of a broader propensity to overcome collective action problems

In specication (5) we measure a large number of districtsrsquo socio-economic charac-teristics and allow them to interact with constituency preferences population size race(share of African Americans and Hispanics) education (share with BA or higher) the shareof the working population employed in manufacturing median household income andthe degree of urbanization (for descriptive statistics see Table A3) is set of covariatesexcludes ldquobad controlsrdquo (Samii 2016) such as partisanship that are a mechanism throughwhich unions inuence representation20 Again our results point towards the existenceof a clear moderating eect of unions albeit at a somewhat smaller magnitude of about7 percentage points Our nal specication column (6) of Table I includes all previouscovariates and again conrms our core nding

19ere are about 2200 elections each year Not included is voluntary card check recognition by employersDespite several high-prole voluntary recognition campaigns in recent years Budd (2018 199) notesthat this is ldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

20eoretically and empirically unions shape voting and election outcomes (see our analysis of possiblemechanisms below and the literature cited in the introduction) Union membership is mainly driven byeconomic considerations and state-level policies that are accounted for in the analysis (Feigenbaum et al2018) To the degree that historical district-level partisanship is linked to union organization beyond state-level policies and district socio-economic structure this should be captured by our measure of certicationelections

17

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 19: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

IVB Further robustness tests

Alternative measures of social capital We consider two additional measures of social capitalOur rst measure is the number of bowling alleys in an area popularized in ldquoBowlingAlonerdquo (Putnam 2000) based on data collected by Rupasingha and Goetz (2008) Our secondmeasure is a composite social capital index combining information on membership involuntary associations voter turnout the Census response rate and the number of non-protorganizations (Rupasingha and Goetz 2008) We aggregate both measures to congressionaldistricts (both refer to 2009 values) using spatial population-based weighting Our resultsshow that using these alternative measures does not change our core results

Table IIRobustness tests Marginal eects of union membership on dierential

legislative responsiveness under alternative specications

Low income High income

(1a) Social capital bowling alleys 0067 (0014) minus0051 (0013)(1b) Social capital index 0065 (0014) minus0048 (0013)(2) Redistricting 0067 (0014) minus0051 (0013)(3) MRP estimated preferences 0115 (0022) minus0091 (0018)Note Based on specication (5) in Table I Entries are parameter estimates for ηl and ηh Cluster-robust standarderrors in parentheses Specication (1) includes measures of social capital the number of bowling establishmentsand the social capital index of Rupasingha and Goetz (2008) spatially interpolated to congressional districtsN=15420 Specication (2) exclude both states (Texas and Georgia) where inter-census redistricting occurredN=14150 Specication (3) uses preferences estimated using MRP See appendix B for more details N=15647

Redistricting Our analysis is conned to a single apportionment period during which dis-trict borders remain constant e exceptions are several cases of court-ordered redistrictingin Georgia and Texas We exclude these two states in our second robustness test and ndthat our results are virtually unchanged

MRP estimated preferences An alternative approach to estimating district preferences isto use multilevel regression followed by poststratication (for recent overviews see Laxand Phillips 2009 or Gelman 2014) We discuss the dierences in statistical assumptionsmade by the two approaches in detail in Appendix B Here we show in specication (3) thatusing estimates based on the MRP methodology yields results that are qualitatively similarto ours Estimated marginal eects for responsiveness towards low income constituents aresomewhat larger at about 12 (plusmn2) percentage points while marginal eects for high incomeconstituents are more pronounced as well In Table B1 in the online appendix we estimatemore specications and show that responsiveness estimates based on MRP preferences arealways somewhat larger than the ones based on matching using chained Random Forests In

18

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 20: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

the same table we also show that our core results are also obtained when simply aggregatingraw preference data from the CCES

Additional robustness tests In Appendix E we report additional lsquotechnicalrsquo robustness testssuch as removing extreme district preferences in each district accounting for measure-ment error in district preferences or using the robust trimmed linear probability estimatorsuggested by Horrace and Oaxaca (2006)

IVC Relaxing modeling assumptions

So far we have mainly studied the robustness of our results by adding potential con-founders In this subsection we implement two rather dierent statistical specications inorder deal with issues of omied variable bias and functional form dependence

Post-double-selection estimator Our rst model using the post-double-selection estimator(Belloni et al 2014 Chernozhukov et al 2015) addresses bias arising from omied variablesusing two strategies First it constructs a high-dimensional vector of controls by allowingfunctional transforms of observables and their higher order interactions It thus creates apartially linear model (Robinson 1988) using controls without the functional form restrictionscommonly employed in the linear model Second it models both the legislative votingequation that we considered so far as well as ldquotreatmentrdquo equations that model variation in theinteraction of union membership and preferences Importantly the high-dimensional controlvector enters both outcome and treatment equations Out of the (possibly large) number ofterms one selects confounders that predict both preferences and roll call votes using standardMachine Learning tools such as the LASSO21 e selected set of covariates is used in apost-LASSO estimation step to account for relevant confounders e resulting estimator haslow bias and yields accurate condence intervals even under moderate selection mistakes(Belloni et al 2014) Appendix F provides more technical details Responsible for thisrobustness property is the LASSO step selecting the control set from both treatment andoutcome equations It nds controls whose omission leads to ldquolargerdquo omied variable biasand includes them in the model Any variables that are not included are therefore at mostmildly associated to the treatment and the outcome which decidedly limits the scope ofomied variable bias (Chernozhukov et al 2015)

Table III shows the resulting estimates from three specications In the rst one weinclude all district variables their pairwise interactions and their interactions with districtpreferences all in both linear and quadratic form is leads to a vector of 144 covariateterms In specication (2) we extend the set of possible controls and additionally includeunion policy variables and our measures of organizational capacity (as well as all theirtransforms) leaving us with 312 terms Specication (3) allows for even more nonlinearity

21e key is to transform this system of equations into one that represents a predictive relationship (wherethe application of machine learning tools such as the LASSO make sense)

19

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 21: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Table IIIPost-double-selection estimator Marginal eect of unionization

on legislative responsiveness to low and high income groups

(1) (2) (3)

Low income preferences 0063 0066 0062(0014) (0017) (0016)

High Income preferences minus0054 minus0036 minus0040(0013) (0015) (0016)

Semi-parametric terms 144 312 624post-LASSO terms 18 45 112Note Double Selection Estimator (Belloni et al 2014) consists of LASSO selection of con-founders in both outcome and union-preferences equations and post-selection least squaresestimation of model see Appendix F for details Selection performed using root-LASSO (Bel-loni et al 2011) We employ sample spliing LASSO selection performed on 50 sample pa-rameter estimates performed on remaining 50 (N=7884) Table entries are estimates for ηLand ηH with cluster-robust standard errors in parentheses Specication (1) includes districtcharacteristics in both linear and quadratic form and all their pairwise interactions Speci-cation (2) adds union policy and organizational capacity terms Specication (3) additionallyincludes cubic splines (at four knots) of all terms

by using cubic splines for all covariate terms leading to a high-dimensional vector of 624controls As the last line of Table III shows the estimator selects a subset of these producingmore exible model specications with the number of included controls ranging from 18to 112 Even under these much more demanding specications we nd that increasingunionization positively aects the representation of low-income constituents A standarddeviation increase in union membership increases legislatorsrsquo responsiveness to low-incomepreferences by about 6 to 7 percentage points while decreasing the responsiveness to thepreferences of the auent by about 4 points e magnitude of our estimates is in line withthe ones we obtained in the richer specications of our previous linear model (comparespecications (4) and (5) in Table I)

Kernel Regularized Least Squares (KRLS) While the previous modeling strategy is ratherexible it did not relax one key assumption the existence of an interaction between districtpreferences and union membership (our η terms) is interaction is of course the center ofour analysis and one might ask why its exclusion should be considered at all e issue here isthat we specify this interaction in a restrictivemdashlinearmdashform which might not be supportedby the data and only found in our model estimates due to functional form misspecication Ina recent replication survey Hainmueller et al (2018) warn that ldquoa large portion of publishedndings based on multiplicative interaction models are artifacts of misspecication or are atbest highly model dependentrdquo It is thus is prudent to consider an analysis that ldquolets the data

20

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 22: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

speakrdquo In the model below estimated using KRLS (Hainmueller and Hazle 2014) we donot specify any interaction a priori nor do we specify any functional form

Intuitively one can think of KRLS as a local regression method which predicts theoutcome at each covariate point by calculating an optimally weighted sum of locally edfunctions e KRLS algorithm uses Gaussian kernels centered around an observation eweights are chosen to produce the best t to the data22 e benet of this approach istwofold First it allows for an approximation of highly nonlinear and non-additive functionalforms Second it allows us to check if the marginal eects of group preferences changeswith levels of unionization without explicitly specifying this interaction term To do thelaer we calculate pointwise partial derivatives of district preferences with respect to levelsof union membership (Hainmueller and Hazle 2014 156)

Figure IV summarizes results from this approach It plots a locally smoothed summaryof pointwise partial eects for low and high income group preferences (on the y-axis)against levels of union membership (on the x-axis) Perhaps unsurprisingly we nd that theassumption of an exactly linear interaction specication is too restrictive especially in thecase of the preferences of high income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

Low income constituents

p10 p25 p50 p75 p90

minus16 minus08 00 08 16minus04

minus02

00

02

04

Union membership [std]

Par

tial e

ffect

High income constituents

Figure IVNonparametric estimate of interaction between union membership and preferences

Note is gure plots partial eects (summarized using thin-plate spline smoothing) of preferences of lowand high income constituents on legislative votes at levels of district union membership Estimates obtainedvia KRLS

22See Appendix G for details on the approach and parameter selection

21

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 23: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

However the most noteworthy result clearly is the fact that using a non-parametricmodel not including an a priori interaction between union membership and preferenceswe nd clear evidence that union membership moderates the relationship between prefer-ences and legislative voting For low income constituents increasing district-level unionmembership steadily increases the marginal eect of their preferences on legislatorsrsquo votechoice Moving from low levels of union membership (at the 25th percentile) to medianlevels of union membership increase low-income preference responsiveness by about 5percentage points An equally sized increase from the median to the 75th percentile increasesresponsiveness by almost 8 percentage points We also nd similar (albeit weaker) evidencefor an interaction between high income group preferences and union membership

V Heterogeneity

Union type Is our nding driven by a particular type of union A recent strand of researchstresses the special characteristics of public unions and their political inuence (eg Anziaand Moe 2016 Flavin and Hartney 2015) Hence one may ask whether our ndings mainlyreect the inuence of private-sector unions since public sector unions are too narrow intheir interests to mitigate unequal responsiveness Panel (A) of Table IV provides someevidence on this question e administrative forms used to measure union membership donot distinguish between private and public unions and local unions may contain workersfrom both the private and the public sector To calculate an approximate measure of districtpublic union membership we identify unions with public sector members (based on theirname) and create separate union membership counts for ldquopublicrdquo and the remaining ldquonon-publicrdquo unions (see appendix A for details)

Our ndings suggests that the coecient for the impact of a districtsrsquo public unionmembership on the responsiveness of legislators to the preferences of the poor is sizable (atabout 7 percentage points) and clearly statistically dierent from zero At the same timethe coecient for the remaining ldquonon-publicrdquo unions is slightly reduced e dierencebetween the two estimates is not statistically distinguishable from zero is nding doesnot support the hypothesis of a null-eect of public sector unions It also suggests that thechanging private-public union composition will not necessarily lead to less collective voicein Congress

Bill ideology Panel (B) explores whether the eect of unions varies with the ideologicaldirection of the bill that is voted on Based on the partisan vote margin of the roll call votewe dene an indicator variable for conservative roll calls and estimate separate coecientsfor each bill type We nd that union eects are relevant (and signicant) for both bill typesthey are larger for conservative votes A standard deviation increase in union membershipincreases responsiveness to the preferences of low-income constituents by about 9 (plusmn2)percentage points for conservative bills compared to about 5 (plusmn1) points for liberal bills

22

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 24: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

e dierence is larger for the preferences of high income constituents In both cases thedierence in marginal eects between liberal and conservative bills is statistically signicantOur ndings suggest that union inuence is more relevant for bills that have (potentially)adverse consequences for low income constituents We trace this issue further in the nextspecication

Table IVEect heterogeneity Marginal eects of unionization on legislative

responsiveness to low and high income groups

Low income High income

(A) Private vs Public unionsPublic unions 0074 (0016) minus0058 (0015)Non-public unions 0054 (0016) minus0027 (0016)

(B) Bill ideologyConservative bill 0086 (0017) minus0086 (0018)Liberal bill 0052 (0014) minus0028 (0013)

(C) AFL-CIO endorsementNo position 0054 (0014) minus0054 (0013)Endorsement 0077 (0015) minus0040 (0014)

Note Estimates for ηL and ηH with cluster-robust standard errors in parentheses N=15780 Panel (A)shows separate eects for district counts of union members for unions classied as public or non-public(see text) Statistical tests for the dierence in union type yield p = 0172 for low income preferences andp = 0027 for high income ones Panel (B) estimates separate eects for bills classied as conservativeor liberal based on their predominant party vote Tests for signicance of dierence p = 0009 for lowand p = 0000 for high income preferences Panel (C) classies bills with economic content where theAFLCIO has taken a public stand for or against it (depending on bill content) Tests for signicance ofdierence p = 0003 for low income p = 0049 for high income preferences

Union voting recommendations In panel (C) we consider bills with economic content andthat have (or have not) been endorsed explicitly by the largest union confederation theAFL-CIO Our denition of endorsement is based on voting recommendations made publiclyby the AFL-CIO23 AFL-CIO recommendations signal the salience of the issue to unions andthey were made for more than half of the votes in the analysis Panel (C) shows that theimpact of union membership on legislatorsrsquo responsiveness for bills especially relevant tolow-income citizens is about 2 percentage points larger for votes on which the AFL-CIO hadtaken a prior position is dierence is statistically dierent from zero (p = 0003)24 efact that districts with higher union membership see beer representation of the less auent

23Taken from the AFL-CIO ldquolegislative scorecardrdquo httpsaflcioorgwhat-unions-dosocial-economic-justiceadvocacyscorecard

24For high-income preferences the estimate for ηh is smaller for endorsed bills but still signicantly dierentfrom zero

23

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 25: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

more so when issues are salient to unions bolsters the interpretation that our main result isactually driven by unionsrsquo capacity for political action is nding is also consistent withmicro-level studies of the eects of union position-taking (Ahlquist et al 2014 Kim andMargalit 2017)

VI Exploring Possible Mechanisms

In this nal empirical section we assess two mechanisms of union inuence discussedbefore campaign contributions and partisan selection If contributions are a channel ofunion inuence we should observe that (i) in districts where unions are stronger localunions and their members contribute more to siing members of Congress and (ii) thatthese contributions are positively linked to legislative responsiveness We examine bothrelationships in Panel (A) of Table V e rst two columns show district-level regressions(with state xed eects) relating union strength to (logged) contributions We nd thatunder two specications (with and without extensive district controls) an increase in unionmembership systematically increases the amount of contributions from labor in that districtConverted to Dollar amounts (following Duan (1983)) a standard deviation increase inunion membership increases contributions from Labor by about $81000 Our measure ofcontributions is calculated from raw campaign nance contribution data obtained fromthe Center for Responsive Politics We sum contributions reported to the Federal ElectionCommission to candidates from the ldquolaborrdquo sector (excluding single-issue donations) Ourcount includes both individuals and PACs (but using either alone does not change ourresults)

e last two columns of Panel (A) examine how contributions moderate legislatorsrsquoresponsiveness Following the specication used in Table I we estimate linear probabilitymodels regressing roll call votes on contributions interacted with constituency preferencesdistrict xed eects and in column (4) district covariates interacted with preferences Wend that in districts where labor contributions are higher the marginal eect capturing alegislatorrsquos responsiveness to the preferences of low income constituents is signicantlyhigher is holds when accounting for district characteristics in the second specicationwhich also hold constant the amount donated by business interests

Turning to the selection of partisan politicians if unions rally around Democratic candi-dates and manage to inuence electoral outcomes through contributions and other mobi-lization eorts we expect to nd that higher union membership is associated with a higherprobability of a Democratic candidate being elected We examine this relationships in Panel(B) e rst two columns show LPMs with state xed eects modeling a Democrat beingelected in a given district as a function of union membership (and district-level controls)We nd our expectation to be borne out an increase in union membership is signicantlyassociated with an increase in the election probability of a Democratic candidate Consistentwith previous research (Rhodes and Schaner 2017) the selection of Democratic legislators

24

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 26: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Table VLabor contributions and selection of Democratic legislators

(1) (2) (3) (4)

A Contributions channel

DV Contrib DV roll callUnion membership 0056 0046

(0012) (0014)Contributions times low income prefs 0946 0865

(0036) (0034)Contributions times high income prefs minus0735 minus0714

(0029) (0031)

B Selection channel

DV Democrat DV roll callUnion membership 0161 0106

(0024) (0023)Democrat times low income prefs 0576 0542

(0012) (0015)Democrat times high income prefs minus0411 minus0423

(0013) (0015)

District controls X X

Note Panel (A) column (1) shows district-level regression of (log) labor contributions on (log) union membershipwith state xed eects Column (2) adds district-level controls (population size degree of urbanization shares offemale Black Hispanic BA degrees employed in manufacturing median household income organizational capac-ity) N=428 (at-large districts are excluded) Column (3) shows LPMs with district xed eects for legislatorsrsquo voteas function of the interaction between (log) labor contributions and district preferences Column (4) adds district-level controls interacted with preferences N=15780 Panel (B) columns (1) and (2) show district-level LPM withstate xed eects of presence of Democratic representative on (log) union membership N=428 Columns (3) and(4) show LPMs with district xed eects for legislatorsrsquo vote as function of the interaction between (log) laborcontributions and Democratic representative N=15776 All specications employ cluster-robust standard errors

is then associated with higher responsiveness to the preferences of low income constituentscompared to their Republican counterparts as shown in the least two columns of Panel (B)

Local unions are not necessarily the primary actor lobbying Congress relative to stateassociations or nationalinternational aliates (Dark 1999) e evidence that district-levelunion membership nonetheless maers for legislative responsiveness is consistent with theargument that local union strength underpins a credible threat of mobilization that shapespolitical equality through political selection and post-electoral incentives e importance ofelectoral selection visible in our results is in line with a larger body of research on electionsand representation (Bartels 2016 Lee et al 2004 Miller and Stokes 1963) Mobilization eortsby unions remain strongly linked to available human resources on the ground (Rosenfeld2014 Zullo 2008) As has already been shown by Berelson et al (1954) local unions provide an

25

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 27: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

important social basis for electoral mobilization Furthermore national associations may alsohave incentives to target contributions to districts where unions are stronger to demonstratethat membersrsquo contributions are used in an eective way Finally recent evidence also showsthat the presence of local unions is linked to the perceptions of constituent preferencesby congressional staers Hertel-Fernandez et al (2018) nd that congressional staersrsquoviews are biased toward the preferences of conservative and business interest groups (alsosee Broockman and Skovron 2018) Strikingly however they nd that this bias declines asdistrict-level union membership increases is is consistent with the (old) argument thatthe visible presence of an organized group in a district makes legislators more alert to itspreferences (Arnold 1990 Miller and Stokes 1963)

In sum we nd that the political power of unions rests in part on their ability to mobilizecampaign contributions and to help geing Democratic candidates elected Consistentwith arguments based on mobilization threats and rational politicians these results alsohelp to explain the puzzle documented by previous studies that inequalities in turnout orcontacting ocials alone do not appear to explain most of the observed income gap inpolitical responsiveness (Bartels 2008 Ellis 2013 Erikson 2015)

VII Conclusion

As Dahl (1961) famously asked who governs in a polity where political rights are equallydistributed but where large inequalities in income and wealth (may) bias representation Inthe wake of rising income inequality in the United States and other advanced economiesscholars have identied the question of political inequality as one of the central challengesfacing democracy in the twenty-rst century (see for example the report of the taskforce on Inequality and Democracy of the American Political Science Association (APSATask Force 2004)) While the scientic debate is ongoing and some results are open todierent interpretations (Erikson 2015) a growing number of studies has documentedstriking paerns of unequal responsiveness by income When policy preferences divergeacross income groups legislators and public policy are biased toward the auent at theexpense of the middle-class andmdashespeciallymdashthe poor Many recent works conclude byasking what factors may improve political representation of the economically disadvantaged

We contribute to this body of research by analyzing whether labor unions serve as acollective voice institution that limits unequal representation in the House of RepresentativesAgainst the wide-spread view that unions are either too weak or too narrow to mitigatepolitical inequality in the national arena we nd that the district-level strength of unionsis clearly linked to the responsiveness of legislators to dierent income groups Whilelegislators are on average more responsive to the preferences of the auent than to thepreferences of the poor this representation gap is highly variable It is much less pronouncedin districts where union membership is relatively higher is result is in line with evidenceon state-level policy responsiveness (Flavin 2018)

26

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 28: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Our ndings cast a somewhat less pessimistic light on democratic representation inCongress Despite high income inequality polarization expensive campaigns and a legisla-ture dominated by auent politicians (Carnes 2013 Gilens 2012 Hacker and Pierson 2010McCarty et al 2006) our evidence indicates that unequal representation is not hard-wiredinto the fabric of American democracy We also nd suggestive evidence that public sectorunions to whom union membership has been shiing over the last decades do not appearto be less of a collective voice for the less well-o than private sector unions

Admiedly the observational nature of our data makes it challenging to draw causalconclusions However our within-district research design combined with rich data on possi-ble confounds and exible statistical specications allows us to rule out a host of alternativeexplanations Going beyond the few existing studies that directly examine the eect ofunions on unequal representation we demonstrate that the moderating eect of unionson legislative responsiveness is not simply a result of state-level policies or institutionsdistrict-level socio-economic structure workersrsquo propensity to organize or broader paernsof associational life and it is robust to relaxing parametric modeling assumptions Ourempirical strategy was made possible by combining local-level administrative data on unionswith extensive public opinion data capturing within-district variation in opinion polarizationacross numerous issues As a result our interpretation of the results is that it is unlikelythat the eects of unions are spurious More broadly a focus on real-world variation inmass organizations is a necessary complement to eld-experimental studies of unequalresponsiveness and their ability to isolate biases in response to personal contacts as wellas the eectiveness of particular strategies of inuence (Butler 2014 Kalla and Broockman2016)

Our ndings have important implications for the direction of future research on repre-sentation First they encourage research on unequal representation to pay more aention tounions Beyond Congress our data on local unions can also be mapped to districts of statelegislatures Similarly existing work in the nascent comparative literature on the topic hasdirected its focus on political institutions (Bartels 2017 Lupu and Warner 2017) includingthe role of labor unionsmdashtraditionally a strong force in many European countriesmdashwouldpaint a clearer picture of the drivers of equal versus unequal representation of citizensrsquointerests in the political arena Second a fuller understanding of representation requiresgoing beyond taking citizensrsquo preferences as given Unions are a prime target for studyinghow economic groups may shape mass preferences as well as political responses to thosepreferences Unionsrsquo inuence on preferences may work through leadership or socialization(Ahlquist et al 2014 Kim and Margalit 2017) but also through directly through labor marketsand economic inequality (Ahlquist 2017)

27

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 29: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Appendices

A Data

In this appendix we present additional details on our dataset including details on thecreation of some control variables and descriptive statistics

Matched roll calls Table A1 displays Congressional roll calls matched to CCES items Weselected congressional roll calls based on content and when several choices were availablebased on their proximity to CCES eldwork periods

Income thresholds Table A2 presents an overview of the income thresholds we use toclassify CCES respondents into income groups We use two thresholds separating the lowestand highest income terciles We calculate them from yearly American Community Surveyles excluding individuals living in group quarters For each congress Table A2 shows theaverage of all district-specic thresholds as well as the smallest and largest ones

Descriptive statistics Table A3 shows descriptive statistics for all variables used in ouranalysis Note that these are for the untransformed variables In our empirical models westandardize all inputs to have mean zero and unit standard deviation

Public unions Public unions captured (by name) in our data include the American Federa-tion of State County amp Municipal Employees National Education Association AmericanFederation of Teachers American Federation of Government Employees National Associa-tion of Government Employees United Public Service Employees Union National TreasuryEmployees Union American Postal Workers Union National Association of Leer CarriersRural Leer Carriers Association National Postal Mail Handlers Union National Allianceof Postal and Federal Employees Patent Oce Professional Association National LaborRelations Board Union International Association of Fire Fighters Fraternal Order of PoliceNational Association of Police Organizations various local police associations and variouslocal public school unions

28

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 30: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Table A1Matched CCESndashHouse roll calls included in our analysis

Match Bill Date Name House Vote Bill(Yea-Nay) Ideologydagger

(1) HR 810 07192006 Stem Cell Research Enhancement Act (Presidential Veto override) 235-193 L(1) HR 3 01112007 Stem Cell Research Enhancement Act of 2007 (House) 253-174 L(1) S 5 06072007 Stem Cell Research Enhancement Act of 2007 247-176 L(2) HR 2956 07122007 Responsible Redeployment from Iraq Act 223-201 L(3) HR 2 01102007 Fair Minimum Wage Act 315-116 L(4) HR 4297 12082005 Tax Relief Extension Reconciliation Act (Passage) 234-197 C(4) HR 4297 05102006 Tax Relief Extension Reconciliation Act (Agreeing to Conference

Report)244-185 C

(5) HR 3045 07282005 Dominican Republic-Central America-United States Free TradeAgreement Implementation Act

217-215 C

(6) S 1927 08042007 Protect America Act 227-183 C(6) HR 6304 06202008 FISA Amendments Act of 2008 293-129 C(7) HR 3162 08012007 Childrenrsquos Health and Medicare Protection Act 225-204 L(7) HR 976 10182007 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-

dential Veto Override)273-156 L

(7) HR 3963 01232008 Childrenrsquos Health Insurance Program Reauthorization Act (Presi-dential Veto Override)

260-152 L

(7) HR 2 02042009 Childrenrsquos Health Insurance Program Reauthorization Act 290-135 L(8) HR 3221 07232008 Foreclosure Prevention Act of 2008 272-152 L(9) HR 3688 11082007 United States-Peru Trade Promotion Agreement 285-132 C(10) HR 1424 10032008 Emergency Economic Stabilization Act of 2008 263-171 L(11) HR 3080 10122011 To implement the United States-Korea Trade Agreement 278-151 C(12) HR 3078 10122011 To implement the United States-Colombia Trade Promotion Agree-

ment262-167 C

(13) HR 2346 06162009 Supplemental Appropriations Fiscal Year 2009 (Agreeing to confer-ence report)

226-202 L

(14) HR 2831 07312007 Lilly Ledbeer Fair Pay Act 225-199 L(14) HR 11 01092009 Lilly Ledbeer Fair Pay Act of 2009 (House) 247-171 L(14) S 181 01272009 Lilly Ledbeer Fair Pay Act of 2009 250-177 L(15) HR 1913 04292009 Local Law Enforcement Hate Crimes Prevention Act 249-175 L(16) HR 1 02132009 American Recovery and Reinvestment Act of 2009 (Agreeing to Con-

ference Report)246-183 L

(17) HR 2454 06262009 American Clean Energy and Security Act 219-212 L(18) HR 3590 03212010 Patient Protection and Aordable Care Act 220-212 L(19) HR 3962 11072009 Aordable Health Care for America Act 221-215 L(20) HR 4173 06302010 Wall Street Reform and Consumer Protection Act of 2009 237-192 L(21) HR 2965 12152010 Donrsquot Ask Donrsquot Tell Repeal Act of 2010 250-175 L(22) S 365 08012011 Budget Control Act of 2011 269-161 C(23) H CR 34 04152011 House Budget Plan of 2011 235-193 C(24) H CR 112 03282012 Simpson-BowlesCopper Amendment to House Budget Plan 38-382 C(25) HR 8 08012012 American Taxpayer Relief Act of 2012 (Levin Amendment) 170-257 L(26) HR 2 01192011 Repealing the Job-Killing Health Care Law Act 245-189 C(26) HR 6079 07112012 Repeal the Patient Protection and Aordable Care Act and [ ] 244-185 C(27) HR 1938 07262011 North American-Made Energy Security Act 279-147 C

Note e matching of roll calls to CCES items can be many-to-onedagger Coding of a billrsquos ideological character as (L)iberal or (C)onservative based on predominant support of bill by Democratic or Repub-

lican representatives respectively

29

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 31: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Table A2Distribution of district income-group reference points Average

threshold over all districts smallest and largest value

33th percentile 67th percentile

Congress Mean Min Max Mean Min Max

109 38123 16800 73675 77964 39612 146870110 40127 18000 77000 83047 43600 155113111 39021 17500 78262 82440 46000 160050112 37381 16500 81000 79868 38500 158654

Note Calculated from American Community Survey 1-year les Household sample excludinggroup quarters Missing income information imputed using Chained Random Forests

Table A3Descriptive statistics of analysis sample

Mean SD Min Max N

Roll-call vote yea 0568 0495 0000 1000 15780Constituent preferences

Low income 0593 0220 0047 0979 15934High income 0555 0198 0037 0967 15934Low-High Gap 0172 0121 0000 0588 15934

Union membership [log] 9705 1046 6094 13619 15934Population 7022 0723 4697 9980 15934Share African American 0124 0146 0004 0680 15934Share Hispanic 0156 0174 0005 0812 15934Share BA or higher 0275 0097 0073 0645 15934Median income [$10000] 5177 1356 2282 10439 15934Share female 0508 0010 0462 0543 15934Manufacturing share 0110 0047 0025 0281 15934Urbanization 0790 0199 0213 1000 15934Certication elections [log] 3347 0861 0000 5100 15934Congregations [per 1000 persons] 0765 1147 0062 6453 15934

Note Calculated from American Community Survey 2006-2013 Note that when entered in models vari-ables are scaled to mean zero and unit SD Preference gap is absolute dierence in preferences betweenlow and high income constituents in sample Urbanization is calculated as the share of the district pop-ulation living in an urban area based on the Censusrsquo denition of urban Census blocks (matched tocongressional districts using the MABLE database) Congregations per 1000 inhabitants calculatedfrom RCMS 2000 (spatially interpolated)

30

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 32: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

B Estimation of District Preferences

In this section we describe how we estimate district-level preferences using threedierent strategies (i) small area estimation using a matching approach based on randomforests (which we use in the main text of our paper) (ii) estimation using multilevel regressionand post-stratication (MRP) and (iii) unadjusted cell means Each approach invokesdierent statistical and substantive assumptions In the spirit of consilience our aim here isto show that our substantive results do not depend on any particular choice

B1 Small Area Estimation via Chained Random Forests

e core idea of our small area estimation strategy is based on the fact that we have accessto two samples one that is likely not representative of the population of all Congressionaldistricts (the CCES) while the second one is representative of district populations by virtue ofits sampling design (the Census or American Community Survey) By matching or imputingpreferences from the former to the laer based on a common vector of observable individualcharacteristics we can use the district-representative sample to estimate the preferences ofindividuals in a given district25

Combining CCES and Census data using Random Forests Figure B1 illustrates this approachin more detail We have data fromm individuals in the CCES and n individuals in the Census(with n m) Both sets of individuals share K common characteristics Zk such as age raceor education e rst task at hand is then to match P roll call preferences Yp that are onlyobserved in the CCES to the census sample is is a purely predictive task and it is thuswell suited for machine learning approaches We use random forests (Breiman 2001) to leanabout Yp = f (Z1 ZK ) for p = 1 P using the algorithm proposed by Stekhoven andBuhlmann (2011) is approach has two key advantages First as is typical for approachesbased on regression trees it deals with both categorical and continuous data allows forarbitrary functional forms and can include higher order interactions between covariates(such as agetimesracetimeseducation) Second we can assess the quality of the predictions basedon our model before we deploy it to predict preferences in the Census With the trainedmodel in hand we can use f (Z1 ZK ) in combination with observed Z in the Censussample to ll in preferences (ie completing the square in the lower right of Figure B1)Using the completed Census data we can estimate constituent district preferences as simpleaverages by district and income group since the Census sample is representative for eachCongressional districtrsquos population

Data details Due to data condentially constraints the Census Bureau does not providedistrict identiers in its micro-data records Instead it identies 630 Public Use Microdata

25See Honaker and Plutzer (2016) for a more explicit exposition of this idea evidence for its empirical reliabilityand a comparison to MRP estimates

31

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 33: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Zi1 ZiK

Covariates Preferences

Yi1 YiPUnits

1

m

m+ 1

2

m+ 2

m+ n

Z11 Z1K

Z21 Z2K

Zm1 ZmK

Y11 Y1P

Y21 Y2P

Ym1 YmP

Zm+11 Zm+1K

Zm+21 Zm+2K

Zm+n1 Zm+nK

NA NA

NA NA

NA NA

CCES

Census Y lowastp = f(Z)

Yp = f(Z)

RandomForest

train

predict

Figure B1Illustration of Small Area Estimation of District Preferences

We use a sample ofm individuals from the CCES that is not necessarily representative on the district-levelwhile a sample of n individuals from the Census is representative of district populations by design (Torrieri etal 2014 Ch4) We have access to bridging covariates Zk that are common to both samples while roll callpreferences Yp are only observed in the CCES We train a exible non-parametric model relating Yp to Z anduse it to predict preferences Y lowastp for Census individuals with characteristics Z With preference values lled ina districtrsquos income-group specic roll call preference can be estimated as the average of all units in that district

areas We create a synthetic Census sample for Congressional districts by sampling indi-viduals from the full Census PUMA regions proportional to their relative share in a givendistricts is information is based on a crosswalk from PUMA regions to Congressionaldistricts created by recreating one from the other based on Census tract level population datain the MABLE Geocorr2K database e lsquodonor poolrsquo for this synthetic sample are the 1extracts for the American Community Survey 2006-2011 We limit the sample to non-groupquarter households and to individuals aged 17 and older providing us with data on 14 million(13711248) Americans From this we create the synthetic district le which is comprisedof 3040265 cases is provides us with a Census sample including Congressional districtidentiers e sample for each district is representative of the district population (savefor errors induced by the crosswalk) We thus use the distribution of important populationcharacteristics (age gender education race income) to match data on policy preferencesfrom the CCES

We harmonize all covariates to be comparable between CCES and Census For familyincome this entails an adjustment to the measure provided in the CCES It asks respondentsto place their familyrsquos total household income into 14 income bins26 We transform thisdiscretized measure of income into a continuous one using a nonparametric midpoint

26e exact question wording is ldquoinking back over the last year what was your familyrsquos annual incomerdquoe obvious issue here is that it is not clear which income concept this refers to (or rather which on the

32

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 34: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Pareto estimator (Henson 1967) It replaces each bin with its midpoint (eg the thirdcategory $20000 to $29999 gets assigned $25000) while the value for the nal open-ended bin is imputed from a Pareto distribution (eg Kopczuk et al 2010) Using midpointshas been recognized for some time as an appropriate way to create scores for incomecategories (without making explicit distributional modeling assumptions) ey have beenused extensively for example in the American politics literature analyzing General SocialSurvey (GSS) data (Hout 2004)

Algorithm details For easier exposition dene a matrix D that contains both individualcharacteristics and roll call preferences Let N be the number of rows of D For any givenvariable v of D Dv with missing entries at locations i(v)mis sube 1 N we can separate outfour parts27

bull Observed values of Dv denoted as y(v)obs

bull Missing values of Dv y(v)mis

bull Variables other than Dv with available observations i(v)obs= 1 N i(v)mis x

(v)obs

bull Variables other than Dv with observations i(v)mis x(v)mis

We now cycle through variables iteratively ing random forest and lling in unobservedvalues until a stopping criterion c (indicating no further change in lled-in values) is metAlgorithmically we proceed as follows

Algorithm 1 Chained Random Forests1 Start with initial guesses of missing values in D

2 w larr vector of column indices sorted by increasing fraction of NA3 while not c do4 D

impoldlarr previously imputed D

5 for v in w do6 Fit Random Forest y(v)

obssim x (v)

obs

7 Predict y(v)mis using x (v)mis

8 Dimpnew larr updated imputed matrix using predicted y(v)mis

9 Updated stopping criterion c

10 Return completed Dimp

To assess the quality of this scheme we inspect the prediction error of the random forestsusing the out-of-bag (OOB) estimate (which can be obtaining during the bootstrap for each

respondent employs) In line with the wording used in many other US surveys we interpret it as referringto market income

27Note that this setup deals transparently with missing values in individual characteristics (such as missingeducation)

33

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 35: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

tree) We nd it to be rather small in our application most normalized root mean squarederrors are around 011 is result is in line with simulations by Stekhoven and Buhlmann(2011) who compare it to other prediction schemes based on K nearest neighbors EM-typeLASSO algorithms or multivariate normal schemes and nd it to perform comparativelywell with both continuous and categorical variables28

B2 Multilevel Regression and Poststratication

e approach described in the last section is closely related to MRP (Gelman andLile 1997 Park et al 2006 Lax and Phillips 2013) which has become quite popular inpolitical science Both strategies involve ing a model that is predictive of preferencesgiven observed characteristics followed by a weighting step that re-balances observedcharacteristics to their distribution in the Census What dierentiates MRP from the previousapproach is that it imposes more structure in the modeling step both in terms of functionalform and distributional assumptions By utilizing the advantages of hierarchical models withnormally distributed random coecients it produces preference estimates that are shrunkentowards group means (Gelman et al 2013 116f)29 No such structural assumptions are madewhen matching preferences to the Census using Random Forests It will thus be instructiveto compare how much our results depend on such modeling choices which we do in thenext section

MRP implementation For each roll call item in the CCES we estimate a separate modelexpressing the probability of supporting a proposal as a function of demographic character-istics e demographic aributes included in our model broadly follow Lax and Phillips(2009 2013) and are race gender education age and income30 Race is captured in threecategories (white black other) education in ve (high school or less some college 2-yearcollege degree 4-year college degree graduate degree) Age is comprised of 6 categories(18-29 30-39 40-49 50-59 60-69 70+) while income is comprised of 13 categories (withthresholds 10 15 20 25 30 40 50 60 70 80 100 120 150 [in $1000]) Our model alsoincludes district-specic intercepts For each roll-call we estimate the following hierarchical

28See Tang and Ishwaran (2017) for further empirical validation of this strategy See also Honaker and Plutzer(2016) who compare a similar matching strategy (but based on a multivariate normal model) with MRPestimated preferences using the CCES

29is might be especially appropriate when some groups are small e median number of respondents perdistrict in the CCES is 506 and no district has fewer than 192 sampled respondents But since we slicepreferences further by income sub-groups one may be worried that the sample size in some districts issmall MRP deals with this potential issue at the cost of making distributional assumptions

30We also estimated a version of the model including a macro-level predictor which has been found to improvethe quality of the model We use the demographically purged state predictor of Lax and Phillips (2013 15)that is the average liberalndashconservative variation in state-level public opinion that is not due to variationdemographic predictors In our case this produces rather similar MRP estimates

34

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 36: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

model using penalized maximum likelihood (Chung et al 2013)

Pr (Yi = 1) = logitminus1(β0 + αracej[i] + α

дenderk[i]

+ αaдel[i]+ αeducm[i] + α

incomen[i] + αdistrictd[i]

)(B1)

We employ the notation of Gelman and Hill (2007) and denote by j[i] the category j to whichindividual i belongs Here β0 is an intercept and the αs are hierarchically modeled eectsfor the various demographic groups Each is drawn from a common normal distributionwith mean zero and estimated variance σ 2

αracej sim N(0σ 2

race

) j = 1 3 (B2)

αдenderk

sim N(0σ 2

дender

) k = 1 2 (B3)

αaдelsim N

(0σ 2

aдe

) l = 1 6 (B4)

αeducm sim N(0σ 2

educ

) m = 1 5 (B5)

α incomen sim N

(0σ 2

income

) n = 1 13 (B6)

is setup induces shrinkage estimates for the same demographic categories in dierentdistricts Note that using xed eects for characteristics with few categories (Specicallygender) does not impact our results e district intercepts are drawn from a normaldistribution with state-specic means αs[d] and freely estimated variance

αd sim N(αstates[d] σ

2state

) (B7)

Our nal preferences estimates for each income group on each roll call are obtained by usingcell-specic predictions from the above hierarchical model weighted by the populationfrequencies (obtained from our Census le) for each cell in each congressional district

B3 Model results under various preference estimation strategies

e estimates of district-level preferences obtained via our SAE approach and MRPare in broad agreement e median dierence in district preferences between SAE andMRP is 25 percentage points for low income and minus01 percentage points for high incomeconstituents A large part of this dierence is due to the heavier tails of the distribution ofdistrict preferences for each roll call estimated by our approachmdashperhaps not surprisinggiven the shrinkage characteristics of MRP To what extent do these dierences in thedistribution of preferences aect our estimated union eects

Table B1 shows estimates for our six main specications using three dierent mea-surement strategies for district preferences Panel (A) shows our approach contrasted toMRP-based preferences in panel (B) e results are unequivocal using MRP estimatedpreferences leads to more pronounced estimates in all specications Using specication (6)

35

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 37: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

which includes state policies measures of district organizational capacity district covariatesinteracted with preferences as well as district xed eects we nd that a unit increasein union membership increased responsiveness of legislators towards the preferences oflow income constituents by about 12 (plusmn2) percentage points (compared to only 6 pointsusing our measurement strategy) Responsiveness estimated for high income preferencesare similarly larger Note that while larger all estimates also carry increased condenceintervals

Table B1Model results using dierent strategies to estimate district-level preferences Entries are

marginal eects of standard deviation increase in union membership on marginal eect ofincome group preferences on legislator vote

(1) (2) (3) (4) (5) (6)

A Small Area Estimation via Chained Random Forests

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B Multilevel Regression amp Poststratication

Low income preferences 0182 0158 0181 0162 0115 0115(0021) (0024) (0026) (0020) (0022) (0022)

High income preferences minus0136 minus0119 minus0139 minus0122 minus0091 minus0091(0017) (0019) (0021) (0017) (0018) (0018)

C Raw CCES means

Low income preferences 0080 0061 0063 0072 0043 0045(0010) (0011) (0012) (0010) (0011) (0011)

High income preferences minus0027 minus0013 minus0010 minus0027 minus0018 minus0024(0008) (0008) (0008) (0008) (0008) (0009)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using dierent strategies to estimate district-level preferences of three income groups

As a further point of comparison panel (C) shows preferences estimated via raw cellmeans in the CCES Due to the the issues discussed above the raw data should not be taken

36

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 38: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

as a yardstick but it is nonetheless informative to see how much the results vary Our coreresults even obtain when we simply use raw cell means without any statistical modeling tocounter non-representative distributions of individual characteristics and small cell sizesWe nd that in our strictest specication a unit increase in union membership still increasesresponsiveness towards low income constituents by about 5 (plusmn1) percentage points

In sum all three approaches lead to the same qualitative conclusions about the moderat-ing eect of unions on unequal representation in Congress e two alternative approachesto deal with the problem that CCS surveys are not representative of congressional districtsby design suggest that a larger eect of unions than the naive approach using the unadjustedsurvey data antitatively our preferred estimates are based on small area estimation viarandom forests as they are less reliant on normality assumptions and are systematicallymore conservative than those based on MRP

C Alternative Income Thresholds

is section discusses the impact of dierent income thresholds on our results Panel (A)of Table C1 replicates Table I in the main text Here preferences of income groups are basedon a district-specic income thresholds spliing the population into three groups (at the33rd and 66th percentile) us in our model voters are classied as lsquolow incomersquo relative toother voters in their congressional district For example during the 111th Congress a voterwith an income of $40000 would be part of the low income group in most of Massachusesrsquodistricts (where low income thresholds vary from about $40000 to $50000) but not in the8th (where the threshold is about $30000) If income threshold were state-specic insteadhe or she would be considered low income everywhere in the state (as the state-specic lowincome threshold is now asymp$47000) Not all states display as much variation in income-groupthresholds us using state- instead of district-specic thresholds does not alter our coreresults in an appreciable way As Panel (B) shows the resulting marginal eects estimatesfor all six model specications are remarkably similar when using preferences of incomegroups dened by state-specic thresholds In panel (C) we no longer divide the populationinto three equally sized income groups Instead we restrict the low-income group to onlythose below the 20th percentile of the (district-specic) income distribution Similarly weclassied as high income only those above the 80th percentile Our resulting estimates forthe union-responsiveness marginal eects are slightly smaller but still of a substantivelyrelevant magnitude and statistically dierent from zero

37

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 39: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Table C1Model results using dierent denitions of income groups Marginal eect of standard

deviation increase in union membership on marginal eect of income group preferenceson legislator vote

(1) (2) (3) (4) (5) (6)

A District-specic income thresholds

Low income preferences 0106 0082 0098 0084 0068 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0063 minus0036 minus0053 minus0051 minus0050 minus0040(0012) (0013) (0014) (0013) (0013) (0014)

B State-specic income thresholds

Low income preferences 0105 0082 0097 0083 0067 0062(0013) (0015) (0016) (0013) (0014) (0014)

High income preferences minus0062 minus0036 minus0052 minus0050 minus0049 minus0039(0012) (0013) (0014) (0013) (0013) (0013)

C Shied income thresholds p20 - p80

Low income preferences 0098 0077 009 0078 0063 0057(0012) (0013) (0014) (0012) (0013) (0013)

High income preferences minus0054 minus0031 minus0046 minus0044 minus0044 minus0034(0011) (0012) (0012) (0011) (0012) (0012)

District xed eects X X X X X X

Group preferencestimes union policy X X

times state constants X

times organizational capacity X X

times district covariates X X

Note Replicates Table I in the main text using income groups dened via dierent income thresholds

38

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 40: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

D Measures of District Organizational Capacity

In the empirical analysis reported in the main text we use two proxies for the orga-nizational capacity of workers union certication elections and the number of religiouscongregations Here we provide some background and explain in more detail how wecalculate both variables

NLRB certication elections e formation of unions is regulated by the National LaborRelations Act (NLRB) enacted in 1935 (see Budd 2018 ch 6) A successful union organizationprocess usually requires an absolute majority of employees voting for the proposed union ina certication election held under the guidelines of the NLRB Geing the NLRB to conductan election requires that there is sucient interest among employees in an appropriatebargaining unit to be represented by a union For proof of sucient interest the NLRBrequires that at least 30 of employees sign an authorization card stating they authorize aparticular union to represent them for the purpose of collective bargaining Building supportand collecting the required signatures takes organizational eort For workers unionizationhas features of a public good Everybody may gain through beer conditions from collectivebargaining but contributing to the organizational drive is costly for each individual Beyondmere opportunity costs there also is a non-zero risk of being (illegally) red by the employerfor those especially active If more than 50 of employees sign authorization cards thenthe union can request voluntary recognition without a certication election However theemployer has the right to deny this in which case a certication election is held In hislabor relations textbook Budd (2018 199) notes that voluntary card check recognition isldquothe exception rather than the norm because employers typically refuse to recognize unionsvoluntarilyrdquo

We use the NLRBrsquos database on election reports to extract all aempts to certify (orde-certify) a local union ey are available from wwwnlrbgov Each database entry is avote concerning a bargaining unit the average unit size is 25 employees ere are about2200 elections each year Each individual case le usually provides address information onthe employer and the site where the election was held Using this information we geocodeeach individual case report and locate it in a congressional district Figure D1 shows theresulting variation in certication elections over districts

Congregations As a proxy for district level social capital we use the number of congrega-tions per inhabitant e number of congregations in a given district is not readily availablefor the years covered in our study erefore we spatially aggregate county-level mea-sures from the 2010 Religious Congregations and Membership Study to the congressionaldistrict level using areal interpolation techniques that take into account the populationdistribution between counties and districts We use a geographic country-to-district equiva-lence le calculated from Census shapeles is is combined with population weights foreach country-district intersection derived using the Master Area Block Level Equivalency

39

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 41: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

62 minus 16439 minus 6226 minus 3913 minus 260 minus 13

Figure D1Total number of union certication elections in in House districts (109th-112th Congress)

database v133 (available from the Missouri Census Data Center) which calculates thembased on about 53 million Census blocks With these weights in hand we can interpolatecounty-level to district-level congregation counts using weighted means (for states withat-large districts this reduces to a simple summation as counties are perfectly nested withindistricts)

40

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 42: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

E Additional Robustness Test

In this section we describe several additional robustness tests

11 mapping of CCES preferences to roll calls We begin by limiting our sample by creatinga unique mapping between preferences and roll call votes Some of our CCEs preferencesestimates are linked to more than one Congressional roll call To investigate if this aectsour results specication (1) uses a 11 map dropping additionally available roll calls aerthe rst match is reduces the sample size to 11104 respondents We nd that our resultsare not inuenced by this change

Table E1Additional robustness tests

Low income High incomepreferences preferences N

(1) Injective preference roll call map 0063 (0013) minus0041 (0013) 11104(2) Extreme preferences excl 0074 (0016) minus0048 (0015) 13308(3) New York excluded 0070 (0015) minus0048 (0014) 14730(4) Local Union Concentration 0065 (0014) minus0047 (0014) 15780(5) Trimmed LPM estimator 0074 (0015) minus0055 (0014) 15426(6) Errors-in-variables 0062 (0004) minus0054 (0004) 15345

Note Based on specication (5) of Table I (4) used trimmed estimator of Horrace and Oaxaca (2006) Specication (5)shows results from an errors-in-variables model implemented in a Bayesian framework See text for details Tableentries are posterior means and standard deviations

Extreme preferences excluded In specication (2) we investigate if extreme district prefer-ences on some roll calls drive our results To do so we trim the distribution of preferences atthe boom and the top For each roll call we exclude districts with preference estimates belowthe 5th and above the 95th percentile Using only trimmed preferences has no appreciableimpact on our estimates

New York excluded Another test estimates our model with the state of New York excludedfrom the sample In earlier work we found that our estimates of union strength correlatehighly with aggregated state-level estimates derived from the Current Population surveyOne state where this correlation is lower is New York (cf Becher et al 2018) In specication(3) we show that our results are not aected by its exclusion

Union Concentration Our data on local unions are from Becher et al (2018) who alsond that the local concentration of unions is an important dimension While Becher et al(2018) show that both dimensions (membership and concentration) vary independently itis prudent to check if our results on the impact of union membership on representation

41

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 43: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

still obtain when accounting for the structure of union organization In specication (4) weshow this to be the case

Trimmed LPM estimator A h more technical specication implements the trimmedestimator suggested by Horrace and Oaxaca (2006) It accounts for the fact that we estimatea linear probability model to a binary dependent variable which entails the possibility thatthe model-implied linear predictor lies outside the unit interval Our results in Table E1indicate that this change does not materially aect our core results (if anything they becomeslightly larger)

Errors-in-variables Our nal test accounts for the errors-in-variables problem caused by thefact that our district preference measures are based on estimates While in general standarderrors for our district-level estimates are quite small relative to the quantity being measuredand one expects a downward bias in parameter estimates in a linear model with errors-in-variables we estimate this specication to get a sense of the quantitative magnitude of thechange in parameter estimates31 We nd that adjusting for measurement error producesvery lile quantitative change both estimates are within the condence bounds of ournon-corrected estimates

F Post-Double-Selection Estimator

e post-double-selection models in the main text provide a relaxation of the linearityand exogeneity assumptions made in our main model To do so we use the double-post-selection estimator proposed by Belloni et al (Belloni et al 2013 2017) Specically thismodel setup aims to reduce the possible impact of omied variable bias by accounting fora large number of confounders in the most exible way possible is can be achieved bymoving beyond restricting confounders to be linear and additive and instead consideringa exible unrestricted (non-parametric) function is leads to the formulation of thefollowing partially linear model (Robinson 1988) equation (for ease of exposition we omit

31We implement this model in a Bayesian framework where we incorporate the measurement error modeldirectly into the posterior distribution To specify the variance of the measurement error for low and highincome group preferences we average the standard errors of the district-group means from the raw CCESdata (pre-Census matching) Measurement error variance is slightly larger for low income preferences(0029) than for high income preferences (0025) We use the setup proposed in Richardson and Gilks (1993)implemented in Stan (v2170) and estimated (due to the size of our data set) using mean eld variationalinference We use normal priors with mean zero and standard deviation (SD) of 100 for all regressioncoecients and inverse Gamma priors with shape and scale 001 for residuals In the measurement errorequation we use normal priors with mean zero and SD of 10 for the mean of the measurement error and astudent-t prior with 3 degrees of freedom and mean 1 SD 10 for the standard deviation of the measuremente reported entries are posterior means and standard deviations

42

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 44: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

district xed eects in the notation and ignore i subscripts)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd + д(Zd) + ϵjd (F1)

with E(ϵjd |ZsUd θjd) = 0 Here y is the vote of a representative in a given district Ud isthe level of union density e function д(Zd) captures the possibly high-dimensional andnonlinear inuence of confounders (interacted with income group preferences) e utilityof this specication as a robustness tests stems from the fact that it imposes no a priorirestriction on the functional form of confounding variables A second key ingredient in amodel capturing biases due to omied variables is the relationship between the treatment(union density) and confounders erefore we consider the following auxiliary treatmentequation

Ud =m(Zd) +vi E(vi |Zd = 0) (F2)

which relates treatment to covariates Zd e function m(Zd) summarizes the confoundingeect that potentially create omied variable bias if m 0 which is to be expected in anobservational study such as ours

e next step is to create approximations to both д(middot) and m(middot) by including a largenumber (p) of control terms wd = P(Zd) isin R

p ese control terms can be spline transformsof covariates higher order interaction terms etc Even with an initially limited set ofvariables the number of control terms can grow large say p gt 200 To limit the number ofestimated coecients we assume that д andm are approximately sparse (Belloni et al 2013)and can be modeled using s non-zero coecients (with s p) selected using regularizationtechniques such as the LASSO (see Tibshirani 1996 see Ratkovic and Tingley 2017 for arecent exposition in a political science context)

yjd = microlθ ljd + micro

hθhjd + ηlUdθ

ljd + η

hUdθhjd +w

primedβд0 + rдd + ζjd (F3)

Ud = wprimedβm0 + rmi +vd (F4)

Here rдi and rmi are approximation errorsHowever before proceeding we need to consider the problem that variable selection

techniques such as the LASSO are intended for prediction not inference In fact a ldquonaiverdquoapplication of variable selection where one keeps only the signicantw variables in equation(F3) fails It relies on perfect model selection and can lead to biased inferences and misleadingcondence intervals (see Leeb and Potscher 2008) us one can re-express the problemas one of prediction by substituting the auxiliary treatment equation (F4) for Dd in (F3)yielding a reduced form equation with a composite approximation error (cf Belloni et al2013) Now both equations in the system represent predictive relationships and are thusamenable to high-dimensional selection techniques

Note that using this dual equation setup is also necessary to guard against variableselection errors To see this consider the consequence of applying variable selection tech-

43

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 45: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

niques to the outcome equation only In trying to predict y with w an algorithm (such asLASSO) will favor variables with large coecients in β0 but will ignore those of intermediateimpact However omied variables that are strongly related to the treatment ie with largecoecients in βm0 can lead to large omied variable bias in the estimate of η even whenthe size of their coecient in β0 is moderate e Post-double selection estimator suggestedby Belloni et al (2013) addresses this problem by basing selection on both reduced formequations Let I1 be the control set selected by LASSO of yjd on wd in the rst predictiveequation and let I2 be the control set selected by LASSO ofUd on wd in the second equationen parameter estimates for the eects of union density and the regularized control setare obtained by OLS estimation of equation (F1) with the set I = I1 cup I2 included as controls(replacing д(middot)) In our implementation we employ the root-LASSO (Belloni et al 2011) ineach selection step

is estimator has low bias and yields accurate condence intervals even under moderateselection mistakes (Belloni and Chernozhukov 2009 Belloni et al 2014)32 Responsible forthis robustness is the indirect LASSO step selecting the Ud-control set It nds controlswhose omission leads to ldquolargerdquo omied variable bias and includes them in the model Anyvariables that are not included (ldquoomiedrdquo) are therefore at most mildly associated to Ud andyjd which decidedly limits the scope of omied variable bias (Chernozhukov et al 2015)

G Nonparametric Evidence for Union-Preferences Interaction

As discussed in the main text we want to estimate a specication that makes as lile apriori assumptions about functional form relationships between variables (including theirinteractions) us we non-parametrically model yijd = f (z) with z = [θ l

jd θh

jdUdXd] by

approximating it via Kernel Regularized Least Squares (Hainmueller and Hazle 2014)

y = Kc (G1)

Here K is an N times N Gaussian Kernel matrix

K = exp(minusZd minus zj

2

σ 2

)(G2)

with an associated vector of weights c Intuitively one can think of KRLS as a local regressionmethod which predicts the outcome at each covariate point by calculating an optimallyweighted sum of locally ed functions e KRLS algorithm uses Gaussian kernels centeredaround an observation e weights c are chosen to produce the best t to the data Sincea possibly large number of c values provide (approximately) optimal weights it makessense to prefer values of c that produce ldquosmootherrdquo function surfaces is is achieved via

32For a very general discussion see Belloni et al (2017)

44

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 46: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

regularization by adding a squared L2 penalty to the least squares criterion

clowast = argmincisinRD

[(y minus Kc)prime(y minus Kc) + λcprimeKc] (G3)

which yields an estimator for c as clowast = (K + λI )minus1y (see Hainmueller and Hazle 2014appendix) is leaves two parameters to be set σ 2 and λ Following Hainmueller andHazle (2014) we set σ 2 = D the number of columns in z and let λ be chosen by minimizingleave-one-out loss

e benet of this approach is twofold First it allows for an approximation of highlynonlinear and non-additive functional forms (without having to construct non-linear termsas we do in the post-double selection LASSO) Second it allows us to check if the marginaleects of group preferences changes with levels of union densitywithout explicitly specifyingthis interaction term (and instead learning it from the data) To do the laer one can calculatepointwise partial derivatives of y with respect to a chosen covariate z(d) (Hainmueller andHazle 2014 156) For any given observation j we calculate

party

partzUdj=minus2σ 2

sumi

ci exp(minusZd minus zj

2

σ 2

) (ZUddminus zUdj

) (G4)

ese yields as many partial derivatives as there are cases We apply a thin plate smoother(with parameters chosen via cross-validation) to plot these against district-level unionmembership in Figure IV

References

Abadie A S Athey G W Imbens and J Wooldridge (2017 November) When should youadjust standard errors for clustering NBER Working Paper No 24003

Ahlquist J (2017) Labor unions political representation and economic inequality AnnualReview of Political Science 17 409ndash432

Ahlquist J S A B Clayton and M Levi (2014) Provoking preferences Unionization tradepolicy and the ilwu puzzle International Organization 68(1) 33ndash75

Ahlquist J S and M Levy (2013) In the Interests of Others Princeton Princeton UniversityPress

Ansolabehere S and P E Jones (2010) Constituentsrsquo responses to congressional roll-callvoting American Journal of Political Science 54(3) 583ndash597

Anzia S F (2011) Election timing and the electoral inuence of interest groups Journal ofPolitics 73(2) 412ndash427

45

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 47: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Anzia S F and T M Moe (2016) Do politicians use policy to make politics the case ofpublic-sector labor laws American Political Science Review 110(4) 763ndash777

APSA Task Force (2004) American democracy in an age of rising inequality Report ofthe American Polictical Science Association Task Force on Inequality and AmericanDemocracy

Arnold D R (1990) e Logic of Congressional Action New Haven Yale University PressBartels L (2008) Unequal Democracy e Political Economy of the New Gilded Age (1st ed)

Princeton Princeton University PressBartels L (2016) Unequal Democracy e Political Economy of the New Gilded Age (2nd ed)

Princeton Princeton University PressBartels L M (2017) Political inequality in auent democracies e social welfare

decit Vanderbilt University CSDI Working Paper 5-2017 [wwwvanderbilteducsdiincludesWorking Paper 5 2017pdf]

Becher M D Stegmueller and K Kaeppner (2018) Local union organization and lawmaking in the us congress Journal of Politics 80(2) 39ndash554

Belloni A and V Chernozhukov (2009) Least squares aer model selection in high-dimensional sparse models Bernoulli 19(2) 521ndash547

Belloni A V Chernozhukov I Fernandez-Val and C Hansen (2017) Program evaluationand causal inference with high-dimensional data Econometrica 85(1) 233ndash298

Belloni A V Chernozhukov and C Hansen (2014) Inference on treatment eects aerselection amongst high-dimensional controls Review of Economic Studies 81 608ndash650

Belloni A V Chernozhukov and C B Hansen (2013) Inference for high-dimensionalsparse econometric models In D Acemoglu M Arellano and E Dekel (Eds) Advancesin Economics and Econometrics Tenth World Congress Volume 3 pp 245ndash295 CambridgeCambridge University Press

Belloni A V Chernozhukov and L Wang (2011) Square-root lasso pivotal recovery ofsparse signals via conic programming Biometrika 98(4) 791ndash806

Berelson B R P F Lazarsfeld and W McPhee (1954) Voting A Study of Opinion Formationin a Presidential Campaign Chicago University of Chicago Press

Bhai Y and R S Erikson (2011) How poorly are the poor represented in the us senateIn P K Enns and C Wlezien (Eds) Who Gets Represented pp 223ndash246 New York RusselSage Foundation

Box-Steensmeier J M L W Arnold and C J W Zorn (1997) e strategic timing ofposition taking in congress A study of the north american free trade agreement AmericanPolitical Science Review 91(2) 324ndash338

Breiman L (2001 Oct) Random forests Machine Learning 45(1) 5ndash32Broockman D E and C Skovron (2018) Bias in perceptions of public opinion among

political elites American Political Science Review 112(3) 542ndash563Brunner E S L Ross and W Ebonya (2013) Does less income mean less representationAmerican Economic Journal Economic Policy 5(2) 53ndash76

46

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 48: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Budd J W (2018) Labor Relations Striking a Balance (5 ed) New York NY McGraw-HillEducation

Butler D M (2014) Representing the Advantaged New York Cambridge University PressButler D M and A M Dynes (2016) How politicians discount the opinions of constituents

with whom they disagree American Journal of Political Science 60(4) 975ndash989Butler D M and D W Nickerson (2011) Can learning constituency opinion aect how

legislators vote results from a eld experiment arterly Journal of Political Science 6(1)55ndash83

Cameron A C and D L Miller (2015) A practitionerrsquos guide to cluster-robust inferenceJournal of Human Resources 50(2) 317ndash372

Card D (1996) e eect of unions on the structure of wages A longitudinal analysisEconometrica 64(4) 957ndash979

Carnes N (2013) White-Collar Government e Hidden Role of Class in Economic PolicyMaking Chicago IL University of Chicago Press

Chernozhukov V C Hansen and M Spindler (2015) Valid post-selection and post-regularization inference An elementary general approach Annual Review of Eco-nomics 7 (1) 649ndash688

Chung Y S Rabe-Hesketh V Dorie A Gelman and J Liu (2013) A nondegenerate penalizedlikelihood estimator for variance parameters in multilevel models Psychometrika 78(4)685ndash709

Dahl R A (1961) Who Governs New Haven Yale University PressDark T E (1999) e Unions and the Democrats Ithaca Cornell University PressDuan N (1983) Smearing estimate A nonparametric retransformation method Journal ofthe American Statistical Association 78(383) 605ndash610

Ellis C (2013) Social context and economic biases in representation Journal of Politics 75(3)773ndash786

Elsasser L S Hense and A Schafer (2017) ldquodem deutschen volkerdquo die ungleiche respon-sivitat des bundestags Zeitschri fur Politikwissenscha 27 (2) 161ndash180

Enns P K (2015) Relative policy support and coincidental representation Perspectives onPolitics 13(4) 1053ndash1064

Erikson R S (2015) Income inequality and policy responsiveness Annual Review of PoliticalScience 18(11-29)

Feigenbaum J A Hertel-Fernandez and V Williamson (2018) From the bargaining tableto the ballot box Political eects of right to work laws NBER Working Paper 24259[wwwnberorgpapersw22637]

Flavin A (2012) Inequality and policy representation in the american states AmericanPolitics Research 40(1) 29ndash59

Flavin P (2018) Labor union strength and the equality of political representation BritishJournal of Political Science 48(4) 1075ndash1091

47

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 49: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Flavin P and M T Hartney (2015) When government subsidizes its own Collective bar-gaining laws as agents of political mobilization American Journal of Political Science 59(4)896ndash911

Freeman R B and J Medo (1984) What Do Unions Do New York Basic BooksGelman A (2014) How bayesian analysis cracked the red-state blue-state problem Statis-tical Science 29(1) 26ndash35

Gelman A and J Hill (2007) Data Analysis Using Regression and Multilevel HierarchicalModels Cambridge University Press

Gelman A and T C Lile (1997) Poststratication into many categories using hierarchicallogistic regression Survey Methodologist 23 127ndash135

Gelman A H S Stern J B Carlin D B Dunson A Vehtari and D B Rubin (2013) Bayesiandata analysis (ird ed) Boca Raton CRC Press

Gilens M (2012) Auence and Inuence Economic Inequality and Political Power in AmericaPrinceton Princeton University Press and Russel Sage Foundation

Gilens M and B I Page (2014) Testing theories of american politics Elites interest groupsand average citizens Perspectives on Politics 12(3) 564ndash581

Hacker J S and P Pierson (2010) Winner-Take-All Politics New York NY Simon amp SchusterHainmueller J and C Hazle (2014) Kernel regularized least squares Reducing mis-

specication bias with a exible and interpretable machine learning approach PoliticalAnalysis 22(2) 143ndash168

Hainmueller J J Mummolo and Y Xu (2018) How much should we trust estimates frommultiplicative interaction models simple tools to improve empirical practice Forthcom-ing in Political Analysis

Henson M F (1967) Trends in the Income of Families and Persons in the United States1947-1964 Washington DC US Department of Commerce Bureau of the Census

Hertel-Fernandez A M Mildenberger and L Stokes (2018) Legislative staers andrepresentation in congress American Political Science Review Forthcoming https

doiorg101017S0003055418000606Hirsch B D Macpherson and W Vroman (2001) Estimates of union density by stateMonthly Labor Review 124(7) 51ndash55

Honaker J and E Plutzer (2016) Small area estimation with multiple overimputationManuscript [httphonakrpapersfilessmallAreaEstimationpdf]

Horrace W C and R L Oaxaca (2006) Results on the bias and inconsistency of ordinaryleast squares for the linear probability model Economics Leers 90 321ndash327

Hout M (2004) Geing the most out of the GSS income measures GSS MethodologicalReport 101

Jessee S A (2009) Spatial Voting in the 2004 Presidential Election American PoliticalScience Review 103(1) 59ndash81

48

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 50: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Kalla J L and D E Broockman (2016) Campaign contributions facilitate access to congres-sional ocials A randomized eld experiment American Journal of Political Science 60(3)545ndash558

Kim S E and Y Margalit (2017) Informed preferences the impact of unions on workersrsquopolicy views American Journal of Political Science 61 728ndash743

Kopczuk W E Saez and J Song (2010) Earnings Inequality and Mobility in the United StatesEvidence from Social Security Data since 1937 arterly Journal of Economics 125(1)91ndash128

Lax J R and J H Phillips (2009) How should we estimate public opinion in the statesAmerican Journal of Political Science 53(1) 107ndash121

Lax J R and J H Phillips (2013) How should we estimate sub-national opinion using mrppreliminary ndings and recommendations Paper presented at the Annual Meeting ofthe Midwest Political Science Association Chicago

Lee D S E Morei and M J Butler (2004) Do voters aect or elect policies evidencefrom the U S House arterly Journal of Economics 119(3) 807ndash859

Leeb H and B M Potscher (2008) Can one estimate the unconditional distribution ofpost-model-selection estimators Econometric eory 24(2) 338ndash376

Leighley J E and J Nagler (2007) Unions voter turnout and class bias in the US electorate1964-2004 Journal of Politics 69(2) pp 430ndash441

Lichtenstein N (2013) State of the Union A Century of American Labor (2nd ed) PrincetonPrinceton University Press

Lijphart A (1999) Paerns of Democracy Government Forms and Performance in irty-SixCountries New Haven Yale University Press

Lupu N and Z Warner (2017) Auence and congruence Unequal representation aroundthe world Manuscript [wwwnoamlupucomAampCpdf]

McCarty N K T Poole and H Rosenthal (2006) Polarized America Cambridge MA MITPress

Mian A A Su and F Trebbi (2010) e political economy of the us mortgage defaultcrisis American Economic Review 100(5) 1967ndash1998

Miler K C (2007) e view from the hill Legislative perceptions of the district LegislativeStudies arterly 32(4) 597ndash628

Miller W E and D E Stokes (1963) Constituency inuence in congress American PoliticalScience Review 57 (1) 45ndash56

Moe T M (2011) Special Interest Teachers Unions and Americarsquos Public Schools WashingtonDC Brookings Institution

Nannicini T A Stella G Tabellini and U Troiano (2013) Social capital and politicalaccountability American Economic Journal Economic Policy 5(2) 222ndash250

Park D K A Gelman and J Bafumi (2006) State-level opinions from national surveysPoststratication using multilevel logistic regression In J E Cohen (Ed) Public opinionin state politics pp 209ndash28 Stanford Stanford University Press

49

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction
Page 51: Labor Unions and Une•al Representationds381/papers/Stegmueller...Labor Unions and Une•al Representation ... Election Study (CCES) and calculate preferences on 27 concrete policy

Putnam R (1993) Making Democracy Work Princeton NJ Princeton University PressPutnam R (2000) Bowling Alone e collapse and revival of american community New

York Simon and SchusterRatkovic M and D Tingley (2017) Sparse estimation and uncertainty with application to

subgroup analysis Political Analysis 25(1) 1ndash40Rhodes J H and B F Schaner (2017) Testing models of unequal representation Democratic

populists and republican oligarchs arterly Journal of Political Science 12(s) 185ndash204Richardson S and W R Gilks (1993) A bayesian approach to measurement error problems

in epidemiology using conditional independence models American Journal of Epidemiol-ogy 138(6) 430ndash442

Rigby E and G C Wright (2013) Political parties and representation of the poor in theamerican states American Journal of Political Science 57 (3) 552ndash565

Robinson P M (1988) Root-n-consistent semiparametric regression Econometrica 56(4)931ndash954

Rosenfeld J (2014) What Unions No Longer Do Cambridge Harvard University PressRupasingha A and S J Goetz (2008) US county-level social capital data 1990-2005 e

northeast regional center for rural development Penn State University University ParkPA

Samii C (2016) Causal empiricism in quantitative research Journal of Politics 78(3) 941ndash955Schlozman D (2015) When Movements Anchor Parties Princeton Princeton University

PressSchlozman K L S Verba and H E Brady (2012) e Unheavenly Chorus Unequal PoliticalVoice and the Broken Promise of American Democracy Princeton Princeton UniversityPress

Southworth C and J Stepan-Norris (2009) American trade unions and data limitations Anew agenda for labor studies Annual Review of Sociology 35 297ndash320

Stekhoven D J and P Buhlmann (2011) Missforest non-parametric missing value imputa-tion for mixed-type data Bioinformatics 28(1) 112ndash118

Stimson J A M B Mackuen and R S Erikson (1995) Dynamic representation AmericanPolitical Science Review 89(3) 543ndash565

Tang F and H Ishwaran (2017) Random forest missing data algorithms Statistical Analysisand Data Mining e ASA Data Science Journal 10 363ndash377

Tibshirani R (1996) Regression shrinkage and selection via the lasso Journal of the RoyalStatistical Society B 58(1) 267ndash288

Torrieri N ACSO DSSD and SEHSD Program Sta (2014) American communitysurvey design and methodology United States Census Bureau [wwwcensusgovprograms-surveysacsmethodologydesign-and-methodologyhtml]

Zullo R (2008) Union membership and political inclusion Industrial and Labor RelationsReview 62(1) 22ndash38

50

  • Introduction
  • Moderating biased responsiveness in Congress
  • Data and Empirical Strategy
    • CCES data and Congressional roll calls
    • Measuring constituency preferences by income group
    • District-level union membership
    • Statistical specifications
      • Results
        • Unions and unequal legislative responsiveness
        • Further robustness tests
        • Relaxing modeling assumptions
          • Heterogeneity
          • Exploring Possible Mechanisms
          • Conclusion
          • Data
          • Estimation of District Preferences
            • Small Area Estimation via Chained Random Forests
            • Multilevel Regression and Poststratification
            • Model results under various preference estimation strategies
              • Alternative Income Thresholds
              • Measures of District Organizational Capacity
              • Additional Robustness Test
              • Post-Double-Selection Estimator
              • Nonparametric Evidence for Union-Preferences Interaction