Predicting State-Level 2016 Vote using MRP and a National Tracking Poll

Predicting State-level 2016 Vote Using MRP and a National Tracking Poll

Chad Kiewiet de Jonge and Gary Langer

Langer Research Associates

[email protected] AAPOR annual meetingNew Orleans, LouisianaMay 20, 2017

mailto:[email protected]

https://www.washingtonpost.com/news/monkey-

cage/wp/2016/12/05/which-was-the-most-accurate-national-poll-in-the-

2016-presidential-election

ABC and ABC/Post pre-election estimates, 1984-2016

Maj pty margin ABC or ABC/Post

Actual vote Diff

2016 4 pts 2 2

2012 3 4 1

2008 9 7 2

2004 1 3 2

2000 3 0 3

1996 12 8 4

1992 7 6 1

1988 8 7 1

1984 15 18 3

Avg.

2

The problem

The narrative

0%

10%

20%

30%

40%

50%

60%

70%

80%

Won popular vote, lost electoral college: 1876, 1888, 2000

The way forward*

*If this, indeed, is the destination

Meet Mr. P

Conceptual and statistical approach

Use large-area dataset to make small-area estimates

1) Fit a multilevel statistical model predicting the outcome of interest

• In this case, two outcomes: state-level turnout and vote preference

▫ Model with state-level variables

Past election turnout (for turnout model)

Census per ACS, past election vote shares (for vote preference model)

2) Use model estimates to predict the outcome variable for groups defined in a poststratification dataset

Conceptual and statistical approach

• Groups are partially pooled across states (depending on sample size)

• This leverages homogeneity of vote preference among subgroups

• In effect, a large cell-based weighting program

• And a promising alternative to uncertain state-level polls

Vote preference estimate

• Multilevel logistic regression models predicting preference for Clinton or Trump, respectively (other/undecided set to 0).

Pr 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑖𝑗

= logit−1 𝛼0 + 𝛽1 2012 𝑝𝑎𝑟𝑡𝑦 𝑠ℎ𝑎𝑟𝑒

+ 𝛽2 𝑏𝑙𝑎𝑐𝑘 𝑠ℎ𝑎𝑟𝑒 + 𝛽3 𝐻𝑖𝑠𝑝𝑎𝑛𝑖𝑐 𝑠ℎ𝑎𝑟𝑒

+ 𝛽4 𝑤ℎ𝑖𝑡𝑒 𝑒𝑣𝑎𝑛𝑔. 𝑠ℎ𝑎𝑟𝑒 +𝛼1𝑔𝑒𝑛𝑑𝑒𝑟

+𝛼2𝑎𝑔𝑒5

+ 𝛼3𝑟𝑎𝑐𝑒4 + 𝛼4

𝑒𝑑𝑢5 + 𝛼5𝑠𝑡𝑎𝑡𝑒 + 𝛼6

𝑟𝑒𝑔𝑖𝑜𝑛+ 𝛼7

𝑎𝑔𝑒5,𝑒𝑑𝑢5

+ 𝛼8𝑔𝑒𝑛𝑑𝑒𝑟,𝑒𝑑𝑢5

+ 𝛼9𝑟𝑎𝑐𝑒5,𝑎𝑔𝑒5

+ 𝛼10𝑟𝑎𝑐𝑒5,𝑒𝑑𝑢5

𝛼0 = baseline intercept• Obama and Romney vote shares by state • Black, Hispanic, evangelical white Protestant population by state• Gender, age, education, race/ethnicity• State, region• Age*education, gender*education, race*age, race*education, race*gender, race*region• Random effect for time periods

Turnout estimate

• Registered to vote at current address• Voted early or definitely will vote• Voted in 2012

Pr 𝑣𝑜𝑡𝑒𝑖= logit

−1 𝛼0

+ 𝛽1 2012 𝑉𝐴𝑃 𝑡𝑢𝑟𝑛𝑜𝑢𝑡 +𝛼1𝑔𝑒𝑛𝑑𝑒𝑟

+𝛼2𝑎𝑔𝑒5

+ 𝛼3𝑟𝑎𝑐𝑒4

+ 𝛼4𝑒𝑑𝑢5 + 𝛼5

𝑠𝑡𝑎𝑡𝑒 + 𝛼6𝑟𝑒𝑔𝑖𝑜𝑛

+ 𝛼7𝑎𝑔𝑒5,𝑒𝑑𝑢5

+ 𝛼8𝑔𝑒𝑛𝑑𝑒𝑟,𝑒𝑑𝑢5

+ 𝛼9𝑟𝑎𝑐𝑒5,𝑎𝑔𝑒5

+ 𝛼10𝑟𝑎𝑐𝑒5,𝑒𝑑𝑢5 + 𝛼11

𝑟𝑎𝑐𝑒5,𝑔𝑒𝑛𝑑𝑒𝑟

Same demos but 2012 state VAP turnout replaces previous state-level variables

Survey dataset:

ABC News and ABC News/Washington Post tracking poll

• 18 daily waves, standalone national samples

• RDD telephone, 65% cell, English/Spanish, live/fresh mix

• n=440 gen pop for first 14 waves, n=800 for final 4

• Total n=9,485 adults, n=7,778 self-reported RVs

• Avg. length 6.5 min.

• AAPOR co-op3 = 38.7%, AAPOR RR3 = 15.6%

AlaskaHawaii

WyomingDelaware

South DakotaNorth DakotaRhode Island

VermontMontana

New HampshireNebraska

MaineDC

IdahoNew Mexico

MississippiWest Virginia

KansasNevada

ArkansasUtahIowa

OklahomaConnecticut

AlabamaColorado

OregonSouth Carolina

LouisianaMissouri

MinnesotaMassachusetts

WisconsinKentuckyMaryland

ArizonaTennessee

IndianaWashingtonNew Jersey

VirginiaMichigan

North CarolinaGeorgiaIllinois

OhioPennsylvania

New YorkTexas

FloridaCalifornia

0 133 200 400 600 800 1000

Sample size per state

California 1,084Alaska 6Colorado (median) 133

State-level distribution

Poststratification

• Poststratify turnout model estimates on 2015 ACS one-year estimate• Cells for every combination of demographic variables in the model

• Total 10,200 rows• Result: estimated LV population in each subgroup

• Poststratify vote preference model estimates (Clinton, Trump) on the LV population in each subgroup• Adjust for time trends using the final survey wave random effect estimate • Assumes uniform swings; empirically justified

• Turnout and vote estimates can be aggregated to state as well as subgroup levels

Turnout model

Turnout model highlights

• Model turnout 60.9%; VAP 54.7%, VEP 60.2%

• Median absolute model error = 5.6 points

• Turnout errors decreased as actual turnout increased; model is correlated with actual turnout at r = -0.77

• Turnout errors were essentially uncorrelated with state-level Clinton-Trump margins (r = 0.01)

Perfect predictions = 0, overprediction for Clinton Overestimates of Clinton’s margin below the lineto the right, overprediction for Trump to the left. underestimates of Clinton’s margin above it.

Vote preference model

All states: • Actual popular vote: 48.2-46.1 percent (2.1 points)• MRP model 46.8 to 44.3 percent (2.5 points)• Correct winner in 49 states (+DC)• RMSE on Clinton-Trump margin 3.5 points (excluding

AK, HI, DC)• Double-digit absolute errors on Clinton-Trump margins

in HI, ND, WY, DC• Errors 5-10 pts in AK, LA, MS, NM, OK, RI, SF, WV

Swing states:• RMSE on the Clinton-Trump margin 2.5 points, including:

• 1 point in GA, WA, FL, MI• 2 to 3 points in PA, CO, MN, AZ, VA, NC, OH• 4 to 5 points in IA, NV

• MRP absolute errors on the margin were smaller than the polling average errors in 10 of 14 swing states (all but CO, AZ, VA, NV)

Vote model highlights

MRP model vs. state polling averages

State-by-state vote share estimates from other aggregators (New York Times Upshot; Princeton Election Consortium) and Survey Monkey’s 50-state poll were not available

RMSE for our Clinton estimates: 2.3 points; others: 2.7 to 3.6 points RMSE for our Trump estimates: 3.5 points; others: 4 to 6.9 points

Our MRP model vs. aggregators/modelers

Vote preference over time: Two models

Left panel (“uniform swing”): single model, all data, random effects for periodRight panel: separate models for each period (smaller ns so less stable)

Over-time highlights

First four days:• Clinton +5 (uniform swing) or +10 in popular vote• Clinton ahead in EVs (narrowly in uniform swing)

Rest of period:• Close in popular vote• Trump ahead in EVs

Narrowing of race and Trump advantage in EVs preceded Comeyletter of 10/28

Comparisons to exit poll

• MRP finds an older electorate: • 25% age 65+ vs. 16% in exit poll• 11% age 18-29 vs. 19% in exit poll

• MRP finds a less-educated electorate:• 37% college degree in MRP vs 50% in exit poll

• MRP finds a larger margin for Clinton among college graduates:• Clinton +18 points in MRP vs. +10 points in exit poll

• Including: • Clinton +34 points among postgrads in MRP vs. +21 points in exit poll• Trump +3 among college-educated white men in MRP vs. +14 in exit poll• Clinton +18 among college-educated white women in MRP vs. +7 in exit poll

2000-2016 MRP estimates, ABC/Post data

Summary of 2000-2016 results

• MRP correctly predicts 48 of 51 contests on average• Misses the EV and popular vote winner only in 2000 (flips them)• Average RMSE on D-R margins 6.7 points; 5.7 points excluding AK,

HI, DC• Average RMSE on candidate %s is 2.9-3.4 excluding AK, HI, DC• 2012 and 2016 are most accurate; may reflect increased early voting

and greater predictive power of included demographic variables • 2000-2012 models may be improved by taking greater account of

contest-specific dynamics

Discussion: Limitations

• Relies on predictive power of demographic variables

• Predicts just one outcome (per model)

• Requires a large and high-quality dataset

• Does not replace substantive measurement, e.g., on voters’ pre-political dispositions, policy preferences and views of candidate attributes

Discussion: Advantages

• MRP is a robust tool for state-level EV predictions

• Consistent in 2000-2016 analyses

• Outperforms 2016 state polls

• Outperforms 2016 state-poll aggregators and non-prob. modelers

▫ Consistent approach across states

▫ High-quality probability-based data (A+ from fivethirtyeight.com)

• Predicts national, group-level and state-level outcomes

• If EV estimates are required, MRP using a high-quality dataset is shown to provide them effectively

Thank you!

Chad Kiewiet de Jonge and Gary LangerLanger Research [email protected]

mailto:[email protected]

Data & Analytics

Predicting State-Level 2016 Vote using MRP and a National Tracking Poll