Dimension Reduction in R Essex Summer School in Data ... · Basic Space Theory • Citizens and legislators may have preferences on countless dimensions. • Generally, those preferences

Dimension Reduction in R

Essex Summer School in Data Analysis

Lecture 2: Spatial Models, Analyzing Issue Scales

Dave Armstrong

Department of Political ScienceUniversity of Wisconsin - Milwaukee

e: [email protected]: http://www.quantoid.net/teachessex/dimension/

August 11, 2015

1 / 66

Outline

Introduction to Spatial (Geometric) Models

Spatial Models of Voting

Analyzing Issue Scales

A-M Scaling in RExampleBootstrapping the A-M Result

Basic Space ScalingExampleBlackbox Transpose

2 / 66

Spatial (Geometric) Models

• 6= models for geographically organized data.

• Produce a geometric representation (spatial map) of a quantity.

• Often times, to recover the latent spatial map that gave rise to thedata.

3 / 66

European City Distances

> load("euro_cities.rda")> library(smacof)> s <- smacofSym(euro_cities)> plot(-s$conf, type="n")> text(-s$conf[,1],+ -s$conf[,2],+ rownames(s$conf))

−1.0 −0.5 0.0 0.5 1.0

−0.5

0.0

0.5

D1

D2

Barcelona Belgrade

BerlinBrussels

Bucharest

Budapest

Copenhagen

Dublin

Hamburg

Istanbul

Kiev

London

MadridMilan

Moscow

Munich

ParisPrague

Rome

Saint Petersburg

Sofia

Stockholm

Vienna

Warsaw

4 / 66

Spatial Models as Measurement

Spatial models can be used to measure latent quantities from observedindicators

• liberal-conservatism from issue scales

• ideal points from roll call or preference data

Spatial models are good means for understanding how humans processdata and the political world around them.

5 / 66

Outline






6 / 66

Spatial Model of Voting: History

• Hotelling (1929)“linear” town: optimum placement of a grocerystore is the middle - implications for political world.

• Black (1948, 1958) derived median voter theory for committees.

• Downs (1957) established spatial theory as a conceptual tool forpolitical scientists.

7 / 66

Spatial Model of Voting

−3 −2 −1 0 1 2 3

05

1015

20

Policy LocationU

tility

Figure : Gaussian (solid line), Quadratic (dashed line) and Linear (dotted line)Utility Functions for Voter with Ideal Point of 0

8 / 66

Basic Space Theory

• Citizens and legislators may have preferences on countlessdimensions.

• Generally, those preferences are well-captured by a relatively smallnumber of dimensions.

• Ideological constraint - bundling or linkage of many di↵erent issuepositions as part of a political ideology or“belief system.” (Converse1964)

• This suggests that the complex, high-dimensional issue space (whereeach attitude is represented by a separate dimension) maps onto anunderlying low-dimensional basic space

9 / 66

Table : Data Types and Appropriate Methods

Data Type Example Method ChapterPerceptual Data: Individuals Maximum Likelihood and ThreeSingle Issue places themselves and Bayesian Aldrich-Scales and/or parties on a McKelvey Scaling, Basic

liberal-conservative Space Scalingscale. (blackbox_transpose()),

Perceptual or Survey respondents Basic Space Scaling Three andPreferential Data: register their (blackbox()) and SevenMultiple Issue attitudes on a Ordinal Item ResponseScales series of ordinal Theory (IRT)

policy scales.Perceptual or An agreement score Metric, Non-Metric FourPreferential Data: matrix is created and BayesianSingle Square that shows how often MultidimensionalMatrices of each legislator Scaling (MDS)Similarity Ratings voted on the sameof Objects side as every other

legislator.Perceptual or Individuals or Metric and Non- FourPreferential Data: groups rate how Metric IndividualMultiple Square similarly they view Di↵erencesMatrices of a series of political Scaling (INDSCAL)Similarity Ratings objects/stimuli (e.g.,of Objects taxes and liberals).Preferential Data: Individuals rate Least Squares and FiveRectangular Matrices parties on a 0-100 Bayesian Unfoldingwith Preferential scale or rankRatings using Interval candidates from mostRatio-Level Scales to least-preferred.Preferential Data: Legislators cast a Parametric Unfolding SixChoices between series of roll call (NOMINATE andBinary (Yea or Nay) votes. ↵-NOMINATE),Alternatives Nonparametric Unfolding

Optimal Classification),and Bayesian IRT

10 / 66

Outline






11 / 66

Issue Scale

Respondents are often asked to place themselves and political parties,candidates, and public figures on issue scales.

• Often have a 5-point or 7-point graded scale indicating strength ofbelief in or agreement with some unidirectional attitude orproposition.

• These are called Likert-type items. A Likert scale is a summatedrating scale of Likert-type items.

• Used to collect both preferential (what is my position?) andperceptual (what are the positions of others?) data.

12 / 66

Examples

“Some people feel that the government in Washington should see to itthat every person has a job and a good standard of living. Others thinkthe government should just let each person get ahead on his/her own.”

• (1) = Government should provide jobs and a good standard of living.

• (7) = Government should stay out of it and let people get ahead onhis or her own.

“We hear a lot of talk these days about liberals and conservatives. Here isa 7-point scale on which the political views that people might hold arearranged from extremely liberal to extremely conservative.” Theendpoints are labeled:

• (1) = extremely liberal

• (7) = extremely conservative

Respondents use this scale to locate themselves and political stimuli onthe scale.

13 / 66

Di↵erential Item Functioning

Main challenge with Issues Scales - Di↵erential ItemFunctioning/Interpersonal Incomparability/Scale Use Heterogeneity

• Respondents may interpret the meaning of the scale di↵erently

• Preferences and a↵ective orientations can bias respondents’evaluations of the political world

• a very liberal survey participant may view President Barack Obama asinsu�ciently progressive, thus rating him as an ideological moderateor even conservative

• Respondents’ a↵ective orientations lead them to exaggerate thepolicy distances between themselves and stimuli they viewunfavorably while understating the policy distance betweenthemselves and stimuli they favor

• a conservative Republican who holds President Obama in very lowregard may place him at the extreme leftward end of the scale.

14 / 66

Low Information Context

Not all respondents are equally informed.

• Some respondents may reverse position of well-known entities (e.g.,democratic party more conservative than republican party).

• Low-information respondents will be more error-prone in theirplacement of stimuli on the scales.

15 / 66

Aldrich-McKelvey Scaling

• Even though respondents might not get the right absoluteplacement, they often have stimuli in the right order.

• We can assume, then, that the observed placements are lineartransformations of the true underlying scale plus some idiosyncraticnoise.

• The goal of A-M scaling is to estimate the perceptual distortion ofeach individual and use that information to get estimates of the truestimulus locations (net of all the individual perceptual nonsense).

16 / 66

Aldrich-McKelvey Scaling II

The A-M model assumes that the individual reports a noisy lineartransformation of the true location of stimulus j (z

j

); that is

↵

i

+ �

i

z

j

= z

ij

+ u

ij

(1)

where u

ij

satisfies the usual Gauss-Markov assumptions of zero mean,homoscedasticity, and independence

Conceptually - the A-M scaling method fits a least squares regression ofobserved stimulus locations on true (but unknown) stimulus locations.

17 / 66

Mechanics

Define the q by 2 matrix X as

X

i

=

2

666664

1 z

i11 z

i2. .

. .

. .

1 z

iq

3

777775(2)

Then the solution is:↵

i

ˆ

�

i

�=

⇥X

0i

X

i

⇤�1X

0i

z (3)

18 / 66

Finding z

j

define the q by q matrix A as

A=

"nX

i=1

X

i

(X

0i

X

i

)

�1X

0i

#(4)

The z

j

estimates come from an eigen decomposition of A.

• Aldrich and McKelvey show that z is the eigenvector of [A� nI

q

]

“with the highest (negative) nonzero”eigenvalue

19 / 66

Robustness

We noted above that the A-M model makes the usual Gauss-Markovassumptions.

• The operational model proposed by A-M is robust to massiveGauss-Markov assumption violations. Even under these violations,the stimuli configuration is approximately right.

• Other operational models (such as taking the mean of respondentscores for each stimulus) is unlikely to produce similarly unbiasedresults in the presence of G-M violations.

• Particularly true in low-information contexts where some goodproportion of respondents may reverse the ordering of stimuli.

20 / 66

Outline






21 / 66

A-M Scaling in R

The aldmck function in the basicspace package does A-M scaling.

aldmck(data, respondent = 0, missing=NULL,

polarity, verbose=FALSE)

data Matrix of numeric values (respondents in rows, stimuli incolumns). No missing data is permitted.

respondent Column number of respondent self-placement data (if notavailable, use 0).

missing A vector of missing value codes (other than NA).

polarity Column number of item that is on the left-side of the scale.

verbose Should the function print detailed output (yes if TRUE, no ifFALSE).

22 / 66

A-M Scaling Output

stimuli Estimated locations of the stimuli.

respondents Estimates of the respondents:

intercept Perceptual distortion intercept term (↵i).weight Perceptual distortion weight term (�i).idealpt Respondent ideal point; missing values are coded NA.

R2 Respondent R2 of bivariate regression between estimatedand reported stimulus locations.

selfplace Self-reported placement.polinfo Respondent correlation between“true” and reported

stimulus locations; used as a measure of politicalinformation.

eigenvalues List of eigenvalues.

AMfit Aldrich and McKelvey’s measure of fit (lower values indicate a better fit).

R2 Total R2.

N Number of respondents included in the analysis.

N.neg Number of respondents with negative weights.

N.pos Number of respondents with positive weights.

23 / 66

Outline






24 / 66

Example: France 2009 (European Election Study)

> u <- url("http://www.quantoid.net/files/essex/franceees2009.rda")> load(u)> close(u)

or, if you have the file locally

> load("franceEES2009.rda")

We can run the function with:

> library(basicspace)> result <- aldmck(franceEES2009, respondent=1, polarity=2,+ missing=c(77,88,89), verbose=FALSE)

25 / 66

Summary

> summary(result)

SUMMARY OF ALDRICH-MCKELVEY OBJECT----------------------------------

Number of Stimuli: 8Number of Respondents Scaled: 611Number of Respondents (Positive Weights): 583Number of Respondents (Negative Weights): 28Reduction of normalized variance of perceptions: 0.06

LocationExtreme Left -0.467Communist -0.322Left Party -0.287Socialist -0.076Greens -0.020UDF (Bayrou) 0.109UMP (Sarkozy) 0.449National Front 0.614

26 / 66

Respondents

> voters <- na.omit(result$respondents)> head(voters)

intercept weight idealpt selfplace polinfo

12 -0.96125282 0.174773240 -0.08738662 5 0.86688389

23 -0.77849747 0.159691789 0.01996147 5 0.89609722

26 -0.70130179 0.155844843 0.07792242 5 0.92591578

27 -0.02837512 0.009080039 0.02610511 6 0.05181008

38 -0.64060911 0.155299178 -0.17471157 3 0.96953953

49 -0.51798188 0.142891553 0.19647589 5 0.89817794

27 / 66

Distribution of ideal points by original self-placement

> boxplot(idealpt ~ factor(selfplace), data=voters,+ xlab = "Original Self-placement", ylab="Ideal Points")

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0 1 2 3 4 5 6 7 8 9 10

−1.0

−0.5

0.0

0.5

1.0

Original Self−placement

Idea

l Poi

nts

28 / 66

Distribution of Ideal Points

> source("http://www.quantoid.net/files/essex/aldmck_functions.r")> pl <- plot2.aldmck(result, bw=.2)> update(pl, ylim=c(-.05, 1.25))> pl

idealpt

Den

sity

0.0

0.2

0.4

0.6

0.8

1.0

1.2

−1 0 1

● ● ●● ● ● ●●

●

●

●

●

●

●

●

●

Extreme LeftCommunistSocialist

GreensUDF (Bayrou)UMP (Sarkozy)

National FrontLeft PartyN = 605

29 / 66

Distribution of Ideal Points

> pl <- plot2.aldmck(result, bw=.2,+ col=c("gray33", "gray66"), pch=15:18)> update(pl, ylim=c(-.05, 1.25))> pl

idealpt

Den

sity

0.0

0.2

0.4

0.6

0.8

1.0

1.2

−1 0 1

● ●

●

●




30 / 66

By Sign of Weights

> resps <- na.omit(result$respondents)> signw <- factor(sign(resps$weight), levels=c(-1,1),+ labels=c("Negative", "Positive"))> pl <- plot2.aldmck(result, bw=.05, by = "signw",+ col=c("gray33", "gray66"), pch=15:18)> pl

idealpt

Den

sity

01

23

−0.5 0.0 0.5

● ●

Negative (N = 28)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

−1.0 −0.5 0.0 0.5 1.0

● ●

Positive (N = 577)

●

●




31 / 66

Exercise

Data from the 2012 American National Election Study (ANES) is stored in the datasetANES2012.Rda. ANES2012 is a list that stores respondents’ placements of themselves and fourstimuli (Barack Obama, Mitt Romney, and the Democratic and Republican Parties) onseven-point liberal-conservative and government/private health insurance scales(ANES2012$libcon.placements and ANES2012$healthins.placements), 0-100 feelingthermometer ratings of eight stimuli (Barack Obama, Mitt Romney, Joe Biden, Paul Ryan,Hillary Clinton, George W. Bush, and the Democratic and Republican Parties)(ANES2012$thermometers) and presidential vote choice (ANES2012$presvote). Missing valuesare coded as 999.

1. Run aldmck() on the government/private health insurance scale data stored inANES2012$healthins.placements.

2. Do a side-by-side plot with separate histograms showing the distribution of ideal pointsfor respondents with positive and negative weights. Label the estimated positions ofBarack Obama, Mitt Romney, and the Democratic and Republican Parties in both.

2.1 How do the two distributions di↵er? In this example, what does itmean for respondents to“flip” the space?

32 / 66

Outline






33 / 66

Stability of Stimuli

While Aldrich and McKelvey provide approximate standard errors for thestimulus locations, these have been shown to be way too conservative(i.e., big).

• Bootstrapping provides a means for generating a samplingdistribution without making distributional error assumptions.

• Bootstrapping is useful when A) no analytical method exists forcreating standard errors or B) assumptions used toderive/approximate SE may not hold (or C) you don’t feel like doinga bunch of math).

34 / 66

Bootstrapping: General Overview

• If we assume that a random variable X or statistic has a particularpopulation value, we can study how a statistical estimator computedfrom samples behaves

• We don’t always know, however, how a variable or statistic isdistributed in the population

• For example, there may be a statistic for which standard errors havenot been formulated (e.g., imagine we wanted to test whether twoadditive scales have significantly di↵erent levels of internal consistency- Cronbach’s ↵ doesn’t have an exact sampling distribution

• Another example is the impact of missing data on a distribution - wedon’t know how the missing data di↵er from the observed data

• Bootstrapping is a technique for estimating standard errors andconfidence intervals without making assumptions about thedistributions that give rise to the data

35 / 66

Bootstrapping: General Overview (2)

• Assume that we have a sample of size n for which we require morereliable standard errors for our estimates

• Perhaps n is small, or alternatively, we have a statistic for which thereis no known sampling distribution

• The bootstrap provides one“solution”• Take several new samples from the original sample, calculating the

statistic each time• Calculate the average and standard error (and maybe quantiles) from

the empirical distribution of the bootstrap samples• In other words, we find a standard error based on sampling (with

replacement) from the original data• We apply principles of inference similar to those employed whensampling from the population

• The population is to the sample as the sample is to the bootstrapsamples

36 / 66

Characteristics of the Bootstrap Statistic

• The bootstrap sampling distribution around the original estimate ofthe statistic T is analogous to the sampling distribution of T aroundthe population parameter ✓

• The average of the bootstrapped statistics is simply:

¯

T

⇤= E(T

⇤) ⇡

PR

b=1 T⇤b

R

where R is the number of bootstraps• The bias of T can be seen as its deviation from the bootstrapaverage (i.e., it estimates T � ✓)

ˆ

B

⇤=

¯

T

⇤ � T

• The estimated bootstrap variance of T ⇤ is:

ˆ

V (T

⇤) =

PR

b=1(T⇤b

� ¯

T

⇤)

2

R� 1

37 / 66

Evaluating Confidence Intervals

Accuracy:

• how quickly do coverage errors go to zero?

•Prob{✓ < ˆ

T

lo

} = ↵ and Prob{✓ > ˆ

T

up

} = ↵

• Errors go to zero at a rate of:• 1

n

(second-order accurate)• 1p

n

(first-order accurate)

Transformation Respecting:

• For any monotone transformation of ✓, � = m(✓), can we obtain theright confidence interval on ˆ

� with the confidence intervals on ˆ

✓

mapped by m()? E.g.,

[

ˆ

�

lo

,

ˆ

�

up

] = [m(

ˆ

✓

lo

),m(

ˆ

✓

up

)]

38 / 66

Bootstrap Confidence Intervals: Normal Theory Intervals

• Many statistics are asymptotically normally distributed

• Therefore, in large samples, we may be able to use a normalityassumption to characterize the bootstrap distribution. E.g.,

ˆ

T

⇤ ⇠ N(

ˆ

T , se

2)

where se isq

ˆ

V (T

⇤)

• This approach works well for the bootstrap confidence interval, butonly if the bootstrap sampling distribution is approximately normallydistributed

• In other words, it is important to look at the distribution beforerelying on the normal theory interval

39 / 66

Bootstrap Confidence Intervals: Percentile Intervals

• Uses percentiles of the bootstrap sampling distribution to find theend-points of the confidence interval

• If ˆ

G is the CDF of T ⇤, then we can find the 100(1-↵)% confidenceinterval with:

hˆ

T%,lo

,

ˆ

T%,up

i=

hˆ

G

�1(↵),

ˆ

G

�1(1� ↵)

i

• The (1� 2↵) percentile interval can be approximated with:hˆ

T%,lo

,

ˆ

T%,up

i⇡hT

⇤(↵)B

, T

⇤(1�↵)B

i

where T

⇤(↵)B

and T

⇤(1�↵)B

are the ordered (B) bootstrap replicatessuch that 100↵% of them fall below the former and 100↵% of themfall above the latter.

• These intervals do not assume a normal distribution, but they donot perform well unless we have a large original sample and at least1000 bootstrap samples

40 / 66

Bootstrap Confidence Intervals: Bias-Corrected, Accelerated Percentile

Intervals (BC

a

)

• The BC

a

CI adjusts the confidence intervals for bias due to smallsamples by employing a normalizing transformation through twocorrection factors.

• This is also a percentile interval, but the percentiles are notnecessarily the ones you would think.

• Using strict percentile intervals,hˆ

T

lo

,

ˆ

T

up

i⇡hT

⇤(↵)B

, T

⇤(1�↵)B

i

• Here,hˆ

T

lo

,

ˆ

T

up

i⇡hT

⇤(↵1)B

, T

⇤(↵2)B

i

•↵1 6= ↵ and ↵2 6= (1� ↵1)

↵1 = �

z0 +

z0 + z

(↵)

1� a(

ˆ

Z0 + z

(↵))

!

↵2 = �

z0 +

z0 + z

(1�↵)

1� a(

ˆ

Z0 + z

(1�↵))

!

41 / 66

Bias correction: z0

z0 = ��1

✓# T

⇤b

T

B

◆

• This just gives the inverse of the normal CDF for proportion ofbootstrap replicates less than T .

• Note that if # T

⇤b

T = 0.5, then z0 = 0.

• If T is unbiased, the proportion will be close to 0.5, meaning thatthe correction is close to 0.

42 / 66

Acceleration: a

a =

Pn

i=1

�T(·) � T(i)

�3

6

Pn

i=1

�T(·) � T(i)

�2

32

•T(i) is the calculation of the original estimate T with eachobservation i jacknifed out in turn.

•T(·) is

Pn

i=1T(i)

n

• The acceleration constant corrects for the fact that se(T ) is not thesame for all true parameters ✓ as normal theory would suggest.

43 / 66

Evaluation Confidence Intervals

Normal Percentile BC

a

Accuracy 1

st order 1

st order 2nd orderTransofrmation-respecting No Yes Yes

44 / 66

Bootstrapping the A-M Solution

The function boot.aldmck() is in the file aldmck_functions.r thatwe sourced in above.> library(boot)> out <- boot.aldmck(franceEES2009, respondent=1,+ missing=c(77,88,89), polarity=2, R=1500, ci="perc")

> ci.out <- ci.aldmck(out, ci="perc")> ci.out$ci

idealpt lower upperExtreme Left -0.46668307 -0.47802001 -0.45504191Communist -0.32191902 -0.33389232 -0.31082758Socialist -0.07562178 -0.08878790 -0.06204369Greens -0.01978329 -0.03443452 -0.00518242UDF (Bayrou) 0.10894542 0.09570104 0.12292225UMP (Sarkozy) 0.44898305 0.43562253 0.46314344National Front 0.61350937 0.59859946 0.62635554Left Party -0.28743067 -0.30129233 -0.27365262

45 / 66

Plotting the Stimuli with Confidence Bounds

> plot.baldmck(ci.out, cex=.5, xlab="Ideal Points")

Ideal Points

Extreme Left

Communist

Left Party

Socialist

Greens

UDF (Bayrou)

UMP (Sarkozy)

National Front

−0.4 −0.2 0.0 0.2 0.4 0.6

●

●

●

●

●

●

●

●

46 / 66

Testing Di↵erence in Stimuli

> test.baldmck(out)

diff lower upper sigCommunist-Extreme Left 0.145 0.127 0.161 *Socialist-Extreme Left 0.391 0.371 0.412 *Greens-Extreme Left 0.447 0.426 0.468 *UDF (Bayrou)-Extreme Left 0.576 0.556 0.595 *UMP (Sarkozy)-Extreme Left 0.916 0.899 0.934 *National Front-Extreme Left 1.080 1.064 1.094 *Left Party-Extreme Left 0.179 0.159 0.199 *Socialist-Communist 0.246 0.228 0.266 *Greens-Communist 0.302 0.283 0.323 *UDF (Bayrou)-Communist 0.431 0.413 0.449 *UMP (Sarkozy)-Communist 0.771 0.753 0.790 *National Front-Communist 0.935 0.919 0.951 *Left Party-Communist 0.034 0.014 0.056 *Greens-Socialist 0.056 0.036 0.074 *UDF (Bayrou)-Socialist 0.185 0.165 0.203 *UMP (Sarkozy)-Socialist 0.525 0.503 0.545 *National Front-Socialist 0.689 0.667 0.707 *Left Party-Socialist -0.212 -0.232 -0.192 *UDF (Bayrou)-Greens 0.129 0.108 0.150 *UMP (Sarkozy)-Greens 0.469 0.449 0.488 *National Front-Greens 0.633 0.608 0.657 *Left Party-Greens -0.268 -0.290 -0.246 *UMP (Sarkozy)-UDF (Bayrou) 0.340 0.322 0.360 *National Front-UDF (Bayrou) 0.505 0.481 0.526 *Left Party-UDF (Bayrou) -0.396 -0.417 -0.374 *National Front-UMP (Sarkozy) 0.165 0.136 0.189 *Left Party-UMP (Sarkozy) -0.736 -0.757 -0.717 *Left Party-National Front -0.901 -0.918 -0.882 *

47 / 66

Exercise

Using the result you got in the previous exercise,

1. Bootstrap the results so you can get confidence intervals for thestimuli.

2. Plot the stimuli along with their confidence intervals.

3. Test to see which stimuli are di↵erent from each other.

48 / 66

Outline






49 / 66

Basic Space Scaling

Poole (1998) developed basic space scaling as an extension of A-Mscaling with two important innovations.

1. Missing data are permitted. This allows bridging across time/space.

2. The latent space can be multidimensional.

Basic Space scaling finds the set of parameters that map respondents’issue scale responses into the low-dimensioned latent space.

50 / 66

Technical Details

X0 = [ W

0+ J

n

c

0]0 + E0 (5)

•x

ij

is the ith individual’s (i = 1, ..., n) reported position on the jthissue (j = 1, ..., q) organized into X0 (which may contain missingdata).

•

ik

be the ith individual’s position on the kth (k = 1, ..., s) basicdimension organized into the n⇥ s matrix

•W and c are parameters that map individuals from the basic spaceonto the individual issue dimensions.

•J

n

is an n length vector of ones.

•E0 if a n by q matrix of error terms.

51 / 66

Implementation in R

Basic Space scaling is implemented in two functions in R:

•blackbox() is used to scale individuals from preference data (e.g.,preferred policy outcomes on many issues).

•blackbox_transpose() is used to scale stimuli that are rated byindividuals (i.e., based on perceptual data).

52 / 66

2000 Convention Delegate Study

> u <- url("http://www.quantoid.net/files/essex/cds2000.rda")> load(u)> close(u)> head(CDS2000[,5:8])

Lib-Con Abortion Govt Services Defense Spending[1,] 3 4 5 4[2,] 2 4 5 6[3,] 1 4 7 6[4,] 2 99 6 5[5,] 4 4 7 1[6,] 4 4 7 4

> issues <- as.matrix(CDS2000[,5:14])

53 / 66

Scaling

> result <- blackbox(issues, missing=99, dims=3,+ minscale = 5, verbose=F)> result$fits

SSE SSE.explained percent SE singularDimension 1 31388.39 56868.27 64.435103 1.119187 235.21101Dimension 2 24890.89 63365.78 7.362055 1.059063 82.83445Dimension 3 20213.13 68043.54 5.300175 1.022721 70.64624

54 / 66

One-dimensional Result

> result$stimuli[[1]]

N c w1 R2Lib-Con 2804 3.700 4.980 0.736Abortion 2693 3.239 -2.439 0.467Govt Services 2805 4.267 -5.579 0.779Defense Spending 2816 3.525 -4.224 0.507Aid to Blacks 2789 3.633 4.944 0.587Health Insurance 2804 3.476 7.010 0.760Protect Homosexuals 2803 3.452 6.753 0.730Affirmative Action 2806 3.213 3.797 0.563Surplus for Tax Cuts 2811 3.284 -4.503 0.605Free Trade 2805 3.168 -1.315 0.078

55 / 66

Two-dimensional Result


N c w1 w2 R2Lib-Con 2804 3.700 4.981 0.552 0.739Abortion 2693 3.239 -2.441 -0.143 0.467Govt Services 2805 4.268 -5.581 -0.985 0.787Defense Spending 2816 3.526 -4.219 -1.564 0.530Aid to Blacks 2789 3.629 4.947 -4.179 0.731Health Insurance 2804 3.477 7.004 4.275 0.857Protect Homosexuals 2803 3.447 6.758 -4.358 0.834Affirmative Action 2806 3.213 3.800 -1.927 0.614Surplus for Tax Cuts 2811 3.283 -4.504 -1.509 0.628Free Trade 2805 3.165 -1.303 -4.230 0.355

56 / 66

Three-dimensional Result


N c w1 w2 w3 R2Lib-Con 2804 3.701 4.977 -0.632 0.688 0.744Abortion 2693 3.236 -2.436 0.264 -1.284 0.508Govt Services 2805 4.268 -5.588 0.905 0.899 0.793Defense Spending 2816 3.526 -4.221 1.553 0.679 0.530Aid to Blacks 2789 3.623 4.949 4.682 -4.825 0.921Health Insurance 2804 3.475 7.001 -4.287 0.077 0.856Protect Homosexuals 2803 3.447 6.747 3.954 4.409 0.907Affirmative Action 2806 3.211 3.806 2.239 -2.628 0.705Surplus for Tax Cuts 2811 3.281 -4.502 1.753 -2.142 0.675Free Trade 2805 3.169 -1.303 3.888 3.609 0.499

57 / 66

Plotting the Individuals

> party <- factor(CDS2000[,1], levels=c(1,2),+ labels=c("Democrat", "Republican"))> plot.bb(result, issues, by="party", col=c("gray33", "gray66"),+ pch=c("D", "R"), rug=T, cex=.5, nv=TRUE,+ xlim=c(-1,1), ylim=c(-1,1))

DD

DD

D

D

D

D

D

DD

D

D

D

D

D DD

D

D

D

D

D

D

D

D

D

D

D

DDD

DD

D

DD

D

D

DD

D

D

DDD

D

D

D

D R

RR

R

R

R

R

RR

RR

R RR

RR

RR

R

R

RR

R

R

R

R

R

R

R

R RR

R

R

RR R

R

RR

RR

R

R

R

R

R

R

R

R

R

R RR

R

R

RR

R

RR

R

R

RRR

R

RR

R

R

RRR

R

RR RR

R

RR

R

R

R

R

RRR

RR

R R

R

R

R

RR

R

R

R

RR

R

R

RR

R

R

R

R

R

RR RR

R

R

R

R

R

R RR

R

R

R R

R

R

RR

R

R

RR

R

R

R

R

R

R

R

R

R

RR

R

R

RR

R

R

R

R

R

R

R

R

R

R

R

RR

R R

R

R

R RRR

R

R

R

R

R R

R

RR

R

R

R

R

RR

R

R

R

R

R

RR

R

R

R

R

R

R

R

R

R

R

R

RR

R

RRRR

R

R

RR

R

R

R

R

R

R

R

RR

R

RR R

RR

R

R

R

R

R

R R

R

RR

R

R

R

RR

R

R

R RR

R

RR

R

RR

R R

R

R

RR

R

R

R

R RR

R

R

R

R R

R

R

RR

RR R

RR

RR

R

RR

R

R

R

R

RR

R

R

R

R

R

RR

R

RR

RRR

RR

R

R

R

R

R

R RR

R

R

R

D

D

DD

D

D

D

D

D

D D

D

D

D

DDD

D

D

D

D

D

D

D

D

D

D

D

D

D

D

DD

D

D

D

D

DD

D

D

D

D

D

D

D

D

D

D

D

D

DD

D

D

DD

D

D

D

DD

DDD

D

D

DD

D

D

D

D

D

D

D

D

DD

D

DD

D

D

D

D

D

D

DD

D

D

D

DD

D

D

DD

D

D

DDD

D

D

DD

D

D

D

D

D

D

D

D

D

D

D

D

DD

D

DD

D

D

D

DD D

DD

D

D

D

D

D

DD

D

D

DD

DD

D

D

D

D

D

D

D

D

DD

D

D

D

D

D

R

R

R

R

R

R

R

R

RR RR

RR

R

RR

R

RR

R

R

RR

RR

R

R

R

R

R

R

R

R

R

R R

R

R

R

RR

R

R

R

R

R

R

R

R

RD

D

DD

D

DD

D

D

D

DDD

D

D

D

D

DD

D

D

D

D

D

D

DD

DD

D

DD

DDD

D

D

D

D

D

D

D

DD D

DD

D

D

D

D

DD

D

D

D

D

DDD

D

D

D

DD

D

D

D

D

D D

D

DDD

RRR

R

R

R

R

R

RR R

R

RR

R

R

R

RR

RR

RR

D

D

D

D

D

D

D

D

D

D

D

D

DD

D

D

D

DD

D

D

D

D

D

D

D

D

D

R

R

RR

R

R

R

DDD

DD

DDD

D

DD

DD

D

D

D

D D D

DDD

D

D

D

D

DD

D

D

D

DDD

D

D

D

DD

D

D

D

D

D

D

D

D D

D

D

DD

D D

D

D

D

D

D

D

D

D D RR

R R

R

R

R

R

D

D

D

D

D

D

D

D

D

D

DDDD

DD

D

D D

D

D

D

D

DDD

DD

D

D

DD

DD

D

D

DD

DD D

D

D

DD

D

D

DD

D

D

D

DD

D

D D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

DD

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

DDD

DD

D

DDD

D

D

D DDD

D

D

D

D

D

D

DD D

D

D

D D

D D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

DD D

DD

D

D

D

R

R

R

RR

RR

R

RR

R

RR

R

R

R

RRR

RR

R

D

D

D

D

D

DD

DDD

D

D

D

D

D

D

D

D

DD

DD

D

DD

D D

D

D

D

DD

D

DD

DD

D D

D

D

D

D

DDDD

D

D

D

D

DD

D

D

D

D

D

D

DD

D

DD

D

D

D

D R R

R

RR

R

R

R

R

R

RR

R

RRR

R

RR

DR

R

D

D

D

D

DD

D

D

D

D

D

D

DD D

D

D

DD

DD

D

D

D D

D

D

D

R R

R

RR

R R

D

D

D

D

DR

D

R

D

D

R

R

D

R

RD

RD

DDD

DD

D

D

D

D

D

D

D

D

D

D

DD

D

DDD

D

D

DD

D

D

DD

D

D

DD

D

D

D

D

DD

D

DD D

D

D

D

D

DD

DD

DD D

D

DD

D

D D

D D

D

D

D

D D

D

D

D

DD

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

DD

D

D

D

D

D

D

R

R

RR

R

R

R

R

R

R

R

R

RR

R

R

R

R

RR

D

D

R

R

R RR

R

D

DDD

D

D

D

D

D

D

D

R

D

R

RD

R

D

D

R

D

D

R

R

D

D

D

D

D D

D

D

RDD

RD

DD

R

D

D

R

R

D

RD

R

R

D

D

R

R

D

R

RR

R

RD

R

R

D

DD

D

D D

D

D

D

D

D

D

DD D

D

D

R

R

R

RR

R

R R

D

D

R

D

R

R

D

D

D

D

R

D

D

D

R

R

D

D R

D

D

D

R

D

R

RD

D

R

D

R

R

D

D

D

D

R

D

D

D

D

D

R

DDD

R

D

D

D D

DD

D

D

D

D

D

D

D

D

D

D

DD

D

R

R

R

R

R

R

RRD

R R

RD

R

D

DD

D

D

DD

DD

D

D

D

D

D

D

D

D

D

D

D

D

D

R

R

R

R

R

R

R

R

DD

D

RR

D

R

RD R

R

D

DD

D

R

D

DD

D

DD

D

DDD

DD

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

DDDD

D

D

D

D

R

R RRRR

R RR

DR

DD

R

DD

D

R

D

RD

R

R

D

DD

DD

D

DD

D

D

R

RR

D

D

R

RD

D

D

R

R

D

D

D

D R

RD

D

D

D D

DDD

R

R

R

R

DD

D

RR

R

R

DD

D

D

DR

D

D

DD

D

D

D

R

R

RD

D

D

DD

D

DD

D

D

D

D

R

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D D

D

D

R

R

DD

RD

D

DR

D

D

R

R

R

D

D

D

D

D

D

D

DD

D

D

D

D R

R

R

R

R

R

D

R

R

D

D

R

DD

D

D

D

D

DD

D

DD D

D

D

D

RR

R

R

R

D

D

D

R

DDD

D DD

R

R

R

RR

D

D

DD

D

D

D

D D

DD

DD

D

D

D

D

D

D

D

D

D

RR

R

R

R

R

D

RR

R

D

D D

DD

D

D

D

D D

DD

D

DDD

DD

D

D

D

D

D

D

D

D

D

D

D

DDD

D

D

D

D

D RRR

R

RR

R

R

D

D

D

D

D

D R

D

DRD

D

RD

D

D

D

D

DD

D

D

DDD

D

DR

R

D

R

D

R

R

R

RD

D

DD

D

D

D

D

D

DD

D

DD

D

RD RRD RD

D

D

RD

D D

R

D

R

RD

DR D

DD

D

R

R

R

R

R

D

R

DDD

D

D

D

D

DD

D

D

D

D

DD

D

DD

D

DD

D

D

D

DD

DD

D

D

DDD

D

D D

D

D

R

R

R

R

RR

R

R

D

RD R

R

D

D

D

R

DD

D

R

D D

R

D

D

DD

D

D

D

R

R

RR

D

R

D

D

DD

D

DD

D

R

R

R

R

D

DD

D

D

D

DD

D

DD

DD

D

D

R

D DR

D

D R

RDD R

D

D

DD

D

R

D

D

DD

D

D D

D D

R

R

R

R

RRRR R

R

RR

R

D

D

D

D

DD

D

DD

D

DD

D

D

DDD

D

D

D

DD

D

DD

DD

D

D

D

D

DD

D

D

D

D R

R

RR

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

RRRR

R

R

R

R R

R

R

R

R

RRD

D

D DD D

D

D

D

D

D

DD

D

DD D

D

D

DD

D

DD

DD

D

D

D

D

DDD

D

D

D

D

D

D

D

D

D

D

DD

D

D

D R

R

R

R

R

R

R

R

R

R

RR

R

R

R

R

R

R

RR

R

R

R

R

R

RR

R

R

R

R

R

R

R

RR

RD

D

D

D

D

DD

D

D

D

D

D

D

D

D

D

DR

R

RR

R

RRR

R

R

DD

D

DD

D

D

DD

D

DD

DD

D

D

DD

D

DD

D

D

DD

R

R

R

R

R

RR

R

RR

R

R RR

RR

R

R

DD

D

D

DD

D

D

DD

D

D

R

R

RR

RRR

D

D

DDD DD

DD

D

D

D

D

D

D

D

D

D

D

D

D

DD

D

D

D

D

D

D

D

R

RRR

RR

R

R

R

R

R

R

R

R

RR

R

R

RRR

D

DD

D

D DR

R

R

R

D

D

D

D

D

D

RR

RR

D

DD

D

D

DD

D

D

D

DD

D

D

D

D

D

D

D

D

DD

D

DDDD

D

D

D

D

D

D

DD

D

D

D D

D

D

D

D

D

D

D

D

D D

D

D

DD

DD

D RR

RRR

R

R

R

D

R RD

R

R

R

DDD

D

D

D

D

DD

D

D

DR

D

D

D

R

D

D

D

R

D D

D

DD

D

D

D

RD

DD

D

D

D

DD D

D

D

RR

R

DR

R

D

R

R

R

R

DDDDD

D

D

D D

D

D

D

RR

D

D

D

D

D

DD

D

D

DD

DD

D

DD

D

D

D

D

D

D

D

D

DD

D

D

R

R

R

R

R

D

R

D

R

RD

D

D

D

RD

DD D

R

RD

D

D

R

D

DR

D D

DR

RD

D

RD

R

DR

D

D

D

R

D R

D D

D

D

R

D

D

D

DD

D

D

DD

D

D

RRD D

D

DD

D

D

D DD

D

D

DD

D

RR

D

D

D

R

D

D

D

DD

D

D

D

R

R R

D DD

R

R

D

D

D

D

D

D

D

R

D

D

D

DD

D

D

D D

D

D

D

DDDD

D

D

D

D

R

DD

DR

RD

D

D

D

D

D

D

DD

D D

R

D

D

D

D

R

D

R

DD

DD

DD

DD

D

DDD D D

D

D

D

D

D

D

D D

D

D

D

D

D

D

D

DD D

D

D DD

DD

DD

D

D

DDD

D

D

D

R

R

R

R

R

RR

D

DD

RD

D

DD

D

D

D

R

DD

D

R

D

DD

D

D

D

R

D

D

D

D

D

DD

D

D

RR

R

R RD

D

D

D

D

DD

D

D

DD

D

D

DD

DD

D

D

DD

D

D

R

R

R

R

R

RR

R

R

R

D

R R

DD

R

R

R

D

D

D

D

DD

D

D

R

R

D R

R

D

D

D

D

D

D DDD

D

D

R

R

RR

RR

R

D

D

DR

R

D

D

D

D

DD

DD

D

R

D

D

R

D

D

DD

D

D DD R

D

D

D

D

DD

D

DD

DD

DD

D

D

DD

D

D

R

DD

D

D

D

DR

D

DD

D

RD

D

RD RR

D

D

D

D

R

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0

First Dimension

Seco

nd D

imen

sion

Lib−ConAbortion

Govt ServicesDefense Spending

Aid to Blacks

Health Insurance

Protect HomosexualsAffirmative Action

Surplus for Tax Cuts

Free Trade

58 / 66

Distribution of First Dimension Scores

> x <- result$individuals[[2]][,1]> densityplot(x, groups=party, pch=NA,+ auto.key=list(space="top"))

x

Density

0.0

0.5

1.0

1.5

2.0

2.5

−0.5 0.0 0.5

DemocratRepublican

59 / 66

Exercise

Use the blackbox() function to analyze the 2011 Canadian Election Study dataset(CES2011.Rda) using the Basic Space scaling procedure. CES2011 is a list that storesrespondents’ preferred party(CES2011$party), attitudes on 11 policy issues (CES2011$issues), placements of five nationalparties on an 11-point left-right scale(CES2011$lrplacements), and propensity to vote for each of the five national parties on an11-point scale (CES2011$propensity).1 Missing values are coded as 999.

1. Run blackbox() on CES2011$issues and report the fit statistics for the one andtwo-dimensional results.

2. In the two-dimensional estimated configuration, which issues are most stronglyassociated with the first and second dimensions? Based on this, what is yourinterpretation of the substantive meaning of each dimension?

3. Plot smoothed histograms of respondents’ first and second dimension coordinates forthose who feel closest to the Conservative Party, the Liberal Party, and the NDP. Whatis the left-right ordering of the parties on each dimension? Is there greater overlapbetween the party coalitions on the first or second dimension?

1Question wordings for the issue questions are available the book website.

60 / 66

Outline






61 / 66

Blackbox Transpose

As we said above, blackbox_transpose() is meant to deal withindividual rankings or perceptions of political stimuli.> u1 <- url("http://www.quantoid.net/files/essex/mexicocses2000.rda")> u2 <- url("http://www.quantoid.net/files/essex/mexicocses2006.rda")> load(u1)> load(u2)> close(u1)> close(u2)

> result_2000 <- blackbox_transpose(mexicoCSES2000, missing=99,+ dims=3, minscale=5, verbose=TRUE)> result_2006 <- blackbox_transpose(mexicoCSES2006, missing=99,+ dims=3, minscale=5, verbose=TRUE)

62 / 66

Transforming Data

The data are meaningful in the sense that they give relative approximaterelative distances in low dimensional space of the stimuli.

• To make them also substantively more interesting, it might be usefulto rotate the stimuli as a group in space.

• Usually, this amounts to a reflection (multiplying by �1) to makeleft parties on the left physically

> x2000 <- -1 * result_2000$stimuli[[2]][,2]> y2000 <- result_2000$stimuli[[2]][,3]> x2006 <- -1 * result_2006$stimuli[[2]][,2]> y2006 <- result_2006$stimuli[[2]][,3]

63 / 66

Plotting Stimuli

> plot(x2000, y2000, main="2000 Stimuli Locations",+ xlab=paste("First Dimension: ",+ round(result_2000$fits[1,3],2), "%", sep=""),+ ylab=paste("Second Dimension: ",+ round(result_2000$fits[2,3],2), "%", sep=""),+ xlim=c(-1,1), ylim=c(-1,1), asp=1, type="n")> points(x2000, y2000, pch=16, font=2, col="black")> text(x2000, y2000, colnames(mexicoCSES2000), pos=c(4,4,4,4,4,2),+ offset=0.40)

> plot(x2006, y2006, main="2006 Stimuli Locations",+ xlab=paste("First Dimension: ",+ round(result_2006$fits[1,3],2), "%", sep=""),+ ylab=paste("Second Dimension: ",+ round(result_2006$fits[2,3],2), "%", sep=""),+ xlim=c(-1,1), ylim=c(-1,1), asp=1, type="n")> points(x2006, y2006, pch=16, font=2)> text(x2006, y2006, colnames(mexicoCSES2006), pos=c(4,4,1,4,4,4,2,3),+ offset=0.40)

64 / 66

Figure : Latent Space

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0

2000 Stimuli Locations

First Dimension: 56.7%

Seco

nd D

imen

sion

: 21.

85%

●

●

●

● ●●

PAN

PRI

PRDPT Greens

PARM

(a) 2000

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0

2006 Stimuli Locations

First Dimension: 61.56%Se

cond

Dim

ensi

on: 1

3.67

% ●

●

●●

●

●

●●

PAN

PRD

PRIGreens

PTConvergencia

Nueva AlianzaPSD

(b) 2006

65 / 66

Exercise

Continuing with the CES2011 data, use the blackbox_transpose() function to analyzeCanadian citizens’ placements of the Conservative Party, the Liberal Party, the NDP, the BlocQuebecois and the Green Party on the left-right ideological scale in two dimensions. Recallthat missing values are coded as 999.

1. If necessary, reverse the polarity of the party coordinates so that the Conservative Partyhas a positive score on the first dimension. Does the left-right order of the three largestparties correspond to the ordering estimated in Exercise 3?

2. Report each party’s R2 value for each dimension. Which parties have the best and worstfits in one dimension?

3. Use the bootstrapping approach to estimate standard errors for the party coordinates intwo dimensions and plot the parties using cross-hairs for the 95% confidence intervals(1.96 ⇥ standard error). Which parties have the highest standard errors?

66 / 66

Documents

Dimension Reduction in R Essex Summer School in Data ... · Basic Space Theory • Citizens and legislators may have preferences on countless dimensions. • Generally, those preferences