39
Political Science meets Computer Science: Opportunities and Challenges GESIS, 14 October 2015 Dr. Andreas Warntjen University of Twente 1

Political Science meets Computer Science

Embed Size (px)

Citation preview

Political Science meets Computer Science:

Opportunities and Challenges

GESIS, 14 October 2015

Dr. Andreas Warntjen University of Twente

1

Main points

• Social science demand for data and Computer Science tools

• Limited existing infrastructure in key areas of Political Science (e.g. legislative decision-making) – Data (from online sources) – Tools (webscraping, text parsing) – Training

• Other areas are better developed (e.g., survey research, statistical modelling)

• Opportunity of mainstreaming Computer Science in the Social Sciences

2

Structure of the Presentation

1. Introduction

– My background

– Political Science and Computer Science

2. Studying Political Representation and Decision-making using Tools from Computer Science

3. (Practical) Challenges

4. Conclusion: Future Opportunities

3

My background

• M.A. (Political Science, Sociology, Law), Konstanz

• PhD (Political Science), LSE

• Assistant Professor of Political Science, Twente

• Research/teaching: European Union politics, comparative institutional analysis, international relations, social science research methods

• Programming experience: webscraping and text parsing (Perl, Python)

Social Science and Computer Science

Social science task Computer science solution

Data collection Webscraping

Data manipulation and coding Text parsing

Modelling Theoretical: agent-based models Quantitative: numeric optimization, MCMC, machine learning, quantitative text analysis

Archiving Relational databases Customizable data sets

Dissemination of results Data visualization, interactive data exploration

5

Challenge: “Mainstreaming” Computer Science

Social science questions and existing data infrastructure

Interests and Institutions

Decision-making

Policy Outputs

6

Public opinion (e.g., survey data)

Political composition of political bodies

Legislative activity – Speeches – Requests – Proposals

Results of policy changes (e.g.,

unemploy- ment figure)

Established data infrastructure

(e.g., survey data at GESIS, CMP, PDYB)

Established data infrastructure

(e.g., national statistics

offices, Eurostat, OECD)

Data available, but: not processed, standard-ized, documented, or

routinely archived

Structure of the Presentation

1. Introduction 2. Studying Political Representation and Decision-

making using Tools from Computer Science – Legislative activity in the EU and the Council

Presidency – Creating a comprehensive data set on EU legislative

activity – Voting at the United Nations – Use of social media (Twitter) by MEPs

3. (Practical) Challenges 4. Conclusion: Future Opportunities

7

The European Multilevel System

Parties Interest groups

Government (MS 1)

influence form

Government (MS 2) Level 1: Member states

Level 2: Europe

Council rotating Presidency

preference heterogeneity Parliament

Commission

Input: Preferences Priorities

Output: Regulations

8

Public opinion

Parties Interest groups

influence form

Public opinion

Legislative activity

Governmental Priorities and Legislative Activity in the EU

• Question: is legislative activity (attention to proposals) influenced by the party political priorities of the government holding the Council Presidency?

• Data (IV): party political priorities and government composition (CMP, party political composition of governments) (established data infrastructure, standard measurements)

• Data (DV): attention to topic in the EU Council (available via online database, but not part of established data infrastructure, no standard measurements)

9

Measuring legislative activity in the Council

• Ratio addressed/pending legislative proposals

• Webscraping PRELEX (official database, now defunct): – All legislative proposals and other documents, 1965-2003

– Number of entries: 12,710

– Not in database format, text parsing to create variables

• Data processing: – Filtering out documents other than proposals

– Proposals introduced (policy field, date)

– Proposals addressed (policy field, date)

– Control variables (procedure, voting threshold, etc.)

10

The Council Presidency and Legislative Activity

Figure 1.: Salience of the Council Presidency and Legislative Activity, Environmental Policy 1984-2001

0

5

10

15

20

25

30

35

40

45

50

1984 1 France

1984 2 Ireland

1985 1 Italy

1985 2 Luxembourg

1986 1 Netherlands

1986 2 United Kingdom

1987 1 Belgium

1987 2 Denm

ark

1988 1 Germ

any

1988 2 Greece

1989 1 Spain

1989 2 France

1990 1 Ireland

1990 2 Italy

1991 1 Luxembourg

1991 2 Netherlands

1992 1 Portugal

1992 2 United Kingdom

1993 1 Denm

ark

1993 2 Belgium

1994 1 Greece

1994 2 Germ

any

1995 1 France

1995 2 Spain

1996 1 Italy

1996 2 Ireland

1997 1 Netherlands

1997 2 Luxembourg

1998 1 United Kingdom

1998 2 Austria

1999 1 Germ

any

1999 2 Finland

2000 1 Portugal

2000 2 France

2001 1 Sweden

Council Presidency (year, half of the year)

Rati

o a

dd

ressed

/pen

din

g p

rop

osals

(%

)

.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

Sali

en

ce

Ratio addressed/pending propsals (%) Salience, Council presidency

11 Warntjen (2007)

Council Presidency and Legislative Activity

• Statistical Analysis (Warntjen 2007): – Poisson Regression

– Control variables: size of member states, number of pending proposals, type of legislation, voting threshold, position (leader in environmental policy)

• Case study (Warntjen 2013b) – Social policy

– Mechanisms: compromise proposals, additional meetings, reduction of complexity

12

Responsiveness of the EU

Parties Interest groups

Government (MS 1)

influence form

Government (MS 2) Level 1: Member states

Level 2: Europe

Council # Meetings Member states’ requests Voting

Parliament Speeches Parliamentary Questions Amendments

Commission Drafting proposals Content of proposals

Input: Preferences Priorities

Output: Regulations

13

Public opinion

Parties Interest groups

influence form

Public opinion

Legislative activity

Creating a comprehensive data set for EU legislative activity

Input factors Source

Interest group statements Commission online consultation

Governmental priorities Speeches and official statements (agenda project)

Party political preferences and public opinion

Existing data structure (Eurobarometer, CMP, ParlGov)

14

Output Source

Proposal (type, field, # recitals) EURLEX

Legislative process (duration, involvement of actors)

EURLEX

# Council working group meetings Member state requests Voting in the Council

Public registry of the Council of the European Union

Speeches and parliamentary questions Amendments Voting

EP Legislative Observatory EP website (verbatim transcripts)

15

16

Extracting EURLEX (Warntjen and Smit 2015)

• Webscraping in July 2015

• Extraction: scrapy

• Text parsing, validation and data management: Perl regex

• 2,877 Variables; 37,272 Observations

17

Political Science Questions: Legislative Decision-making in the Council

• Member state frequently make requests to change legislative proposals – How frequent are these requests (Euroscepticism,

party politics, importance of policy field)?

– What kind of coalitions form (party politics, economics)?

• Sometimes decisions are made by votes – How frequent are votes?

– Who votes how and why?

18

7647

4641

4039

3531.6

3026

2423

1812

116

6532

3128

26.333326

2524

2221

1914

9

0 20 40 60 80

EU15

New

UKDEFRNLESELPT

AverageITFI

ATDKSEBEIELU

PLCZLVLT

AverageSKSI

MTEEHUROBGCY

Number of requests

Figure 1: Number of requests by country and membership

Warntjen (2015) 19

Number of Member State Requests (subset) in the field of environmental policy, 2004-2009

Computer science solutions: Legislative Decision-making in the Council

• Extracting annotated proposals from public registry of the Council

• Text parsing to isolate member state requests (country code, footnotes)

• Data management: linking data to information from other data sources via proposal identifier

20

Political Science Questions: Voting Cohesion of the EU at the UN

One voice… • Coordination of policies • Statements on behalf of the

Union • High voting cohesion

… unless it really matters? Division on highly salient topics, e.g., • Libya (Germany abstains) • Iraq war (France and

Germany against, Great Britain in favour)

21

22

Computer Science Solutions: Voting Cohesion of the EU at the UN

• Extracting voting information from the official UN database (1946-2014): 4,497 observations (Warntjen 2015a)

• Extracting information about passed resolutions from the UN website (1946-2014): 17,359 observations (Warntjen 2015b)

• Extraction: scrapy, BeautifulSoup

• Text parsing, validation and data management: Perl regex

23

Political Science Questions: Social Media Usage of MEPs

• Does social media usage of MEPs depend on

– party political ideology (participatory democracy, Euroscepticism),

– internet usage of core constituency, or

– electoral system?

• How responsive are MEPs to topics raised by followers, public opinion and the media?

• Does Twitter create a common European public sphere or are there national differences?

24

25

Opportunities (data)

• Social science demand for data that is available online but not processed for analysis

• Building standard data sets for key questions variables (e.g., legislative activity, decision-making, topics trending on social media)

• Interaction of Social Science (domain knowledge, variable selection) and Computer Science (extraction, processing, dissemination)

26

Structure of the Presentation

1. Introduction

2. Studying Political Representation and Decision-making using Tools from Computer Science

3. (Practical) Challenges

– Empiricist challenge

– Practical issues

4. Conclusion

27

Big data: the end of physics envy?

“And while in the physical sciences the investigator will be able to measure what, on the basis of a prima facie theory, he thinks important, in the social sciences often that is treated as important which happens to be accessible to measurement.”

(Friedrich A. von Hayek, The Pretence of Knowledge, Nobel Price Lecture, 1974)

http://www.nobelprize.org/nobel_prizes/economic-sciences/laureates/1974/hayek-lecture.html

28

“The Empiricist Challenge”

“While the potentials of the use of digital trace data have been a continuous focus in public debate, scientific contributions using such trace data in political science usually come in the form of research-manifestos or isolated proofs-of-concept, only marginally contributing to current debates in the social sciences.”

(Yannis Theocharis, Andreas Jungherr, conference call, MZES/Mannheim University, October 2015)

29

Mainstreaming Computer Science in the Social Sciences

• Social sciences are primarily theory- and question-driven

• Division of labour: limited computer science knowledge of social scientists

• Challenge for Computer Science

– Building useful tools for recurrent social science tasks

– Creating relevant data sets (or data set generators)

– Moderate learning curve for computer science tools if they are to be widely used

30

Practical Challenges

Task Challenge

Extracting information from websites -slightly changing formats -tabular form, but not field names

Robust html-parser Data cleaning (manual input)

Extracting information from archives (scanned documents)

Conversion of pdf-documents to text Reliable OCR

Dissemination Customizable data set Data exploration and visualization But: pre-selection of variables

31

32

HTML code: Seperate row for new variable (decision mode) HTML (Beautiful Soup) parser does not seperate elements in a reliable manner Extracted using regular expressions

Practical Challenges

• UN voting

– Online database only includes passed resolutions

– Remaining information needs to be extracted from meeting records (pdf documents)

• Decision-making in the Council

– Voting information only available as pdf-file with icons

– Annotated proposals only available as pdf-file

• Practical challenge: reliable conversion of large quantities of pdf-files (some of them quite old)

33

34

Annotated proposals: before conversion

After conversion (pdf miner)

35

36

Practical Challenges and Opportunities

• Previous examples: not insurmountable problems, but limited accessibility of computer science tools for the average social scientists

• Opportunities (tools): – developing/disseminating easy-to-use tools for

routine social science tasks

– Reference resources (e.g. websites) and specialised support systems (e.g. forums) for social scientists

– Training tailored for social science needs (content, format)

37

Structure of the Presentation

1. Introduction

2. Opportunities: Studying Political Representation and Decision-making using Tools from Computer Science

3. (Practical) Challenges

4. Conclusion: Future Opportunities

38

Future Opportunities

• Computer science can create new opportunities to address established and new social science questions (new data, new methods, data visualization and exploration, dissemination of data)

• New data sources: longitudinal and more fine-grained data

• “Empiricist challenge”: Big data not necessarily useful/interesting data

• More dialogue between computer scientists (producers) and social scientists (consumers) needed

39