Upload
utwente
View
0
Download
0
Embed Size (px)
Citation preview
Political Science meets Computer Science:
Opportunities and Challenges
GESIS, 14 October 2015
Dr. Andreas Warntjen University of Twente
1
Main points
• Social science demand for data and Computer Science tools
• Limited existing infrastructure in key areas of Political Science (e.g. legislative decision-making) – Data (from online sources) – Tools (webscraping, text parsing) – Training
• Other areas are better developed (e.g., survey research, statistical modelling)
• Opportunity of mainstreaming Computer Science in the Social Sciences
2
Structure of the Presentation
1. Introduction
– My background
– Political Science and Computer Science
2. Studying Political Representation and Decision-making using Tools from Computer Science
3. (Practical) Challenges
4. Conclusion: Future Opportunities
3
My background
• M.A. (Political Science, Sociology, Law), Konstanz
• PhD (Political Science), LSE
• Assistant Professor of Political Science, Twente
• Research/teaching: European Union politics, comparative institutional analysis, international relations, social science research methods
• Programming experience: webscraping and text parsing (Perl, Python)
Social Science and Computer Science
Social science task Computer science solution
Data collection Webscraping
Data manipulation and coding Text parsing
Modelling Theoretical: agent-based models Quantitative: numeric optimization, MCMC, machine learning, quantitative text analysis
Archiving Relational databases Customizable data sets
Dissemination of results Data visualization, interactive data exploration
5
Challenge: “Mainstreaming” Computer Science
Social science questions and existing data infrastructure
Interests and Institutions
Decision-making
Policy Outputs
6
Public opinion (e.g., survey data)
Political composition of political bodies
Legislative activity – Speeches – Requests – Proposals
Results of policy changes (e.g.,
unemploy- ment figure)
Established data infrastructure
(e.g., survey data at GESIS, CMP, PDYB)
Established data infrastructure
(e.g., national statistics
offices, Eurostat, OECD)
Data available, but: not processed, standard-ized, documented, or
routinely archived
Structure of the Presentation
1. Introduction 2. Studying Political Representation and Decision-
making using Tools from Computer Science – Legislative activity in the EU and the Council
Presidency – Creating a comprehensive data set on EU legislative
activity – Voting at the United Nations – Use of social media (Twitter) by MEPs
3. (Practical) Challenges 4. Conclusion: Future Opportunities
7
The European Multilevel System
Parties Interest groups
Government (MS 1)
influence form
Government (MS 2) Level 1: Member states
Level 2: Europe
Council rotating Presidency
preference heterogeneity Parliament
Commission
Input: Preferences Priorities
Output: Regulations
8
Public opinion
Parties Interest groups
influence form
Public opinion
Legislative activity
Governmental Priorities and Legislative Activity in the EU
• Question: is legislative activity (attention to proposals) influenced by the party political priorities of the government holding the Council Presidency?
• Data (IV): party political priorities and government composition (CMP, party political composition of governments) (established data infrastructure, standard measurements)
• Data (DV): attention to topic in the EU Council (available via online database, but not part of established data infrastructure, no standard measurements)
9
Measuring legislative activity in the Council
• Ratio addressed/pending legislative proposals
• Webscraping PRELEX (official database, now defunct): – All legislative proposals and other documents, 1965-2003
– Number of entries: 12,710
– Not in database format, text parsing to create variables
• Data processing: – Filtering out documents other than proposals
– Proposals introduced (policy field, date)
– Proposals addressed (policy field, date)
– Control variables (procedure, voting threshold, etc.)
10
The Council Presidency and Legislative Activity
Figure 1.: Salience of the Council Presidency and Legislative Activity, Environmental Policy 1984-2001
0
5
10
15
20
25
30
35
40
45
50
1984 1 France
1984 2 Ireland
1985 1 Italy
1985 2 Luxembourg
1986 1 Netherlands
1986 2 United Kingdom
1987 1 Belgium
1987 2 Denm
ark
1988 1 Germ
any
1988 2 Greece
1989 1 Spain
1989 2 France
1990 1 Ireland
1990 2 Italy
1991 1 Luxembourg
1991 2 Netherlands
1992 1 Portugal
1992 2 United Kingdom
1993 1 Denm
ark
1993 2 Belgium
1994 1 Greece
1994 2 Germ
any
1995 1 France
1995 2 Spain
1996 1 Italy
1996 2 Ireland
1997 1 Netherlands
1997 2 Luxembourg
1998 1 United Kingdom
1998 2 Austria
1999 1 Germ
any
1999 2 Finland
2000 1 Portugal
2000 2 France
2001 1 Sweden
Council Presidency (year, half of the year)
Rati
o a
dd
ressed
/pen
din
g p
rop
osals
(%
)
.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
Sali
en
ce
Ratio addressed/pending propsals (%) Salience, Council presidency
11 Warntjen (2007)
Council Presidency and Legislative Activity
• Statistical Analysis (Warntjen 2007): – Poisson Regression
– Control variables: size of member states, number of pending proposals, type of legislation, voting threshold, position (leader in environmental policy)
• Case study (Warntjen 2013b) – Social policy
– Mechanisms: compromise proposals, additional meetings, reduction of complexity
12
Responsiveness of the EU
Parties Interest groups
Government (MS 1)
influence form
Government (MS 2) Level 1: Member states
Level 2: Europe
Council # Meetings Member states’ requests Voting
Parliament Speeches Parliamentary Questions Amendments
Commission Drafting proposals Content of proposals
Input: Preferences Priorities
Output: Regulations
13
Public opinion
Parties Interest groups
influence form
Public opinion
Legislative activity
Creating a comprehensive data set for EU legislative activity
Input factors Source
Interest group statements Commission online consultation
Governmental priorities Speeches and official statements (agenda project)
Party political preferences and public opinion
Existing data structure (Eurobarometer, CMP, ParlGov)
14
Output Source
Proposal (type, field, # recitals) EURLEX
Legislative process (duration, involvement of actors)
EURLEX
# Council working group meetings Member state requests Voting in the Council
Public registry of the Council of the European Union
Speeches and parliamentary questions Amendments Voting
EP Legislative Observatory EP website (verbatim transcripts)
Extracting EURLEX (Warntjen and Smit 2015)
• Webscraping in July 2015
• Extraction: scrapy
• Text parsing, validation and data management: Perl regex
• 2,877 Variables; 37,272 Observations
17
Political Science Questions: Legislative Decision-making in the Council
• Member state frequently make requests to change legislative proposals – How frequent are these requests (Euroscepticism,
party politics, importance of policy field)?
– What kind of coalitions form (party politics, economics)?
• Sometimes decisions are made by votes – How frequent are votes?
– Who votes how and why?
18
7647
4641
4039
3531.6
3026
2423
1812
116
6532
3128
26.333326
2524
2221
1914
9
0 20 40 60 80
EU15
New
UKDEFRNLESELPT
AverageITFI
ATDKSEBEIELU
PLCZLVLT
AverageSKSI
MTEEHUROBGCY
Number of requests
Figure 1: Number of requests by country and membership
Warntjen (2015) 19
Number of Member State Requests (subset) in the field of environmental policy, 2004-2009
Computer science solutions: Legislative Decision-making in the Council
• Extracting annotated proposals from public registry of the Council
• Text parsing to isolate member state requests (country code, footnotes)
• Data management: linking data to information from other data sources via proposal identifier
20
Political Science Questions: Voting Cohesion of the EU at the UN
One voice… • Coordination of policies • Statements on behalf of the
Union • High voting cohesion
… unless it really matters? Division on highly salient topics, e.g., • Libya (Germany abstains) • Iraq war (France and
Germany against, Great Britain in favour)
21
Computer Science Solutions: Voting Cohesion of the EU at the UN
• Extracting voting information from the official UN database (1946-2014): 4,497 observations (Warntjen 2015a)
• Extracting information about passed resolutions from the UN website (1946-2014): 17,359 observations (Warntjen 2015b)
• Extraction: scrapy, BeautifulSoup
• Text parsing, validation and data management: Perl regex
23
Political Science Questions: Social Media Usage of MEPs
• Does social media usage of MEPs depend on
– party political ideology (participatory democracy, Euroscepticism),
– internet usage of core constituency, or
– electoral system?
• How responsive are MEPs to topics raised by followers, public opinion and the media?
• Does Twitter create a common European public sphere or are there national differences?
24
Opportunities (data)
• Social science demand for data that is available online but not processed for analysis
• Building standard data sets for key questions variables (e.g., legislative activity, decision-making, topics trending on social media)
• Interaction of Social Science (domain knowledge, variable selection) and Computer Science (extraction, processing, dissemination)
26
Structure of the Presentation
1. Introduction
2. Studying Political Representation and Decision-making using Tools from Computer Science
3. (Practical) Challenges
– Empiricist challenge
– Practical issues
4. Conclusion
27
Big data: the end of physics envy?
“And while in the physical sciences the investigator will be able to measure what, on the basis of a prima facie theory, he thinks important, in the social sciences often that is treated as important which happens to be accessible to measurement.”
(Friedrich A. von Hayek, The Pretence of Knowledge, Nobel Price Lecture, 1974)
http://www.nobelprize.org/nobel_prizes/economic-sciences/laureates/1974/hayek-lecture.html
28
“The Empiricist Challenge”
“While the potentials of the use of digital trace data have been a continuous focus in public debate, scientific contributions using such trace data in political science usually come in the form of research-manifestos or isolated proofs-of-concept, only marginally contributing to current debates in the social sciences.”
(Yannis Theocharis, Andreas Jungherr, conference call, MZES/Mannheim University, October 2015)
29
Mainstreaming Computer Science in the Social Sciences
• Social sciences are primarily theory- and question-driven
• Division of labour: limited computer science knowledge of social scientists
• Challenge for Computer Science
– Building useful tools for recurrent social science tasks
– Creating relevant data sets (or data set generators)
– Moderate learning curve for computer science tools if they are to be widely used
30
Practical Challenges
Task Challenge
Extracting information from websites -slightly changing formats -tabular form, but not field names
Robust html-parser Data cleaning (manual input)
Extracting information from archives (scanned documents)
Conversion of pdf-documents to text Reliable OCR
Dissemination Customizable data set Data exploration and visualization But: pre-selection of variables
31
32
HTML code: Seperate row for new variable (decision mode) HTML (Beautiful Soup) parser does not seperate elements in a reliable manner Extracted using regular expressions
Practical Challenges
• UN voting
– Online database only includes passed resolutions
– Remaining information needs to be extracted from meeting records (pdf documents)
• Decision-making in the Council
– Voting information only available as pdf-file with icons
– Annotated proposals only available as pdf-file
• Practical challenge: reliable conversion of large quantities of pdf-files (some of them quite old)
33
Practical Challenges and Opportunities
• Previous examples: not insurmountable problems, but limited accessibility of computer science tools for the average social scientists
• Opportunities (tools): – developing/disseminating easy-to-use tools for
routine social science tasks
– Reference resources (e.g. websites) and specialised support systems (e.g. forums) for social scientists
– Training tailored for social science needs (content, format)
37
Structure of the Presentation
1. Introduction
2. Opportunities: Studying Political Representation and Decision-making using Tools from Computer Science
3. (Practical) Challenges
4. Conclusion: Future Opportunities
38
Future Opportunities
• Computer science can create new opportunities to address established and new social science questions (new data, new methods, data visualization and exploration, dissemination of data)
• New data sources: longitudinal and more fine-grained data
• “Empiricist challenge”: Big data not necessarily useful/interesting data
• More dialogue between computer scientists (producers) and social scientists (consumers) needed
39