51
INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE LECTURE 1, 1.9.2015 INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE (CSS01) LAURI ELORANTA

Introduction to Computational Social Science - Lecture 1

Embed Size (px)

Citation preview

Page 1: Introduction to Computational Social Science - Lecture 1

INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE

LECTURE 1 192015

INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE (CSS01)

LAURI ELORANTA

LAURIELORANTA

DATA MINING

DATA AND SOCIETY

BIG DATA

PREDICTIVE ANALYSIS

DIGITAL METHODS

DIGITAL HUMANITIES

SOCIAL NETWORK ANALYSIS

PROGRAMMING IN SOCIAL SCIENCE

IT IS A JUNGLE OUT THERE

COMPLEX SYSTEMS

DATA SCIENCE

HADOOPMAP REDUCE

REACTIVE PROGRAMMING

PERSONAL DATA

MY DATA

OPEN DATA

IOT WEARABLES

BUZZ

HYPE

BUZZ

HYPE

BUZZ

HYPE

THE BACKGROUND IMAGE ldquoJUNGLErdquo BY LUKE JONESIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT THAT MUCH TALKING ANDEVEN LESS DOINGONLY A FEW PIONEERS IN THE DESERTED CSS SCENE IN FINLAND

THE BACKGROUND IMAGE ldquoDESERTrdquo BY MOYAN BRENNIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

bull Practicalities

bull What is computational social science

bull Areas of Computational Social Science

bull (Big) Data amp automated information extraction

bull Social Networks

bull Social Complexity

bull Simulation

bull Research examples

bull Lecture 1 Reading

LECTURE 1OVERVIEW

PRACTICALITIES

bull The slides and all materials will be online at

httpblogshelsinkificomputationalsocialscience

bull Course consists of

bull 8 Lectures

bull A Research Plan Assignment (required if you want study credits 5op)

bull Any questions

bull Contact lecturer Lauri Eloranta at firstname dot lastname helsinkifi

PRACTICALITIESGENERAL

bull LECTURE 1 Introduction to Computational Social Science [TODAY]

bull Tuesday 0109 1600 ndash 1800 U35 Seminar room114

bull LECTURE 2 Basics of Computation and Modeling

bull Wednesday 0209 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 3 Big Data and Information Extraction

bull Monday 0709 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 4 Network Analysis

bull Monday 1409 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 5 Complex Systems

bull Tuesday 1509 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 6 Simulation in Social Science

bull Wednesday 1609 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 7 Ethical and Legal issues in CSS

bull Monday 2109 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 8 Summary

bull Tuesday 2209 1700 ndash 1900 U35 Seminar room 114

LECTURESSCHEDULE

bull Course Book

bull Cioffi-Revilla Claudio (2014) Introduction to

Computational Social Science Springer-

Verlag London

bull Further

Reading

LITERATURECOURSE BOOK

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 2: Introduction to Computational Social Science - Lecture 1

LAURIELORANTA

DATA MINING

DATA AND SOCIETY

BIG DATA

PREDICTIVE ANALYSIS

DIGITAL METHODS

DIGITAL HUMANITIES

SOCIAL NETWORK ANALYSIS

PROGRAMMING IN SOCIAL SCIENCE

IT IS A JUNGLE OUT THERE

COMPLEX SYSTEMS

DATA SCIENCE

HADOOPMAP REDUCE

REACTIVE PROGRAMMING

PERSONAL DATA

MY DATA

OPEN DATA

IOT WEARABLES

BUZZ

HYPE

BUZZ

HYPE

BUZZ

HYPE

THE BACKGROUND IMAGE ldquoJUNGLErdquo BY LUKE JONESIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT THAT MUCH TALKING ANDEVEN LESS DOINGONLY A FEW PIONEERS IN THE DESERTED CSS SCENE IN FINLAND

THE BACKGROUND IMAGE ldquoDESERTrdquo BY MOYAN BRENNIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

bull Practicalities

bull What is computational social science

bull Areas of Computational Social Science

bull (Big) Data amp automated information extraction

bull Social Networks

bull Social Complexity

bull Simulation

bull Research examples

bull Lecture 1 Reading

LECTURE 1OVERVIEW

PRACTICALITIES

bull The slides and all materials will be online at

httpblogshelsinkificomputationalsocialscience

bull Course consists of

bull 8 Lectures

bull A Research Plan Assignment (required if you want study credits 5op)

bull Any questions

bull Contact lecturer Lauri Eloranta at firstname dot lastname helsinkifi

PRACTICALITIESGENERAL

bull LECTURE 1 Introduction to Computational Social Science [TODAY]

bull Tuesday 0109 1600 ndash 1800 U35 Seminar room114

bull LECTURE 2 Basics of Computation and Modeling

bull Wednesday 0209 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 3 Big Data and Information Extraction

bull Monday 0709 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 4 Network Analysis

bull Monday 1409 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 5 Complex Systems

bull Tuesday 1509 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 6 Simulation in Social Science

bull Wednesday 1609 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 7 Ethical and Legal issues in CSS

bull Monday 2109 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 8 Summary

bull Tuesday 2209 1700 ndash 1900 U35 Seminar room 114

LECTURESSCHEDULE

bull Course Book

bull Cioffi-Revilla Claudio (2014) Introduction to

Computational Social Science Springer-

Verlag London

bull Further

Reading

LITERATURECOURSE BOOK

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 3: Introduction to Computational Social Science - Lecture 1

DATA MINING

DATA AND SOCIETY

BIG DATA

PREDICTIVE ANALYSIS

DIGITAL METHODS

DIGITAL HUMANITIES

SOCIAL NETWORK ANALYSIS

PROGRAMMING IN SOCIAL SCIENCE

IT IS A JUNGLE OUT THERE

COMPLEX SYSTEMS

DATA SCIENCE

HADOOPMAP REDUCE

REACTIVE PROGRAMMING

PERSONAL DATA

MY DATA

OPEN DATA

IOT WEARABLES

BUZZ

HYPE

BUZZ

HYPE

BUZZ

HYPE

THE BACKGROUND IMAGE ldquoJUNGLErdquo BY LUKE JONESIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT THAT MUCH TALKING ANDEVEN LESS DOINGONLY A FEW PIONEERS IN THE DESERTED CSS SCENE IN FINLAND

THE BACKGROUND IMAGE ldquoDESERTrdquo BY MOYAN BRENNIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

bull Practicalities

bull What is computational social science

bull Areas of Computational Social Science

bull (Big) Data amp automated information extraction

bull Social Networks

bull Social Complexity

bull Simulation

bull Research examples

bull Lecture 1 Reading

LECTURE 1OVERVIEW

PRACTICALITIES

bull The slides and all materials will be online at

httpblogshelsinkificomputationalsocialscience

bull Course consists of

bull 8 Lectures

bull A Research Plan Assignment (required if you want study credits 5op)

bull Any questions

bull Contact lecturer Lauri Eloranta at firstname dot lastname helsinkifi

PRACTICALITIESGENERAL

bull LECTURE 1 Introduction to Computational Social Science [TODAY]

bull Tuesday 0109 1600 ndash 1800 U35 Seminar room114

bull LECTURE 2 Basics of Computation and Modeling

bull Wednesday 0209 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 3 Big Data and Information Extraction

bull Monday 0709 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 4 Network Analysis

bull Monday 1409 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 5 Complex Systems

bull Tuesday 1509 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 6 Simulation in Social Science

bull Wednesday 1609 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 7 Ethical and Legal issues in CSS

bull Monday 2109 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 8 Summary

bull Tuesday 2209 1700 ndash 1900 U35 Seminar room 114

LECTURESSCHEDULE

bull Course Book

bull Cioffi-Revilla Claudio (2014) Introduction to

Computational Social Science Springer-

Verlag London

bull Further

Reading

LITERATURECOURSE BOOK

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 4: Introduction to Computational Social Science - Lecture 1

NOT THAT MUCH TALKING ANDEVEN LESS DOINGONLY A FEW PIONEERS IN THE DESERTED CSS SCENE IN FINLAND

THE BACKGROUND IMAGE ldquoDESERTrdquo BY MOYAN BRENNIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

bull Practicalities

bull What is computational social science

bull Areas of Computational Social Science

bull (Big) Data amp automated information extraction

bull Social Networks

bull Social Complexity

bull Simulation

bull Research examples

bull Lecture 1 Reading

LECTURE 1OVERVIEW

PRACTICALITIES

bull The slides and all materials will be online at

httpblogshelsinkificomputationalsocialscience

bull Course consists of

bull 8 Lectures

bull A Research Plan Assignment (required if you want study credits 5op)

bull Any questions

bull Contact lecturer Lauri Eloranta at firstname dot lastname helsinkifi

PRACTICALITIESGENERAL

bull LECTURE 1 Introduction to Computational Social Science [TODAY]

bull Tuesday 0109 1600 ndash 1800 U35 Seminar room114

bull LECTURE 2 Basics of Computation and Modeling

bull Wednesday 0209 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 3 Big Data and Information Extraction

bull Monday 0709 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 4 Network Analysis

bull Monday 1409 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 5 Complex Systems

bull Tuesday 1509 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 6 Simulation in Social Science

bull Wednesday 1609 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 7 Ethical and Legal issues in CSS

bull Monday 2109 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 8 Summary

bull Tuesday 2209 1700 ndash 1900 U35 Seminar room 114

LECTURESSCHEDULE

bull Course Book

bull Cioffi-Revilla Claudio (2014) Introduction to

Computational Social Science Springer-

Verlag London

bull Further

Reading

LITERATURECOURSE BOOK

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 5: Introduction to Computational Social Science - Lecture 1

bull Practicalities

bull What is computational social science

bull Areas of Computational Social Science

bull (Big) Data amp automated information extraction

bull Social Networks

bull Social Complexity

bull Simulation

bull Research examples

bull Lecture 1 Reading

LECTURE 1OVERVIEW

PRACTICALITIES

bull The slides and all materials will be online at

httpblogshelsinkificomputationalsocialscience

bull Course consists of

bull 8 Lectures

bull A Research Plan Assignment (required if you want study credits 5op)

bull Any questions

bull Contact lecturer Lauri Eloranta at firstname dot lastname helsinkifi

PRACTICALITIESGENERAL

bull LECTURE 1 Introduction to Computational Social Science [TODAY]

bull Tuesday 0109 1600 ndash 1800 U35 Seminar room114

bull LECTURE 2 Basics of Computation and Modeling

bull Wednesday 0209 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 3 Big Data and Information Extraction

bull Monday 0709 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 4 Network Analysis

bull Monday 1409 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 5 Complex Systems

bull Tuesday 1509 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 6 Simulation in Social Science

bull Wednesday 1609 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 7 Ethical and Legal issues in CSS

bull Monday 2109 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 8 Summary

bull Tuesday 2209 1700 ndash 1900 U35 Seminar room 114

LECTURESSCHEDULE

bull Course Book

bull Cioffi-Revilla Claudio (2014) Introduction to

Computational Social Science Springer-

Verlag London

bull Further

Reading

LITERATURECOURSE BOOK

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 6: Introduction to Computational Social Science - Lecture 1

PRACTICALITIES

bull The slides and all materials will be online at

httpblogshelsinkificomputationalsocialscience

bull Course consists of

bull 8 Lectures

bull A Research Plan Assignment (required if you want study credits 5op)

bull Any questions

bull Contact lecturer Lauri Eloranta at firstname dot lastname helsinkifi

PRACTICALITIESGENERAL

bull LECTURE 1 Introduction to Computational Social Science [TODAY]

bull Tuesday 0109 1600 ndash 1800 U35 Seminar room114

bull LECTURE 2 Basics of Computation and Modeling

bull Wednesday 0209 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 3 Big Data and Information Extraction

bull Monday 0709 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 4 Network Analysis

bull Monday 1409 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 5 Complex Systems

bull Tuesday 1509 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 6 Simulation in Social Science

bull Wednesday 1609 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 7 Ethical and Legal issues in CSS

bull Monday 2109 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 8 Summary

bull Tuesday 2209 1700 ndash 1900 U35 Seminar room 114

LECTURESSCHEDULE

bull Course Book

bull Cioffi-Revilla Claudio (2014) Introduction to

Computational Social Science Springer-

Verlag London

bull Further

Reading

LITERATURECOURSE BOOK

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 7: Introduction to Computational Social Science - Lecture 1

bull The slides and all materials will be online at

httpblogshelsinkificomputationalsocialscience

bull Course consists of

bull 8 Lectures

bull A Research Plan Assignment (required if you want study credits 5op)

bull Any questions

bull Contact lecturer Lauri Eloranta at firstname dot lastname helsinkifi

PRACTICALITIESGENERAL

bull LECTURE 1 Introduction to Computational Social Science [TODAY]

bull Tuesday 0109 1600 ndash 1800 U35 Seminar room114

bull LECTURE 2 Basics of Computation and Modeling

bull Wednesday 0209 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 3 Big Data and Information Extraction

bull Monday 0709 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 4 Network Analysis

bull Monday 1409 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 5 Complex Systems

bull Tuesday 1509 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 6 Simulation in Social Science

bull Wednesday 1609 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 7 Ethical and Legal issues in CSS

bull Monday 2109 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 8 Summary

bull Tuesday 2209 1700 ndash 1900 U35 Seminar room 114

LECTURESSCHEDULE

bull Course Book

bull Cioffi-Revilla Claudio (2014) Introduction to

Computational Social Science Springer-

Verlag London

bull Further

Reading

LITERATURECOURSE BOOK

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 8: Introduction to Computational Social Science - Lecture 1

bull LECTURE 1 Introduction to Computational Social Science [TODAY]

bull Tuesday 0109 1600 ndash 1800 U35 Seminar room114

bull LECTURE 2 Basics of Computation and Modeling

bull Wednesday 0209 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 3 Big Data and Information Extraction

bull Monday 0709 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 4 Network Analysis

bull Monday 1409 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 5 Complex Systems

bull Tuesday 1509 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 6 Simulation in Social Science

bull Wednesday 1609 1600 ndash 1800 U35 Seminar room 113

bull LECTURE 7 Ethical and Legal issues in CSS

bull Monday 2109 1600 ndash 1800 U35 Seminar room 114

bull LECTURE 8 Summary

bull Tuesday 2209 1700 ndash 1900 U35 Seminar room 114

LECTURESSCHEDULE

bull Course Book

bull Cioffi-Revilla Claudio (2014) Introduction to

Computational Social Science Springer-

Verlag London

bull Further

Reading

LITERATURECOURSE BOOK

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 9: Introduction to Computational Social Science - Lecture 1

bull Course Book

bull Cioffi-Revilla Claudio (2014) Introduction to

Computational Social Science Springer-

Verlag London

bull Further

Reading

LITERATURECOURSE BOOK

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 10: Introduction to Computational Social Science - Lecture 1

bull The full eBook is available via Helsinki

University Library

httpshelkalinneanetficgi-

binPwebreconcgiBBID=2753081

LITERATURECOURSE BOOK

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 11: Introduction to Computational Social Science - Lecture 1

LITERATUREADDITIONAL READING

bull There will be additional reading given for each lecture

bull Research articles on the topic at hand some will be given for ldquohomework

readingrdquo

bull The full list of articles can be found at

httpblogshelsinkificomputationalsocialscience

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 12: Introduction to Computational Social Science - Lecture 1

bull Write a short research plan where you apply a computational social

science method to a research problem

bull Length 8 pages for Masterrsquos students 10 pages for PhD students

bull Focus on research method lt-gt research data lt-gt research problem

bull How to write a research plan general instructions

bull httpwwwutaficmtendoctoralstudiesapplyTutkimussuunnitelmaohje

et_EN5B15Dpdf

bull httpsintoaaltofidisplayendoctoraltaikResearch+Plan

ASSIGNMENTGENERAL

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 13: Introduction to Computational Social Science - Lecture 1

bull Assignment DL is Friday 2102015 at EODMidnight

bull All assignments are returned in PDF-format

bull How to save my work in pdf-format You can rdquoSave as PDFrdquo or rdquoPrint to PDFrdquo in MS

Word

bull Include your name student ID and contact details

bull Assignments are returned to the lecturer Lauri Eloranta via email

firstname dot lastname helsinkifi

bull Grading is done in one monthrsquos time and you will receive the study

credits on or before 30102015

ASSIGNMENTHOW TO RETURN THE ASSIGNMENT

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 14: Introduction to Computational Social Science - Lecture 1

bull Contains six course covering different aspects of computational social

science

bull Full stydy block 25-30 op

bull Basic courses (mandatory)

bull Introduction to Computational Social Science (5 op) (I period)

bull Introduction to Programming in Social Science (5 op) (II period)

bull Special courses

bull Data extraction (5 op) (IV period)

bull Network Analysis (5 op) (in 2016 ndash 2017)

bull Complex Systems (5 op) (III period)

bull Simulation (5 op) (in 2016 ndash 2017)

COMPUTATIONAL SOCIAL SCIENCE STUDY BLOCK

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 15: Introduction to Computational Social Science - Lecture 1

WHAT IS COMPUTATIONAL SOCIAL SCIENCE

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 16: Introduction to Computational Social Science - Lecture 1

ldquoIn short a computational social science is

emerging [field] that leverages the capacity

to collect and analyze data with an

unprecedented breadth and depth and

scalerdquo (Lazer et al 2009)

Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 17: Introduction to Computational Social Science - Lecture 1

bull ldquoIn short a computational social science is emerging [field] that

leverages the capacity to collect and analyze data with an

unprecedented breadth and depth and scalerdquo

bull Lazer D et al 2009 Computational Social Science Science 6 February

2009 Vol 323 no 5915 pp 721-723

LAZER ET AL 2009

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 18: Introduction to Computational Social Science - Lecture 1

bull ldquoThe increasing integration of technology into our lives has created

unprecedented volumes of data on societyrsquos everyday behaviour Such

data opens up exciting new opportunities to work towards a quantitative

understanding of our complex social systems within the realms of a

new discipline known as Computational Social Science Against a

background of financial crises riots and international epidemics the

urgent need for a greater comprehension of the complexity of our

interconnected global society and an ability to apply such insights in

policy decisions is clear (Conte et al 2012)

bull Conte R 2012 Manifesto of Computational Social Science The

European Physical Journal Special Topics November 2012 Vol 214

Issue 1 pp 325-346

CSS MANIFESTO(CONTE ET AL 2012)

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 19: Introduction to Computational Social Science - Lecture 1

bull ldquoComputational social science refers to the academic sub-disciplines

concerned with computational approaches to the social sciences Fields

include computational economics and computational sociology

It is a multi-disciplinary and integrated approach to social survey

focusing on information processing by means of advanced information

technology The computational tasks include the analysis of social

networks and social geographic systemsrdquo

bull (Wikipedia 2015 httpenwikipediaorgwikiComputational_social_science)

WIKIPEDIA

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 20: Introduction to Computational Social Science - Lecture 1

bull ldquoThe new field of Computational Social Science can be

defined as the interdisciplinary investigation of the social

universe of many scales ranging from individual actors to

the largest groupings through the medium of computationrdquo

(Cioffi-Revilla 2014)

CIOFFI-REVILLA 2014

Cioffi-Revilla Claudio (2014) Introduction to Computational Social Science Springer-Verlag London

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 21: Introduction to Computational Social Science - Lecture 1

INCREASINGLY COMPLEX SOCIETY

THE BACKGROUND IMAGE ldquoPOINT AND LINE TO (MULTIPLE) PLANE(S)rdquo RODRIGO CARVALHO

IS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 22: Introduction to Computational Social Science - Lecture 1

INSTRUMENTAL REVOLUTION

THE BACKGROUND IMAGE ldquoTATEL TELESCOPErdquo BY EP_JHUIS UNDER NON COMMERCIAL CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

IT IS FOREMOST AN

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 23: Introduction to Computational Social Science - Lecture 1

COMPUTER SCIENCE

SOCIAL SCIENCE

STATISTICS

COMPUTATIONAL SOCIAL SCIENCE

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 24: Introduction to Computational Social Science - Lecture 1

Time

More

Less

bull Speed and performance of IT (CPU RAM Network)

bull Access to IT Internet

bull Amount of data generated

bull Cost of IT

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 25: Introduction to Computational Social Science - Lecture 1

FUNDAMENTAL CHANGES IN RESEARCH SETUP

THE BACKGROUND IMAGE ldquoHOME VISITrdquo BY NICOLAS NOVAIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 26: Introduction to Computational Social Science - Lecture 1

MAJOR QUESTIONS REGARDING RESEARCH ETHICS THE BACKGROUND IMAGE ldquoCAMEacuteRA DE SURVEILLANCErdquo BY TRISTAN NITOT

IS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 27: Introduction to Computational Social Science - Lecture 1

NOT A SILVER BULLET

COMPUTATIONAL SOCIAL SCIENCE IS

THE BACKGROUND IMAGE ldquo9MM BULLET BWrdquo BY AN NGUYENIS UNDER CREATIVE COMMONS LICENSE

SEE ORIGINAL IMAGE HERE SEE LICENSE TERMS HERE

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 28: Introduction to Computational Social Science - Lecture 1

Computational Social Science

proposes revolutionary opportunities

for the social sciences but it has still

some challenges in relation to

methods interdisciplinary

cooperation and research ethics

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 29: Introduction to Computational Social Science - Lecture 1

1 Solving increasingly complex problems The problems of global

world are complex computational methods might be able to solve

these complex issues

2 The rise of data The amounts of data has exploded during the 21st

century

3 IT and Instrumental revolution all the new tools and possibilities

4 Complex systems modeling our dynamic organisations and societies

5 Social networks modeling human behavior as networks

6 Making predictions and simulations predicting future from the past

7 Interdisciplinary field (social sciences math computer sciencehellip)

8 Many problems and challenges especially regarding research

ethics

CSS COMPONENTS

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 30: Introduction to Computational Social Science - Lecture 1

bull Information processing paradigm has two aspects in relation

to CSS

1 Information processing is substantive to the complex

systems of society that CSS researches This means that

information processing is takes part in forming and

evolution of complex systems

2 Information processing is methodological in the sense

that it serves as the core instrument of CSS

COMPUTATIONAL PARADIGM OF SOCIETY

(Cioffi-Revilla 2014)

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 31: Introduction to Computational Social Science - Lecture 1

BIG DATA amp AUTOMATED INFROMATION EXTRACTION

SOCIAL NETWORK ANALYSIS

COMPLEX SYSTEMS amp MODELING

SIMULATION

1

2

3

4THE MAIN AREAS OF CSS

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 32: Introduction to Computational Social Science - Lecture 1

bull Areas of Computational Social Science

1 (Big) Data amp automated data extraction

bull Generate retrieve sort modify transform hellip data

2 Social Networks

bull Network analysis and social networks

3 Social Complexity

bull Social complexity complex adaptive systems complex

systems modeling

4 Simulation

FOUR MAIN AREAS OF CSS

(Cioffi-Revilla 2014)

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 33: Introduction to Computational Social Science - Lecture 1

bull Data and automated information extraction can be seen as foundation

for the other areas of CSS

bull Raw data can be used as

1 Data for its own sake as research data -gt data is the subject of

research

2 Data for modeling or validating other phenomena via eg network

analysis complex systems analysis or simulation

bull Data is generated retrieved modified transformedhellip for research

purposes via computational automation

BIG DATA amp AUTOMATED INFORMATION EXTRACTION

(Cioffi-Revilla 2014)

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 34: Introduction to Computational Social Science - Lecture 1

bull A long tradition in network analysis (much older field than CSS)

bull Social Networks (Facebook Twitter etc) just one part of network

analysis

bull Many other social interactions can be modeled as networks -gt thus

social networks are not technology dependent as such

bull -gt eg modeling family as network

bull -gt eg modeling a project as network

SOCIAL NETWORKS

(Cioffi-Revilla 2014)

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 35: Introduction to Computational Social Science - Lecture 1

bull Society seen as a complex adaptive system

bull Phase transitions

bull Adaptation (multi stage process)

bull Need -gt intent -gt capacity -gt implementation

bull Goal

bull Information processing in many parts of Complex adaptive systems

bull To help adaptation allocating resources coordination hellip

bull Family as and complex adaptive system

bull Development hardships births deaths successes failures

bull Adaptation over decades

SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 36: Introduction to Computational Social Science - Lecture 1

bull Three types of systems

1 Natural systems

2 Human systems

3 Artificial systems

bull Artificial systems (or artifacts) exist because they have a function they

serve as adaptive buffers between humans and nature

bull Humans pursue the strategy of building artifacts to achieve goals

bull Two kinds of artificial systems working in synergy

bull Tanglible (eg roads buildings)

bull Intanglibe ( eg organisations social structures)

SIMONrsquoS THEORY OF ARTIFACTS AND SOCIAL COMPLEXITY

(Cioffi-Revilla 2014)

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 37: Introduction to Computational Social Science - Lecture 1

bull Large (and old) research field

bull Two main areas of simulation

1 Variable-Oriented Models

bull System Dynamics Models (eg modeling a nuclear plant)

bull Queuing Models (eg modeling how a box office line behaves)

2 Object-Oriented Models

bull Cellular automate (eg Game of life httpenwikipediaorgwikiConway27s_Game_of_Life

httppmaveustuffjavascript-game-of-life-v311)

bull Agent based models (eg Modeling the communication of a project

organisation of many individuals)

bull Also Evolutionary Models

SIMULATION

(Cioffi-Revilla 2014)

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 38: Introduction to Computational Social Science - Lecture 1

bull 4 main areas of Computational Social Science

1 Big data and automatic information extraction

2 Social networks

3 Social complexity

4 Simulation

bull Typically all of these working together

bull CSS has a lot of problems especially concerning privacy and ethics

bull CSS is not a silver bullet and it does not replace other social science

fields or methods Instead CSS complements other research fields and

methods

SUMMARY

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 39: Introduction to Computational Social Science - Lecture 1

SOME RESEARCH EXAMPLES

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 40: Introduction to Computational Social Science - Lecture 1

bull Tracking and predicting how flu or other contagious diseases spread

bull Based on network and social media analysis and modeling

bull Many different variations one of the first Google Flu Trends based on

flu related search queries

bull For example

bull Achrekar H Gandhe A Lazarus R Ssu-Hsin Yu Benyuan Liu 2011 Predicting Flu

Trends using Twitter data Computer Communications Workshops (INFOCOM

WKSHPS) 2011 IEEE Conference on vol no pp702707 10-15 April 2011

MODELING THE SPREAD OF DISEASESALREADY AN EPIDEMOLOGY CLASSIC

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 41: Introduction to Computational Social Science - Lecture 1

bull httpwwwgoogleorgflutrendsintlen_us

GOOGLE FLU TRENDS

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 42: Introduction to Computational Social Science - Lecture 1

bull Leskovec J Backstrom L Kleinberg J 2009 Meme-tracking and the dynamics of

the news cycle Proceedings of the 15th ACM ACM SIGKDD international conference

on Knowledge discovery and data mining Pages 497-506 2009 - dlacmorg

bull Tracking new topics ideas and memes across the Web has been an issue of considerable interest

Recent work has developed methods for tracking topic shifts over long time scales as well as abrupt

spikes in the appearance of particular named entities However these approaches are less well suited to

the identification of content that spreads widely and then fades over time scales on the order of days -

the time scale at which we perceive news and events

bull We develop a framework for tracking short distinctive phrases that travel relatively intact through on-line

text developing scalable algorithms for clustering textual variants of such phrases we identify a broad

class of memes that exhibit wide spread and rich variation on a daily basis

MODELING NEWS CYCLE DYNAMICS

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 43: Introduction to Computational Social Science - Lecture 1

bull Athanasiadis I N Mentes A K Mitkas P A Mylopoulos Y A 2005 A Hybrid Agent-

Based Model for Estimating Residential Water Demand SIMULATION March 2005 81

175-187 doi1011770037549705053172

bull Picardi C and Saeed K 1979The dynamics of water policy in southwestern Saudi

Arabia Anthony SIMULATION October 1979 vol 33 4 pp 109-118

SUSTAINABLE WATER DEMAND MANAGEMENT MODELING

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 44: Introduction to Computational Social Science - Lecture 1

bull Venturini T Laffite N B Cointet J-P Gray I Zabban V De Pryck K 2014Three

maps and three misunderstandings A digital mapping of climate diplomacy Big Data

amp Society July-December 2014 1 2053951714543804 first published on August 5 2014

doi1011772053951714543804

CLIMATE DIPLOMACY MAPPING

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 45: Introduction to Computational Social Science - Lecture 1

bull Can electoral popularity be predicted using socially generated big

data Information Technology Volume 56 Issue 5 Pages 246ndash253

ISSN (Online) 2196-7032 ISSN (Print) 1611-2776 DOI 101515itit-

2014-1046 September 2014

bull Today our more-than-ever digital lives leave significant footprints in cyberspace Large scale collections

of these socially generated footprints often known as big data could help us to re-investigate different

aspects of our social collective behaviour in a quantitative framework In this contribution we discuss one

such possibility the monitoring and predicting of popularity dynamics of candidates and parties through

the analysis of socially generated data on the web during electoral campaigns Such data offer

considerable possibility for improving our awareness of popularity dynamics However they also suffer

from significant drawbacks in terms of representativeness and generalisability In this paper we discuss

potential ways around such problems suggesting the nature of different political systems and contexts

might lend differing levels of predictive power to certain types of data source We offer an initial

exploratory test of these ideas focussing on two data streams Wikipedia page views and Google

search queries On the basis of this data we present popularity dynamics from real case examples of

recent elections in three different countries

PREDICTING ELECTIONS

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 46: Introduction to Computational Social Science - Lecture 1

bull DIGIVAALIT 2015

bull httpwwwhiitfidigivaalit-2015

bull Researching the parliamentary elections 2015 in Finland focusing on

digital media data (Twitter Facebook)

bull Trying to understand how media is used and how public agenda is set

bull CITIZEN MINDSCAPES

bull httpchallengehelsinkifiblogcitizen-mindscapes-kansakunnan-

mielentilabull Diving deep into the unscoped virtual territories of a nationrsquos collective consciousness may reveal something remarkable The

Finnish hugely popular Suomi24 discussion forum has 19 million monthly visitors who use the online town square to talk about

anything and everything close to their hearts If this data could be harnessed into research use what amazing things could we learn

about Finnish society A team of media professionals at the forums owner company Aller and researchers at the National Consumer

Research Center plan to make use of this immense database

DIGIVAALIT 2015 amp CITIZENMINDSCAPES

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 47: Introduction to Computational Social Science - Lecture 1

bull Listen the ldquoThe Trust Engineersrdquo podcast by Radiolab

bull httpwwwradiolaborgstorytrust-engineers

bull Think about and discuss different ethical research issues in relation to

what you heard

ETHICS

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 48: Introduction to Computational Social Science - Lecture 1

bull Lazer D et al 2009 Computational Social Science Science 6 February 2009 Vol 323 no 5915 pp 721-723

bull Conte R 2012 Manifesto of Computational Social Science The European Physical Journal Special Topics November 2012 Vol 214 Issue 1 pp 325-346

bull Anderson C 2008 The End of Theory The Data Deluge Makes the Scientific Method Obsolete Wired httparchivewiredcomsciencediscoveriesmagazine16-07pb_theory

bull Einav L and Levin J 2014 The Data Revolution and Economic Analysis In Innovation Policy and the Economy edited by Josh Lerner and Scott Stern httpwebstanfordedu~leinavpubsIPE2014pdf

bull King G 2011 Ensuring the Data-Rich Future of the Social Sciences Science 11 February 2011 Vol 331 no 6018 pp 719-721

bull Wallach H 2014 Big Data Machine Learning and the Social Sciences Fairness Accountability and Transparency Mediumcom httpsmediumcomhannawallachbig-data-machine-learning-and-thesocial-sciences-927a8e20460d

LECTURE 1 READING

Thank You

Questions and comments

twitter laurieloranta

Page 49: Introduction to Computational Social Science - Lecture 1

Thank You

Questions and comments

twitter laurieloranta