134
1 陳昇瑋 台灣資料科學協會理事長 中央研究院資訊科學研究所研究員 計算社會科學初探- 當電腦科學家遇上社會科學

計算社會科學初探-當電腦科學家遇上社會科學

Embed Size (px)

Citation preview

Page 1: 計算社會科學初探-當電腦科學家遇上社會科學

1

陳昇瑋台灣資料科學協會理事長

中央研究院資訊科學研究所研究員

計算社會科學初探-當電腦科學家遇上社會科學

Page 2: 計算社會科學初探-當電腦科學家遇上社會科學

2

Sheng-Wei (Kuan-Ta) Chen

Institute of Information ScienceAcademia Sinica

Computational Social ScienceThe Collaborative Futures of Big Data, Computer

Science, and Social Sciences

Page 3: 計算社會科學初探-當電腦科學家遇上社會科學

3

PEOPLE

Page 4: 計算社會科學初探-當電腦科學家遇上社會科學

4

The Favorite Major for US College Athletes

(Source: USA Today, http://usatoday30.usatoday.com/sports/college/2008-11-18-majors-graphic_N.htm)

Page 5: 計算社會科學初探-當電腦科學家遇上社會科學

5

Social Science

Page 6: 計算社會科學初探-當電腦科學家遇上社會科學

6

Social Life is Hard to See

We can interview friends, but we cannot interview a friendship

Fleeting interactionIn privateTedious to record over time, especially in large groups

Page 7: 計算社會科學初探-當電腦科學家遇上社會科學

7

Bigger Problems

Social phenomena involve many individuals interacting to produce collective entities

firms, markets, cultures, political parties, social movements, audiences“Micro-Macro” problem (aka “Emergence”)

Micro-macro problems are hard to study empirically

Difficult to collect observational data about individuals, networks, and populations at same timeEven more difficult to do “macro” scale experiments

Page 8: 計算社會科學初探-當電腦科學家遇上社會科學

8

1890 US Census

1st time Hollerith machines were used to tabulate US Census data (population: 62,947,714)

Page 9: 計算社會科學初探-當電腦科學家遇上社會科學

9

The Era of Big Data

Past: Government data, national survey dataToday: A variety of new data sources

Economic data: trade, finance, e-cash / e-wallet, ...GIS data: satellite, GPS loggers, laser scanning cars, …Sensor data: video surveillance, smart phones, wearable devices, mobile apps, beacons, …

Page 10: 計算社會科學初探-當電腦科學家遇上社會科學

10

New kinds of data

Page 11: 計算社會科學初探-當電腦科學家遇上社會科學

11

Page 12: 計算社會科學初探-當電腦科學家遇上社會科學

12

Page 13: 計算社會科學初探-當電腦科學家遇上社會科學

陳昇瑋 / 資料科學的第一堂課

Computer vision for healthcare

Video magnification(Slide Credit: Jia-Bin Huang)

Page 14: 計算社會科學初探-當電腦科學家遇上社會科學

14

Page 15: 計算社會科學初探-當電腦科學家遇上社會科學

15

New kinds of data

Page 16: 計算社會科學初探-當電腦科學家遇上社會科學

16

Engagement and Exploration

Standing face-to-face?Physical distanceHand gesture, postureConversation patternsFrequency of interruptions

Page 17: 計算社會科學初探-當電腦科學家遇上社會科學

18

Web as a Record of Social Interaction

Public web pages / discussionsTwitter, Facebook, blogs, news groups, wikis, MMOGs, Instagram, LastFM, Flickr, SpotifyPrivate email, Whatsapp, LINE, SlackText, images, sounds: speeches, commercials

Page 18: 計算社會科學初探-當電腦科學家遇上社會科學

19

New kinds of data

Page 19: 計算社會科學初探-當電腦科學家遇上社會科學

20

Computational Social Science

The science that investigates social phenomena through the medium of computing and

statistical data processing.

Page 20: 計算社會科學初探-當電腦科學家遇上社會科學

23

Computational Social ScienceAn instrument-enabled scientific discipline

microbiology

microscope

radio astronomy

radar

nanoscience

electron microscope

Page 21: 計算社會科學初探-當電腦科學家遇上社會科學

24

Page 22: 計算社會科學初探-當電腦科學家遇上社會科學

25

Page 23: 計算社會科學初探-當電腦科學家遇上社會科學

26

Page 24: 計算社會科學初探-當電腦科學家遇上社會科學

27

Technical Challenges

Computational infrastructures for dealing withMore data: analyzing large amounts of dataFuzzy data: cleaning up inprecise and noisy dataNew kinds of data: processing real-time sensor streams and web data

Need for new substantive ideasNeed for new statistical methods (WHY in addition to WHAT and HOW)

Page 25: 計算社會科學初探-當電腦科學家遇上社會科學

28

3 Common Approaches

Macroscope Virtual LabEmpirical Modeling

Page 26: 計算社會科學初探-當電腦科學家遇上社會科學

29

APPROACHES#1 MACROSCOPE #2 VIRTUAL LAB#3 EMPIRICAL MODELING

Page 27: 計算社會科學初探-當電腦科學家遇上社會科學

31

WE ARE WHAT WE SAYLinguistics

Schwartz, H. Andrew, et al. "Personality, gender, and age in the language of social media: The open-vocabulary approach." PloS one 8.9 (2013): e73791.

Macroscope

Page 28: 計算社會科學初探-當電腦科學家遇上社會科學

32

Dataset

700 million words, phrases, and topic instances collected from 75,000 volunteers’ FB posts Record users’ personality (5-factor), gender and age

Page 29: 計算社會科學初探-當電腦科學家遇上社會科學

33

What Words Do You Use?

male

female

Page 30: 計算社會科學初探-當電腦科學家遇上社會科學

34

How Old Are You? (#1)13 - 18

19 - 22

Page 31: 計算社會科學初探-當電腦科學家遇上社會科學

35

How Old Are You? (#2)

23 - 29

30 - 65

Page 32: 計算社會科學初探-當電腦科學家遇上社會科學

36

Personality Traits

Extraversion

Introversion

Page 33: 計算社會科學初探-當電腦科學家遇上社會科學

38

Topics Across 4 Age-groups

Page 34: 計算社會科學初探-當電腦科學家遇上社會科學

39

Warm and Negative Words

Page 35: 計算社會科學初探-當電腦科學家遇上社會科學

40

Usage of “I” & “We”

Huge-volume data + simple analysis crystal clear language use patterns

Page 36: 計算社會科學初探-當電腦科學家遇上社會科學

42

APPROACHES#1 MACROSCOPE #2 VIRTUAL LAB#3 EMPIRICAL MODELING

Page 37: 計算社會科學初探-當電腦科學家遇上社會科學

43

Scaling up the Lab

Social science experimental heavily constrained by scale and speed

Unit of analysis was individuals or small groupsExperiments took months to design and run

Potentially “virtual labs” lift both constraintsState of the art ~ 5000 workers, but in principle could construct subject panel ~ 100K – 1M Could shrink hypothesis-testing cycle to days or hours

Page 38: 計算社會科學初探-當電腦科學家遇上社會科學

44

MOOD CONTAGION (& MANIPULATION) ON FACEBOOK

Social Psychology

Kramer, Adam DI, Jamie E. Guillory, and Jeffrey T. Hancock. "Experimental evidence of massive-scale emotional contagion through social networks.” Proceedings of the National Academy of Sciences111.24 (2014): 8788-8790.

Virtual Lab

Page 39: 計算社會科學初探-當電腦科學家遇上社會科學

45

Facebook Mood Contagion

0.7 million (~ 0.04%) users on Facebook3 million posts manipulated in one weekHide some “positive” or “negative” emotional posts from users (in the experimental group)

Page 40: 計算社會科學初探-當電腦科學家遇上社會科學

46

Observations

Negative posts hidden

People who see more positive posts, tend to post more positively, and vice versa.Facebook users’ emotion can be easily

manipulated by changing ALGORITHMS

Positive posts hidden

Page 41: 計算社會科學初探-當電腦科學家遇上社會科學

47

Ethical Issues (!)

Unethical experiment because it’s conducted without users’ consent

Serious invasion of users’ perceptions about their friend circles (and the society)

Well, Facebook's data use policy states that users' information will be used "for internal operations, including troubleshooting, data analysis, testing, research and service improvement," meaning that any user can become a lab rat.

Page 42: 計算社會科學初探-當電腦科學家遇上社會科學

48

FACEBOOK “I VOTED” BUTTONSocial Psychology & Politics

Bond, Robert M., et al. "A 61-million-person experiment in social influence and political mobilization." Nature 489.7415 (2012): 295-298.

Virtual Lab

Page 43: 計算社會科學初探-當電腦科學家遇上社會科學

49

“I Voted” Button

Direct messages to 61 million users on FBInformational: 1% users receivedSocial: 98% users receivedControl group: 1% (no message received)

Informational

Social

Page 44: 計算社會科學初探-當電腦科學家遇上社會科學

51

Effect of Manipulation

Ratio of friends voted

Prob. of oneself claimed voted

Page 45: 計算社會科學初探-當電腦科學家遇上社會科學

53

2% more likely to click “I voted” button and 0.3%more likely to seek information about a polling place, and 0.4% more likely to head to the polls.

Page 46: 計算社會科學初探-當電腦科學家遇上社會科學

54

Real-world Consequence (!)

In total there were about 60,000 votes of turnout, and estimated 280,000 indirect turnout (out of 61 million users)

What if Facebook did not randomize the control/experimental groups?

Page 47: 計算社會科學初探-當電腦科學家遇上社會科學

62

APPROACHES#1 MACROSCOPE #2 VIRTUAL LAB#3 EMPIRICAL MODELING

Page 48: 計算社會科學初探-當電腦科學家遇上社會科學

63

Empirical Modeling

Traditional mathematical or computational modeling

Tends to rely on many, often unrealistic, assumptions Not generally tested in detail against dataResult is proliferation of models that exist in parallel and are often incompatible with each other

New sources/scales of data allow both to learn/test models and also calibrate them

Observations Models Lab Field Observations

Page 49: 計算社會科學初探-當電腦科學家遇上社會科學
Page 50: 計算社會科學初探-當電腦科學家遇上社會科學

65

Google Flu Trends

Nature 457, 1012-1014 (2009)

Page 51: 計算社會科學初探-當電腦科學家遇上社會科學

66

PREDICTION OF COUNTY-LEVEL HEART DISEASE MORTALITY

Medicine and Linguistics

Empirical Modeling

Eichstaedt, Johannes C., et al. "Psychological language on twitter predicts county-level heart disease mortality." Psychological science 26.2 (2015): 159-169.

Page 52: 計算社會科學初探-當電腦科學家遇上社會科學

68

Datsets

Heart disease Arteriosclerotic heart diseasemortality rates during 2009 -- 2010

Predictors826 million tweets collected between June 2009 and March 2010Socioeconomic (income and education)Demographic (percentages of Black, Hispanic, married, and female residents)Health status (diabetes, obesity, smoking, and hypertension)

Page 53: 計算社會科學初探-當電腦科學家遇上社會科學

69

Prediction Accuracy

Page 54: 計算社會科學初探-當電腦科學家遇上社會科學

70

Page 55: 計算社會科學初探-當電腦科學家遇上社會科學

71

Page 56: 計算社會科學初探-當電腦科學家遇上社會科學

74

Language Use in Tweets

Page 57: 計算社會科學初探-當電腦科學家遇上社會科學

75

Social media opens up a new window of what humans actually feel and think

Page 58: 計算社會科學初探-當電腦科學家遇上社會科學

76

YOU ARE WHAT YOU LIKESocial Psychology

Empirical Modeling

Kosinski, Michal, David Stillwell, and Thore Graepel. "Private traits and attributes are predictable from digital records of human behavior." Proceedings of the National Academy of Sciences 110.15 (2013): 5802-5805.

Page 59: 計算社會科學初探-當電腦科學家遇上社會科學

78

Personality Prediction

Personality traitsGender, age, relationship status, # friendsSexual orientation, ethnicity, religion, political inclinationAddictive substances (alcohol, drugs, cigarette), parental separationIQ, 5-Factor model, satisfaction with Life

Page 60: 計算社會科學初探-當電腦科學家遇上社會科學

79

Data Collection

9,939,220 Likes (55,814 unique ones) from 58,466 Facebook volunteers

SportsMusicBooksRestaurantsPopular websites

Page 61: 計算社會科學初探-當電腦科學家遇上社會科學

80

Ground truth

Political Inclination

Sexual Orientation

Democrat Republican

Democratic GOP (Grand Old Party)

Democratic Party Republican Party

Homosexual Heterosexual

1 / 0 1 / 0

Page 62: 計算社會科學初探-當電腦科學家遇上社會科學

82

Ground truth

5-Factor ModelOpennessConscientiousnessExtraversionAgreeablenessStability

Page 63: 計算社會科學初探-當電腦科學家遇上社會科學

83

Ground truth

Satisfaction with Life (SWL)

Page 64: 計算社會科學初探-當電腦科學家遇上社會科學

85

Methodology

User-Like matrix dimension reduction: Singular Value Decomposition (SVD)Prediction models: Logistic Regression & Linear Regression

Page 65: 計算社會科學初探-當電腦科學家遇上社會科學

87

Prediction ResultsSolid: Pearson corr. coef. between pred. & actual valuesTransparent: baseline acc. of the questionnaire, in terms of test-retest reliability

Page 66: 計算社會科學初探-當電腦科學家遇上社會科學

89

Discriminative Likes (#1)

Page 67: 計算社會科學初探-當電腦科學家遇上社會科學

90

Discriminative Likes (#2)

Page 68: 計算社會科學初探-當電腦科學家遇上社會科學

91

Discriminative Likes (#3)

Page 69: 計算社會科學初探-當電腦科學家遇上社會科學

92

Likes are Culture-Dependent (#1)

卡提諾正妹抱報 Catworld小舖

Garena《英雄聯盟 LOL》 QUEEN FASHION SHOP

遊戲大亂鬥 范范范瑋琪

Garena-TW 撿便宜特賣會

好色龍 衣芙日系

放棄治療 王大陸

這樣變型男 就愛網拍特賣會

Taipei Assassins (台北暗殺星) 86小舖商城

你為什麼要放棄治療呢 H.H先生

Toyz LOVFEE

Page 70: 計算社會科學初探-當電腦科學家遇上社會科學

93

Likes are Culture-Dependent (#2)

已婚

嬰兒與母親懷孕生產情報站 學生愛打工

未婚

味全MyWei Duncan

Estee Lauder Taiwan 雅詩蘭黛 林俊傑 JJ Lin

光泉"HOT"鮮奶 Cherng

舒潔溫柔心感動 Byebyechuchu

綠巨人 Dcard

AVON Taiwan 雅芳粉絲團 田馥甄Hebe

人人玩遊戲 彭于晏 Eddie Peng

Creative Baby -台灣 Dorothy

阿默典藏蛋糕 韋禮安Weibird

Can we have real privacy on social media?Unprecedented opportunity to observe individuals in a society

Page 71: 計算社會科學初探-當電腦科學家遇上社會科學

100

資料分析如何幫我們更瞭解捐款人?

Page 72: 計算社會科學初探-當電腦科學家遇上社會科學

101

x 3,518

in 10.5 years (since May 2003)

Page 73: 計算社會科學初探-當電腦科學家遇上社會科學

102

Page 74: 計算社會科學初探-當電腦科學家遇上社會科學

103

AppleDaily Charity Case Dataset

3000+ cases along with detailed description and donation records

Page 75: 計算社會科學初探-當電腦科學家遇上社會科學

104

20 50 80

捐款金額分布 (每戶個案家庭)

Page 76: 計算社會科學初探-當電腦科學家遇上社會科學

105

Page 77: 計算社會科學初探-當電腦科學家遇上社會科學

106

DATA COLLECTION

Page 78: 計算社會科學初探-當電腦科學家遇上社會科學

107

Crawling

http://search.appledaily.com.tw/charity/projlist/

Page 79: 計算社會科學初探-當電腦科學家遇上社會科學

108

Web page parsing

Page 80: 計算社會科學初探-當電腦科學家遇上社會科學

113

# donors

Page 81: 計算社會科學初探-當電腦科學家遇上社會科學

114

# donors w/ linear fitting

Page 82: 計算社會科學初探-當電腦科學家遇上社會科學

115

Adjusted Time Series

Page 83: 計算社會科學初探-當電腦科學家遇上社會科學

116

ANNOTATION

Page 85: 計算社會科學初探-當電腦科學家遇上社會科學

118

Page 86: 計算社會科學初探-當電腦科學家遇上社會科學

119

http://bountyworkers.net/

Page 87: 計算社會科學初探-當電腦科學家遇上社會科學

120

人工編碼成果

431編碼者

6532人次

255小時

8436家庭成員

1590個案

Page 88: 計算社會科學初探-當電腦科學家遇上社會科學

121

Sample Annotations

Page 89: 計算社會科學初探-當電腦科學家遇上社會科學

122

Variables we got (290+)

Page 90: 計算社會科學初探-當電腦科學家遇上社會科學

124

Methodology

Predict # donors and donation amountFeature selection based on mutation informationUsing libsvm to do 2-class classification

Classifying top 25% and bottom 25% cases by removing the middle 50% cases10-fold cross validation

Find out significant factors that determine the dependent variable(s)

Page 91: 計算社會科學初探-當電腦科學家遇上社會科學

126

Factor Categories

SubjectStructureFinance Member PresentationMeta

Page 92: 計算社會科學初探-當電腦科學家遇上社會科學

127

Factor – Members Category

Subject & MemberAge, gender, marital statusDisability, disease, accident, habit, status

Page 93: 計算社會科學初探-當電腦科學家遇上社會科學

129

Factor – Structure Category

StructureCount and ratio of particular types of family membersRelationships between members

Page 94: 計算社會科學初探-當電腦科學家遇上社會科學

130

Factor – Finance Category

FinanceIs the family below the poverty line?Regular income & expense

Page 95: 計算社會科學初探-當電腦科學家遇上社會科學

131

Factor – Presentation

PresentationCurrently, only title and images are evaluatedSubjective ratings from human subjects

Page 96: 計算社會科學初探-當電腦科學家遇上社會科學

133

Title & picture rating

http://mmnet.iis.sinica.edu.tw/~cslin/rating/welcome.php

Page 97: 計算社會科學初探-當電腦科學家遇上社會科學

134

Factor – Meta Information

Meta informationInformation unrelated to the family & its situationE.g., article writer and when was the article published

Page 98: 計算社會科學初探-當電腦科學家遇上社會科學

135

Page 99: 計算社會科學初探-當電腦科學家遇上社會科學

136

捐款意願與時間點高度相關

Page 100: 計算社會科學初探-當電腦科學家遇上社會科學

137

星期幾很重要

日 一 二 三 四 五

Page 101: 計算社會科學初探-當電腦科學家遇上社會科學

138

哪個月份也重要

一 二 三 四 五 六 七 八 九 十 十一 十二

Page 102: 計算社會科學初探-當電腦科學家遇上社會科學

139

受訪者的胖瘦會影響捐款決策

Page 103: 計算社會科學初探-當電腦科學家遇上社會科學

140

Page 104: 計算社會科學初探-當電腦科學家遇上社會科學

142

誰收到較多捐款?

Page 105: 計算社會科學初探-當電腦科學家遇上社會科學

144

捐款人對各式疾病及身心障礙有差別待遇

Page 106: 計算社會科學初探-當電腦科學家遇上社會科學

145

Page 107: 計算社會科學初探-當電腦科學家遇上社會科學

147

Page 108: 計算社會科學初探-當電腦科學家遇上社會科學

149

不可抗力因素較讓人同情

Page 109: 計算社會科學初探-當電腦科學家遇上社會科學

150

意外失業 離婚入獄 人為

意外輟學

Page 110: 計算社會科學初探-當電腦科學家遇上社會科學

152

Page 111: 計算社會科學初探-當電腦科學家遇上社會科學

154

捐款與固定支出成反比

個案家庭固定支出

捐款金額

Page 112: 計算社會科學初探-當電腦科學家遇上社會科學

155

捐款者期待能看見「希望」

Page 113: 計算社會科學初探-當電腦科學家遇上社會科學

156

CASE STUDY

Page 114: 計算社會科學初探-當電腦科學家遇上社會科學

157

Successful Case

Page 115: 計算社會科學初探-當電腦科學家遇上社會科學

158

Less Successful Case

Page 116: 計算社會科學初探-當電腦科學家遇上社會科學

162

TEXT MINING APPROACH

Page 117: 計算社會科學初探-當電腦科學家遇上社會科學

163

C-LIWC簡介從James Pennebaker的LIWC (Linguistic Inquiry and Word Count) 發展而來由台科大與台大心理團隊,依照中文特性增刪類別與語詞,編製而成總計88個類別,6862個詞與詞幹語言特性與寫作風格多少能反應個人特質、影響讀者的感受此文本分析方法,逐漸被廣泛使用在心理學相關研究主題。如:道歉與原諒、測謊、治療過程的語言變化、心理位移等C-LIWC官網:http://cliwc.weebly.com/

Page 118: 計算社會科學初探-當電腦科學家遇上社會科學

164

中文版語文探索與字詞計算字典(C-LIWC)

Page 119: 計算社會科學初探-當電腦科學家遇上社會科學

165

家庭詞、死亡詞、健康詞相關:家庭詞、死亡詞、健康詞大致和捐款皆成正相關推論:當事件主題符合傳統價值時較易引起捐款

(r, p-value) 家庭詞 死亡詞 健康詞log(捐款總額) (r=0.148,

p=0.000)(r=0.101, p=0.000)

(r=0.056, p=0.026)

捐款人數 (r=0.131, p=0.000)

(r=0.113, p=0.000)

(r=0.058, p=0.021)

每人平均捐款額 (r=0.129, p=0.000)

(r=0.084, p=0.001)

(r=0.007,p=0.771)

範例 母親、婆婆、阿公、家屬、堂妹、繼父、雙親

火化、死者、自殺、告別式、往生、致死

中風、糖尿病、結石、住院、安眠藥

Page 120: 計算社會科學初探-當電腦科學家遇上社會科學

166

文章總詞數相關:文章總詞數和捐款成正相關推論:將事件敘述越詳盡,越容易募到款

(r, p-value) 總詞數 (word count)log(捐款總額) (r=0.101, p=0.000)

捐款人數 (r=0.056, p=0.027)

每人平均捐款額 (r=0.143, p=0.000)

Page 121: 計算社會科學初探-當電腦科學家遇上社會科學

167

工作詞、成就詞、金錢詞

相關:工作詞、成就詞、金錢詞大致和捐款皆成負相關推論:和工作相關的主題,相較不易募得款項

(r, p-value) 工作詞 成就詞 金錢詞log(捐款總額) (r=-0.079, p=0.002) (r=-0.064, p=0.011) (r=-0.072, p=0.004)

捐款人數 (r=-0.099, p=0.000) (r=-0.085, p=0.000) (r=-0.025, p=0.319)

每人平均捐款額 (r=-0.022, p=0.380) (r=-0.020, p=0.001) (r=-0.101, p=0.000)

範例 勞工、契約、付費、裁員、生意、員工、職業

升遷、職權、權威、嘉獎、能幹、高層、榮耀

帳戶、租金、商店、現金、消費、捐贈

Page 122: 計算社會科學初探-當電腦科學家遇上社會科學

168

其它否定詞

範例:不滿、不幸、不能、無關、不料、不須相關:和平均每人捐款額呈負相關(r=-0.063, p=0.013)推論:正面描述較佳

副詞範例:真的、終於、確實、一定、一向、不管、全然相關:和平均每人捐款額呈負相關(r=-0.084, p=0.001)推論:平實地描述即可,過度誇大或多加贅述易有反效果

Page 123: 計算社會科學初探-當電腦科學家遇上社會科學

170

ONGOING WORK

Page 124: 計算社會科學初探-當電腦科學家遇上社會科學

171

Page 125: 計算社會科學初探-當電腦科學家遇上社會科學

174

Opportunities to explore

Incentive provisioningLet doners keep track their own donation recordDoner profile, like KivaRe-visit the families being helped

Viral marketing

Cognitive biasesAnchoring effectEndowed progress effect

(see https://en.wikipedia.org/wiki/List_of_cognitive_biases)

Page 126: 計算社會科學初探-當電腦科學家遇上社會科學

175

Page 127: 計算社會科學初探-當電腦科學家遇上社會科學

176

CONCLUSION & OUTLOOK

Page 128: 計算社會科學初探-當電腦科學家遇上社會科學

177

WE ARE STILL AT THE VERY START

Page 129: 計算社會科學初探-當電腦科學家遇上社會科學

178

LOTS of Big Questions

The polarization of global economic inequality What explains the success of social movements?The emergence of pro-sociality behaviorThe causality of video gaming and propensity of violence?The politics of censorshipThe causality of social selection and social influence?…

Page 130: 計算社會科學初探-當電腦科學家遇上社會科學

179

The Data Divide

Social scientists have good questions but…IT tools are not part of their toolkitsNot clear that we will/should make the investment

Computer scientists have powerful methods but…

Trained to resolve technical problemsIt seems there are less “methodological” contributions

Page 131: 計算社會科學初探-當電腦科學家遇上社會科學

180

The Challenges

Education and habits of social and computer scientists

Different ways of thinking Different methodologies Differences in framing questions and defining contributions

Data access and fragmentation issueData privacy issue Ethics issueOrganizational issue

Page 132: 計算社會科學初探-當電腦科學家遇上社會科學

182

Institutional Innovations

New platforms and protocols for data management

Better coordination of data collection, storage, sharingRecruitment and management of subject pools, field panels

Integrated research designsCoordination across theoretical, experimental and observational studies

Collaborative interdisciplinary teamsFor a given data set, often unclear what the most interesting question isFor a given question, often unclear how to collect the right data

Page 133: 計算社會科學初探-當電腦科學家遇上社會科學

184

Techniques to collect, manage, and process large

datasets

Knowledge about social theories, methods, and

issues

ComputationalSocial Science

Page 134: 計算社會科學初探-當電腦科學家遇上社會科學

185

Sheng-Wei ChenAcademia Sinica