Upload
cristina-sarasua
View
120
Download
0
Embed Size (px)
Citation preview
Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Exploring the challenge of linking scientific publications
and studies with crowd workers instead of domain experts
Cristina Sarasua
Computational Social Science workshop
Köln, 16.12.2013
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
FOTO
Ideal workflow
Peter Schumacher (social scientist) would like to analyse
the voting patterns of Germans in the last 20 years
Past observations
New analysis, new findings
Read publications Access data Reuse data1 2 3
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
Reality
Publications and research data (coming from surveys and
studies) are published independently
The link between them is missing
Researchers cannot easily access the research data
FOTO?
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
Scenario
We need a method to
process publications and
studies in order to be able
to
1. Find references to
studies inside
publications
2. Identify which
publication is connected
to which study
3. Identify the type of
relation between
publication and study
publications
research data (studies)
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
Problem
Computers cannot perform these 3 tasks automatically in a
perfect way
We need human intervention
Domain experts are often not available for such kind of
tasks
Incorrect link between a
publication and a study
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
Solution: Crowdsourcing
“The process of outsourcing a task to a (potentially) large and
undefined group of people in an open call“ Jeff Howe, 2006
Microtask crowdsourcing
-Simple and independent tasks
-Paid crowdsourcing
-Online labor marketplaces (e.g. MTurk)
-
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
Amazon Mechanical Turk
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
1) Automatic processing of publications and studies
2) Ask crowd workers to review links
- Correct errors
- Identify primary literature / secondary literature
3) Generates Linked Data
Hybrid solution
SSOAR
da|ra
InfoLink
CrowdLINK
links
corrected links
1
PublicationsWeb
portal
Web
portal
Researcher
Research data
3
2
Crowdsourced interlinking: the GESIS case study
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
How is this related to CSS?
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
On the one hand …
The GESIS case study
In collab with GESIS colleagues
Katarina Boland, Daniel Hienert et al.
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
On the other hand …
How to manage such a
group of people to maximize
their efficiency and make
them happy?
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
2010Chart: Ipeirotis, 2010
Different background
Open call
We can impose some restrictions (e.g. language, country,
reputation gained)
Spam
Charts: Charts Ross et al., 2010
Different motivations Different behaviour
CrowdFlower 11.12.2013
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
They are not the “most exciting tasks“ of the world
The data is in German
The domain is very specific
The tasks at hand
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
First experiments of the GESIS case study
Adopted measures
Used majority voting
Included verification questions (e.g. “please type the date shown for the
publication“)
Defined gold standard links to check who could be trusted
Highlights of findings
We managed to get trusted workers quite quickly (e.g. 490 links reviewed
in ~24hours) being able to improve the precision of the automatic software
without without loosing considerable recall
The cases which required background knowledge showed worse results
The task of “relating publication and study“ was solved with much better
recall than the task of deciding on “whether a publication is
primaryLiterature or not of a study“. The precision was very high, though.
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
Ongoing research work
Can we improve their results by including mixed
incentives? Not only money, but also competition at a
microtask level
How can we better instruct crowd workers in 1) the type of
tasks were are running and 2) the domain we are working
with?
there are only X links left, be
quick!“, or „there are three workers
who were faster in reviewing links!
there 3 workers who were faster in
reviewing links!
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
Take-home message
We can employ crowd workers for connecting scientific
publications and studies in the social sciences. It can improve
automatically generated links.
How can we transfer the knowledge of domain
experts to the crowd?
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
Call for discussion
Who?
1. Psychologists
2. Social Scientists
3. Computer scientists
Possible topics
Any feedback about the aforementioned ideas
Well-established methodologies in psychology to instruct
or train a large group of people
Any suggestion on how to analyse crowd workers (i.e.
criteria)
WeST Cristina SarasuaExploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
Thank you.
Vielen Dank.