My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone


Citation preview @laroyo

Disrupting the Semantic

Lora Aroyo

Web & Media Group

Web & Media Group @laroyo

BulgariaThe Netherlands



Personal Semantics

Web & Media Group @laroyo

Riva del Garda, Italy, 2014

Semantic Social Life

Web & Media Group @laroyo 4

To understand the value of Semantic Web for e-learning

you have to understand people, e.g. how they learn, interact &

consume information

Web & Media Group @laroyo 5

To understand the value of Semantic Web for e-learning

you have to understand people, e.g. how they interact &

consume information

Web & Media Group @laroyo 6

To understand the value of Semantic Web for cultural heritage

you have to understand people, e.g. how they interact & consume information

Web & Media Group @laroyo 7

To understand the value of Semantic Web for cultural heritage

you have to understand people, e.g. how they interact & consume information

Web & Media Group @laroyo

To understand the value of Semantic Web for digital humanities, you have to

understand people, e.g. how they interact & consume information

Web & Media Group @laroyo

people are in the center of everythingpeople & their semantics, i.e. their real-world behavior,

online interactions, information needs, information consumption habits, personal preferences ...

Web & Media Group @laroyoCrowdTruth team @laroyo

Web & Media Group

the evolution of the semantic web:great moments from the 1980s to ESWC 2017 @laroyo

50’AI more or less begins......

80’expert systems90’knowledge acquisition from experts

00’standards & interoperability10’big data & large crowds

A long time agoin a galaxy far, far away … @laroyo

80’s - empire of the experts @laroyo

Advances in hardware and SDEsPCs, workstations, Symbolics, SunNew architectures like the Hypercube LISP, Prolog, OPSAI can now BUILD SYSTEMS

Primary focus on experts and rules

What is the knowledge of expertsWhat is the form of this knowledge?Graphs, logic, rules, frames

How do experts reason?Deduction, induction

80’s - empire of the experts

Work on form & process remained academic

what happened inside the system, to make the reasoning inside the system proper and as good as possible

industry forged ahead with ad-hoc & proprietary systems and actually tried to build expert systems

Originals of uncertain KRFuzzy, probabilistic @laroyo

Piero Bonissone and the DELTA/CATS expert system for

locomotive repair with David Smith, a locomotive repair expert

Buchanan and Shortliff’s MYCIN project at Stanford built an huge rule base for medicat diagnosis working with an extensive team of

medical experts. @laroyo

90’s - knowledge acquisition from experts @laroyo @laroyo

90’s - knowledge acquisition from expertsThe 90’s brought [attention for] knowledge acquisition. Knowing that expert systems by then can functionally work, the focus [in

practice as well as scientific research and technology development] shifted to the then-bigger challenge of how to acquire knowledge in real-world scenarios.

It seems natural that after the look inside the systems, then one needed to pay attention to how actually get the knowledge from the world outside and frame it into the proper structured knowledge for inside the system.

Dream of the 90’s @laroyo @laroyo

00’s - interoperability & standards odyssey @laroyo

10’s - AI Awakens• Machine Learning• Neural networks• Solving basic perceptual problems instead of high-expertise ones• Ambiguity tolerant reasoning• Non-taxonomic ordering → non-taxonomic reasoning • folksonomies, clustering, diversity of perspectives, embeddings

Web & Media Group @laroyo

2011 @laroyo

10’s – Big Data

Web & Media Group @laroyo

Human AnnotationCentral in Machine Learning

Training & Evaluation

10’s – Crowds @laroyo

Web & Media Group

Team BellKor wins Netflix Prize

20071998 2006 2009

Web & Media Group @laroyo

Web & Media Group @laroyo

the semantic comfort


Web & Media Group @laroyo

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example

All examples are created equal: triples are triples, one is not more important than another, they are all either true or false

Disagreement bad: when people disagree, they don’t understand the problem

Experts rule: knowledge is captured from domain experts

One is enough: knowledge by a single expert is sufficient

Detailed explanations help: if examples cause disagreement - add instructions

Once done, forever valid: knowledge is not updated; new data not aligned with old

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Web & Media Group @laroyo

Use Case:video archive enrichment

Search Behavior of Media Professionals at an Audiovisual Archive: A Transaction Log Analysis (2009).

B. Huurnink, L. Hollink, W. van den Heuvel, M. de Rijke.

Web & Media Group @laroyo

Use Case:video archive enrichment

Goal: make the

multimedia content ofDutch National Video Archiveaccessible to large audiences

Comfort Zone Solution: media professionals watch & annotate videos. Of course!

Web & Media Group @laroyo

but ...

ExpensiveDoesn’t scale

time-consuming5 times the video duration

professional vocabularyexperts use a specific vocabulary

that is unknown to general audiences

Web & Media Group @laroyo

… and

people search for fragmentsexperts annotate full videos

not finding35% of search queries result in not found

Web & Media Group @laroyo

Use Case:real world QA

for Watson

Crowdsourcing ground truth for Question Answering using CrowdTruth (2015).B Timmermans, L Aroyo, C Welty

Web & Media Group @laroyo

Goal: gather questions

that real people ask for training & evaluating Watson

Data: 30K Questions + Candidate Answers.

from Yahoo! Answers

Comfort Zone Solution: ask people if the passage answers the question (Y/N). Simple!

Use Case:real world QA

for Watson

Web & Media Group @laroyo

Contradicting evidenceIs Coral a plant? • “Coral almost could be considered half-plant [..]”• “[..] organism, such as a coral, resembling a stony plant.”

Unanswerable questions• Can I take a pill if you don't have a child yet?• Is the spelling for being drunk right?• Is napster black?

Unclear answer typeIs paper animal plant or man made?

Multiple right answers to a questionWhat is the best university in NY? (subjective)

YES or NO?

Web & Media Group @laroyo

Use Case:medical relation

extraction for Watson

Crowdsourcing Ground Truth for Medical Relation Extraction (2017). A Dumitrache, L Aroyo, C Welty

Web & Media Group @laroyo

Goal: gather data to train

Watson to read medical text & automatically

extract a medical relations KB

Comfort Zone Solution: having medical experts read & annotate examples

Use Case:medical relation

extraction for Watson

Web & Media Group @laroyo

ANTIBIOTICS are the first line treatment for indications of TYPHUS. treats(ANTIBIOTICS, TYPHUS)? Expert: yes

Patients with TYPHUS who were given ANTIBIOTICS exhibited side-effects. treats(ANTIBIOTICS, TYPHUS)? Expert: yes

With ANTIBIOTICS in short supply, DDT was used during WWII to control the insect vectors of TYPHUS. treats(ANTIBIOTICS, TYPHUS)? Expert: yes.

Are these three really all the same???

Web & Media Group @laroyo

Use Case:map music to moods

Web & Media Group @laroyo

Use Case:map music to moods

Goal: annotate songs with emotional tags

Comfort Zone Solution: people assign the prevalent mood of a song

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Otherpassionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into

rousing, cheerful, fun, poignant, wistful, campy, quirky, tense, anxious, any of the 5confident, sweet, amiable, bittersweet, whimsical, witty, intense, volatile, clustersboisterous, good-natured autumnal, wry visceral

rowdy brooding

Choose one:

Which is the mood most appropriate for each song?


(Lee and Hu 2012)

1 song - 1 mood???

Web & Media Group @laroyo

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example

All examples are created equal: triples are triples, one is not more important than another, they are all either true or false

Disagreement bad: when people disagree, they don’t understand the problem

Experts rule: knowledge is captured from domain experts

One is enough: knowledge by a single expert is sufficient

Detailed explanations help: if examples cause disagreement - add instructions

Once done, forever valid: knowledge is not updated; new data not aligned with old

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Web & Media Group @laroyo

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example

All examples are created equal: triples are triples, one is not more important than another, they are all either true or false

Disagreement bad: when people disagree, they don’t understand the problem

Experts rule: knowledge is captured from domain experts

One is enough: knowledge by a single expert is sufficient

Detailed explanations help: if examples cause disagreement - add instructions

Once done, forever valid: knowledge is not updated; new data not aligned with old

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Semantic Comfort Zone

Web & Media Group @laroyo

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example

All examples are created equal: triples are triples, one is not more important than another, they are all either true or false

Disagreement bad: when people disagree, they don’t understand the problem

Experts rule: knowledge is captured from domain experts

One is enough: knowledge by a single expert is sufficient

Detailed explanations help: if examples cause disagreement - add instructions

Once done, forever valid: knowledge is not updated; new data not aligned with old

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Semantic Comfort Zone


Web & Media Group @laroyo

Web & Media Group @laroyo

interestingly …

Web & Media Group @laroyo

• collective decisions of large groups of people

• a group of error-prone decision-makers can be surprisingly good at picking the best choice

• when thumbs up or thumbs down - the chance of picking the right answer needs to be > 50%

• the odds that a most of them will pick the right answer is greater than any of them will pick it on their own

• performance gets better as size grows

1785 Marquis de Condorcet

“wisdom of crowds”

Web & Media Group @laroyo

• asked 787 people to guess the weight of an ox

• none got the right answer

• their collective guess was almost perfect

1906Sir Francis Galton

“wisdom of crowds”

Web & Media Group @laroyoWWII Math Rosies

1942: Ballistics calculations and flight trajectories

Web & Media Group @laroyoNASA’s Computer Room

transcribe raw flight data from celluloid film & oscillograph paper

Web & Media Group @laroyo

can we harness it? @laroyo

Web & Media GroupCrowdTruth @laroyo

Web & Media Group

CrowdTruthThree basic causes of disagreement: workers, examples, target semantics

Disagreement is signal, not noise.

It is indicative of the variation in human semantic interpretation

It can indicate ambiguity, vagueness, similarity, over-generality, etc, as well as quality

Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data (2014)

O Inel, A Dumitrache, l.Aroyo, C. Welty

Web & Media Group @laroyo

one truth: multiple truths

all examples are created equal: each example is unique

disagreement bad: disagreement is good

experts rule: crowd rules

one is enough: the more the better

detailed explanations help: keep it simple stupid

once done, forever valid: maintenance is necessary

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Web & Media Group @laroyo

changes neededvideo archive enrichment

improve support for fragment search

time-based annotations

bridging vocabulary gap between searcher & cataloguer

Web & Media Group @laroyo

crowdsourcingvideo tagging

two video tagging pilots

Web & Media Group @laroyo


engage crowds

through continuous

gaming @laroyo

Web & Media Group

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011 @laroyo

Web & Media Group


just “tags”

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011 @laroyo

Web & Media Group

objects (57%)

westminster abbeyabbeypriestergeestelijken


koetskroningmensenmassaparadekroon regen

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011 @laroyo

Web & Media Group

persons (31%)



objects (57%)

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011 @laroyo

Web & Media Group

user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google

locations (7%)


locations (7%)

persons (31%)

objects (57%)

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011 @laroyo

Web & Media Group

user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google

locations (7%)

describe mainly short segmentsoften not very specificdon’t describe programmes as a whole

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011

user vocabulary8% in professional vocabulary23% in Dutch lexicon89% found on Google

Web & Media Group @laroyo

crowdsourcingmedical relation


diversity of opinionsindependent perspectives

multitude of contexts

we exposed a richer set of possibilitiesthat help in identifying, processing

& understanding context

Web & Media Group @laroyo

Does this sentence express TREATS(Antibiotics, Typhus)?

Patients with TYPHUS who were given ANTIBIOTICS exhibited several side-effects.

With ANTIBIOTICS in short supply, DDT was used during World War II to control the insect vectors of TYPHUS.

ANTIBIOTICS are the first line treatment for indications of TYPHUS. 95%



The crowd results captures the natural ambiguity @laroyo

Web & Media Group

What is the relation between the highlighted terms?

He was the first physician to identify the relationship between HEMOPHILIA and HEMOPHILIC ARTHROPATHY.

Experts Hallucinate

Crowd reads text literally - provide better examples to machine

experts: cause crowd: no relation @laroyo

Web & Media Group

Unclear relationship between the two arguments reflected in the disagreement

Medical Relation Extraction @laroyo

Web & Media Group

Clearly expressed relation between the two arguments reflected in the agreement

Medical Relation Extraction @laroyo

Web & Media Group

Unclear relationship between the two arguments reflected in the disagreement

Medical Relation Extraction @laroyo

Web & Media Group @laroyo

Web & Media Group

Learning Curves

(crowd with pos./neg. threshold at 0.5)

above 400 sent.: crowd consistently over baseline & singleabove 600 sent.: crowd out-performs experts @laroyo

Web & Media Group

Learning Curves Extended

(crowd with pos./neg. threshold at 0.5)

crowd consistently performs better than baseline @laroyo

Web & Media Group

# of Workers: Impact on Sentence-Relation Score

Web & Media Group @laroyo

Training a Relation Extraction Classifier

F1 Cost per sentence

CrowdTruth 0.642 $0.66

Expert Annotator 0.638 $2.00

Single Annotator 0.492 $0.08

“wisdom of the crowd”provides training data that is at least as good

if not better than experts

only with proper analytic framework for harnessing disagreement from the crowd @laroyo

Web & Media Group

map music to moods

Goal: tag songs with emotional clusters

Comfort Zone Solution: people assign the prevalent mood of a song

Web & Media Group @laroyo

Web & Media Group @laroyo

Is this song ….














Web & Media Group @laroyo

If “One Truth” & “No Disagreement”Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5

W1 1

W2 1

W3 1

W4 1

W5 1

W6 1



W9 1

W10 1

Totals 1 3 1 2 1

Web & Media Group @laroyo

Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other

W1 1 1 1

W2 1 1 1

W3 1 1 1

W4 1 1

W5 1 1

W6 1 1 1

W7 1 1 1

W8 1 1 1

W9 1 1

W10 1 1 1 1 1

Totals 3 5 6 5 2 8

If “Many Truths” & “Disagreement”

Web & Media Group @laroyo

can indicate alternative interpretations

Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other

W10 1 1 1 1 1

Totals 3 5 6 5 2 8

Disagreement as Signal

can indicate ambiguity in the


can indicate low quality workers @laroyo

so … @laroyo

getting comfortable

again @laroyo

Take Home MessagePeople first, experts second

True and False is not enough,

There is diversity in human interpretation

CrowdTruth introduces a spatial representation

of meaning that harnesses disagreement

With CrowdTruth untrained workers can be just as

reliable as highly trained experts @laroyo