My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone

http://lora-aroyo.org @laroyo

Disrupting the Semantic

Lora Aroyo

Web & Media Group

Web & Media Group


BulgariaThe Netherlands

Sofia

NYC

Personal Semantics

Web & Media Group


Riva del Garda, Italy, 2014

Semantic Social Life

Web & Media Group

http://lora-aroyo.org @laroyo 4

To understand the value of Semantic Web for e-learning

you have to understand people, e.g. how they learn, interact &

consume information

Web & Media Group


To understand the value of Semantic Web for e-learning

you have to understand people, e.g. how they interact &

consume information

Web & Media Group


To understand the value of Semantic Web for cultural heritage

you have to understand people, e.g. how they interact & consume information

Web & Media Group


To understand the value of Semantic Web for cultural heritage

you have to understand people, e.g. how they interact & consume information

Web & Media Group


To understand the value of Semantic Web for digital humanities, you have to

understand people, e.g. how they interact & consume information

Web & Media Group


people are in the center of everythingpeople & their semantics, i.e. their real-world behavior,

online interactions, information needs, information consumption habits, personal preferences ...

Web & Media Group

http://lora-aroyo.org @laroyoCrowdTruth team


Web & Media Group

the evolution of the semantic web:great moments from the 1980s to ESWC 2017


50’AI more or less begins......

80’expert systems90’knowledge acquisition from experts

00’standards & interoperability10’big data & large crowds

A long time agoin a galaxy far, far away …


80’s - empire of the experts


Advances in hardware and SDEsPCs, workstations, Symbolics, SunNew architectures like the Hypercube LISP, Prolog, OPSAI can now BUILD SYSTEMS

Primary focus on experts and rules

What is the knowledge of expertsWhat is the form of this knowledge?Graphs, logic, rules, frames

How do experts reason?Deduction, induction

80’s - empire of the experts

Work on form & process remained academic

what happened inside the system, to make the reasoning inside the system proper and as good as possible

industry forged ahead with ad-hoc & proprietary systems and actually tried to build expert systems

Originals of uncertain KRFuzzy, probabilistic


Piero Bonissone and the DELTA/CATS expert system for

locomotive repair with David Smith, a locomotive repair expert

Buchanan and Shortliff’s MYCIN project at Stanford built an huge rule base for medicat diagnosis working with an extensive team of

medical experts.


90’s - knowledge acquisition from experts


http://www.youtube.com/watch?v=TZt-pOc3moc&t=47


90’s - knowledge acquisition from expertsThe 90’s brought [attention for] knowledge acquisition. Knowing that expert systems by then can functionally work, the focus [in

practice as well as scientific research and technology development] shifted to the then-bigger challenge of how to acquire knowledge in real-world scenarios.

It seems natural that after the look inside the systems, then one needed to pay attention to how actually get the knowledge from the world outside and frame it into the proper structured knowledge for inside the system.

Dream of the 90’s

https://www.youtube.com/watch?v=HX8BsX3IIa4

https://www.youtube.com/watch?v=HX8BsX3IIa4



00’s - interoperability & standards odyssey


10’s - AI Awakens• Machine Learning• Neural networks• Solving basic perceptual problems instead of high-expertise ones• Ambiguity tolerant reasoning• Non-taxonomic ordering → non-taxonomic reasoning • folksonomies, clustering, diversity of perspectives, embeddings

Web & Media Group


2011


10’s – Big Data

Web & Media Group


Human AnnotationCentral in Machine Learning

Training & Evaluation

10’s – Crowds


Web & Media Group

Team BellKor wins Netflix Prize

20071998 2006 2009

Web & Media Group


Web & Media Group


the semantic comfort

zone

Web & Media Group


One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example

All examples are created equal: triples are triples, one is not more important than another, they are all either true or false

Disagreement bad: when people disagree, they don’t understand the problem

Experts rule: knowledge is captured from domain experts

One is enough: knowledge by a single expert is sufficient

Detailed explanations help: if examples cause disagreement - add instructions

Once done, forever valid: knowledge is not updated; new data not aligned with old

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Web & Media Group


Use Case:video archive enrichment

Search Behavior of Media Professionals at an Audiovisual Archive: A Transaction Log Analysis (2009).

B. Huurnink, L. Hollink, W. van den Heuvel, M. de Rijke.

Web & Media Group


Use Case:video archive enrichment

Goal: make the

multimedia content ofDutch National Video Archiveaccessible to large audiences

Comfort Zone Solution: media professionals watch & annotate videos. Of course!

Web & Media Group


but ...

ExpensiveDoesn’t scale

time-consuming5 times the video duration

professional vocabularyexperts use a specific vocabulary

that is unknown to general audiences

Web & Media Group


… and

people search for fragmentsexperts annotate full videos

not finding35% of search queries result in not found

Web & Media Group


Use Case:real world QA

for Watson

Crowdsourcing ground truth for Question Answering using CrowdTruth (2015).B Timmermans, L Aroyo, C Welty

Web & Media Group


Goal: gather questions

that real people ask for training & evaluating Watson

Data: 30K Questions + Candidate Answers.

from Yahoo! Answers

Comfort Zone Solution: ask people if the passage answers the question (Y/N). Simple!

Use Case:real world QA

for Watson

Web & Media Group


Contradicting evidenceIs Coral a plant? • “Coral almost could be considered half-plant [..]”• “[..] organism, such as a coral, resembling a stony plant.”

Unanswerable questions• Can I take a pill if you don't have a child yet?• Is the spelling for being drunk right?• Is napster black?

Unclear answer typeIs paper animal plant or man made?

Multiple right answers to a questionWhat is the best university in NY? (subjective)

YES or NO?

Web & Media Group


Use Case:medical relation

extraction for Watson

Crowdsourcing Ground Truth for Medical Relation Extraction (2017). A Dumitrache, L Aroyo, C Welty

Web & Media Group


Goal: gather data to train

Watson to read medical text & automatically

extract a medical relations KB

Comfort Zone Solution: having medical experts read & annotate examples

Use Case:medical relation

extraction for Watson

Web & Media Group


ANTIBIOTICS are the first line treatment for indications of TYPHUS. treats(ANTIBIOTICS, TYPHUS)? Expert: yes

Patients with TYPHUS who were given ANTIBIOTICS exhibited side-effects. treats(ANTIBIOTICS, TYPHUS)? Expert: yes

With ANTIBIOTICS in short supply, DDT was used during WWII to control the insect vectors of TYPHUS. treats(ANTIBIOTICS, TYPHUS)? Expert: yes.

Are these three really all the same???

Web & Media Group


Use Case:map music to moods

Web & Media Group


Use Case:map music to moods

Goal: annotate songs with emotional tags

Comfort Zone Solution: people assign the prevalent mood of a song

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Otherpassionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into

rousing, cheerful, fun, poignant, wistful, campy, quirky, tense, anxious, any of the 5confident, sweet, amiable, bittersweet, whimsical, witty, intense, volatile, clustersboisterous, good-natured autumnal, wry visceral

rowdy brooding

Choose one:

Which is the mood most appropriate for each song?

Goal:

(Lee and Hu 2012)

1 song - 1 mood???

Web & Media Group










Web & Media Group










Semantic Comfort Zone

Web & Media Group










Semantic Comfort Zone

disrupted

Web & Media Group


Web & Media Group


interestingly …

Web & Media Group


• collective decisions of large groups of people

• a group of error-prone decision-makers can be surprisingly good at picking the best choice

• when thumbs up or thumbs down - the chance of picking the right answer needs to be > 50%

• the odds that a most of them will pick the right answer is greater than any of them will pick it on their own

• performance gets better as size grows

1785 Marquis de Condorcet

“wisdom of crowds”

Web & Media Group


• asked 787 people to guess the weight of an ox

• none got the right answer

• their collective guess was almost perfect

1906Sir Francis Galton

“wisdom of crowds”

Web & Media Group

http://lora-aroyo.org @laroyoWWII Math Rosies

1942: Ballistics calculations and flight trajectories

Web & Media Group

http://lora-aroyo.org @laroyoNASA’s Computer Room

transcribe raw flight data from celluloid film & oscillograph paper

Web & Media Group


can we harness it?


Web & Media GroupCrowdTruth

http://crowdtruth.org/


Web & Media Group

CrowdTruthThree basic causes of disagreement: workers, examples, target semantics

Disagreement is signal, not noise.

It is indicative of the variation in human semantic interpretation

It can indicate ambiguity, vagueness, similarity, over-generality, etc, as well as quality

Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data (2014)

O Inel, A Dumitrache, l.Aroyo, C. Welty

Web & Media Group


one truth: multiple truths

all examples are created equal: each example is unique

disagreement bad: disagreement is good

experts rule: crowd rules

one is enough: the more the better

detailed explanations help: keep it simple stupid

once done, forever valid: maintenance is necessary


Web & Media Group


changes neededvideo archive enrichment

improve support for fragment search

time-based annotations

bridging vocabulary gap between searcher & cataloguer

Web & Media Group


crowdsourcingvideo tagging

two video tagging pilots

Web & Media Group


@waisdahttp://waisda.nl

engage crowds

through continuous

gaming


Web & Media Group

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011


Web & Media Group

time-basedbernhard

just “tags”



Web & Media Group

objects (57%)

westminster abbeyabbeypriestergeestelijken

hekpaardentochtaankomst

koetskroningmensenmassaparadekroon regen



Web & Media Group

persons (31%)

bernhard

juliana

objects (57%)



Web & Media Group

user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google

locations (7%)

engeland

locations (7%)

persons (31%)

objects (57%)



Web & Media Group

user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google

locations (7%)

describe mainly short segmentsoften not very specificdon’t describe programmes as a whole


user vocabulary8% in professional vocabulary23% in Dutch lexicon89% found on Google

Web & Media Group


crowdsourcingmedical relation

extraction

diversity of opinionsindependent perspectives

multitude of contexts

we exposed a richer set of possibilitiesthat help in identifying, processing

& understanding context

Web & Media Group


Does this sentence express TREATS(Antibiotics, Typhus)?

Patients with TYPHUS who were given ANTIBIOTICS exhibited several side-effects.

With ANTIBIOTICS in short supply, DDT was used during World War II to control the insect vectors of TYPHUS.

ANTIBIOTICS are the first line treatment for indications of TYPHUS. 95%

75%

50%

The crowd results captures the natural ambiguity


Web & Media Group

What is the relation between the highlighted terms?

He was the first physician to identify the relationship between HEMOPHILIA and HEMOPHILIC ARTHROPATHY.

Experts Hallucinate

Crowd reads text literally - provide better examples to machine

experts: cause crowd: no relation


Web & Media Group

Unclear relationship between the two arguments reflected in the disagreement

Medical Relation Extraction


Web & Media Group

Clearly expressed relation between the two arguments reflected in the agreement



Web & Media Group

Unclear relationship between the two arguments reflected in the disagreement



Web & Media Group


Web & Media Group

Learning Curves

(crowd with pos./neg. threshold at 0.5)

above 400 sent.: crowd consistently over baseline & singleabove 600 sent.: crowd out-performs experts


Web & Media Group

Learning Curves Extended

(crowd with pos./neg. threshold at 0.5)

crowd consistently performs better than baseline


Web & Media Group

# of Workers: Impact on Sentence-Relation Score

Web & Media Group


Training a Relation Extraction Classifier

F1 Cost per sentence

CrowdTruth 0.642 $0.66

Expert Annotator 0.638 $2.00

Single Annotator 0.492 $0.08

“wisdom of the crowd”provides training data that is at least as good

if not better than experts

only with proper analytic framework for harnessing disagreement from the crowd


Web & Media Group

map music to moods

Goal: tag songs with emotional clusters

Comfort Zone Solution: people assign the prevalent mood of a song

Web & Media Group


http://www.youtube.com/watch?v=v1c2OfAzDTI&t=34

Web & Media Group


Is this song ….

?Passionate

RousingConfidentBoisterous

Rowdy

LiteratePoignantWistful

BittersweetAutumnalBrooding

RollickingCheerful

FunSweet

AmiableGood-natured

HumorousSilly

CampyWhimsical

WittyWry

AggressiveFiery

TenseAnxiousIntenseVolatile

Web & Media Group


If “One Truth” & “No Disagreement”Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5

W1 1

W2 1

W3 1

W4 1

W5 1

W6 1

W7

W8

W9 1

W10 1

Totals 1 3 1 2 1

Web & Media Group


Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other

W1 1 1 1

W2 1 1 1

W3 1 1 1

W4 1 1

W5 1 1

W6 1 1 1

W7 1 1 1

W8 1 1 1

W9 1 1

W10 1 1 1 1 1

Totals 3 5 6 5 2 8

If “Many Truths” & “Disagreement”

Web & Media Group


can indicate alternative interpretations

Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other

W10 1 1 1 1 1

Totals 3 5 6 5 2 8

Disagreement as Signal

can indicate ambiguity in the

categorisation

can indicate low quality workers


so …


getting comfortable

again


Take Home MessagePeople first, experts second

True and False is not enough,

There is diversity in human interpretation

CrowdTruth introduces a spatial representation

of meaning that harnesses disagreement

With CrowdTruth untrained workers can be just as

reliable as highly trained experts


http://data.crowdtruth.org/

Technology

My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone