Upload
lora-aroyo
View
513
Download
0
Embed Size (px)
Citation preview
http://lora-aroyo.org @laroyo
Disrupting the Semantic
Lora Aroyo
Web & Media Group
Web & Media Group
http://lora-aroyo.org @laroyo
BulgariaThe Netherlands
Sofia
NYC
Personal Semantics
Web & Media Group
http://lora-aroyo.org @laroyo
Riva del Garda, Italy, 2014
Semantic Social Life
Web & Media Group
http://lora-aroyo.org @laroyo 4
To understand the value of Semantic Web for e-learning
you have to understand people, e.g. how they learn, interact &
consume information
Web & Media Group
http://lora-aroyo.org @laroyo 5
To understand the value of Semantic Web for e-learning
you have to understand people, e.g. how they interact &
consume information
Web & Media Group
http://lora-aroyo.org @laroyo 6
To understand the value of Semantic Web for cultural heritage
you have to understand people, e.g. how they interact & consume information
Web & Media Group
http://lora-aroyo.org @laroyo 7
To understand the value of Semantic Web for cultural heritage
you have to understand people, e.g. how they interact & consume information
Web & Media Group
http://lora-aroyo.org @laroyo
To understand the value of Semantic Web for digital humanities, you have to
understand people, e.g. how they interact & consume information
Web & Media Group
http://lora-aroyo.org @laroyo
people are in the center of everythingpeople & their semantics, i.e. their real-world behavior,
online interactions, information needs, information consumption habits, personal preferences ...
Web & Media Group
http://lora-aroyo.org @laroyoCrowdTruth team
http://lora-aroyo.org @laroyo
Web & Media Group
the evolution of the semantic web:great moments from the 1980s to ESWC 2017
http://lora-aroyo.org @laroyo
50’AI more or less begins......
80’expert systems90’knowledge acquisition from experts
00’standards & interoperability10’big data & large crowds
A long time agoin a galaxy far, far away …
http://lora-aroyo.org @laroyo
80’s - empire of the experts
http://lora-aroyo.org @laroyo
Advances in hardware and SDEsPCs, workstations, Symbolics, SunNew architectures like the Hypercube LISP, Prolog, OPSAI can now BUILD SYSTEMS
Primary focus on experts and rules
What is the knowledge of expertsWhat is the form of this knowledge?Graphs, logic, rules, frames
How do experts reason?Deduction, induction
80’s - empire of the experts
Work on form & process remained academic
what happened inside the system, to make the reasoning inside the system proper and as good as possible
industry forged ahead with ad-hoc & proprietary systems and actually tried to build expert systems
Originals of uncertain KRFuzzy, probabilistic
http://lora-aroyo.org @laroyo
Piero Bonissone and the DELTA/CATS expert system for
locomotive repair with David Smith, a locomotive repair expert
Buchanan and Shortliff’s MYCIN project at Stanford built an huge rule base for medicat diagnosis working with an extensive team of
medical experts.
http://lora-aroyo.org @laroyo
90’s - knowledge acquisition from experts
http://lora-aroyo.org @laroyo
http://lora-aroyo.org @laroyo
90’s - knowledge acquisition from expertsThe 90’s brought [attention for] knowledge acquisition. Knowing that expert systems by then can functionally work, the focus [in
practice as well as scientific research and technology development] shifted to the then-bigger challenge of how to acquire knowledge in real-world scenarios.
It seems natural that after the look inside the systems, then one needed to pay attention to how actually get the knowledge from the world outside and frame it into the proper structured knowledge for inside the system.
Dream of the 90’s
http://lora-aroyo.org @laroyo
http://lora-aroyo.org @laroyo
00’s - interoperability & standards odyssey
http://lora-aroyo.org @laroyo
10’s - AI Awakens• Machine Learning• Neural networks• Solving basic perceptual problems instead of high-expertise ones• Ambiguity tolerant reasoning• Non-taxonomic ordering → non-taxonomic reasoning • folksonomies, clustering, diversity of perspectives, embeddings
Web & Media Group
http://lora-aroyo.org @laroyo
2011
http://lora-aroyo.org @laroyo
10’s – Big Data
Web & Media Group
http://lora-aroyo.org @laroyo
Human AnnotationCentral in Machine Learning
Training & Evaluation
10’s – Crowds
http://lora-aroyo.org @laroyo
Web & Media Group
Team BellKor wins Netflix Prize
20071998 2006 2009
Web & Media Group
http://lora-aroyo.org @laroyo
Web & Media Group
http://lora-aroyo.org @laroyo
the semantic comfort
zone
Web & Media Group
http://lora-aroyo.org @laroyo
One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example
All examples are created equal: triples are triples, one is not more important than another, they are all either true or false
Disagreement bad: when people disagree, they don’t understand the problem
Experts rule: knowledge is captured from domain experts
One is enough: knowledge by a single expert is sufficient
Detailed explanations help: if examples cause disagreement - add instructions
Once done, forever valid: knowledge is not updated; new data not aligned with old
“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
Web & Media Group
http://lora-aroyo.org @laroyo
Use Case:video archive enrichment
Search Behavior of Media Professionals at an Audiovisual Archive: A Transaction Log Analysis (2009).
B. Huurnink, L. Hollink, W. van den Heuvel, M. de Rijke.
Web & Media Group
http://lora-aroyo.org @laroyo
Use Case:video archive enrichment
Goal: make the
multimedia content ofDutch National Video Archiveaccessible to large audiences
Comfort Zone Solution: media professionals watch & annotate videos. Of course!
Web & Media Group
http://lora-aroyo.org @laroyo
but ...
ExpensiveDoesn’t scale
time-consuming5 times the video duration
professional vocabularyexperts use a specific vocabulary
that is unknown to general audiences
Web & Media Group
http://lora-aroyo.org @laroyo
… and
people search for fragmentsexperts annotate full videos
not finding35% of search queries result in not found
Web & Media Group
http://lora-aroyo.org @laroyo
Use Case:real world QA
for Watson
Crowdsourcing ground truth for Question Answering using CrowdTruth (2015).B Timmermans, L Aroyo, C Welty
Web & Media Group
http://lora-aroyo.org @laroyo
Goal: gather questions
that real people ask for training & evaluating Watson
Data: 30K Questions + Candidate Answers.
from Yahoo! Answers
Comfort Zone Solution: ask people if the passage answers the question (Y/N). Simple!
Use Case:real world QA
for Watson
Web & Media Group
http://lora-aroyo.org @laroyo
Contradicting evidenceIs Coral a plant? • “Coral almost could be considered half-plant [..]”• “[..] organism, such as a coral, resembling a stony plant.”
Unanswerable questions• Can I take a pill if you don't have a child yet?• Is the spelling for being drunk right?• Is napster black?
Unclear answer typeIs paper animal plant or man made?
Multiple right answers to a questionWhat is the best university in NY? (subjective)
YES or NO?
Web & Media Group
http://lora-aroyo.org @laroyo
Use Case:medical relation
extraction for Watson
Crowdsourcing Ground Truth for Medical Relation Extraction (2017). A Dumitrache, L Aroyo, C Welty
Web & Media Group
http://lora-aroyo.org @laroyo
Goal: gather data to train
Watson to read medical text & automatically
extract a medical relations KB
Comfort Zone Solution: having medical experts read & annotate examples
Use Case:medical relation
extraction for Watson
Web & Media Group
http://lora-aroyo.org @laroyo
ANTIBIOTICS are the first line treatment for indications of TYPHUS. treats(ANTIBIOTICS, TYPHUS)? Expert: yes
Patients with TYPHUS who were given ANTIBIOTICS exhibited side-effects. treats(ANTIBIOTICS, TYPHUS)? Expert: yes
With ANTIBIOTICS in short supply, DDT was used during WWII to control the insect vectors of TYPHUS. treats(ANTIBIOTICS, TYPHUS)? Expert: yes.
Are these three really all the same???
Web & Media Group
http://lora-aroyo.org @laroyo
Use Case:map music to moods
Web & Media Group
http://lora-aroyo.org @laroyo
Use Case:map music to moods
Goal: annotate songs with emotional tags
Comfort Zone Solution: people assign the prevalent mood of a song
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Otherpassionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into
rousing, cheerful, fun, poignant, wistful, campy, quirky, tense, anxious, any of the 5confident, sweet, amiable, bittersweet, whimsical, witty, intense, volatile, clustersboisterous, good-natured autumnal, wry visceral
rowdy brooding
Choose one:
Which is the mood most appropriate for each song?
Goal:
(Lee and Hu 2012)
1 song - 1 mood???
Web & Media Group
http://lora-aroyo.org @laroyo
One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example
All examples are created equal: triples are triples, one is not more important than another, they are all either true or false
Disagreement bad: when people disagree, they don’t understand the problem
Experts rule: knowledge is captured from domain experts
One is enough: knowledge by a single expert is sufficient
Detailed explanations help: if examples cause disagreement - add instructions
Once done, forever valid: knowledge is not updated; new data not aligned with old
“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
Web & Media Group
http://lora-aroyo.org @laroyo
One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example
All examples are created equal: triples are triples, one is not more important than another, they are all either true or false
Disagreement bad: when people disagree, they don’t understand the problem
Experts rule: knowledge is captured from domain experts
One is enough: knowledge by a single expert is sufficient
Detailed explanations help: if examples cause disagreement - add instructions
Once done, forever valid: knowledge is not updated; new data not aligned with old
“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
Semantic Comfort Zone
Web & Media Group
http://lora-aroyo.org @laroyo
One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example
All examples are created equal: triples are triples, one is not more important than another, they are all either true or false
Disagreement bad: when people disagree, they don’t understand the problem
Experts rule: knowledge is captured from domain experts
One is enough: knowledge by a single expert is sufficient
Detailed explanations help: if examples cause disagreement - add instructions
Once done, forever valid: knowledge is not updated; new data not aligned with old
“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
Semantic Comfort Zone
disrupted
Web & Media Group
http://lora-aroyo.org @laroyo
Web & Media Group
http://lora-aroyo.org @laroyo
interestingly …
Web & Media Group
http://lora-aroyo.org @laroyo
• collective decisions of large groups of people
• a group of error-prone decision-makers can be surprisingly good at picking the best choice
• when thumbs up or thumbs down - the chance of picking the right answer needs to be > 50%
• the odds that a most of them will pick the right answer is greater than any of them will pick it on their own
• performance gets better as size grows
1785 Marquis de Condorcet
“wisdom of crowds”
Web & Media Group
http://lora-aroyo.org @laroyo
• asked 787 people to guess the weight of an ox
• none got the right answer
• their collective guess was almost perfect
1906Sir Francis Galton
“wisdom of crowds”
Web & Media Group
http://lora-aroyo.org @laroyoWWII Math Rosies
1942: Ballistics calculations and flight trajectories
Web & Media Group
http://lora-aroyo.org @laroyoNASA’s Computer Room
transcribe raw flight data from celluloid film & oscillograph paper
Web & Media Group
http://lora-aroyo.org @laroyo
can we harness it?
http://lora-aroyo.org @laroyo
Web & Media GroupCrowdTruth
http://crowdtruth.org/
http://lora-aroyo.org @laroyo
Web & Media Group
CrowdTruthThree basic causes of disagreement: workers, examples, target semantics
Disagreement is signal, not noise.
It is indicative of the variation in human semantic interpretation
It can indicate ambiguity, vagueness, similarity, over-generality, etc, as well as quality
Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data (2014)
O Inel, A Dumitrache, l.Aroyo, C. Welty
Web & Media Group
http://lora-aroyo.org @laroyo
one truth: multiple truths
all examples are created equal: each example is unique
disagreement bad: disagreement is good
experts rule: crowd rules
one is enough: the more the better
detailed explanations help: keep it simple stupid
once done, forever valid: maintenance is necessary
“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
Web & Media Group
http://lora-aroyo.org @laroyo
changes neededvideo archive enrichment
improve support for fragment search
time-based annotations
bridging vocabulary gap between searcher & cataloguer
Web & Media Group
http://lora-aroyo.org @laroyo
crowdsourcingvideo tagging
two video tagging pilots
Web & Media Group
http://lora-aroyo.org @laroyo
@waisdahttp://waisda.nl
engage crowds
through continuous
gaming
http://lora-aroyo.org @laroyo
Web & Media Group
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
http://lora-aroyo.org @laroyo
Web & Media Group
time-basedbernhard
just “tags”
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
http://lora-aroyo.org @laroyo
Web & Media Group
objects (57%)
westminster abbeyabbeypriestergeestelijken
hekpaardentochtaankomst
koetskroningmensenmassaparadekroon regen
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
http://lora-aroyo.org @laroyo
Web & Media Group
persons (31%)
bernhard
juliana
objects (57%)
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
http://lora-aroyo.org @laroyo
Web & Media Group
user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google
locations (7%)
engeland
locations (7%)
persons (31%)
objects (57%)
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
http://lora-aroyo.org @laroyo
Web & Media Group
user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google
locations (7%)
describe mainly short segmentsoften not very specificdon’t describe programmes as a whole
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
user vocabulary8% in professional vocabulary23% in Dutch lexicon89% found on Google
Web & Media Group
http://lora-aroyo.org @laroyo
crowdsourcingmedical relation
extraction
diversity of opinionsindependent perspectives
multitude of contexts
we exposed a richer set of possibilitiesthat help in identifying, processing
& understanding context
Web & Media Group
http://lora-aroyo.org @laroyo
Does this sentence express TREATS(Antibiotics, Typhus)?
Patients with TYPHUS who were given ANTIBIOTICS exhibited several side-effects.
With ANTIBIOTICS in short supply, DDT was used during World War II to control the insect vectors of TYPHUS.
ANTIBIOTICS are the first line treatment for indications of TYPHUS. 95%
75%
50%
The crowd results captures the natural ambiguity
http://lora-aroyo.org @laroyo
Web & Media Group
What is the relation between the highlighted terms?
He was the first physician to identify the relationship between HEMOPHILIA and HEMOPHILIC ARTHROPATHY.
Experts Hallucinate
Crowd reads text literally - provide better examples to machine
experts: cause crowd: no relation
http://lora-aroyo.org @laroyo
Web & Media Group
Unclear relationship between the two arguments reflected in the disagreement
Medical Relation Extraction
http://lora-aroyo.org @laroyo
Web & Media Group
Clearly expressed relation between the two arguments reflected in the agreement
Medical Relation Extraction
http://lora-aroyo.org @laroyo
Web & Media Group
Unclear relationship between the two arguments reflected in the disagreement
Medical Relation Extraction
http://lora-aroyo.org @laroyo
Web & Media Group
http://lora-aroyo.org @laroyo
Web & Media Group
Learning Curves
(crowd with pos./neg. threshold at 0.5)
above 400 sent.: crowd consistently over baseline & singleabove 600 sent.: crowd out-performs experts
http://lora-aroyo.org @laroyo
Web & Media Group
Learning Curves Extended
(crowd with pos./neg. threshold at 0.5)
crowd consistently performs better than baseline
http://lora-aroyo.org @laroyo
Web & Media Group
# of Workers: Impact on Sentence-Relation Score
Web & Media Group
http://lora-aroyo.org @laroyo
Training a Relation Extraction Classifier
F1 Cost per sentence
CrowdTruth 0.642 $0.66
Expert Annotator 0.638 $2.00
Single Annotator 0.492 $0.08
“wisdom of the crowd”provides training data that is at least as good
if not better than experts
only with proper analytic framework for harnessing disagreement from the crowd
http://lora-aroyo.org @laroyo
Web & Media Group
map music to moods
Goal: tag songs with emotional clusters
Comfort Zone Solution: people assign the prevalent mood of a song
Web & Media Group
http://lora-aroyo.org @laroyo
Is this song ….
?Passionate
RousingConfidentBoisterous
Rowdy
LiteratePoignantWistful
BittersweetAutumnalBrooding
RollickingCheerful
FunSweet
AmiableGood-natured
HumorousSilly
CampyWhimsical
WittyWry
AggressiveFiery
TenseAnxiousIntenseVolatile
Web & Media Group
http://lora-aroyo.org @laroyo
If “One Truth” & “No Disagreement”Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5
W1 1
W2 1
W3 1
W4 1
W5 1
W6 1
W7
W8
W9 1
W10 1
Totals 1 3 1 2 1
Web & Media Group
http://lora-aroyo.org @laroyo
Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other
W1 1 1 1
W2 1 1 1
W3 1 1 1
W4 1 1
W5 1 1
W6 1 1 1
W7 1 1 1
W8 1 1 1
W9 1 1
W10 1 1 1 1 1
Totals 3 5 6 5 2 8
If “Many Truths” & “Disagreement”
Web & Media Group
http://lora-aroyo.org @laroyo
can indicate alternative interpretations
Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other
W10 1 1 1 1 1
Totals 3 5 6 5 2 8
Disagreement as Signal
can indicate ambiguity in the
categorisation
can indicate low quality workers
http://lora-aroyo.org @laroyo
so …
http://lora-aroyo.org @laroyo
getting comfortable
again
http://lora-aroyo.org @laroyo
Take Home MessagePeople first, experts second
True and False is not enough,
There is diversity in human interpretation
CrowdTruth introduces a spatial representation
of meaning that harnesses disagreement
With CrowdTruth untrained workers can be just as
reliable as highly trained experts
http://lora-aroyo.org @laroyo
http://data.crowdtruth.org/