Upload
dyanne
View
41
Download
1
Embed Size (px)
DESCRIPTION
Approximate subgraph matching for detection of topic variations. Mitja Trampu š Dunja Mladenić AI Lab, Jožef Stefan Institute , Slovenia. Mining Diversity. Web content varies in many aspects , e.g. Topical Social (author, target audience, people written about) - PowerPoint PPT Presentation
Citation preview
ailab.ijs.si
Approximate subgraph matching for detection of topic variations
Mitja TrampušDunja MladenićAI Lab, Jožef Stefan Institute, Slovenia
AI Lab, Jozef Stefan Institute ailab.ijs.si
Mining Diversity• Web content varies in many aspects, e.g.
o Topicalo Social (author, target audience, people written about)o Geographical (publisher, places written about)o Sentiment (positive/negative)o Writing style (structure, vocabulary)o Coverage bias
• This work: (micro-)topical diversityo Macroscopic = largely solvedo Microscopic = challenge
AI Lab, Jozef Stefan Institute ailab.ijs.si
Task:Given a collection of texts on a topic,• identify a common template • align texts to the template
AI Lab, Jozef Stefan Institute ailab.ijs.si
Template representation
• Syntactico info1: X people were killed / killed X people / resulted in
X casualtieso info2: blew up Y / destroyed Y / attacked in a Y
• Semantico kill(bomber, people); count(people, X)o destroy(bomber, Y)
people bomber Ykill destroyX count
AI Lab, Jozef Stefan Institute ailab.ijs.si
patients terrorist hospitalkill demolish100 count
treatment attackreceive withstand
execute
policeofficer bomber police
stationslaughter blow up2 count
Prer
equi
site
: Se
man
tic G
raph
believers suicidebomber churchkill destroy12 count
vestexit
cardrive
wearrun
AI Lab, Jozef Stefan Institute ailab.ijs.si
Mining Templates• Template := subgraph with frequent
specializationso Specializations implied by background taxonomy
(WordNet)o Threshold frequency manually defined
believers suicidebomber churchkill
teardown12 count
policeofficer bomber police
stationslaughter blow up2 count
people bomber buildingkill destroyX count
AI Lab, Jozef Stefan Institute ailab.ijs.si
Semantic Graph Construction
1. Data: Google News crawl2. HTML cleanup3. Named entity tagging4. Pronoun resolution (he/she/him/her)5. Named entity consolidation (Barack
Obama vs President Obama)6. Parsing, triple/fact/assertion extraction
(for now: subj-verb-obj only)7. Ontology/taxonomy alignment8. Merging triples into a graph
AI Lab, Jozef Stefan Institute ailab.ijs.si
Approximate subgraph matching
believers suicidebomber churchkill
teardown12 count
policeofficer bomber police
stationslaughter blow up2 count
people person locationkill destroynumbercount
people person locationkilldestroynumbercount
people person locationkill destroynumbercount
GENERALIZE
FREQUENT SUBTREE MINING
AI Lab, Jozef Stefan Institute ailab.ijs.si
Approximate subgraph matching
people bomber buildingkill destroynumbercount
people person locationkill destroynumbercount
SPECIALIZE
believers suicidebomber churchkill
teardown12 count
policeofficer bomber police
stationslaughter blow up2 count
AI Lab, Jozef Stefan Institute ailab.ijs.si
Preliminary results• 5 test domains; for each:
o ~10 graphs, ~10000 nodeso 10-60 seconds
• At min. support 30%o 20 maximal patterns, 9 manually judged as interesting
AI Lab, Jozef Stefan Institute ailab.ijs.si
AI Lab, Jozef Stefan Institute ailab.ijs.si
Conclusion• Future work:
o Mapping text -> semantics• Other ontologies?
o Interestingness measure for assertions and patternso Evaluation (precision, recall; multiple domains)o Alternative approaches to generalizing subgraphs
• Template extraction is achievable, but not easy
• Human filtering of results hard to avoid• Current approach reasonably fast
AI Lab, Jozef Stefan Institute ailab.ijs.si
Q?
Thank you.
Can we extract all relations?Kind of …
Thousands of small quakes resumed 18 months ago and continue to rattle Mammoth Lakes, June Lake and other Mono County resort towns. The temblors, most measuring 1 to 3 on the Richter scale, started beneath Mammoth Mountain.
Subject – Verb – ObjectTriplets
Semantic Graph
AI Lab, Jozef Stefan Institute ailab.ijs.si
AI Lab, Jozef Stefan Institute ailab.ijs.si
Templates - why• Interpret content
o news archives: structure/annotate old texts, enable semantic search
o wikipedia: suggestions for infobox entries
• Generate contento wikipedia: a starting point for new articles / a checklist of
information to be included
• No normative definition of “good template”
AI Lab, Jozef Stefan Institute ailab.ijs.si
Evaluation• Qualitative
o Usage-specifico Not useful for tuning algorithms
• Quantitativeo Precisiono Recall