17
ailab.ijs.si Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia

Approximate subgraph matching for detection of topic variations

  • Upload
    dyanne

  • View
    41

  • Download
    1

Embed Size (px)

DESCRIPTION

Approximate subgraph matching for detection of topic variations. Mitja Trampu š Dunja Mladenić AI Lab, Jožef Stefan Institute , Slovenia. Mining Diversity. Web content varies in many aspects , e.g. Topical Social (author, target audience, people written about) - PowerPoint PPT Presentation

Citation preview

Page 1: Approximate subgraph matching for detection of topic variations

ailab.ijs.si

Approximate subgraph matching for detection of topic variations

Mitja TrampušDunja MladenićAI Lab, Jožef Stefan Institute, Slovenia

Page 2: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Mining Diversity• Web content varies in many aspects, e.g.

o Topicalo Social (author, target audience, people written about)o Geographical (publisher, places written about)o Sentiment (positive/negative)o Writing style (structure, vocabulary)o Coverage bias

• This work: (micro-)topical diversityo Macroscopic = largely solvedo Microscopic = challenge

Page 3: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Task:Given a collection of texts on a topic,• identify a common template • align texts to the template

Page 4: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Template representation

• Syntactico info1: X people were killed / killed X people / resulted in

X casualtieso info2: blew up Y / destroyed Y / attacked in a Y

• Semantico kill(bomber, people); count(people, X)o destroy(bomber, Y)

people bomber Ykill destroyX count

Page 5: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

patients terrorist hospitalkill demolish100 count

treatment attackreceive withstand

execute

policeofficer bomber police

stationslaughter blow up2 count

Prer

equi

site

: Se

man

tic G

raph

believers suicidebomber churchkill destroy12 count

vestexit

cardrive

wearrun

Page 6: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Mining Templates• Template := subgraph with frequent

specializationso Specializations implied by background taxonomy

(WordNet)o Threshold frequency manually defined

believers suicidebomber churchkill

teardown12 count

policeofficer bomber police

stationslaughter blow up2 count

people bomber buildingkill destroyX count

Page 7: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Semantic Graph Construction

1. Data: Google News crawl2. HTML cleanup3. Named entity tagging4. Pronoun resolution (he/she/him/her)5. Named entity consolidation (Barack

Obama vs President Obama)6. Parsing, triple/fact/assertion extraction

(for now: subj-verb-obj only)7. Ontology/taxonomy alignment8. Merging triples into a graph

Page 8: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Approximate subgraph matching

believers suicidebomber churchkill

teardown12 count

policeofficer bomber police

stationslaughter blow up2 count

people person locationkill destroynumbercount

people person locationkilldestroynumbercount

people person locationkill destroynumbercount

GENERALIZE

FREQUENT SUBTREE MINING

Page 9: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Approximate subgraph matching

people bomber buildingkill destroynumbercount

people person locationkill destroynumbercount

SPECIALIZE

believers suicidebomber churchkill

teardown12 count

policeofficer bomber police

stationslaughter blow up2 count

Page 10: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Preliminary results• 5 test domains; for each:

o ~10 graphs, ~10000 nodeso 10-60 seconds

• At min. support 30%o 20 maximal patterns, 9 manually judged as interesting

Page 11: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Page 12: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Conclusion• Future work:

o Mapping text -> semantics• Other ontologies?

o Interestingness measure for assertions and patternso Evaluation (precision, recall; multiple domains)o Alternative approaches to generalizing subgraphs

• Template extraction is achievable, but not easy

• Human filtering of results hard to avoid• Current approach reasonably fast

Page 13: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Q?

Thank you.

Page 14: Approximate subgraph matching for detection of topic variations

Can we extract all relations?Kind of …

Thousands of small quakes resumed 18 months ago and continue to rattle Mammoth Lakes, June Lake and other Mono County resort towns. The temblors, most measuring 1 to 3 on the Richter scale, started beneath Mammoth Mountain.

Subject – Verb – ObjectTriplets

Semantic Graph

Page 15: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Page 16: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Templates - why• Interpret content

o news archives: structure/annotate old texts, enable semantic search

o wikipedia: suggestions for infobox entries

• Generate contento wikipedia: a starting point for new articles / a checklist of

information to be included

• No normative definition of “good template”

Page 17: Approximate subgraph matching for detection of topic variations

AI Lab, Jozef Stefan Institute ailab.ijs.si

Evaluation• Qualitative

o Usage-specifico Not useful for tuning algorithms

• Quantitativeo Precisiono Recall