View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction
• WCA searches for missing information (fragments) on the Web
• WCA structures information into ontology “place_of_birth” (Person,Place)
• Techniques used: NLP (Natural Language Processing), Information extraction, relation extraction, question answering
OverviewOntology
work_in (Alonso, ‘Granada’)date_of_birth (Rembrandt, ?)
class - relation - class
Person - work_in - PlacePerson - date_of_birth - Place
…..
Ontology instance
web Web CrawlerAgent
date_of_birth (Rembrandt, ?)
missing instance
searchextract “15-July-1606”
as answer
Start Ontotriple
Searching information with Google
• The “old” Web Search (eg Google) is good for getting documents but NOT for extracting concise answers – (e.g. “15-July-1606”)
• No analysis to “understand” the documents (e.g. “Rembrandt” can mean “hotel” or “bookstore”)
Information extraction on the Web
• data may be low quality and repeated– e.g. Seurat Georges’s date of death– 29, March 1891(http://www.ibiblio.org/wm/paint/auth/seurat/)
– 19, March 1891 (http://www.rickdoble.net/influence/20seurat.htm)
• WCA depends on:– Well-structured sentences and documents– Good named-entity recognisers