View
223
Download
1
Category
Tags:
Preview:
Citation preview
Annotating Documents for Annotating Documents for the Semantic Web the Semantic Web
Using Data-Extraction Using Data-Extraction OntologiesOntologies
Annotating Documents for Annotating Documents for the Semantic Web the Semantic Web
Using Data-Extraction Using Data-Extraction OntologiesOntologies Dissertation ProposalDissertation Proposal
Yihong DingYihong Ding
2
Motivation• The representation of web content
limits its usability
• A machine understandable web– Shared, explicit, formal
conceptualizations (ontologies)– The semantic web
3
A Problem
• How to transform current web to be the semantic web?
4
A Solution: Semantic Annotation
• Add explicit, formal, and unambiguous metadata to web documents
• Explicit: publicly accessible• Formal: publicly agreeable• Unambiguous: publicly identifiable
5
Annotation Representation
Explicit Annotation
Implicit Annotation
6
Semantic Annotation Current Research Status
• Manual annotation through friendly interfaces [Annotea, etc.]
• Automatic annotation with ontology generation [SCORE]
• Automatic annotation using automated IE tool based on pre-defined ontologies [SemTag, MnM, etc.]
7
Current Automatic Annotator
a typical paradigm
Domain OntologyNon-ontology-based IE
Wrapper
Rules and extracting categories
Document
(1) Extraction
(2) Alignment
(3) Annotation
8
Current Automatic Annotator
Problems
Domain Ontology
Document
(1) Problem of data recognition
(2) Problem ofconcept disambiguation
(3) Problem of Annotation formatting,storing, indexing, sharing
(4) Problem of Assembling ontologies
Non-ontology-based IE Wrapper
Rules and extracting categories
9
“Main Drawback of Using Automated IE”
[Kiryakov04]
• “none of these approaches expects an input or produces output with respect to ontologies”
• “a set of heuristics for post-processing and mapping of the IE results to an ontology … not sufficient for large-scale, domain-independent semantic annotation.”
• “IE and wrapper induction techniques need to use the ontology more directly during the process of extraction.”
10
Ontology-driven Paradigm
(Data-Extraction Ontology)
for Semantic Annotation
Document
Non-ontology-based
IE Wrapper
Ontology-basedIE Wrapper
Document
11
Ontology-driven Paradigmfor Semantic Annotation
Some Arguments
• Resiliency w.r.t. web page layouts (helps scale to large set of web pages)
• Adpativeness w.r.t. domain specifications (helps scale to large size domains)
• Creation of ontologies: still a problem but no longer a drawback
• Speed of execution: still a drawback (but we are going to propose a solution next)
12
Two-Layer Annotation Model
Conceptual Annotator using an
ontology-based IE tool
DocumentStructuralAnnotator
SampleAnnotationProcess
SimilarDocumentsMassive
AnnotationProcess
13
Structural Annotator• Major components
– HTML hierarchical path that leads to concept locations
– Local context around locations– Dependencies among multiple semantic
categories
• Significance– Identify both categories and their semantic
meanings
14
Ontology Factors in Semantic Annotation
Tasks• Knowledge specification
– Semantic web community– Web Ontology Language (OWL)
• Knowledge instantiation– IE and database community– Object-oriented System Model in XML
(OSMX)
15
Ontology Conversion• Similarities (OWL vs. OSMX)
– Class vs. object set– ObjectProperty vs. relationship set– Cardinality restriction vs. participation constraint– subclassOf vs. is-a relationship
• Unique features– OWL
• subpropertyOf• symmetric and transitive property• namespace declaration• ontology importing
– OSMX• arbitrary n-ary relationship sets• data frames• general constraints
16
Ontology Construction An Unavoidable Problem
• Semantic annotation tasks require ontologies.
• The ontology for a specific semantic annotation task is not promised to be available all the time.
17
Ontology Construction General and Special
• Generally speaking– Until now, main stream, manual construction – Automatic and semi-automatic ontology generation,
many research papers, few or none practical, a very hard problem
• Special to semantic annotation purpose– Very dynamic and variant domains– Much overlapped information– Limited size of scope for one web page– Flat structure
18
Ontology Construction Knowledge Reusing
• “What has been will be again, what has been done will be done again; there is nothing new under the sun.” (The Holy Bible, Ecclesiastes, 1:9, NIV translation)
• A “new” ontology is a new assembly with unions and projections of several pre-existed ontologies.
19
Architecture on Dynamically Assembling
Domain of Interest
Web Page
(1)
(2)
(1) Knowledge-component selection
(2) Ontology assembly
……
Collection of KnowledgeSelected Knowledge Components
…
Assembled Ontology
…
20
Thesis StatementPropose a new solution to perform semantic annotation on normal HTML web pages, specifically
1. apply ontology-based automatic IE techniques
2. augment OWL with knowledge recognition extension
3. combine conceptual annotator and layout-based annotator
4. assemble a new domain ontology for an annotation task dynamically
21
Standard Evaluation• Annotation performance
– Precision– Recall– Speed of execution
• Testing bed– 5 ~ 10 different domains, with over 10
lexical concepts in each domain ontology– 20 ~ 50 web pages on each domain
22
Ontology Converter Test
• A complete and sound checking is costly and difficult to implement.
• Our simple test– Start with an OSMX ontology AA– Covert it to OWL and then transform it back to be
OSMX ontology BB– Process both AA and BB to annotate a same set of web
pages (say 30 – 50 web pages)– Annotation results should be identical
23
Two-Layer Annotation Model Evaluation
• Standard evaluation
• In addition– About five large web sites with
machine-generated web pages, each of which contains at least dozens of web pages
24
Dynamic Ontology Assembler Evaluation
• Regular precision and recall study according to selected knowledge components
• A pilot study on when ontology assembler works better than manual ontology construction– Record the time to use a tool to create an ontology
from scratch– Record the time to assemble a same ontology– Compare their differences and the special conditions
for each case– Make empirical suggestions about how to build a
knowledge base that favors ontology assembly
25
Delimitations• Automatic ontology creation from scratch
• Annotation storing, indexing, and sharing mechanisms
• Semantic annotation for multimedia content
• Parallel or distributional computing to further scale the semantic annotation system to a large number of web pages
26
Contributions• To convert current web pages into machine-understandable semantic
web pages
• Producing a pure ontology-driven semantic annotator using ontology-based IE wrapper
• Proposing a novel two-layer annotation model to do fast, accurate, and resilient annotation
• Studying a dynamic ontology assembler that helps maximize the reuse of existing knowledge and minimize the load of manual ontology creation
• Implementing an ontology converter so that this work is useful to the rest of the semantic web society.
Recommended