Upload
kiran-matukumilli
View
224
Download
0
Embed Size (px)
Citation preview
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
1/46
Personalised Ontology Learning and Mining
for Web Information Gathering
Xiaohui (Daniel) Tao
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
2/46
Agenda
! Introduction! Ontology Learning for User Background Knowledge! Ontology Mining for Personalisation! Evaluation! Conclusions
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
3/46
Challenge and Solution
! Web information explodes rapidly and, as a result, Webinformation gathering (WIG) becomes challenging;
!Most systems are based on keyword-matching techniques! Feature vectors based on the statistics of terms and
documents;
! Information mismatching and overloading problems!
Capturing user information needs can benefit Webinformation gathering
! Personalised Web information gathering
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
4/46
User Profiles in Personalised WIG
! What is a user profile?! the interesting concepts of user information needs
! User profiles acquisition!
Global analysis techniques use global knowledge bases;! Local analysis techniques use user feedback or observe userbehaviour.
! Interviewing, non-interviewing, and semi-interviewing! Interviewing: e.g. TREC-11 Filtering track training sets;! Non-interviewing: e.g. the OBWAN model;! Semi-interviewing techniques: e.g. Foxtrot recommender
system.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
5/46
User Profiles in Other Fields
! Biological/medical professionals range from various levels ofcomputer knowledge;
! There are many other distinguishing factors when it comes topatients;
! Adapting a personalized computer application using user profiles, wecan improve the usability of biological/medical systems and thus the
work of professionals:
! Provide simpler, more efficient user interfaces for bio/medicalprofessionals;
! Tailor user interfaces to a professionals/patients needs andimpairments.! In a smart clinic bio/medical professionals preferences and prioritiescan be considered. Their input can be simplified and human errors canbe significantly reduced.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
6/46
Ontologies for User Profiles
! Ontologies can be used to describe user background knowledge torepresent user profiles.
! Ontology definition! Formal description and explicit specification of conceptualisation
! Consist of concepts, instances, semantic relations, and axioms;! Domain ontologies and generic ontologies! Ontology learning
! Manual accomplishment of ontology learning (efficiency needs tobe improved)
! Automated accomplishment of ontology learning (accuracy needsto be improved)
! Knowledge specification in ontologies.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
7/46
Problem and Motivation
! Acquiring user profiles using ontology-based methods is animportant hypothesis in personalised Web information gathering.However, the existing approaches have limitations.
! A breakthrough is necessary for clear and complete specificationof knowledge in ontologies through mining local information;
! This thesis addresses these problems by!
proposing a novel ontology learning and mining model, and! evaluating the model against numerous existing personalisedWIG models.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
8/46
Ontology Learning for User Background
Knowledge
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
9/46
World Knowledge Representation
! World knowledge is the commonsense knowledge possessed by people and isacquired through their experience and education;
! Aworld knowledge base (WKB) is a global ontology that formally describes andspecifies world knowledge;
! With a WKB, user-interesting concepts are extracted, including both the relevantand non-relevant concepts according to user information needs.
! Library of Congress Subject Headings (LCSH)
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
10/46
World Knowledge Base Construction
! MARC 21 authority records of the LCSH system! 130MB with a single data stream
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
11/46
Ontology Taxonomy Construction! The WKB works as a global ontology to extract user-interesting
concepts for bootstrapping user personalised ontologies;
! For a given topic, three different sets of concepts need to beextracted, along with their associated semantic relationships:
! Positive subjects: the concepts that are interesting to the userwith respect to the topic;
! Negative subjects: the concepts that may make paradoxical orambiguous interpretations of the topic, thus making it difficult
to capture the information needs.! Neutral subjects: the concepts that have no indication ofeither positive or negative subjects.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
12/46
Semi-automatic Taxonomy Construction! Ontology Learning Environment (OLE)
! extracts candidate subjects from the WKB, and! for users to identify for positives and negatives;
! Support values of selected subjects! The subjects are selected by the user, thus, their support
values are approved by the user.
!sup(s,T)=1 for positive subjects;! sup(s,T)=-1for negative subjects;
! sup(s,T)=0for neutral subjects;
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
13/46
Automatic Taxonomy Construction
! Web users may not like to burden themselves with providing feedback, soautomatic taxonomy construction is necessary;
! The subjects that have terms overlapped with the terms in the given topicare extracted for positive subjects;
! The heighbours of positive subjects are extracted as the negative subjects.! The support value of identified subjects:
! Positive subjects:
! sup(s,T)=-1for negative subjects;! No neutral subjects in this method at this stage.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
14/46
Illustration of Constructed Ontologies
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
15/46
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
16/46
Specificity and Exhaustivity
! Ontology mining aims to discover interesting concepts from the ontologies.! The difficulty is how to emphasis the specific relations ofis-aandpart-ofin a
single computational model;
! A multidimensional ontology mining method, Specificityand Exhaustivity,is introduced to solve this problem
! Specificitydescribes the focus of a subject on a given topic,! Sematic and topic specificity
! Exhaustivityrestricts the semantic extent covered by a subject that deals with thetopic.
! Specificityand Exhaustivityare designed to investigate the concepts and thestrength of associations between them in ontologies.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
17/46
Semantic Specificity
! Semantic specificity: the focus of a subject on its referring concepts.! Influenced by the subject's locality in the taxonomic structure of
ontology;
! The upper bound level subjects are more abstractive, cover moredescendent subjects, and have more concepts referred, comparing withthe lower bound level subjects;
! Thus, the lower bound subjects have stronger focus because they have asmaller number of concepts referred.
! The Semantic specificity of a subject, spea(s), is measured by! investigating the subjects locality in the taxonomic structure,! taking the associated is-aandpart-of semantic relations into account.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
18/46
Topic Specificity
! Topic Specificity: the focus of a subject on a given topic.! The topic specificity can be discovered from a user's personal information
collections: Local Instance Repository (LIR).
! LIR: user stored documents, browsed Web pages, and compiled/received emails,and so on;! They have content-related descriptors associating with the concepts specified in aknowledge base;
! This kind of documents with semantic meta-data become more and more popularon the Web today, and are argued to be the mainstream of semantic Web
documents.
! A user's LIR is simulated by a collection of user-visited information items in alibrary catalogue.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
19/46
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
20/46
Topic Specificity
! Mappings between the subjects in the WKB and theinstances in an LIR:
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
21/46
Exhaustivity
! Exhaustivity: the extent of concepts dealt with by a subject,in respect to a given topic.
! The extent of interesting concepts referred to by a subjectextends if the subject has more positive descendants to thetopic.
! In contrast, if the subject has more negative descendants, theextent of user-interesting concepts referred by the subjectshrinks.
! The exhaustivity of a subject, exh(s, T), is measured byinvestigating the strength of its descendant subjectssupporting or against the given topic.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
22/46
User Profiles Refinement
! Subjects are only considered user-interesting if thesubjects have positive specificity and exhaustivity
values.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
23/46
Interesting Subject Discovery
! The interesting subjects are discovered from the user'sLIR, based on the citation of subjects to instances.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
24/46
Interesting Subject Discovery
! The interestingness level of these discovered subjects ismeasured
! Based on the overlapping size with the positive subjects.! A discovered subject is more interesting if
! it has more related-topositive subjects, and! these related-topositive subjects have stronger support tothe topic.
! Methods are proposed to calculate:! The min_interestthat is determined by the positive subjectsand designated to prune the weak discoveries;! The interest(s, T) of the discovered interesting subjects.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
25/46
User Profiles Refinement (2)! The ontology-based user profiles are further refined for
personalisation, with the interesting subjects discovered
based on users LIRs.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
26/46
Evaluation
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
27/46
Evaluation Experiments
! Objective evaluation method! Standard test data, topics, and measuring methods;! No human participants are involved;! Possible to compare with state-of-the-art models;! Fully repeatable.
! Testing data set! The Reuters Corpus Volume 1 (RCV1) corpus with 806,791 Web documents;! The dataset was also used in TREC-11 Filtering Track;
! Experimental topics (50 topics)! TREC-11 Filtering Track topics R101-R150 manually created by NISTA experts;! High stability of evaluation experiment
! 25 topics is just barely enough for an experiment but that 50 topics is stable, stated by C. Buckleyand E. M. Voorhees[16];
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
28/46
Experiment Dataflow
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
29/46
Common Platform: WIG System
! Commonly used by all the experimental models;! The implementation of a model developed byLi and Zhong[8],
which uses user profiles for Web information gathering.
! It is chosen because! Verified better than theRocchio andDempster-Shafermodels,! Extensible in using support values of training documents.
! Input: user profiles containing positive and negative documentsassociated with support(d)values.
! Output: ranked documents gathered from the RCV1 testing set.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
30/46
Ontology Models
! Implementation of the proposed ontology learning and mining model! Ontology-I model: ontologies were learnt by using the semi-automatic method;! Ontology-II model: ontologies were leant by using the automatic method.
! World Knowledge Base!
Constructed based on the LCSH system;! Contains about 491,250 topical, geographic, and corporate subjects;! Three different semantic relations specification: is-a, part-of, and related-to.
! Local Instance Repository! The catalogue information in the QUT library, containing 448,590 items;! Available for public access.
! Documents in user profiles were extracted from the LIRs using the user-interestingsubjects
! The training documents were weighted by
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
31/46
TREC Model
! Demonstrated the manual user profile acquiring methods;! For a given topic,
! Linguists read a set of documents; and marked them positive or negativeagainst the topic;
!Positive documents: support
(d
)=1/
|D
+|;! Negative documents: support(d)=0;
! TREC model makes a target model to our proposed model to mark.! the justifications of positive and negative were made by users manually;! assumption: only users know their interests and preferences perfectly.
! Experiment hypothesis! The Ontology models can achieve the same or close performance to that of this
TREC model.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
32/46
Category Model
! Demonstrated the non-interviewing user profiles acquiring techniques! In particular the Gauch et al.'s OBWAN model and Sieg et al.'s ontological user
profile model.
! The user-interesting concepts were represented by a set of weighted positivesubjects constructed in the ontology form with the super-class and sub-class relations specified.
! Positive subjects were the same as those used in the Ontology-I model andobtained via the OLE;
! Training sets were extracted from the user LIRs, the same process as that inthe Ontology models.
! support(d) was determined by the # of positive subjects cited by thedocument.
! Experiment hypothesis was that the Ontology models can outperform this baselineCategory model.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
33/46
Web Model
! Implementation of the preliminary study model and the typical semi-interviewing user profiles acquiring models.
! The positive and negative subjects were identified by users manually;! The subjects were used to acquire training sets from the Web via Google;! support(d)was determined by
! The beliefs of the referring positive subjects;! The ranking position on the returned list;! The Googles precision performance.
! Experiment hypothesis! The Ontology models can outperform this preliminary study model.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
34/46
Performance Measuring
! Common and modern performance measuring methods! The precision averages at eleven standard recall levels (11SPR);! The mean average precision (MAP);! Macro - F1 Measure and Micro - F1 Measure.
! Statistical significance tests! The Students Paired T-Test
! Largely agrees with the bootstrap and randomisation tests in terms ofinformation gathering evaluations [181].
! Percentage change in performance.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
35/46
11SPR Results
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
36/46
MAP and F1 Measure Results
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
37/46
Statistic Significance Test Results
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
38/46
Ontology vs. TREC
! The coverage of user profiles;! Limited readings in the TREC user profiles acquiring process;
! The representation of user profiles;! Formal definitions vs. non-definitions! Ontology taxonomy structure vs.
non-structure
! The specification of semantic relations! is-a, part-of, and related-tovs.
non-specification
!The support value of training documents! Float values vs. the binary values
! Manual acquiring still maintained the accuracy of TREC user profiles.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
39/46
Ontology vs. Web
! The definition of concepts.! The use of WKB in the Ontology model.! Users had no clear definition of concepts for computational models.! The accuracy and coverage of user profiles! Interesting concepts discovery in the
Ontology model;
! The refining phases in the Ontology userprofiles acquiring;
! Semantic relations specification! is-a, part-of, and related-to in theOntology model! No semantic relations taken into account inthe Web model
! Training documents extraction! Abstracted information in the Ontology user profiles! Free contribution to Web documents that were used by the Web model.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
40/46
Ontology vs. Category
! The coverage of user profiles! Interesting concepts discovery phase in the Ontology model;! The accuracy of user profiles! The refining phase in the Ontology
model for user profiles acquiring;! Knowledge specification! is-a, part-of, and related-tovs.! super-class and sub-class! The representation of user profiles
! Positive, negative, and neutral subjects in the Ontology userprofiles;! Positive subjects only in the Category user profiles.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
41/46
Conclusions
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
42/46
Ontology Learning and Mining Model
! An ontology learning and mining model is proposed, aiming to acquire userprofiles for personalised Web information gathering;
! A large world knowledge base is constructed based on the LCSH system;! Two ontology learning methods, automatic and semi-automatic, are proposed to
learn personalised ontologies for user profiles;
! A multidimensional ontology mining method, specificity and exhaustivity, isintroduced to investigate the concepts and their semantic relationships inontologies;
! The ontologies are personalised base on the user-interesting concepts discoveredfrom the user Local Instance Repositories.
! The model is evaluated by comparing the acquired user profiles with thatacquired by benchmark models in experiments, and the evaluation result ispromising.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
43/46
Contributions
! Contributions to knowledge engineering:! Proposed a computational model that emphasises the specific semantic
relations ofis-a, part-of, and related-to in a single model, using the
multidimensional specificity and exhaustivity;
! Explored a breakthrough for clear and complete specification of knowledge inontologies through mining local information;
! Provided an ideal world knowledge base for knowledge models developed byother scientific researchers.
! Introduced a reliable objective ontology evaluation method for otherontology researchers to evaluate their works;
! Contributions to Web information gathering:! Proposed a concept-based approach to acquire ontology-based user profiles
for personalised WIG;
! Provided a new benchmark for other researchers.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
44/46
Future Work
! Extending user profiles acquiring from short term to longterm.
! It will be interesting to investigate the change of user interests ina long term period and to measure its influence on WIG
performance by extending the work presented in this thesis.! Extending the work on user profiles to other fields, such as:
! Mining biological data to create profiles for cancer users so thatwe can provide them better cares as well as systematically
collecting more effective medical/biological data;
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
45/46
References! The references are referred to the References
! X. Tao, Y. Li, and N. Zhong. A Personalized Ontology Model forWeb Information Gathering. IEEE Transactions on Knowledge and
Data Engineering, 23(4):496--511, 2011
! X. Tao, Y. Li, and N, Zhong. A Knowledge-based Model UsingOntologies for Personalized Web Information Gathering. Web
Intelligence and Agent Systems, an International Journal, 8(3), pp.
235-254, 2010.
8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering
46/46
Thanks for Listening
Questions?