55
Ontology learning from textual resources Ontology learning from textual resources using text clustering Ana Rios-Alvarado Information Technology Laboratory CINVESTAV - Tamaulipas November, 7th 2011 Thesis advisor Ivan Lopez-Arevalo 1 / 44

Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Ontology learning from textual resources usingtext clustering

Ana Rios-Alvarado

Information Technology LaboratoryCINVESTAV - Tamaulipas

November, 7th 2011

Thesis advisorIvan Lopez-Arevalo

1 / 44

Page 2: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Outline

1 Introduction

2 The research problem

3 The general approach

4 AdvancesTaxonomy construction using linguistic patterns and WordNet onweb searchTowards acquisition of axioms from text

5 Conclusions and further work

2 / 44

Page 3: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Motivation

• The extensive use of e-mail, word processing, digital presenta-tions and audio/video have shifted this balance to unstructureddata in many organizations (corporations, governments, coo-peratives, universities)

• The need of efficient mechanisms to access, use, and exchangeinformation resources

• The emergence of Semantic Web (web of data) and its goal fordoing entendible and functional the information by computers

3 / 44

Page 4: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Motivation

• Ontologies have emerged in recent years as a fundamental toolfor formalizing and representing knowledge

4 / 44

Page 5: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Some definitions...What is an ontology?

• A rigorous and exhaustive organization of some knowledge do-main that is usually hierarchical and contains all the relevantentities and their relations1

What are the elements of an ontology?

Element DescriptionConcepts (classes) Ideas to formalizeTaxonomic relationships Relation is-a or subClassOfNon-taxonomic Interaction between elementsrelationshipsInstances Given objectAxioms Theorems on relations

to be satisfied by elements

1http://wordnetweb.princeton.edu/perl/webwn?s=ontology 5 / 44

Page 6: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Example of an ontology: Travel ontology2

• Taxonomic relation: Beach is a Destination• Non-taxonomic relation: Activity is offered at Destination• Axiom: disjointClass(RuralArea, UrbanArea)

2http://www.owl-ontologies.com/travel.owl

6 / 44

Page 7: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Some definitions...

• What is ontology learning?

• Ontology learning as the set of methods and techniques usedfor building an ontology from scratch, enriching, or adaptingan existing ontology in a semi-automatic fashion using severalknowledge and information sources

• What is ontology learning from text?• It is essentially the process of deriving high-level concepts and

relations as well as the occasional axioms from information toform an ontology

• Techniques such as information retrieval, data mining, and na-tural language processing are able to get:

• the vocabulary for a domain• the relationships between elements

7 / 44

Page 8: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Some definitions...

• What is ontology learning?

• Ontology learning as the set of methods and techniques usedfor building an ontology from scratch, enriching, or adaptingan existing ontology in a semi-automatic fashion using severalknowledge and information sources

• What is ontology learning from text?• It is essentially the process of deriving high-level concepts and

relations as well as the occasional axioms from information toform an ontology

• Techniques such as information retrieval, data mining, and na-tural language processing are able to get:

• the vocabulary for a domain• the relationships between elements

7 / 44

Page 9: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Some definitions...

• What is ontology learning?

• Ontology learning as the set of methods and techniques usedfor building an ontology from scratch, enriching, or adaptingan existing ontology in a semi-automatic fashion using severalknowledge and information sources

• What is ontology learning from text?• It is essentially the process of deriving high-level concepts and

relations as well as the occasional axioms from information toform an ontology

• Techniques such as information retrieval, data mining, and na-tural language processing are able to get:

• the vocabulary for a domain• the relationships between elements

7 / 44

Page 10: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Classifcation ontology learning approaches3

3Tatyana Ivanova, AN APPROACH TO EXTEND E-LEARNING RESOURCE IDE BY ADDING ONTOLOGY

LEARNING AND MANAGEMENT CAPABILITIES, Journal of Information, Control and Management Systems, vol.8 num. 3, 2010

8 / 44

Page 11: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Ontology learning from text

The aspects and tasks in ontology development are involved into aset of layers 4

4P. Buitelaar, P. Cimiano and B. Magnini, Ontology learning from text: An overview, Ontology learning from

text: Methods, evaluation and applications, 2005, pp. 3-12

9 / 44

Page 12: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

BackgroundSome techniques for obtaining the vocabulary

• Linguistic analysis

• Example: “Hiking is an outdoor activity”• Tokenization: [Hiking] [is] [an] [outdoor] [activity]...• Part of Speech Tagging: [Hiking/NNP] [is/VBZ] [an/DT] ...• Parsing:

10 / 44

Page 13: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

BackgroundSome techniques for obtaining the vocabulary

• Linguistic analysis• Example: “Hiking is an outdoor activity”

• Tokenization: [Hiking] [is] [an] [outdoor] [activity]...• Part of Speech Tagging: [Hiking/NNP] [is/VBZ] [an/DT] ...• Parsing:

10 / 44

Page 14: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

BackgroundSome techniques for obtaining the vocabulary

• Linguistic analysis• Example: “Hiking is an outdoor activity”• Tokenization: [Hiking] [is] [an] [outdoor] [activity]...

• Part of Speech Tagging: [Hiking/NNP] [is/VBZ] [an/DT] ...• Parsing:

10 / 44

Page 15: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

BackgroundSome techniques for obtaining the vocabulary

• Linguistic analysis• Example: “Hiking is an outdoor activity”• Tokenization: [Hiking] [is] [an] [outdoor] [activity]...• Part of Speech Tagging: [Hiking/NNP] [is/VBZ] [an/DT] ...

• Parsing:

10 / 44

Page 16: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

BackgroundSome techniques for obtaining the vocabulary

• Linguistic analysis• Example: “Hiking is an outdoor activity”• Tokenization: [Hiking] [is] [an] [outdoor] [activity]...• Part of Speech Tagging: [Hiking/NNP] [is/VBZ] [an/DT] ...• Parsing:

10 / 44

Page 17: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

BackgroundSome techniques for obtaining the vocabulary

• Statistical analysis• Term weightning TD-IDF• Mutual Information• Chi-square

• Based on WordNet

• Latent Semantic Indexing (LSI)

• Text clustering

The use of clustering techniques relying on the assumption thatsimilar terms share similar syntactic contexts.

11 / 44

Page 18: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

BackgroundSome techniques for obtaining taxonomic relations (hierarchical structure)

• Lexical databases (for example, WordNet)

• Linguistic approaches

• Co-ocurrence analysis

• Lexico-syntactic patterns

The lexical patterns occurs frequently in many text gendersThe idea is matching the patterns in the text to retrievehypernym/hyponym relations

• Use of web search and Wikipedia as knowledge source

The web search uses as knowledge base all the Web

12 / 44

Page 19: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

BackgroundSome techniques for obtaining taxonomic relations (hierarchical structure)

• Lexical databases (for example, WordNet)

• Linguistic approaches

• Co-ocurrence analysis

• Lexico-syntactic patterns

The lexical patterns occurs frequently in many text gendersThe idea is matching the patterns in the text to retrievehypernym/hyponym relations

• Use of web search and Wikipedia as knowledge source

The web search uses as knowledge base all the Web

12 / 44

Page 20: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

BackgroundSome the techniques for automatic axiom extraction

• Lexical patterns

• Transforming rules

• Heuristics

• Inductive logic programming

Axiom DL Syntax Example

subClassOf C1 v C2 Human v Animal u BipedequivalentClass C1 ≡ C2 Man ≡ Human u Male

disjointWith C1 v ¬ C2 Male v ¬ FemalesameIndividualAs {x1} ≡ {x2} {President Bush} ≡ {G. W. Bush}

differentFrom {x1} v ¬ {x2} {john} v ¬ {peter}

13 / 44

Page 21: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Related workClustering-based Ontology learning approaches

Author Description

Agglomerative clustering[Lin98] Use agglomerative clustering to create a thesauri

and defined a similarity measure between twowords based on each word’s number of occurrencesin syntactic dependency triples

[Car99] Represent a word’s context as a vector then applyagglomerative clustering based on cosinesimilarity between the vectors

[RF07] Use surface patterns as features, selected thefeatures by statistical measures, and then usedagglomerative clustering based on cosine similaritybetween the feature vectors

14 / 44

Page 22: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Introduction

Related workClustering-based Ontology learning approaches

Author Description

Divisive clustering[PL02, Use CBC (Clustering By Committee) whichPR04] calculates cluster centroids by averaging the

feature vectors of a subset of chosencluster members and then iteratively splits clusters

Formal Concept Analysis[CHS04] Proposed FCA which not only produces clusters,

but also intensional descriptions of obtained clusters

15 / 44

Page 23: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

The research problem

The problem

• The ontology learning is typically carried out by knowledgeengineers and domain experts, resulting on long and tediousdevelopment stages

• Given the massive scope of the Semantic Web, the manualapproach is not scalable enough

• Limitations in previous works:

• Clustering techniques have only been applied to solve is-arelations

• Level expressivenness is limited to non-taxonomic relationships• Dependency on linguistic databases and the domain

16 / 44

Page 24: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

The research problem

Research questions...

• It is possible build an ontology from textual resources withhigh level of expressiveness?

• It is possible extract the vocabulary and their relationships byweb search?

• It is possible extract axioms using natural language processingtechniques?

Hypothesis

Considering that text clustering can get the vocabulary of a corpus,web search can obtain taxonomic and non-taxonomic relationships,and natural language processing techniques allows discover axioms,it is possible that these techniques can be adapted and integratedfor ontology learning.

17 / 44

Page 25: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

The research problem

Research questions...

• It is possible build an ontology from textual resources withhigh level of expressiveness?

• It is possible extract the vocabulary and their relationships byweb search?

• It is possible extract axioms using natural language processingtechniques?

Hypothesis

Considering that text clustering can get the vocabulary of a corpus,web search can obtain taxonomic and non-taxonomic relationships,and natural language processing techniques allows discover axioms,it is possible that these techniques can be adapted and integratedfor ontology learning.

17 / 44

Page 26: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

The research problem

Objectives

Goal Techniques Expressiveness Resources

Term extraction, Hybrid clustering Axioms Text documents

build taxonomy and Web search (pdf, doc, docx, html,

discovery axioms NLP techniques pps, odt)

18 / 44

Page 27: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

The research problem

Objectives

General Objective

Obtain a model for ontology learning that allows to get thevocabulary, relationships, and key axioms from textual resources

Particular objectives

• Obtain a method to get the vocabulary from text using a clusteringalgorithm

• Study and analyze the techniques to get taxonomic relations andpropose a method to extract the taxonomic relationships betweenterms

• Study and analyze the techniques to extract axioms from text

• Design a model to represent axioms from text by using DescriptionLogic

19 / 44

Page 28: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

The general approach

The general approach

20 / 44

Page 29: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Background

• Ontology learning process involves:

• The identification of hypernymy/hyponymy relationsbetween terms is mandatory for bulding a taxonomy

21 / 44

Page 30: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

The methodThe representation model

For obtaining the vocabulary:

• Using linguistic analysis the pairs <verb, subject> and<verb, object> are considered for building a pair-term matrix

• The Pointwise Mutual Information is the measure used for theassociation strength between two words (w1, w2)

PMI (w1,w2) = log2p(w1)ANDp(w2)

p(w1) ∗ p(w2)(1)

22 / 44

Page 31: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

The methodThe representation model

In order to increase the relevance of the terms obtainedsintactically, the definitory context where a relevant term appershas been considered.

• Example: a dominant person was defined as someone wholooked like they could ’get what they wanted’

• dominant person is a candidate to be a concept

Some patterns of definitory contexts are:

• call

• know as

• defined as

• denote

• named

• refers to

• designate

• means

23 / 44

Page 32: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

The methodThe representation model

In order to increase the relevance of the terms obtainedsintactically, the definitory context where a relevant term appershas been considered.

• Example: a dominant person was defined as someone wholooked like they could ’get what they wanted’

• dominant person is a candidate to be a concept

Some patterns of definitory contexts are:

• call

• know as

• defined as

• denote

• named

• refers to

• designate

• means

23 / 44

Page 33: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

The methodThe representation model

For topic extraction:

• Clustering by Committee can assign words to different clustersusing sets of representative elements (committees) that try todiscover unambiguous centroids for describing the members ofa possible class

• This method only creates clusters of terms, but it does notcreate a hierarchical structure

24 / 44

Page 34: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

The methodThe representation model

For topic extraction:

• Clustering by Committee can assign words to different clustersusing sets of representative elements (committees) that try todiscover unambiguous centroids for describing the members ofa possible class

• This method only creates clusters of terms, but it does notcreate a hierarchical structure

24 / 44

Page 35: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

The methodQuerying the Web

Hyponymy can be defined as:

• An expression A is a hyponym of an expression B if the meaningof B is part of the meaning of A and A is a subordinate of B

Hearst PatternsA, and other BA, or other B

A is a BB, such as AB, including AB, specially A

25 / 44

Page 36: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

The methodQuerying the Web

For finding hypernym relations between elements in each topic:

• A set of queries is built considering the following:• Lexical patterns + contextual information• Lexical patterns + related information

• Each query set is execute on a web search

• Candidate hypernyms are extracted from the retrieval pages

• The score is computed using:

ScoreCandHypernym =hits(LexicalPattern(term,CandHypernym))

hits(CandHypernym)

• The hypernym with greater score is selected

26 / 44

Page 37: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

The methodQuerying the Web

• Using lexical patterns + contextual information

term + lexical pattern + terms with the higher frecuencies in theinput corpus

• Example: Query set for term “museum”

museum + and other + cash + travel + productmuseum + or other + cash + travel + productsuch as + museum + cash + travel + productincluding + museum + cash + travel + productspecially + museum + cash + travel + product

27 / 44

Page 38: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

The methodQuerying the Web

• Using lexical patterns + related information

term + lexical pattern + the most representative terms in theWordNet synset

• Example: Query set for term “museum”

museum + and other + collection + object + displaymuseum + or other + collection + object + displaysuch as + museum + collection + object + displayincluding + museum + collection + object + displayspecially + museum + collection + object + display

28 / 44

Page 39: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Experiments & Results

Input corpus SizeBiology News 691 pages

Lonely Planet 1801 files

Diabetes’ tweets 13800 tweets

Considering the Biology News Corpus:

• 205388 words (lexical diversity 20.9057)

• 2140 x 4998 pair-matrix size

• Some extracted terms using linguistic patterns are:mule deer, plant, evidence...

29 / 44

Page 40: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Experiments & Results

Input corpus SizeBiology News 691 pages

Lonely Planet 1801 files

Diabetes’ tweets 13800 tweets

Considering the Biology News Corpus:

• 205388 words (lexical diversity 20.9057)

• 2140 x 4998 pair-matrix size

• Some extracted terms using linguistic patterns are:mule deer, plant, evidence...

29 / 44

Page 41: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Experiments & Results

For example, the score of candidate hypernyms for the “cell” term:

Candidate Hypernym Scoreroom 1.6810device 1.2745

membrane 1.0272subunit 0.9599place 0.9508

container 0.5181structure 0.4356

...the basic structural and functional unit of all organism independentunits of life (as in monads) or may form colonies or tissues as animals5

5WordNet synset(Biology)

30 / 44

Page 42: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Experiments & ResultsExample of obtained taxonomies from the information extractedfrom the input corpus

Figure: Group of terms relatedwith cell

Figure: Group of terms relatedwith plant

31 / 44

Page 43: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Experiments & Results

Compared with WordNet, the obtained hypernyms to group ofterms related with “plant” are:

Term Hyperonym obtained WordNet Hyperplant organism organism, beingpark plant tract, piece of land

garden park vegetationregion park locationsafari garden expedition, travel

environment garden geographical areavegetation plant collection, aggregation

32 / 44

Page 44: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Experiments & ResultsPerformance: 152 Biology News

Precision Recall F-measureTerms 0.67 0.71 0.69

Taxonomic relations 0.24 0.89 0.38

152 Biology News + 100 documents (articles, journals)

Precision Recall F-measureTerms 0.69 0.72 0.70

Taxonomic relations 0.33 0.91 0.48

• The result can be associated directly with the size and qualityof the input corpus

33 / 44

Page 45: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Publications

• Ana B. Rios-Alvarado, Ivan Lopez-Arevalo, and Victor Sosa-Sosa.Discovering hypernyms using linguistic patterns on web search. Inthe Proceedings of International Conference on Next GenerationWeb Services Practices (NWeSP’11), Salamanca, Spain, October19-21, 2011, pp. 302-307.

• Ana B. Rios-Alvarado, Ivan Lopez-Arevalo, and Victor Sosa-Sosa.Structuring taxonomies by using linguistic patterns and Wordnet onweb search. In the Proceedings of the International Conference onKnowledge Engineering and Ontology Development (KEOD 2011),Paris, France, October 26-29, 2011, pp. 273-278.

34 / 44

Page 46: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Taking into account the Social NetworksMotivation

• The activity on social networks are growing quickly

• Twitter is one of the social networks with more popularity

• Tweets can be analyzed for extract useful information and thiscan be organized into a knowledge strcuture

• First works in twitter data-mining have been done in areas as:classification, and opinion - sentiment mining

35 / 44

Page 47: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Taking into account the Social NetworksThe proposed approach

36 / 44

Page 48: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Taxonomy construction using linguistic patterns and WordNet on web search

Taking into account the Social Networks

Example of obtained taxonomy from Diabetes’ tweets

Figure: Group of terms related with diabetes

37 / 44

Page 49: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Towards acquisition of axioms from text

Work in progress...

• Ontology learning process involves:

• The identification of relations as axioms and theirrepresentation in a formal lenguage is important to reach thenext level of expressiveness

38 / 44

Page 50: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Advances

Towards acquisition of axioms from text

Work in progress...

The proposed method to identify axioms:

39 / 44

Page 51: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Conclusions and further work

Conclusions

• The advances and work in progress have been presented

• The main goal is to obtain a model for ontology learning fromtextual resources about a specific domain from textual resources

• The text clustering combined with linguistic analysis seems tobe a good technique to obtain the representative vocabulary ina specific domain

• The use of additional information and lexical-patterns into querysets execute on web search have shown a good evidence to iden-tify hypernymy and hyponymy relationships

• The natural language processing techniques can be useful inthe axiom extraction task

40 / 44

Page 52: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Conclusions and further work

Schedule of activities

41 / 44

Page 53: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Conclusions and further work

Further work

• The evaluation of method to extract taxonomic relationshipsusing different input corpus

• To continue with the PhD stay at DERI NUIG Ireland

• Adaptation and implementation of method to extract relationsas axioms

• To integrate the methods to obtain a ontology learning model

42 / 44

Page 54: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Conclusions and further work

Thanks a lot for your attention!

Ana Rios-Alvarado

[email protected]

43 / 44

Page 55: Ontology learning from textual resources using text clusteringarios/docs/seminario/... · 2014. 7. 18. · Ontology learning from textual resources Introduction Motivation The extensive

Ontology learning from textual resources

Conclusions and further work

References

[Lin98] D. Lin, Automatic retrieval and clustering of similar words, Proceedings of the 20th International

Conference on Computational Linguistics (COLING 1998), 1998.

[Car09] S.A. Caraballo, Automatic construction of a hypernym-labeled noun hierarchy from text,

Proceedings of the 37th annual meeting of the Association for Computational Linguistics on ComputationalLinguistics, 1999, pp. 120–126.

[RF07] B. Rosenfeld and R. Feldman, Clustering for unsupervised relation identification, Proceedings of the

16th ACM Conference on Information and Knowledge Management (CIKM 2007), 2007.

[CHS04] P. Cimiano, A. Hotho, and S. Staab, Comparing conceptual, divisive and agglomerative clustering

for learning taxonomies from text, Proceedings of the European Conference on Artificial Intelligence, 2004.

[PL02] P. Pantel and D Lin, Discovering word senses from text, Proceedings of 8th ACM SIGKDD

International Conference on Knowledge Discovery and Data Mining (SIGKDD 2002), 2002.

[PR04] P. Pantel and D. Ravichandran, Automatically labeling semantic classes, Proceedings of Human

Language Technology conference / North American chapter of the Association for ComputationalLinguistics annual meeting (HLT/NAACL 2004), 2004.

44 / 44