23
Automatic Extraction Automatic Extraction and Incorporation of and Incorporation of Purpose Data into Purpose Data into PurposeNet PurposeNet P. Kiran Mayee P. Kiran Mayee Rajeev Sangal Rajeev Sangal Soma Paul Soma Paul SCONLI3 JNU NEW DELHI

Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Embed Size (px)

Citation preview

Page 1: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Automatic Extraction Automatic Extraction and Incorporation of and Incorporation of

Purpose Data into Purpose Data into PurposeNetPurposeNet

P. Kiran MayeeP. Kiran Mayee

Rajeev SangalRajeev Sangal

Soma PaulSoma Paul

SCONLI3 JNU NEW DELHI

Page 2: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

INTRODUCTION INTRODUCTION

PurposePurpose

Need for a knowledge base of objects Need for a knowledge base of objects

and actions in which the knowledge and actions in which the knowledge

is organized around purpose. is organized around purpose.

Page 3: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

PurposeNetPurposeNet

PurposeNet is an intelligent PurposeNet is an intelligent knowledge-based system dealing knowledge-based system dealing with specialized attributes of artifacts with specialized attributes of artifacts – namely, their purpose, purpose of – namely, their purpose, purpose of their types, components, their types, components, accessories, as also data about their accessories, as also data about their birth, processes, side-effects, birth, processes, side-effects, maintenance and result on maintenance and result on destruction. destruction.

Page 4: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

PurposeNetPurposeNet

Page 5: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Building the PurposeNetBuilding the PurposeNet

Template DesigningTemplate Designing Revision & Refinement of templateRevision & Refinement of template Selection of DomainSelection of Domain Information Retrieval from WebInformation Retrieval from Web Ontology populationOntology population TestingTesting

Page 6: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Need for AutomationNeed for Automation

Acquisition bottleneckAcquisition bottleneck Massive availability of textMassive availability of text Availability of purpose cuesAvailability of purpose cues

Page 7: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Purpose data requiredPurpose data required

Artifact -- garage Artifact -- garage

Purpose Purpose

Action -- storeAction -- store

Upon -- vehicleUpon -- vehicle

Page 8: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Purpose CuesPurpose Cues

Word(s)Word(s) Lexical entities in a particular orderLexical entities in a particular order Classification Classification

Sentences beginning with artifact nameSentences beginning with artifact name Sentences ending with artifact nameSentences ending with artifact name Sentence containing artifact nameSentence containing artifact name Hidden CuesHidden Cues

Page 9: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Sentences commencing with Sentences commencing with artifact nameartifact name

Page 10: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Sentences ending with Sentences ending with artifact nameartifact name

We cut trees with an axe.

action upon artifact

Page 11: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Sentences containing Sentences containing artifact nameartifact name

Use the air+pump to fill the tyre.

Use the <artifact> to <action> the <upon>

Page 12: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Methodology for purpose Methodology for purpose data extractiondata extraction

Page 13: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Algorithm for Purpose Data Algorithm for Purpose Data ExtractionExtraction

Algorithm PurpDataExtract(corpus)

Step1 : Read first sentence in Corpus. Step2 : Loop until end-of-corpus – 2a. if contains(sentence, artifact) and match( sentence, cuetable) then extract(sentence, artifact) extract(sentence, to_action) extract(sentence, to_upon) add_to_ontology(artifact, to_action, to_upon) else 2b. goto step 3. Step3 : Read next sentence

Page 14: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

DataData

Wikipedia – 249 files Wikipedia – 249 files

Wordnet – 81,837 descriptionsWordnet – 81,837 descriptions

Princeton noun-artifact corpus – Princeton noun-artifact corpus –

82,115 sentences82,115 sentences

Page 15: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Observations – summary Observations – summary resultsresults

Corpus Name Corpus size purpsen PurpData Density (%)Wordnet 81837 1251 1.53Princeton 82115 1023 1.25Wikipedia 243 109 44.86

Page 16: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Purpose Data Extraction Purpose Data Extraction MissesMisses

Corpus Name PurpHits Purpmiss ( artifact name absent ) Purpmiss ( action_upon absent )Wordnet 1251 nil 4Princeton 1023 41 17Wikipedia 109 44 3

Page 17: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

IE Metrics for ExtractionIE Metrics for Extraction

Corpus Name Precision F-measureWordnet 99.6 99.79Princeton 94.6 97.22Wikipedia 69.8 82.21

Page 18: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Result BreakUp per Cue Result BreakUp per Cue ClassClass

Corpus NameWordnet 70.19 0.01 24.7Princeton 71.4 1.21 21.22Wikipedia 84.2 1.6 12.21

Class1(begin cue)

Class2(ending cue)

Class3(embedded cue)

Page 19: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Comparison with manually Comparison with manually built Ontologybuilt Ontology

Exponential increase in speedExponential increase in speed

High Error RateHigh Error Rate

Page 20: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

IssuesIssues

RedundancyRedundancy

Primary purpose not always obtainedPrimary purpose not always obtained

Pronouns and brand namesPronouns and brand names

Correctness and consistency not Correctness and consistency not

guaranteedguaranteed

One-to-one mapping assumedOne-to-one mapping assumed

Other sentence manifestationsOther sentence manifestations

Page 21: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Further EnhancementsFurther Enhancements

Parsed inputParsed input

Cues for hidden caseCues for hidden case

Better artifact lookup listBetter artifact lookup list

Multipage lookup for consistencyMultipage lookup for consistency

Cloud computingCloud computing

Automating other attributes of PurposeNetAutomating other attributes of PurposeNet

Page 22: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

ConclusionsConclusions

A methodology was proposed for A methodology was proposed for automated ontology population of automated ontology population of purposenetpurposenet

The methodology was implemented The methodology was implemented on three corporaon three corpora

The time-taken for purposenet The time-taken for purposenet 'purpose' ontology population was a 'purpose' ontology population was a fraction of that by manual methodsfraction of that by manual methods

The Error rate was found to be highThe Error rate was found to be high

Page 23: Automatic Extraction and Incorporation of Purpose Data into PurposeNet P. Kiran Mayee Rajeev Sangal Soma Paul SCONLI3 JNU NEW DELHI

Thank YouThank You