Upload
hope-parker
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Overview
What is WSD ? How wordnet is analyzed as a Complex
Network What are the results
Project Methodology Area of study
Key Findings/Results New approaches Improvement techniques
Conclusion
Project Description
Objective Study on WSD
Effects of WSD in Word Sense Ontology Characteristics of WordNet
Results How do match words with other words
Parameters taken for study of word sense Improvise them by making necessary changes
Study network characteristics
WordNet - overview
Machine readable semantic dictionary interlinked by semantic relations
Developed at Princeton University as a large lexical database for English language
Most widely used linguistic resource Free for public (GPL ) Forms a scale free network with small
average shortest path having words as nodes and concepts as links
Easily navigable
WordNet (Structure)
Shows the relation in the form of Noun, Verb, Adjective, adverb
Synonym Hypernym (Is a kind of …) Hyponym (… Is a kind of) Troponym (particular ways to …) Meronym (parts of …) ---- about 25 relations
Also available for online navigation
WordNet (working)
WSD: Corpus based approaches
Set of samples that enables the system Knowledge based approaches
Machine readable dictionary with relations
WordNet Research Open source
Ranking of synsets derived from word frequencies in the British National Corpus
Top 1000 Content manipulation of text
Dataset I – controlled and calibrated study Dataset II – collected using mechanical trunk using pairs
Word Sense Disambiguation (WSD) Task of determining the meaning of an
ambiguous word in the given context Bank
Edge of a riveror
Financial institution that accepts money Refers to the resolution of lexical
semantic ambiguity and its goal is to attribute the correct senses to words (AI-complete problem)
WSD: Area of Research
Assigning correct sense to words having electronic dictionary as source of word definitions
Open research field in Natural Language Processing (NLP)
Hard Problem which is a popular area for research
Used in speech synthesis by identifying the correct sense of the word
WordNet – Theoretical aspects Wordnet – word sense ontology
Symbols are words Synset: list of words and semantic relations
between them Word sense disambiguation
Wordnet structure using latent semantics Variable lexical notation for a concept Citibase – Thesaurus Semantic relatedness And few others…
WSD: using latent semantics Measures the semantic distance of concepts Relatedness and between-ness are calculated Matrix form of wordnet data structure is used Can be used to integrate with other applications Uses Singular Value Decomposition (SVD)
algorithm Example: Multiple synsets are
{car, gondola} {car, railway car} {car, automobile}
{Motor vehicle}, {Coupe}, {Sedan}, {Taxi}
MDS-example
1 2 3 4 5 6 7 8 9 10 11 12 131 0 1 1 1 2 2 3 1 1 2 4 2 22 1 0 2 2 1 2 3 2 2 3 4 3 33 1 2 0 2 3 3 4 2 2 3 5 3 34 1 2 2 0 3 2 3 2 2 1 4 1 35 2 1 3 3 0 1 2 2 2 2 3 3 36 2 2 3 2 1 0 1 1 1 1 2 2 27 3 3 4 3 2 1 0 2 2 2 1 3 38 1 2 2 2 2 1 2 0 2 2 3 3 19 1 2 2 2 2 1 2 2 0 2 3 3 1
10 2 3 3 1 2 1 2 2 2 0 3 1 3
11 4 4 5 4 3 2 1 3 3 3 0 4 412 2 3 3 1 3 2 3 3 3 1 4 0 413 2 3 3 3 3 2 3 1 1 3 4 4 0
1, 2, 3, 4, 10,
12
5, 6, 7, 8, 9, 11,
13
Geodesic Distance Matrix
MDS
k-means
S
15
WSD: variable lexical notations for a concept
Generic concept notation: D = I ∪ J ∪ K∴ J = D − (I ∪ K) = (D − I )∩(D − K) = D∩ (I∪ K) J = D∩ ( I ∩K)since, B = D ∪ E ∪ F D = B − (E∪F) =(B − E)∩(B − F) = B∩(E ∪F) D =B ∩(E ∩ F)
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
¯¯¯¯
¯ ¯
¯¯¯¯
¯ ¯
WSD: variable lexical notations for a concept
J = D∩ ( I ∩K) =( B∩(E ∩ F) )∩( I ∩ K) J = B∩( (E ∩ F)∩( I ∩
K) )when J = fly, D = fish lure I = spinner k = troll And introducing boolean
operators, AND for ∩ OR for ∪ NOT for
¯ ¯
¯ ¯ ¯ ¯
¯ ¯ ¯ ¯
¯Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: variable lexical notations for a concept
(“fly”) becomes : (“fisherman's lure” OR “fish
lure”) AND ( (NOT “spinner”) AND (NOT “troll”) )
then B = lure,
E = ground bait,
F = stool pigeon
(“fly”) becomes :
(“bait” OR “decoy” OR “lure”) AND ( ((NOT “ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) )
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Thesaurus as a complex network
As a Directed Graph sink composed of the 73,046
terms with kout = 0 source are the 30,260 terms
with at least one outgoing link (kout > 0) – Root words
absolute source : without incoming links kin = 0
normal source : (kout > 0 and kin > 0)
bridge source : without outgoing links to root words (kout(source) = 0)
1 – Normal source2 – Bridge source3 – Absolute source4 – sink
Source: arXiv:cond-mat/0312586 v1 2003
WSD: Semantic relatedness and word sense disambiguation
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Concepts that occur more frequently and closer with each others are “more related” to each others than the concepts that appear less frequently and farther one
WordNet Relationship
Semantic relatedness Involves relationships among words
car-wheel (meronym) hot-cold (antonym) pencil-paper (functional) penguin-antarctica (association) Bank-trust company (synonym)
Probability and Distance calculation Frequency of synsets or words
Performance in NLP applications
WordNet Connect
Program to find all possible connections between two words in WordNet
Used in computing Semantic Opposition among word sense ontology
WordNet lexical database dictionary is used to read the semantic relations
Capabilities like number of paths, shortest path, overall network structure is studied
Future work
WordNet structure in terms of complex network
Key assumptions WordNet lexical dictionary analyzed under the
scope of source node, target node with an additional reference node
Achieve a cost effective path which is conditionally related to mean reference node
Control the path traversal with a relation of focus Include Common File Number to make it more
efficient
Conclusion
A single visualization can not reveal the entire structure of wordnet
There are different ways of analyzing the effectiveness of the overall system
A new method to evaluate the usefullness of the WordNet network structure