Upload
creda
View
28
Download
0
Embed Size (px)
DESCRIPTION
Concept Hierarchy Induction. b y Philipp Cimiano p resented by Joseph Park. Concept Hierarchies. Structure information into categories Provide a level of generalization Form the backbone of any ontology. Common Approaches. Machine readable dictionaries Lexico -syntactic patterns - PowerPoint PPT Presentation
Citation preview
B Y P H I L I P P C I M I A N O
P R E S E N T E D B Y J O S E P H P A R K
CONCEPT HIERARCHY INDUCTION
CONCEPT HIERARCHIES
• Structure information into categories
• Provide a level of generalization
• Form the backbone of any ontology
COMMON APPROACHES
• Machine readable dictionaries
• Lexico-syntactic patterns
• Distributional similarity
• Co-occurrence analysis
MACHINE READABLE DICTIONARIES
• Exploit regularity of dictionaries• Find a hypernym for the defined word• Head of the first NP (genus or kernel term)
• spring "the season between winter and summer and in which leaves and flowers appear“• hornbeam "a type of tree with a hard wood,
sometimes used in hedges“• launch "a large usu. motor-driven boat used for
carrying people on rivers, lakes, harbors, etc."
LEXICO-SYNTACTIC PATTERNS
• Hearst patterns• Hearstl: NP such as {NP,}* {(and | or)} NP• Hearst2: such NP as {NP,}* {(and | or)} NP• HearstS: NP {,NP}* {,} or other NP• Hearst4: NP {,NP}* {,} and other NP• Hearst5: NP including {NP,}* NP {(and | or)} NP• Hearst6: NP especially {NP,}* {(and|or)} NP
• They should occur frequently and in many text genres• They should accurately indicate the relation of interest• They should be recognizable with little or no pre-
encoded knowledge
EXAMPLE OF USING HEARST PATTERN
• 'Such injuries as bruises, wounds and broken bones...'
• hyponym(bruise, injury)• hyponym(wound, injury)• hyponym(broken bone, injury)
DISTRIBUTIONAL SIMILARITY
• Distributional hypothesis• Words are similar to the extent they share the same
context• ‘you shall know a word by the company it keeps’ –Firth
EXAMPLE
CO-OCCURRENCE ANALYSIS
• Collocation
• Document-based subsumption• a certain term is more special than a term if also
appears in all the documents in which appears
THREE MORE APPROACHES
• Formal Concept Analysis (FCA)
• Guided Clustering
• Learning from heterogeneous sources of evidence
FORMAL CONCEPT ANALYSIS
• Set-theoretical approach• Parse corpus (extract dependencies)• Verb-pp-complement• Verb-object• Verb-subject
• Extract surface dependencies (section 4.1.4)
PSEUDOCODE
EXAMPLE
RESULTS
GUIDED CLUSTERING
• Uses hypernyms from WordNet and Hearst patterns
EXAMPLE
RESULTS
MORE RESULTS
HETEROGENEOUS SOURCES OF EVIDENCE
• Naïve threshold classifier• Uses Hearst patterns for corpus patterns• Uses Google API for web patterns• Uses Hearst patterns over downloaded pages• Uses WordNet senses• Uses ‘head’-heuristic (r-match)• Uses corpus based subsumption• Uses document based subsumption
RESULTS
MORE RESULTS