Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Automatic Construction of Situation Ontology
LOD2 WorkshopFebruary 18-19, 2011
Sung-Hyon MYAENGhttp://ir.kaist.ac.kr
Division of Web Science & TechnologyDept. of Computer Science
KAIST
} Supported by WCU (World Class University) Program} Ministry of Education & Science, Korea} About USD 2.5 million per year} For three years, renewable for additional two+ years} Faculty: 8 from Dept. of CS, KAIST, and 5 from abroad, &
Web Science & Technology (WebST) Division
Copyright © 2011 Sung-Hyon Myaeng
} Faculty: 8 from Dept. of CS, KAIST, and 5 from abroad, & a few adjuncts
} Very first academic program at graduate level in Web Science (& Engineering) in Korea
2
Blogosphere
Collective Intelligence
SocialNetwork
Linked Open Data
parti
cipa
tion
Virtu
allin
king
Diversity & significance of content types
Dynamic changes tomassive databases
Linking distributed
Computational Characteristics
Web Trends Key areas ofInvestigation
Processing & Utilization of
Web Contents
Web Platform
Research Landscape in WebST
Semantic Web Service
RDF, OWL, SPARQL
Mashup
Ontology
Cloud Computing
Internet of Things
Virtu
allin
king
Sem
antic
sP
hysi
cal
linki
ngLinking distributedheterogeneous data
Personalization ofsoftware/programs
Extraction & processingof semantics
Global analysis oftrends and patterns
Web Contents
Human-centricWeb Exploration
Web SW Engineering
3 Copyright © 2011 Sung-Hyon Myaeng
Four Major Curriculum Areas in WebST} Fundamentals
} Algorithm design & analysis, information theory, …} Enabling Technology
} Web software engineering, web architecture, high-performance computing, …
Information Contents Access & Manipulation
4
} Information Contents Access & Manipulation} Web data analysis and mining, ontology engineering, web
information security,…} Social and Collaborative Applications and Analyses
} Social networking, mobile web applications, web economy & business, …
Copyright © 2011 Sung-Hyon Myaeng
Research Areas of IR&NLP LabInformation Retrieval
Digital Content Technology
HCI
Proactive Search
Semantic Web
Natural LanguageProcessing
ArtificialIntelligence
HCIHuman Activity & ExperienceMining
Query-Free Search
In Mobile Env.
Text Mining
5
Our Current Research Focus } Automatic Construction of “Situation Ontology” from
Semi-structured Text} Automatic Processing of e-How and wiki-How for Human
Activity Knowledge} Knowledge Enrichment with Multiple Resources
} Experience Mining from Free Text in Social Media} Experience Mining from Free Text in Social Media} Experience-containing Sentence Identification} Experience Pattern Mining (forthcoming)
} Application-Oriented Research} Physical Object-Driven Search} Context-aware Suggestions of Medical Advice
6
Why Activity-Based Experiences?Contextual Factors for Mobile Information Needs
Copyright © 2011 Sung-Hyon Myaeng
[Sohn et al., CHI 2008]
7
Why Activity-Based Experiences?Everybody talks about context-aware X.
Location
Time
LBS
Intention-User
Copyright © 2011 Sung-Hyon Myaeng8
Time
Object
Actions
UserContext
Intention-based
Services
UserGoals
& Intentions
Experiences from TextInfo from sensors
An Example Application
} Situation-aware Action Recommendation System1. Recognize context or query-driven
situation2. Recommend potentially useful
services (activities)
Change a Tire like a Real Woman
Get a Road Service
Copyright © 2011 Sung-Hyon Myaeng
services (activities) 3. Receive a user selection4. Recommend a set of actions to
follow
9
1. Set your emergency brake.
2. Loosen lug nuts on tire.3. Place jack at the most solid point on the car.
4. Remove loosened lug nuts.
5. Install spare tire.6. Reinstall lug nuts.7. Let the jack down.
Situation “Ontology” Schema [Jung et al., JWS 2010]
Copyright © 2011 Sung-Hyon Myaeng10
Situation “Ontology”
Copyright © 2011 Sung-Hyon Myaeng11
Activity Extraction from “How-to” Articles
Title) How to Make Omelet Soup
Step 1) Place the water or canned chicken
broth in a large saucepan.
Boil the sweet yellow onion for
several minutes.
Step 2) Add the powdered chicken broth
(boil, sweet yellow onion)(boil, sweet
yellow onion)
(add, powdered chicken broth)(add, powdered chicken broth)
Action Sequences
Goal
(place, water)(place, water) (place, canned chicken broth) (place, canned chicken broth)
Make Omelet SoupMake Omelet Soup
Copyright © 2011 Sung-Hyon Myaeng
Step 2) Add the powdered chicken broth
along with the canned mushrooms.
Boil the soup for a few more
minutes, and then add the chopped
green onion.
Step 3) Drop the eggs into the simmering
broth a few minutes before you're
ready to serve the omelet soup.
chicken broth)chicken broth)
(boil, soup)(boil, soup)
waterwater chicken brothchicken broth
oniononionsoupsoup eggseggs
(drop, eggs)(drop, eggs)
(add, chopped green onion)
(add, chopped green onion)
Ingredients
how-to article
12
e-How StatisticsCategory # Articles Percentage
Arts & Entertainment 68,165 6.7%Business 31,846 3.1%Careers & Work 39,291 3.9%Cars 30,900 3.1%Computers 47,450 4.7%Culture & Society 26,508 2.6%Education 30,677 3.0%Electronics 18,876 1.9%Fashion, Style & Personal Care 49,270 4.9%Food & Drink 75,842 7.5%Health 122,152 12.1%
Aug. 26, 2009(1 million articles)
May 19, 2010(1.5 million articles)
Copyright © 2011 Sung-Hyon Myaeng
Health 122,152 12.1%Hobbies, Games & Toys 74,216 7.3%Holidays & Celebrations 22,632 2.2%Home & Garden 102,843 10.2%Internet 24,938 2.5%Legal 9,805 1.0%Parenting 19,427 1.9%Parties & Entertaining 8,874 0.9%Personal Finance 41,086 4.1%Pets 30,017 3.0%Relationships & Family 25,220 2.5%Sports & Fitness 74,930 7.4%Travel 29,359 2.9%Weddings 8,449 0.8%
Total 1,012,773 100.0%
Review
eHow: Hierarchy
Weddings Travel Education … Art& Entertainment
24Topics
eHoweHow
Copyright © 2011 Sung-Hyon Myaeng
Transportation
Marriage License
Wedding Basics
Wedding Budgets
Wedding Cake
Wedding Centerpieces
Wedding Decorations
Wedding Favors
Wedding Flowers
Wedding Ideas
Wedding Receptions
WeddingsPlanningsWeddingsPlannings
…
Air Travel
Airports
Buses
Car Rentals
Cruises
Public Transportation
RV
Subways
Trains
Travel & Transportation
194sub-topics
16653rd level topics
…
14
Statistics of wikiHow
Category # Articles PercentageArts & Entertainment 2,965 5.39%Cars & Other Vehicles 1,057 1.92%Computers & Electronics 8,821 16.03%Education & Communications 3,334 6.06%Family Life 1,047 1.90%Finance, Business & Legal 1,071 1.95%Food & Entertaining 6,198 11.26%Health 3,459 6.29%Hobbies & Crafts 5,919 10.76%Holidays & Traditions 726 1.32%
15
Holidays & Traditions 726 1.32%Home & Garden 2,854 5.19%Personal Care & Style 3,054 5.55%Pets & Animals 2,101 3.82%Philosophy & Religion 663 1.20%Relationships 2,247 4.08%Sports & Fitness 3,858 7.01%Travel 608 1.11%Work World 775 1.41%Youth 4,264 7.75%
Total 55,021 100.00%Copyright © 2010 Sung-Hyon Myaeng
Extraction from “How-To” Articles} Target
} (Goal, Actions, Ingredients) from each article
} Extraction of Goals} Simple rule-based pattern recognition} Goal normalization
Copyright © 2011 Sung-Hyon Myaeng
} Extraction of Actions & Ingredients from Imperative Sentences} Pattern-based approaches
} Obvious patterns è high precision} But coverage is limited.
} Machine learning (CRF) based} For more complete coverage
} Action Normalization
16
how-todoc
(wikiHow, eHow)
how-todoc
(wikiHow, eHow)
Action & Ingredient ExtractionAction & Ingredient Extraction
Syntactic Pattern-based Approach
Probabilistic CRF-based Approach
Situation “Ontology” Population
17
Instance GenerationInstance Generation
Action Normalization
Action Transition Probability Calculation
Goal Normalization
Copyright © 2010 Sung-Hyon Myaeng
} Processing of Action Steps in e-how Articles} Sentence boundary detection
} Identification of imperative sentences
} Parsing} Using Stanford NLP library
Action + Ingredient Extraction
Copyright © 2011 Sung-Hyon Myaeng
} Dependency tree generation} Simplify parse trees
¨ Eliminate adverbial phrase, determiners, and articles} Convert a parse tree to a dependency structure
¨ E.g. (VP (VBP start) (PRT (RP up)) (NP (NN car)))è prt(start-1, up-2)
dobj(start-1, car-3)
18
Syntactic Pattern-based Method} Pattern discovery è Pattern Rules
} Mask words è generate dependency relation driven patterns} E.g. prt(start-1, up-2) & dobj(start-1, car-3) è prt(a, b) & dobj(a, c)
} Identify frequent patterns (f ³ 3)} E.g. prt(a, b) & dobj(a, c) è verb(‘a b’) / ingredient(c, ‘a b’)
} Compute confidence for each pattern using manually annotated data
Copyright © 2011 Sung-Hyon Myaeng19
} verb(‘a b’) / ingredient(c, ‘a b’) } 184 patterns were generated (confidence > 85%)
} Instance generation by applying rules} Extract (action, ingredient) instances based on pattern matching
} E.g. check out the engine è (Action: check out, Ingredient: engine)
} For more complete coverage} Training data
} Sentences extracted by applying each of the selected pattern rules} POS and dependency features used for the classifier
Machine Learning (CRF) based Method
Copyright © 2011 Sung-Hyon Myaeng
none verb ingredient ingredient
You/PRPYou/PRP remove/VBPremove/VBP timing/VBGtiming/VBG belt/NNbelt/NN ..
none
20
Evaluation of the Population Method} Based on a manually constructed test collection
} Randomly chosen 2400 eHow articles from 24 domains
Method AverageAccuracy
AverageCoverage
Baseline 1 (based on Shah and Gupta) 0.7866 0.9821
Baseline 2 (based on M. Perkowiz et al) 0.5432 0.9897
Copyright © 2011 Sung-Hyon Myaeng21
Baseline 2 (based on M. Perkowiz et al) 0.5432 0.9897
Syntactic Pattern-based Method 0.9130 0.5660
CRF-based Method 0.8192 0.9499
Pattern-based & CRF-based 0.8261 0.9501
Baseline 1: Extract every first verb and first noun phrase as an actionBaseline 2: Extract every first verb and every noun phrase under 'object' and
'substance' categories in WordNet
how-todoc
(wikiHow, eHow)
how-todoc
(wikiHow, eHow)
Action & Ingredient ExtractionAction & Ingredient Extraction
Syntactic Pattern-based Approach
Probabilistic CRF-based Approach
Situation “Ontology” Population
Copyright © 2011 Sung-Hyon Myaeng22
Instance GenerationInstance Generation
Action Normalization
Action Transition Probability Calculation
Goal Normalization
Action Normalization} Replace “similar” actions with a representative action è Build an equivalence class of actions with a representative name
Mapped
Fixa flat tire
(pump, (pump, (pump,
brake foot (pump,
brake foot
Changea flat tire
Goals
Actions(a)Additional
Copyright © 2011 Sung-Hyon Myaeng
Mapped intothe
same cluster
(pump, brake pedal)
(pump, brake pedal)
brake foot pedal)
brake foot pedal)
(check,equipment)
(check,equipment)
(jack up, car)
(jack up, car)
(raise,vehicle)(raise,
vehicle)
(take,spare tire)
(take,spare tire)
(a)
(b)
(c)
ContextualSimilarity
AdditionalDescriptor
Synonyms
23
Goal normalization
} wikiHow articles serve as a backbone (unique goals)} eHow articles as a source of action instances to
enrich goal classes (e.g. additional steps)
Mapped Change
a tire
WikiHow Goals
Copyright © 2011 Sung-Hyon Myaeng
Mapped into same goal class
Changea tire
Changea tire
Fixa flat tire
Change a flattire safely
eHow Goals
24
Action transition probability
} A goal is achieved by a set of action sequences in order.} A normalized action sequence have weights that indicate the
strength of the occurrence of next/previous actions.
Copyright © 2011 Sung-Hyon Myaeng
G:a given normalized goalisNextStep(·) = a binary function:
1: when Ai->Aj appears0: when Ai->Aj not appears
25
Human Experience/Activity Mining from Blog Postsfrom Blog Posts
Experience Mining} Goal
} Extract place- and time-anchored activities from Web documents (blogs, tweets, etc.) è Experience Knowledge Base
} Activity Lexicon Construction } Automatic construction of a lexicon of verbs related to activities
or events
[Park et al., ACL 2010]
Copyright © 2010 Sung-Hyon Myaeng27
or events} Classify all V, VP in WordNet into activity / state verbs
} Experience Sentence Detection} Formulate the problem as a classification task using
various linguistic features} Experience Pattern Mining
} Extract experience constituents with plausibility
Why Experience Mining?} Urban/Spontaneous Computing} New Generation Recommendation Systems
} Experience sharing} Place-, time-, & object-aware recommendation
} Web Search
Copyright © 2010 Sung-Hyon Myaeng
} Experience retrieval} Mobile Search
} Place- and task-dependent, query-free search} E.g. As you approach the KAIST campus, it shows you
how to get to the workshop place.
28
Experience Mining
Experiences§ Definition
• Knowledge embedded in a collection of activities or events which an individual or group has actually undergone (Wikipedia)
§ Characteristics• Experience-revealing sentences have certain linguistic style • Experience-revealing sentences have certain linguistic style
[Jijkoun et al., 2010]
• I ran with my wife 3 times a week until we moved …• We went to a restaurant near the central park
• If Jason arrives on time, I’ll buy him a drink• Probably, she will laugh and dance in his funeral• Don’t play soccer on the streets!
29 Copyright © 2010 Sung-Hyon Myaeng
Experience Mining
Activity Lexicon Construction} Task
} Automatic construction of a lexicon of verbs related to activities or events
} Classify all V, VP in WordNet into activity / state verbs} Approach
} Based on linguistic theory (Vendler) and properties
Copyright © 2010 Sung-Hyon Myaeng
} Based on linguistic theory (Vendler) and properties
30
Class ExampleState Like, know, believe, …
Activity Run, swim, walk, …
Achievement Recognize, realize, …
Accomplishment Paint (a picture), build (a house), …
Activity
State
Sentence Classification
§ Linguistic Features
Experience Mining
Feature DescriptionVerb class Class of predicate from the lexicon we’ve constructed
Tense Experience revealing sentence tend to use past / present tense
Modal status of the sentence: {indicative, imperative,
• Using POS, Dependency Parsing, NER and some heuristics
Mood Modal status of the sentence: {indicative, imperative, subjunctive}
Voice {Active voice, passive voice}Aspect Temporal flow of verb: {progressive, perfective}
Modality Existence of modal verbs (e.g., can, shall, may, will, …)Experiencer Whether the subject of the experience is a person or not
31 Copyright © 2010 Sung-Hyon Myaeng
§ Detection Performance (10 fold cross validation)
Experience Mining
Sentence Classification Results
FeatureLogistic Regression SVM
Precision Recall Precision Recall
Baseline 32.0% 55.1% 25.3% 44.4%Baseline 32.0% 55.1% 25.3% 44.4%Lexicon 77.5% 76.0% 77.5% 76.0%Tense 75.1% 75.1% 75.1% 75.1%Mood 75.8% 60.3% 75.8% 60.3%Aspect 26.7% 51.7% 26.7% 51.7%Modality 79.8% 70.5% 79.8% 70.5%Experiencer 54.3% 53.5% 54.3% 53.5%All included 91.9% 91.7% 91.7% 91.4%
32 Copyright © 2010 Sung-Hyon Myaeng
Conclusion
} To deal with diverse situations of human activities for context-aware applicationsè Need a large-scale situation knowledge base and experiential knowledge
Copyright © 2011 Sung-Hyon Myaeng
} Take advantage of the Web – brain of the mankind
} Applications} Proactive Search & Recommendations [Jang et al., 2010]} Physical Object-Driven Search} Situation-Aware Medical Assistance
33
References (papers in PDF available on http://ir.kaist.ac.kr)
} Yuchul Jung, Jihee Ryu, Kyung-min Kim and Sung-HyonMyaeng (2010). "Automatic Construction of a Large-Scale Situation Ontology by Mining How-to Instructions from the Web", Journal of Web Semantics.
} Jihee Ryu, Yuchul Jung, Kyung-min Kim, and Sung HyonMyaeng (2010). "Automatic Extraction of Human Activity
Copyright © 2010 Sung-Hyon Myaeng
Myaeng (2010). "Automatic Extraction of Human Activity Knowledge from Method-Describing Web Articles", In Proceedings of the 1st Workshop on Automated Knowledge Base Construction (AKBC 2010).
} Keun Chan Park, Yoonjae Jeong, Sung-Hyon Myaeng(2010)."Detecting Experiences from Weblogs", The 48th Annual Meeting of the Association for Computational Linguistics (ACL).
34