Upload
griffin-watson
View
28
Download
0
Embed Size (px)
DESCRIPTION
A Method for Automatically Constructing Case Frames for English. Daisuke Kawahara and Kiyotaka Uchimoto. National Institute of Information and Communications Technology. (LREC2008, 2008/05/29). Background. NLP analyzers so far (Mainly) supervised, (relatively) knowledge-poor - PowerPoint PPT Presentation
Citation preview
1
A Method for Automatically Constructing Case Frames for
English
Daisuke Kawahara and Kiyotaka Uchimoto
(LREC2008, 2008/05/29)
National Institute of Information and Communications Technology
2
Background• NLP analyzers so far
– (Mainly) supervised, (relatively) knowledge-poor
• e.g., PP-attachment or parsingMary ate the salad with a fork
Mary ate the salad with mushrooms
– Only 1.5% of bilexical dependency was learned [Bikel, 04]
Toward knowledge-oriented NLP– Automatically compile case frames and integrate
them into NLP analyzers/applications
3
Related work
• Subcategorization frames– [Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe
and Carroll, 97] [Korhonen, 02] …
e.g., She greeted me.• NP(sbj) greet NP(obj)
e.g., She gave him a book.• NP(sbj) give NP(obj) NP(obj)
# of SCFs # of verbs corpus size Acc
[Brent, 1993] 6 63 1.2M 85%[Ushioda et al., 1993] 6 33 0.3M 86%[Manning, 1993] 19 200 4.1M 82%[Ersan & Charniak, 1996] 16 30 36M 70%[Caroll & Rooth, 1998] 15 100 30M 77%[Briscoe & Caroll, 1997] 161 7 1.2M 81%[Sarkar & Zeman, 2000] 137 914 0.3M 88%
4
Related work
• Subcategorization frames– [Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe
and Carroll, 97] [Korhonen, 02] …
• (Handmade) frames– FrameNet [Baker et al., 98],
PropBank [Palmer et al., 05]
• Japanese case frames– Semantics-based: [Haruno, 95] [Utsuro et al., 96]– Example-based: [Kawahara and Kurohashi, 06]
5
CS examples (in English)
yaku (1)(bake)
ga I:18, person:15, craftsman:10, …
wo bread:2484, meat:1521, cake:1283, …
de oven:1630, frying pan:1311, …
yaku (2)(have difficulty)
ga teacher:3, government:3, person:3, …
wo hand:2950
ni attack:18, action:15, son:15, …
yaku (3)(burn)
ga company:1, distributor:1, …
wo data:178, file:107, copy:9, …
ni R:1583, CD:664, CDR:3, …
…
ga: nominative, wo: accusative, ni: dative, de: instrument
Construction of case frames for Japanese [Kawahara and Kurohashi, LREC2006]
6
Case frames for 10K predicates
Construction of case frames for English
100M sentences(English Gigaword)
Filtering andParsing
Predicate-argumentstructures
Clustering WordNet
MSTParser47M sents.
sbj:you pred:borrow obj:idea pp:from:artist
sbj:she pred:borrow obj:idea pp:over:year
sbj:i pred:borrow obj:dollar pp:from:friend
sbj:farmer pred:borrow obj:money pp:for:supply
sbj:he pred:borrow obj:money pp:from:companysbj:{you,she} pred:borrow obj:idea pp:from:artist pp:over:year
sbj:i pred:borrow obj:dollar pp:from:friend
sbj:{farmer,he} pred:borrow obj:money pp:for:supply pp:from:company
sbj:{you,she} pred:borrow obj:idea pp:from:artist pp:over:year
sbj:{farmer,he} pred:borrow obj:{money,dollar} pp:for:supply pp:from:{company,friend}
7
Specification of our case frames
• Case slots– surface cases (dependency labels) and
prepositions• sbj, obj, obj2, pp:for, pp:in, …
• Instances– words– several semantic markers
• <time>, <num>, <clause>
8
Details of case frame construction
• Use only reliable parses– Sentence length <= 20 words– MSTParser [McDonald et al., 06]
• Extract predicate-argument structures– From labeled dependency parses
• Group and cluster p-a structures– Grouping by a dominant case slot
• pre-defined order: obj, sbj, pp:*
– Clustering based on WordNet
•Labeled dependency acc.:89.9% → 91.5%•Complete rate: 36.3% → 56.4%
9
sbj: { i } obj: { dollar } pp:from: { friend }
sbj: { farmer, he } obj: { money }pp:from: { company }5
3
10
81
1 1
0.82
73.053
573.0373.0
82.0111
173.0173.010.1
0.73 1.0
ratio of common cases:381
381
5103)11(
510)11(
82.0108
1082.0882.0
similarity betweeninstances (words): 53108111
73.05382.010882.0111
0.73
CF1
CF2pp:for:supply
Clustering of case frames
similarity between case frames
3
10
Results
• Obtained case frames for 9,300 verbs
• Evaluated case frames of 20 verbs– Criteria:
• Verb usage is disambiguated by dominant arguments
• Case frames must have obligatory case slots• Case slots, except a dominant one, may
contain an ineligible example
– Accuracy: 88.4%
11
Examples of obtained case frames
CS examples
burn (1) sbj they:262, it:113, protester:99, …
obj flag:247, effigy:81, house:67, …
pp:in <num>:29, ramallah:14, brisbane:11, …
pp:for week:15, hour:6, month:5, …
burn (2) sbj candle:26, lamp:5
pp:on motor-scooter:7, altar:3, platform:1, …
pp:for day:2, steinhaeuser:1
…
12
Conclusion and future work
• Constructed broad-coverage case frames for English– Described real use of English verbs
• Future work– Use more sophisticated methods for extracting
reliable parses [Kawahara and Uchimoto, 08]
– Integrate case frames to parsing (and other applications)
• cf. [Zeman, 02] for subcategorization frames[Kawahara and Kurohashi, 06] for case
frames